What is your e-mail address?

My e-mail address is:

Do you have a password?

Forgot your password? Click here
close

Experts Blog: Philip Russom

Philip RussomPhilip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 500 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at prussom@tdwi.org, @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.


Philip Russom

By Philip Russom


Q&A RE: Data Warehouse Architecture Issues

Attendees of a recent TDWI Webinar asked excellent questions.
By Philip Russom, TDWI Research Director for Data Management

Recently, on Tuesday April 15, 2014, I broadcasted a TDWI Webinar in which I presented some of the findings from my new TDWI report, Evolving Data Warehouse Architectures in the Age of Big Data. You can download a free copy of the report in a PDF file. And you can replay the Webinar.

More

Posted on April 30, 20140 comments


Evolving Data Warehouse Architectures: An Overview in 35 Tweets

By Philip Russom
Research Director for Data Management, TDWI

To help you better understand the ongoing evolution of data warehouse architectures and why you should care, I’d like to share with you the series of 35 tweets I recently issued on the topic. I think you’ll find the tweets interesting because they provide an overview of big data management and its best practices in a form that’s compact, yet amazingly comprehensive.

Every tweet I wrote was a short sound bite or stat bite drawn from my recent TDWI report Evolving Data Warehouse Architectures in the Age of Big Data. Many of the tweets focus on a statistic cited in the report, while other tweets are definitions stated in the report.

More

Posted on April 15, 20140 comments


Q&A RE: The State of Big Data Integration

It’s still early days, but users are starting to integrate big data with enterprise data, largely for business value via analytics.

By Philip Russom, TDWI Research Director for Data Management

A journalist from the IT press recently sent me an e-mail containing several very good questions about the state of big data relative to integrating it with other enterprise data. Please allow me to share the journalist’s questions and my answers:

How far along are enterprises in their big data integration efforts?

According to my survey data, approximately 38% of organizations don’t even have big data, in any definition, so they’ve no need to do anything. See Figure 1 in my 2013 TDWI report Managing Big Data. Likewise, 23% have no plans for managing big data with a dedicated solution. See Figure 5 in that same report.

Even so, some organizations have big data, and they are already managing it actively. Eleven percent have a solution in production today, with another 61% coming in the next three years. See Figure 6.

Does data integration now tend to be haphazard, or one-off projects, in many enterprises, or are architectural strategies emerging?

I see all the above, whether with big data or the usual enterprise data. Many organizations have consolidated most of their data integration efforts into a centralized competency center, along with a centrally controlled DI architecture, whereas a slight majority tend to staff and fund DI on a per-application or per-department basis, without an enterprise strategy or architecture. Personally, I’d like to see more of the former and less of the latter.

What are the best approaches for big data integration architecture?

Depends on many things, including what kind of big data you have (relational, other structures, human language text, XML docs, etc.) and what you’ll do with it (analytics, reporting, archiving, content management). Multiple big data types demand multiple data platforms for storing big data, whereas multiple applications consuming big data require multiple processing types to prepare big data for those applications. For these reasons, in most cases, managing big data and getting business use from it involves multiple data management platforms (from relational DBMSs to Hadoop to NoSQL databases to clouds) and multiple integration tools (from ETL to replication to federation and virtualization).

Furthermore, capturing and integrating big data can be challenging from a data integration viewpoint. For example, the streaming big data that comes from sensors, devices, vehicles, and other machines requires special event-processing technologies to capture, triage, and route time-sensitive data—all in a matter of milliseconds. As with all data, you must transform big data as you move it from a source to a target, and the transformations may be simple (moving a click record from a Web log to a sessionization database) or complex (deducing a fact from human language text and generating a relational record from it).

What "traditional" approaches are being updated with new capabilities and connectors?

The most common data platform being used for capturing, storing, and managing big data today are relational databases, whether based on MPP, SMP, appliance, or columnar architectures. See Figure 16 in the Managing Big Data report. This makes sense, given that in a quarter of organizations big data is mostly or exclusively structured data. Even in organizations that have diverse big data types, structured and relational types are still the most common. See Figure 1.

IMHO, we’re fortunate that vendors’ relational database management systems (RDBMSs) (from the old brands to the new columnar and appliance-based ones) have evolved to scale up to tens and hundreds of terabytes of relational and otherwise structured data. Data integration tools have likewise evolved. Hence, scalability is NOT a primary barrier to managing big data.

If we consider how promising Hadoop technologies are for managing big data, it’s no surprise that vendors have already built interfaces, semantic layers, and tool functionality for accessing a broad range of big data managed in the Hadoop Distributed File System (HDFS). This includes tools for data integration, reporting, analysis, and visualization, plus some RDBMSs.

What are the enterprise "deliverables" coming from users’ efforts with big data (e.g., analytics, business intelligence)?

Analytics is the top priority and hence a common deliverable from big data initiatives. Some reports also benefit from big data. A few organizations are rethinking their archiving and content management infrastructures, based on big data and the potential use of Hadoop in these areas.

How is the role of data warehousing evolving to meet the emergence of Big Data?

Big data is a huge business opportunity, with few technical challenges or downsides. See figures 2 through 4 in the report Managing Big Data. Conventional wisdom says that the opportunity for business value is best seized via analytics. So the collection, integration, and management of big data is not an academic exercise in a vacuum. It is foundational to enabling the analytics that give an organization new and broader insights via analytics. Any calculus for the business return on managing big data should be based largely on the benefits of new analytics applied to big data.

On April 1, 2014, TDWI will publish my next big report on Evolving Data Warehouse Architectures in the Age of Big Data. At that time, anyone will be able to download the report for free from www.tdwi.org.

How are the new platforms (such as Hadoop) getting along with traditional platforms such as data warehouses?

We say “data warehouse” as if it’s a single monolith. That’s convenient, but not very accurate. From the beginning, data warehouses have been environments of multiple platforms. It’s common that the core warehouse, data marts, operational data stores, and data staging areas are each on their own standalone platforms. The number of platforms increased early this century, as data warehouse appliances and columnar RDBMSs arrived. It’s now increasing again, as data warehouse environments now fold in new data platforms in the form of the Hadoop Distributed File System (HDFS) and NoSQL databases. The warehouse has always evolved to address new technology requirements and business opportunities; it’s now evolving again to assure that big data is managed appropriately for the new high-value analytic applications that many businesses need.

For an exhaustive discussion of this, see my 2013 TDWI report Integrating Hadoop into Business Intelligence and Data Warehousing.

Posted on January 22, 20140 comments


Managing Big Data: An Overview in 30 Tweets

By Philip Russom
Research Director for Data Management, TDWI

To help you better understand new practices for managing big data and why you should care, I’d like to share with you the series of 30 tweets I recently issued on the topic. I think you’ll find the tweets interesting, because they provide an overview of big data management and its best practices in a form that’s compact, yet amazingly comprehensive.

Every tweet I wrote was a short sound bite or stat bite drawn from my recent TDWI report “Managing Big Data.” Many of the tweets focus on a statistic cited in the report, while other tweets are definitions stated in the report.

I left in the arcane acronyms, abbreviations, and incomplete sentences typical of tweets, because I think that all of you already know them or can figure them out. Even so, I deleted a few tiny URLs, hashtags, and repetitive phrases. I issued the tweets in groups, on related topics; so I’ve added some headings to this blog to show that organization. Otherwise, these are raw tweets.

Types of Multi-Structured Data Managed as Big Data
1. #TDWI SURVEY SEZ: 26% of users manage #BigData that’s ONLY structured, usually relational.
2. #TDWI SURVEY SEZ: 31% manage #BigData that’s eclectic mix of struc, unstruc, semi, etc.
3. #TDWI SURVEY SEZ: 38% don’t have #BigData by any definition. Hear more in #TDWI Webinar Oct.8 noonET http://bit.ly/BDMweb
4. Structured (relational) data from traditional apps is most common form of #BigData.
5. #BigData can be industry specific, like unstruc’d text in insurance, healthcare & gov.
6. Machine data is special area of #BigData, with as yet untapped biz value & opportunity.

Reasons for Managing Big Data Well
7. Why manage #BigData? Keep pace w/growth, biz ROI, extend ent data arch, new apps.
8. Want to get biz value from #BigData? Manage #BigData for purposes of advanced #analytics.
9. #BigDataMgt yields larger samples for apps that need it: 360° views, risk, fraud, customer seg.
10. #TDWI SURVEY SEZ: 89% feel #BigDataMgt is opportunity. Mere 11% think it’s a problem.
11. Key benefits of #BigDataMgt are better #analytics, datasets, biz value, sales/marketing.
12. Barriers to #BigDataMgt: low maturity, weak biz support, new design paradigms.
13. #BigDataMgt non-issues: bulk load, query speed, scalability, network bandwidth.

Strategies for Users’ Big Data Management Solutions
14. #TDWI SURVEY SEZ: 10% have #BigDataMgt solution in production; 10% in dev; 20% prototype; 60% nada. #TDWI Webinar Oct.8 http://bit.ly/BDMweb
15. #TDWI SURVEY SEZ: Most common strategy for #BigDataMgt: extend existing DataMgt systems.
16. #TDWI SURVEY SEZ: 2nd most common strategy for #BigDataMgt: deploy new DataMgt systems for #BigData.
17. #TDWI SURVEY SEZ: 30% have no strategy for #BigDataMgt though they need one.
18. #TDWI SURVEY SEZ: 15% have no strategy for #BigDataMgt cuz they don’t need one.

Ownership and Use of Big Data Management Solutions
19. Some depts. & groups have own #BigDataMgt platforms, including #Hadoop. Beware teramart silos!
20. Trend: #BigDataMgt platforms supplied by IT as infrastructure. Imagine shared #Hadoop cluster.
21. Who does #BigDataMgt? analysts 22%; architects 21%; mgrs 21%; tech admin 13%; app dev 11%.

Tech Specs for Big Data Management Solutions
22. #TDWI SURVEY SEZ: 97% of orgs manage structured #BigData, followed by legacy, semi-struc, Web data etc.
23. Most #BigData stored on trad drives, but solid state drives & in-memory functions are gaining.
24. #TDWI SURVEY SEZ: 10-to-99 terabytes is the norm for #BigData today.
25. #TDWI SURVEY SEZ: 10% have broken the 1 petabyte #BigData barrier. Another 13% will within 3 years.

A Few Best Practices for Managing Big Data
26. For open-ended discovery-oriented #analytics, manage #BigData in original form wo/transformation.
27. Reporting and #analytics are different practices; managing #BigData for each is, too.
28. #BigData needs data standards, but different ones compared to other enterprise data.
29. Streaming #BigData is easy to capture & manage offline, but tough to process in #RealTime.
30. Non-SQL, non-relational platforms are coming on strong; BI/DW needs them for diverse #BigData.

Want to learn more about managing big data?

For a much more detailed discussion—in a traditional publication!—get the TDWI Best Practices Report, titled Managing Big Data, available in a PDF file via a free download.

You can also register for and replay my TDWI Webinar, where I present the findings of Managing Big Data.

Please consider taking courses at the TDWI World Conference in Boston, October 20–25, 2013. Enroll online.
============================
Philip Russom is the research director for data management at The Data Warehousing Institute (TDWI). You can reach him at prussom@tdwi.org or follow him as @prussom on Twitter.

Posted on October 11, 20130 comments


Analytics and Reporting Are Two Different Practices

Treat them differently, if you want to get the most out of each.

By Philip Russom, TDWI Research Director for Data Management

I regularly get somewhat off-base questions from users who are in the thick of implementing or growing their analytic programs, and therefore get a bit carried away. Here’s a question I’ve heard a lot recently: “Our analytic applications generate so many insights that I should decommission my enterprise reporting platform, right?” And here’s a related question: “Should we implement Hadoop to replace our data warehouse and/or reporting platform?” More

Posted on September 26, 20130 comments


Evolving Data Warehouse Architectures: Integrating HDFS with an RDBMS Alleviates the Limitations of Both

Hadoop has limitations. But the relational database management systems used for data warehousing do, too. Luckily, their strengths are complementary.

By Philip Russom, TDWI Research Director for Data Management

In a recent blog in this series, I discussed “The Roles of Hadoop” in evolving data warehouse architectures. (There’s a link to that blog at the end of this blog.) In response, a few people asked me (I’m paraphrasing): “Since the Hadoop Distributed File System (HDFS) is so useful, can it replace the relational database management system (RDBMS) that’s at the base of my current data warehouse and its architecture?” More

Posted on September 2, 20130 comments


Back to Top

Channels by Topic

  • Agile BI »
    Includes:
    • Agile
    • Scoping
    • Principles
    • Iterations
    • Scrum
    • Testing
  • Big Data Analytics »
    Includes:
    • Advanced Analytics
    • Diverse Data Types
    • Massive Volumes
    • Real-time/Streaming
    • Hadoop
    • MapReduce
  • Business Analytics »
    Includes:
    • Advanced Analytics
    • Predictive
    • Customer
    • Spatial
    • Text Mining
    • Big Data
  • Business Intelligence »
    Includes:
    • Agile
    • In-memory
    • Search
    • Real-time
    • SaaS
    • Open source
  • BI Leadership »
    Includes:
    • Latest Trends
    • Technologies
    • Thought Leadership
  • Data Analysis and Design »
    Includes:
    • Business Requirements
    • Metrics
    • KPIs
    • Rules
    • Models
    • Dimensions
    • Testing
  • Data Management »
    Includes:
    • Data Quality
    • Integration
    • Governance
    • Profiling
    • Monitoring
    • ETL
    • CDI
    • Master Data Management
    • Analytic/Operational
  • Data Warehousing »
    Includes:
    • Platforms
    • Architectures
    • Appliances
    • Spreadmarts
    • Databases
    • Services
  • Performance Management »
    Includes:
    • Dashboards, Scorecards
    • Measures
    • Objectives
    • Compliance
    • Profitability
    • Cost Management
  • Program Management »
    Includes:
    • Leadership
    • Planning
    • Team-Building
    • Staffing
    • Scoping
    • Road Maps
    • BPM, CRM, SCM
  • Master Data Management »
    Includes:
    • Business Definitions
    • Sharing
    • Integration
    • ETL, EAI, EII
    • Replication
    • Data Governance

Sponsored Links