MDM, Quality & Governance: More Vibrant Than Ever

By Philip Russom, TDWI Research Director for Data Management

This week, we at TDWI produced our fifth annual Solution Summit on Master Data, Quality, and Governance, this time on Coronado Island off San Diego. I moderated the conference, along with David Loshin (president of Knowledge Integrity). We lined up a host of great user speakers and vendor panelists. The audience asked dozens of insightful questions, and the event included hundreds of one-to-on meetings among attendees, speakers, and vendor sponsors. The aggregated result was a massive knowledge transfer that illustrates how master data management (MDM), data quality (DQ), and data governance (DG) are more vibrant than ever.

To give you a sense of the range of data management topics addressed at the summit, here’s an overview of some of the presentations heard at the TDWI Solution Summit for Master Data, Quality, and Governance:

Luuk van den Berg (Data Governance Lead, Cisco Systems) talked about MDM and DG tools and processes that are unique to Cisco, especially their method of “watermarking reports,” which enables them to certify the quality, source, and content of data, as well as reports that contain certified data.

Joe Royer (Enterprise Architect, Principal Financial Group) discussed how Principal found successful starting points for their DG program. Joe also discussed strategies for sustaining DG, different DG models, and how DG can accelerate MDM and DQ programs.

Rich Murnane (Director of Enterprise Data Operations, iJET International) described how iJET collects billions of data points annually to provide operational risk management solutions to their clients. To that end, iJET applies numerous best practices in MDM, DQ, DG, stewardship, and data integration.

Hope Johansen (Master Data Project Manager, Schlumberger). Because of mergers, Schlumberger ended up with many far-flung facilities, plus related physical assets and locations. Hope explained how her team applied MDM techniques to data about facilities (not the usual customer and product data domains!) to uniquely identify facilities and similar assets, so they could be optimized by the business.

Bruce Harris (Vice President, Chief Risk and Strategy Officer, Volkswagen Credit). VW Credit has the long-term goal of becoming a “Strategy Focused Organization”. Bruce explained how DG supports that business goal. As an experienced management executive who has long depended on data for operational excellence, Bruce recounted many actionable tips for aligning data management work with stated business goals.

Kenny Sargent (Technical Lead, Enterprise Data Products, Compassion International) described his method for “Validation-Driven Development.” In this iterative agile method, a developer gets an early prototype in front of business users ASAP to get feedback and to make course corrections as early as possible. The focus is on data, to validate early on that the right data is being used correctly and that the data meets business requirements.

Besides the excellent user speakers described above, the summit also had break out sessions with speeches by representatives from vendor firms that sponsored the summit. Sponsoring firms included: IBM, SAS, BackOffice Associates, Collibra, Compact Solutions, DataMentors, Information Builders, and WhereScape.

To learn more about the event, visit the Web site for the 2013 TDWI Solution Summit on Master Data, Quality and Governance.

Posted by Philip Russom, Ph.D. on June 5, 20130 comments

Five Trends in Predictive Analytics

Predictive analytics, a technology that has been around for decades has gotten a lot of attention over the past few years, and for good reason.  Companies understand that looking in the rear-view mirror is not enough to remain competitive in the current economy.  Today, adoption of predictive analytics is increasing for a number of reasons including a better understanding of the value of the technology, the availability of compute power, and the expanding toolset to make it happen. In fact, in a recent TDWI survey at our Chicago World Conference earlier this month, more than 50% of the respondents said that they planned to use predictive analytics in their organization over the next three years. The techniques for predictive analytics are being used on both traditional data sets as well as on big data.

Here are five trends that I’m seeing in predictive analytics:

  • Ease of use.  Whereas in the past, statisticians used some sort of scripting language to build a predictive model, vendors are now making their software easier to use.  This includes hiding the complexity of the model building process and the data preparation process via the user interface.  This is not a new trend but it is worth mentioning because it opens up predictive analytics to a wider audience such as marketing. For example, vendors such as Pitney Bowes, Pegasystems, and KXEN provide solutions targeted to marketing professionals with ease of use as a primary feature.  The caveat here, of course, is that marketers still need the skills and judgment to make sure the software is used properly.
  • Text hits the mainstream in predictive analytics. The kind of data being used as part of the predictive analytics process continues to grow in scope.  For example, some companies are routinely using text data to improve the lift of their predictive models because it helps provide the “why” behind the “what”.  Predictive analytics providers such as IBM and SAS provide text analytics as part of their solution.  Others, such as Angoss and Pegasystems have partnered with text analytics vendors (such as Lexalytics and Attensity) to integrate this functionality in their products. 
  • Geospatial data use is on the rise.  Geospatial data is also becoming more popular for use in and with predictive analytics.  For instance, geospatial predictive analytics is being used to predict crime and terrorism.  On the business front, location based data is starting to be used in conjunction with predictive modeling to target specific offers to customers based on where they are (i.e. traveling from work, at home) and their behavior. 
  • Operationalizing the analytics for action. Operationalizing means making something part of a business process.  For example, companies are using predictive analytics to predict maintenance failures, predict collections, predict churn, and the list goes on.  In these examples, predictive models are actually incorporated into the business process of an organization.  For example, if a customer takes a certain action that puts them at risk for churn, that customer’s information is routed to the appropriate department for action.  In fact, the term “action” and “insight to action” has come up quite a bit in recent conversations I’ve had with vendors.
  • Adaptive learning: I’ve heard this go by a number of names – adaptive intelligence, automated learning, and adaptive learning. The idea is about continuously learning.  For example, a model to understand behavior might be deployed against customer data. As the data changes, the model might change too.  This kind prediction could also work against streaming data.  Adaptive intelligence is still pretty early in the adoption cycle, but I expect it to increase. 

These are just a few of the trends that I’m seeing in predictive analytics.  As the technology continues to be adopted, new trends will certainly emerge.  I used predictive analytics back in the late 1980s when I was at AT&T to understand customer behavior and I’m very happy to see that it’s a technology whose time has finally come!  I’m now starting work on TDWI’s Best Practices Report on Predictive Analytics.  Expect more from me on this topic in the future.

Posted by Fern Halper, Ph.D. on May 22, 20130 comments

Big Data Management: What to Expect

By Philip Russom, TDWI Research Director

Think about everything you know about data management, including its constituent disciplines for integration, quality, master data, metadata, data modeling, event processing, data warehousing, governance, administration, capacity planning, hand coding, and so on. Now, write down everything you know on a series of index cards that are about the same size as playing cards. Next, do some reading and studying to determine the new things you need to learn about managing so-called “big data,” and write those on more index cards. Finally, shuffle the index cards and deal them into several large hands, as you would do with playing cards.

That’s pretty much what will happen to data management in the next several years, under the influence of big data and related phenomena like advanced analytics, real-time operation, and multi-structured data. You won’t stop doing the old tried-and-true practices, and the new stuff won’t replace the old best practices. You’ll play every card in the newly expanded deck, but in hands, suites, and straights that are new to you, as well as at unprecedented levels of scale, speed, complexity, and concurrency. And every hand dealt from the deck will require tweaking and tuning to make it perform at the new level.

The result is Big Data Management, the next generation of data management best practices and technologies, driven by new business and technical requirements for big data and related practices for analytics, real time, and diverse data types. Big Data Management is an amalgam of old and new techniques, best practices, teams, data types, and home-grown or vendor-built functionality. One assumption is that all these are being expanded and realigned so businesses can fully leverage big data, not merely manage it. Another assumption is that big data will eventually assimilate into enterprise data.

To help user organizations understand and embrace the next generation of data management, TDWI has commissioned a report titled: “Managing Big Data.” I will research and write this 36-page report. It will catalog new user practices and technical functions in Big Data Management, as well as explain how the older ones need to be adjusted to serve the new world of big data. This report will bring readers up-to-date, so they can make intelligent decisions about which tools, techniques, and team structures to apply to their next-generation Big Data Management solutions. TDWI will publish the report “Managing Big Data” on or about October 1, 2013.

Got #BigData? How do you manage it? Share your experiences and opinions by taking the TDWI survey for the upcoming report on “Managing Big Data,” If you complete the survey, TDWI will send you an email explaining how to download a free copy of the report in October.

Thanks! I really appreciate you taking the survey.

Posted by Philip Russom, Ph.D. on May 20, 20130 comments

Premises vs. Premise in the Cloud

With all of the research I’ve been doing around cloud computing over the past few years, I’ve noticed something very disturbing about how people use the word premises.  I’ve blogged about this before but it merits repeating on my TDWI blog.  Maybe it’s because I come from a telecommunications background that this bothers me so much – but has anyone else noticed that people are misusing the words premise/premises when describing aspects of the cloud?  The proper term is generally premises, people, as in – on your premises (see below).


Premise:  a proposition supporting or helping to support a conclusion, a statement considered to be true.

Premises:  a tract of land including its buildings.

Therefore, when discussing where servers, services, etc. are located, for instance, you should use the term premises.

Even vendors in the space make this mistake and I cringe every time I hear it.  I used to correct them, but I’ve given up doing that.  I could list hundreds, if not thousands, of examples of this error.  Has the definition of the word changed and I’m missing something?  Or, has the word been used incorrectly so many times that it doesn’t matter anymore?  My POV:  It still matters. 

Posted by Fern Halper, Ph.D. on April 17, 20130 comments

Integrating Hadoop into Business Intelligence and Data Warehousing: An Overview in 27 Tweets

Blog by Philip Russom
Research Director for Data Management, TDWI

To help you better understand how Hadoop can be integrated into business intelligence (BE) and data warehousing (DW) and why you should care, I’d like to share with you the series of 27 tweets I recently issued on the topic. I think you’ll find the tweets interesting, because they provide an overview of these issues and best practices in a form that’s compact, yet amazingly comprehensive.

Every tweet I wrote was a short sound bite or stat bite drawn from my recent TDWI report “Integrating Hadoop in Business Intelligence and Data Warehousing.” Many of the tweets focus on a statistic cited in the report, while other tweets are definitions stated in the report.

I left in the arcane acronyms, abbreviations, and incomplete sentences typical of tweets, because I think that all of you already know them or can figure them out. Even so, I deleted a few tiny URLs, hashtags, and repetitive phrases. I issued the tweets in groups, on related topics; so I’ve added some headings to this blog to show that organization. Otherwise, these are raw tweets.

Status of Users’ Efforts at Integrating Hadoop into BI/DW
1. #TDWI SURVEY SEZ: Shocking 26% don’t know what #Hadoop is. Ignorance of #Hadoop too common in BI & IT.
2. #TDWI SURVEY SEZ: Mere 18% have had experience w/#HDFS & #Hadoop. Only 2/3rds of 18% have deployed HDFS.
3. Use of #Hadoop Distributed File System (#HDFS) will go from scarce to ensconced in 3 yrs.
4. #TDWI SURVEY SEZ: Only 10% have deployed #HDFS today, yet another 63% expect to within 3 yrs.
5. #TDWI SURVEY SEZ: A mere 27% say their organization will never deploy #HDFS.

Hadoop Technologies Used Today in BI/DW
6. #TDWI SURVEY SEZ: #MapReduce (69%) & #HDFS (67%) are the most used #Hadoop techs today.
7. #TDWI SURVEY SEZ: #Hive (60%) & #HBase (54%) are the #Hadoop techs most commonly used w/#HDFS.
8. #TDWI SURVEY SEZ: #Hadoop technologies used least today are: Chukwa, Ambari, Oozie, Hue, Flume.
9. #TDWI SURVEY SEZ: #Hadoop techs etc poised for adoption: Mahout, R, Zookeeper, HCatalog, Oozie.

What Hadoop will and won’t do for BI/DW
10. #TDWI SURVEY: 88% say #Hadoop for BI/DW (#Hadoop4BIDW) is opportunity cuz enables new app types.
11. #TDWI SURVEY: Can #Hadoop Distributed File System (#HDFS) replace #EDW? Mere 4% said yes.
12. #TDWI SURVEY: Can #Hadoop Distributed File System (#HDFS) augment #EDW? Mere 3% said no.
13. #TDWI SURVEY: Can #Hadoop Distributed File System (#HDFS) expand your #Analytics? Mere 1% said no.

Hadoop Use Case with BI/DW
14. #TDWI SURVEY: 78% of respondents say #HDFS complements #EDW. That’s leading use case in survey.
15. #TDWI SURVEY: Other #HDFS use cases: archive (52%), data stage (41%), sandbox (41%), content mgt (35%).

Hadoop Benefits and Barriers
16. #TDWI SURVEY: Best #Hadoop4BIDW benefits: #BigData source, #analytics, data explore, info discover.
17. #TDWI SURVEY: Worst #Hadoop4BIDW barriers: lacking skill, biz case, sponsor, cost, lousy #Hadoop tools.

Best Practices among Users who’ve deployed Hadoop
18. #TDWI SURVEY: Why adopt #Hadoop4BIDW? Scale, augment DW, new #analytics, low cost, diverse data types.
19. #TDWI SURVEY: Job titles of #Hadoop4BIDW workers: data developer, architect, scientist, analyst.
20. Organizations surveyed with #Hadoop in production average 12 clusters; median is 2.
21. Orgs surveyed with #Hadoop in production average 45 nodes per cluster; median is 12.
22. Orgs surveyed with #Hadoop in production manage a few TBs today but expect ~.5PB within 3yrs.
23. Orgs surveyed with #Hadoop in production mostly load it via batch every 24 hrs.
24. #TDWI SURVEY: Worst #Hadoop functions: security, admin tools, namenode, data quality, loading, dev tools.

BI/DW Tools etc. Integrated Today & Tomorrow with Hadoop
25. #TDWI SURVEY: BI/DW tools commonly integrated with #Hadoop: #analytics, DWs, reporting, webservers, DI.
26. Other BI/DW tools integrated with #Hadoop: analytic DBMSs, #DataViz, OpApps, marts, DQ, MDM.
27. #TDWI SURVEY: Machinery (13%) & sensors (8%) are seldom integrated w/#Hadoop today, but coming.

Want to learn more about big data and its management?
Take courses at the TDWI World Conference in Chicago, May 5-10, 2013. Enroll online.

For a more detailed discussion – in a traditional publication! – get the TDWI Best Practices Report, titled “Integrating Hadoop into Business Intelligence and Data Warehousing,” which is available in a PDF file via a free download.

You can also register online for and replay my TDWI Webinar, where I present the findings of the TDWI report “Integrating Hadoop into BI/DW.”

Philip Russom is the research director for data management at The Data Warehousing Institute (TDWI). You can reach him at or follow him as @prussom on Twitter.

Posted by Philip Russom, Ph.D. on April 12, 20130 comments

Bringing Big Data Down to Earth

We are just weeks away from the TDWI World Conference in Chicago (May 5-10), where the theme will be “Big Data Tipping Point.” I have it on good authority that by then, the current coldness will have passed and Chicago will be basking in beautiful spring weather. (If not, as they say, wait five minutes.) The theme of the World Conference is “Big Data Tipping Point,” which means that TDWI will feature many educational sessions to help you get beyond the big data hype and learn how to apply best practices and new technologies for conquering the challenges posed by rising data volumes and increased data variety.

I would like to highlight three sessions to be held at the conference that I see as important to this objective. The first actually does not have “big data” in its description but addresses what always appears in our research as a topmost concern: data integration. In many organizations, the biggest “big data” challenge is not so much about dealing with one large source as integrating many sources and performing analytics across them. Mark Peco will be teaching “TDWI Data Integration Principles and Practices: Creating Information Unity from Data Disparity” on Monday, May 6.

On Wednesday, May 7, Dave Wells will head up “TDWI Business Analytics: Exploration, Experimentation, and Discovery.” For most organizations, the central focus of big data thinking is about analytics; business leaders want to anchor decisions in sound data analysis and use data science practices to uncover new insights in trends, patterns, and correlations. Yet, understanding analytics techniques how to align them with business demands remains a barrier. Dave Wells does a great job of explaining analytics, how the practices relate to business intelligence, and how to bring the practices to bear to solve business problems.

The third session I’d like to spotlight is “Building a Business Case for Big Data in Your Data Warehouse,” taught by Krish Krishnan. A critical starting point for big data projects and determining their relationship to the existing data warehouse is building the business case. Krish is great at helping professionals get the big picture and then see where to begin, so that you don’t get intimated by the scale. He will cover building the business case, the role of data scientists, and how next-generation business intelligence fits into the big data picture.

These are just three of the many sessions to be held during the week, on topics ranging from data mining, Hadoop, and social analytics to advanced data modeling and data virtualization. I hope you can attend the Chicago TDWI World Conference!

Posted by David Stodder on April 10, 20130 comments