This week, we at TDWI produced our third annual Solution Summit on Master Data, Quality, and Governance, again in Savannah, Georgia. Jill Dyché and I moderated the conference, and we lined up a host of great user speakers and vendor panelists. The audience asked dozens of insightful questions, and the event included hundreds of one-to-on meetings among attendees, speakers, and vendor sponsors. The aggregated result was a massive knowledge transfer that highlights most of today’s burning issues in master data management, data quality, and data governance. I’d like to share with you some of the themes and insights that arose at the TDWI Solution Summit.
RECURRING THEMES
As you can see in the title of the conference, it brings together the data disciples master data management (MDM), data quality (DQ), and data governance (DG). TDWI has noted that many of its members coordinate these three very closely, sometimes with addition coordination for data integration, business intelligence, data warehousing, and metadata management. Most of the user case study speakers explained how their organizations are handling the coordination, including Cathy Burrows from Royal Bank of Canada (RBC), Becky Briggs from Airlines Reporting Corporation (ARC), and Mark Love and Sara Temlitz from the Veterans Health Administration (VHA).
A number of recurring themes were heard across the presentations and panels. But three stood out prominently: the importance of identity resolution, accurate matching, and de-duplication of redundant records. All three techniques apply to both MDM and DQ. It was obvious from speeches and questions from the audience that most organizations currently have matching and de-duplication in place, but need to complement these with identity resolution in the near future.
It would appear that data stewardship plays an important role in MDM--not just DQ. For example, speaker Becky Briggs (ARC) has won TDWI Best Practices awards for her applications of stewardship in data quality and analytics implementations. For many organizations, stewardship is a first step toward data governance. Mark Love from the VHA explained: “Our purpose for stewardship is to create a framework for data-related decision making, collaboration, and governance.” Furthermore, the VHA has expanded the concept of stewardship by hiring Identity Management Stewards who support MDM.
SPEAKER INSIGHTS
We all know that taking DQ functions upstream into the operational applications where data is entered or altered can substantially reduce DQ problems in applications databases and downstream databases (like data warehouses). But did you know that the same applies to MDM? Rick Clements from IBM played an eye-opening video that shows how to embed MDM matching and identity resolution functions in the GUIs of salesforce.com and other operational applications
David Smith (who’s on the CIO team at Citrix) shared with the audience his method for quantifying vendors and their tools, thereby giving structure and hard facts to the otherwise ad hoc process of selecting a vendor tool. Although the method can apply to many tool types, David explained how to apply it to the selection of an MDM tool.
In discussions of the future of MDM, we heard about how MDM is now available through clouds and software as a service (SaaS). For example, Peter Kulupka from Acxiom describing the cutting edge of MDM, where it’s provided as a service via a third-party public cloud. Similarly, Dan Soceanu from DataFlux pointed out that “The private cloud’s where it’s at for most enterprises doing DQ and MDM.”
The TDWI Solution Summit on Master Data, Quality, and Governance concluded with John Biderman of Harvard Pilgrim Health Care, who explained in detail his mature master reference data strategy, which includes a business-friendly, browser-based tool for entering, validating, and studying master data.
To learn more about the event, visit its Web site at: http://events.tdwi.org/events/solution-summit-savannah-2011/home.aspx. You can also read its tweets by searching Twitter for #tdwimdm.
Posted by Philip Russom, Ph.D. on March 14, 20110 comments
Teradata’s recent acquisition of Aster Data Systems is a huge signal that worlds of “big data” and data warehousing are coming together. The deal itself was not a surprise; Teradata made a down payment on Aster last September, when it bought 11 percent of the company. And before making that initial investment, Teradata proved that it was not averse to bringing in other people’s database engines by acquiring Kickfire, an innovator in MySQL and analytic appliances. However, unlike Kickfire, which was floundering in the market but offered interesting “SQL on a chip” technology, Aster was successful and well-funded. Teradata will now have an opportunity to expand its appeal beyond traditional, SQL-based data warehousing into the realm of particularly unstructured big data – and provide the technology to bring these worlds together.
“Big data” refers to the massive volumes of structured and unstructured data being generated by relatively new data sources such as Web and smart phone applications, social networks, sensors and robots, GPS systems, genomics and multimedia. For customer interaction, fraud detection, risk management and other purposes, it is often vital to analyze this data in something close to real time so that decision makers can be aware of events, trends and patterns for immediate response or predictive understanding.
The extreme requirements brought on by big data have accelerated the technology shift toward massively parallel processing (MPP) systems, which generally offer better speed and scale for the size and workloads involved in big data analysis compared with traditional symmetric multiprocessing (SMP) systems. TDWI survey data shows that data warehouse professionals intend to abandon SMP in favor of MPP. Not surprisingly, MPP’s growing appeal was a driver behind the market explosion in recent years of new data management systems and appliances that could take advantage of parallelism. Now, that market is consolidating; EMC bought Greenplum, IBM bought Netezza, HP bought Vertica and now Teradata has picked up Aster. And during this period, we’ve seen Oracle introduce Exadata, IBM introduce its Smart Analytics Systems and other developments that are bringing MPP into the mainstream for advanced analytics.
To take advantage of MPP for big data, many developers, particularly at Google, Yahoo! and other firms that bet their business on analysis of online data, have chosen to look beyond SQL, the lingua franca of relational databases, and implement Hadoop and MapReduce, which offer programming models and tools specifically for building applications and services that will run on MPP and clustered systems. Aster, with its nCluster platform, has strongly supported MapReduce implementations; as part of its “universal query framework” introduced with the 4.6 release of nCluster last fall, Aster released SQL-MapReduce to support a wider spectrum of applications.
My colleague at TDWI Research, Philip Russom, notes that while there are many synergies between Teradata and Aster – the technologies from both companies are fully capable of handling extreme big data and both assume use cases involving both big data and analytics – there are significant differences. “Teradata is designed for data that’s ruthlessly structured, even third normal form, whereas Aster, especially with its recent support for Hadoop, is known for handling a far wider range of data types, models, and standards,” Philip noted. “Most Teradata users are data warehouse professionals who are hand-cuffed to SQL, whereas Aster’s user base includes lots of application developers and other non-warehouse folk who are more interested in Pig and Hive. It’s a good thing that having diversity is strength. Assuming the Teradata and Aster camps can overcome their differences, they have a lot of great things to learn from each other.”
TDWI members have been ramping up use of advanced analytics against multi-terabyte data sets for the last several years, and Teradata platforms have been in the middle of that trend. Teradata’s move gives data warehouse professionals a strong reason to evaluate whether Aster’s technology can enable them to further exploit the power of MPP for both SQL and non-SQL applications that require advanced analytics of big data.
Stay tuned to TDWI for more insight into how organizations can expand data warehousing into the realm of big data. We are in the planning stages now for our TDWI Solution Summit, “Deep Analytics for Big Data,” to be held in San Diego, September 25-27.
Posted by David Stodder on March 11, 20110 comments
The Teradata Partners User Conference is one of the largest dedicated data warehousing conferences, pulling BI professionals from all over the world. Its attendees are collectively the most sophisticated users of BI anywhere. Given that Teradata is reinvigorated since the spinoff from NCR and more nimble and responsive, this is a good fit. Here are highlights of what I learned in my brief (1 day) visit to Partners.
Teradata Appliances
Teradata is having good success selling the 2650 machine, which customers are using primarily for departmental warehouses and dependent data marts. Customers like the product because unlike some competitors (e.g. Netezza), the box is easily expandable and highly scalable. You buy only the nodes you need, and if you reach the capacity of the box, you simply buy another, connect it with the first via gigabyte Ethernet, and redistribute the data. With other appliances, you need a forklift upgrade. (Although a Netezza customer said this wasn’t a major inconvenience.) The only downside to the 2650 appliance is that many people still don’t know that it exists. And Teradata, which always priced its products at a premium, recently lowered the list price on the 2650, making it competitively affordable.
Solid State
I also had a chance to bump into the always pleasant and insightful Dan Graham, who was recently promoted to general manager of enterprise systems at Teradata Labs. Congrats Dan! In his new role, he is spending a lot of time thinking about how to integrate solid state disk into Teradata’s big iron. He said it’s a big challenge to do it right and he’s determined to make Teradata a leader in dynamic allocation of data to solid state drives. He also said solid state also has thorny implications for pricing.
Hadoop
I also had fun talking with Dan two weeks ago at Hadoop World where he was nonplussed about comments made by some of the more exuberant voices in the Hadoop and NoSQL community about how Hadoop will displace relational databases. A recent Tweet of his sums it up: “It’s Hadoop 1.ohmygod.” Seriously, as a distributed file system, Hadoop in its current and near future states won’t offer the analytical interactivity of a relational database. The two are complementary, not competitive—at least for the foreseeable future.
Viz Contenders
Tableau. Visualization vendor Tableau had a sizable booth at Partners, reflecting its rising star status. Tableau is a Windows desktop tool that is easy to install, easy to use, and affordable (less than $1,000 for a single user). It makes it easy for users to explore data visually and publish live views to a server environment ($10,000+) for others to consume. It works well with new analytic appliances because it queries remote databases, and its recently added in-memory database can cache frequently used data sets to improve performance.
BIS2. Also exhibiting was BIS2, an up-and-coming visualization vendor, that takes a different approach to the graphical display quantitative information than Tableau. BIS2 is a server-based environment that generates complex SQL against large, complex relational databases to render what it calls “super graphics.” These are rich, multi-layered visualizations often displayed as heatmaps. Unlike the desktop-oriented Tableau, BIS2 is an enterprise visualization platform that can be embedded in other applications. Today, most of the world’s airlines use BIS2.
Managed Excel
I bumped into Donald Farmer of Microsoft, Suzanne Hoffman of Simba, and Sam Tawfik of Teradata within about 30 minutes of each other, and they all had the same message: Excel now runs directly against major databases without a mid-tier. Microsoft’s PowerPivot queries remote data sources to populate a scalable, local column store, whereas Teradata’s newly launched and misnamed Teradata OLAP enables Excel to query Teradata using MDX running against Aggregate Join Indexes (a virtual cube.) Bravo Teradata for getting out in front of the new charge to make Excel a managed BI environment. Of course, other vendors have already jumped on that bandwagon, including, including Lyzasoft, Quest (Toad), and startup Bonavista Systems (which makes Excel look like Tableau.) Others that will soon join the parade are SAP with its BEx replacement called Pioneer and IBM Cognos with desktop product in the labs.
Farmer spooked me a bit when he said one customer in Europe has deployed 100,000 copies of PowerPivot. I’ve assumed PowerPivot would be sparsely deployed since it’s an Excel add-in designed for hard core data jockeys. But Microsoft thinks otherwise: Donald reminded me that PowerPivot is free and that it will soon be included in trial versions of Microsoft Office to improve the product’s visibility outside of BI professionals, who are the product’s biggest champions currently. When I voiced concern about spreadmart proliferation among PowerPivot users, he said the big gating factor is the 10MB limit for attachments imposed by most email servers. Whew! He also said Microsoft is trying to make SharePoint so attractive as a collaboration platform that it becomes the preferred method for sharing PowerPivot data. We’ll see!
Mobile
MicroStrategy. I talked briefly with MicroStrategy’s lead man, Sanju Bansal on the show floor, and attended the company’s Tuesday evening session on mobile BI. I told Sanju that I admired MicroStrategy’s “make or break” bet on mobile BI, and its first mover advantage. I added that if the bet succeeds, MicroStrategy could be a very different company in five years, selling information-driven apps and development platforms into a consumer market. Sanju corrected some of my conjectures. He said the bet on mobile was a “make” play only and the target market for its mobile products will be business customers, not consumers.
MicroStrategy, like Teradata, has been pigeonholed at the high-end of the BI market for a long time, generating much of its revenues from its installed base. To continue to grow and exert influence over the industry, it needs to make a legitimate play for the “low end” where there are more customers. MicroStrategy estimates that there are 5 billion mobile users. That’s pretty big low-end market to play in and is a contributing factor to its aggressive push into the mobile arena.
Mobile Visualization. What’s great about mobile BI is that the limited screen real estate forces developers to adhere to key visualization principles, like “less is more.” Mark LaRow of Microstrategy told me a few months ago that mobile BI apps are more usable than their Web counterparts because they are purpose-built and highly focused. They are designed to enable a single function, and as a result, mobile applications are much more intuitive to use than general-purpose BI applications that are overloaded with functionality. Sometimes strait jackets bring enlightenment.
Posted on October 28, 20100 comments