Q&A with the Experts
A business intelligence or data warehouse implementation can be a formidable undertaking. In these pages, leading business intelligence and data warehousing solution providers share their answers to the questions they hear often from industry professionals. Mark Hammond, an independent consultant, provides his analyst viewpoint to each Q&A.
We want to enhance our data warehouse with current information stored in our operational systems. How can we do that without degrading source system performance?
A well-designed data federation tool will allow you to extend your data warehouse with up-to-the-second information from your operational systems while limiting disruption to those systems. It does this through intelligent query techniques, optimized join strategies, specialized algorithms that minimize data transfer, and by leveraging database query optimizers. With these advanced features, data federation can be used to create a virtual data warehouse that provides users with both real-time and historical information without physically moving data or bringing your operational systems to a crawl. In the end, users will have access to trusted, accurate, timely information for better decision making.
A federated approach to business intelligence is a sound alternative for organizations needing to track key metrics from source systems with a relatively low cost and rapid implementation, as long as the organization does not require the richer query functionality and long-term historical analysis possible in a traditional data warehouse. Other options to incorporate timely information into a warehouse include real-time trickle feed technology from data integration vendors, as well as changed data capture (CDC), which moves only data updated in source systems since the last batch load to the warehouse. By reducing data volumes, CDC can enable greater load frequency with minimal impact on operational performance.
Does data governance require a budget?
Absolutely! If you want to accomplish something tangible, you need to obtain management support and an approved budget that weighs costs against benefits. Grassroots data governance usually leads to failure. Identify the goals and objectives of your data governance program, and estimate the resources required to accomplish the goals. Take into account that data governance often requires new software solutions and external guidance. Since effective data governance requires commitment from IT and business users, resources will be diverted from other corporate projects. Make a conscious decision to make the trade-off, and your data governance efforts will flourish.
A data governance budget is not only a sound practice to help ensure alignment of resources with business objectives; it can also help prevent “project creep” and unanticipated expenses. Especially for large organizations with a variety of data and metadata management systems, a data governance initiative can take on a life of its own with multiple facets, competing responsibilities, and diluted objectives. A strategic budget, strong executive sponsorship, and cross-enterprise data stewards are key to helping you realize the data governance benefits of common data definitions and closer IT/business alignment across business units. Ideally, a data governance budget will map out total cost of ownership as well as the expected tangible and intangible benefits.
What is the role of data quality in master data management (MDM) initiatives?
MDM is emerging as a way for organizations to deliver a single, unified view of the truth from multiple operational applications. At the core of successful MDM engagements is data quality technology, which helps companies standardize, verify, and correct data across sources. This technology is also adept at matching similar pieces of data in different sources and resolving numerous instances of a customer, product, or asset into a “master record. ” Although MDM approaches often focus on the access and movement of data, an MDM approach driven by data quality will lead to better data within the master repository—and a more accurate reflection of the truth.
Data quality should be very much a foundational element of an MDM solution. Without data quality in place, organizations run a significant risk of perpetuating the old “garbage in, garbage out” phenomenon across the MDM platform. Close attention to data quality can exponentially increase the value of MDM by cleansing, matching, and standardizing data from operational customer, product, inventory, and other systems. In tandem, data quality and MDM help enable organizations to precisely target customers, eliminate duplicate communications, and improve supply-chain efficiency across regions or globally. Deduplication of data from multiple sources can also help improve MDM operational performance by reducing the number of records to be loaded by a factor of 5, 10, or even more.
What are the advantages of using a data warehouse appliance based on a commodity versus a proprietary platform?
Commodity-based appliances offer significant advantages:
- Major partnerships. Commodity players reduce risk by attracting the attention and support of major partners.
- Rationalized research and development. Development is focused and advancement is quicker when harnessing the power of major partners to provide hardware reliability and innovation.
- High availability. Commodity hardware offers higher reliability and improved mean time between failures (MTBF).
- Performance and scalability at lower prices. Commodity players offer higher scalability and performance at lower cost because of strong competition.
- Easier upgrade path. Plug and play prevents forklift upgrades.
The increasing availability of commodity-based systems has introduced yet another twist in the fast-changing DW appliance market. Many appliances based on commodity hardware (e.g., Intel and AMD microprocessors) and software are competitively priced because vendors can sidestep the dedicated R&D required to develop and evolve proprietary systems, and over time commodity systems generally mature to approach, if not match, the performance of proprietary technology. Organizations in the market for a DW appliance need to weigh price, performance, data volumes, infrastructure integration, and energy efficiency considerations—as well as their long-term road map—in deciding between commodity and proprietary DW appliances.
Hyperion Solutions Corporation
What are the key points to consider for an MDM initiative related to performance management and BI?
MDM projects vary widely, but it’s key to start small and plan for growth from the beginning. First, identify one or two types of master data (dimensions) and hierarchies to tackle. You can start by defining and agreeing on a master data lifecycle, definitions, and attributes, and by determining the system of record and system of entry for changes.
Next, identify one or two systems that will interact with the MDM application initially. This means defining the integration method(s) and establishing the required frequency of updates to each system.
Finally, ensure business user involvement in the maintenance of master data. This includes securing management sponsorship and participation, establishing policies and business rules for managing changes, and defining change approval levels, internal controls, and reporting.
A good rule of thumb when strategizing on an MDM initiative for BI and performance management is to start small but think big. Driven by tactical needs, some organizations apply MDM in isolated areas without a vision for enterprisewide MDM—inviting silo problems and focusing more on reactive MDM rather than proactive MDM. Given differing interpretations of MDM, it’s important to invest time up front in defining what MDM means for the business. Organizations should clearly distinguish between analytic and operational MDM, baseline problems to be addressed, and outline the expected benefits of MDM both immediately and in longer-range pursuit of MDM’s promise of a single view of the business.
What’s wrong with data warehouse tuning techniques used today?
Data warehouse performance tuning techniques are designed to overcome the architectural limitations of the underlying relational database. E.F. Codd, a relational database pioneer, admitted that relational databases need these. Whether you’re denormalizing, indexing, partitioning, striping, or creating summary tables or views, you’re overcoming existing relational database management system (RDBMS) limitations. These techniques are labeled as tuning, when in fact they’re an engineering compromise that trades speed for adaptability, reliability, and cost. With the rapid growth in data and the increasing need for more insightful information balanced with constrained IT budgets, it is time for innovation. Disruptive approaches to high-performance data access are a necessity for success.
We all know that RDBMSs were originally built with the relatively minimal data model requirements of transactional applications in mind. The RDBMS vendors have done a good job of extending their database products to enable the much more complex multi-dimensional data models required of data warehousing. But there’s still room for improvement. To fill the void, a few vendors have produced software tools—sometimes called accelerators—that sit atop popular RDBMSs, giving them greater dimensionality in modeling, speed with multi-dimensional queries, and efficiency in managing dimensionaldata.
What is the role of profiling in data quality?
The goal of profiling is to identify problems that could prevent effective matching of data. Data quality profiling enables you to answer the following questions about customer data:
- What data fields are suitable for use in the matching processes?
- What standardization/cleansing is required for each data field prior to the matching process?
- What matching rules are likely to be effective? For example, partially incomplete or invalid fields can be used in the matching process, but rules must be formulated to ensure that they are used only when a valid output is present.
Organizations that overlook data profiling in a broader data quality initiative do so at their own peril. Though short-term savings might be realized by avoiding both software licensing and personnel costs, skipping the data profiling phase can undermine the integrity of the entire process and mean more time in manually troubleshooting problem spots. When properly executed, data profiling assesses weaknesses and discrepancies in data prior to cleansing and matching and enables reconciliation up front, as well as providing a radar screen for ongoing data monitoring. Especially for high-impact initiatives like customer data integration, data profiling can be the make-or-break element in the project’s overall success.
Initiate Systems, Inc.
What are the necessary features for accurate and effective customer recognition and data matching?
Dual thresholds. Most systems based on probabilistic algorithms can be tuned to achieve specific false positive and false negative rates. However, look for a system that provides the ability to set multiple thresholds for each search.
Real-time response. Avoid solutions that offload batch processing with no emphasis on performance. Instead, look for a system that can scale to support millions and billions of records for on-demand record lookups.
Adaptability. Businesses concerned with high accuracy should also look for a highly adaptive system—one that adjusts according to the data contained in individual files.
Extensibility. To ensure high accuracy, companies must be able to include search parameters specific to their business or industry.
As customer data integration (CDI) systems proliferate, more attention is being paid to the functionality and reach of data cleansing and matching solutions. A checklist feature should be supported for virtually all data types—structured, unstructured, and information from external third parties. The solution should also provide outlier reporting to enable users to monitor its effectiveness and determine the root cause of problems. Because packaged data matching software can be complex, with probabilistic algorithms, heuristics, comparisons, and scorings, organizations may want to consider engaging an independent expert to assess the pros and cons of competing applications and help select the solution that best meets their business requirements.
How can we efficiently tap into our data to enhance the decision-making process?
Effective decision making requires a user to view a sequence of interrelated data. With reporting technology, a series of 5 to 10 reports may be needed to make analytically based decisions. This process is time-consuming, both for IT to design the reports and for the businessperson to uncover critical insights.
Dynamic dashboards with relational online analytical processing (ROLAP) functionality offer an advanced approach to decision making. These informational dashboards collapse data from dozens of reports into one dashboard, and provide intuitive and rapid navigation across the data. With ROLAP’s “drill-anywhere” capability, users can easily surf the data warehouse to find the data without requiring an explicit report to be designed by IT.
A key culprit behind inefficiency in tapping data is user reluctance to embrace sophisticated business intelligence tools. Rather than attempting to force-feed BI down users’ throats, many organizations have benefited from lowering the barrier to entry through third-party Microsoft Excel add-ons that enable analysts to continue using a favored tool but with richer ad hoc analysis and reporting functionality. Similarly, dynamic, graphical dashboards with easy-to-read metrics and drill-through, workflow-driven decision making, as well as collaborative training involving both business and IT, can help improve user efficiency. A business requirements analyst can help understand user needs and complaints and tailor solutions accordingly.
Data warehouse appliances have been acknowledged as successful at supporting data mart analytics. But can they handle the requirements of enterprise-class data warehousing?
Increasingly, large organizations are implementing Netezza data warehouse appliances to handle enterprise-class data warehouse applications because of their ability to easily manage large workloads of varying complexity. These systems have proven they can deliver high performance against large, mixed workloads across many business applications and for hundreds of concurrent users. This capability provides organizations with faster, deeper insight into data from multiple departments across the enterprise.
Appliances are steadily gaining credibility as scalable platforms for enterprise-class data warehouse systems—one reason why IDC predicts the appliance market to grow from its current estimated $50–75 million to roughly $500 million in five years. Continued R&D by DW appliance vendors and the advent of multicore processors are combining to increase appliance scalability to support large data volumes, complex queries, and high numbers of concurrent sessions. Prospective buyers should conduct rigorous proof-of-concept testing that mirrors the production environment to ensure performance matches expectations. If performance is up to par, an appliance may be a smart choice, especially for organizations that need to reduce expenses for deployment and maintenance.
What can I accomplish with enterprise data mash-ups (EDM) that I can’t from application integration?
Enterprise data mash-ups (EDM) are lightweight, Web-based SOA applications that leave source data in its original location and state. They do all the work on the fly, usually without the need for “big iron.” The smaller footprint of EDM enables companies to quickly construct new dynamic and flexible applications for specific tasks by assembling ingredients from existing applications, Web services, and databases. This empowers companies to solve problems incrementally, realizing ROI gains in small, measurable steps and avoiding the huge burden of a larger enterprise development project that could take months and millions. Management, developers, and end users all benefit from this approach.
What is the value of prebuilt analytic applications for data integration and business intelligence?
The use of analytic applications is a rapidly growing trend as organizations look to deploy BI more broadly and in a more integrated fashion with operational applications and processes. A key part of the value of analytic applications is the prebuilt ETL adapters that are typically included, enabling organizations to very quickly and easily deploy a data warehouse populated with data from common enterprise applications such as Oracle, Siebel, and SAP. Analytic applications also offer significant value when it comes time to upgrade, with utilities to help organizations upgrade the analytic environmentin concert with the operational applications running in parallel.
Analytic applications can offer the best of both worlds—prebuilt functionality that zeroes in on issues common to a functional area (e.g., supply chain or finance) or a particular industry, as well as customization options that enable developers to tailor the application to an organization’s unique needs. As a result, they have become a popular alternative to both vanilla query and reporting tools and to building an application from scratch. While customization is always required, analytic applications can accelerate time to value for both developers and end users, with prebuilt ETL connectivity and business-focused workflows and other analytic functionality geared for the business side.
Petris Technology, Inc.
What can owners of long-lived assets do to maintain knowledge across the generations of technology, science, and staffing?
Managing the information associated with these extremely large and long-lived assets poses unprecedented challenges —but also opportunities. Robust data management framework and process are required to ensure that relevant information is captured and managed for future use. A workflow that automates many of these processes removes the need to manually handle inputs, while the collected insights of experts provide built-in training for the next generation who are just learning the ropes. The long life of a field will mean that technology, scientific concepts, and analytical approaches will evolve during its life and adjustments must be made. A solid data management approach and integrated workflow will ensure these advancements can be applied to distill more knowledge from old data.
The question has more to do with political will and strategic vision than with technology. Organizations in highly technical industries do face unique challenges with massive and complex data volumes, the need to preserve knowledge across generations, and the certainty that information will only continue to grow. On the other hand, data management systems and storage capacity have matured to help meet the challenge. For one, the discipline of information lifecycle management (ILM) provides a framework to align the business value of data to its most appropriate and cost-effective storage medium. Implementing an overarching information system like ILM can be a massive challenge, but it’s also increasingly crucial for scientific and research organizations.
Pitney Bowes Group 1 Software
Why has interest in improving master data management surged in recent years?
In many respects, the interest in master data management (MDM) is a result of lessons learned during struggles to implement effective CRM and ERP applications, which were intended to provide a 360-degree view of a given customer. Many organizations come up with a fuzzy or distorted view because the underlying data used to generate it was of poor quality. Now businesses recognize the importance of high-quality data as the basis for any business intelligence programs that they implement. MDM can help align and maintain master data assets.
Data quality and consistency has long been the crazy uncle that no one in the family wanted to acknowledge. Organizations have historically had a limited understanding of the costs and inefficiencies associated with data discrepancies across multiple systems, as well as the return on investment (ROI) that can be derived from MDM and a single set of consistent data. Now, the maturation of MDM platforms and data quality technology have given enterprises a means of automating what would 10 years ago have been a costly and painstaking manual process. Publicized MDM success and the perceived opportunity costs of failing to address MDM also contribute to MDM’s ascent on the IT radar screen.
Certain IT departments are still finding it challenging to build the right business case for master data management initiatives. How do you recommend creating a compelling business case?
This is an important question—MDM is no longer exclusively an IT-driven project. MDM can provide significant business benefits, and gaining business sponsorship up front is critical. We recommend doing a quick ROI audit to identify the specific set of business drivers by function—whether in marketing, sales, contracting, procurement, or compliance. The key is to identify the hard cost savings associated with a MDM project that will capture the attention of business managers. For example, one manufacturer identified over $11 million in cost savings from their sales operations over a five-year period, effectively paying for the MDM project. However, it is equally important not to leave out the soft costs, which are essential to sell the larger vision.
Most IT professionals, much less business sponsors, have little insight into the magnitude of data quality and inconsistency problems across multiple business units. A sound first step in building a business case for MDM is to document the extent of the problem through a data life-cycle audit (and in many cases, the problem will be greater than expected). A thorough assessment of the issue supplies a foundation to 1) document the cost of data discrepancies to the business, and 2) quantify business benefits and ROI that can be realized through MDM, ideally over a 5- or 10-year period. It may be helpful to document the larger industry trend toward a comprehensive data infrastructure—and the opportunity costs of falling behind.
What are the most important criteria for creating a usable data aggregation environment?
There are five requirements that stand out. A data aggregation environment should:
- Support multiple data integration technology requirements, sharing common administration, design, and metadata.
- Meet rigorous standards for ad hoc querying and data warehouse load performance, well beyond what can be delivered by traditional relational databasesystems.
- Be significantly easier to deploy and maintain than what is possible through traditional database systems.
- Enable near-linear user and data scalability to support thousands of users and terabytes of data and, in some cases, also the flexibility to allow multiple grades of SLAs.
- Provide a cost-effective solution that drives data aggregators’ top-line growth while containing data warehouse infrastructure costs.
Throughout the process, it’s important to keep the overarching objective in mind—building measurable and sustainable business value, which can become diluted amid the complexity and time pressures of a large-scale data aggregation project. Key criteria supporting that principle include 1) executive-sponsored collaboration between business and IT, 2) improving the quality and consistency of enterprise data, 3) ensuring performance and scalability, and 4) implementing a standards-based system that may be rapidly and affordably extended to support future initiatives or mergers/acquisitions. Establishing a “center of excellence” is a proven way to coordinate competing priorities and share best practices to help ensure that business value is realized.
How can I improve the quality of my data throughout my data integration project?
A thorough data quality program includes two phases. First, data is either captured in a standard, error-proof way, or it is cleansed in preparation for loading. Then, data can be enhanced for further analysis. For example, demographic and/or lifestyle information can be added to customer records before the actual load. Enhancing data usually entails combining multiple sources of data, and this data is often held in multiple databases on disparate platforms. Using a data manipulation tool makes coordinating all of these different sources much easier.
With growing recognition of the risk that poor data quality poses to the business, many organizations are taking the smart step of incorporating data profiling and cleansing into broader data integration initiatives, often multisource integration into a warehouse or migration from legacy systems into a modern, Web-based application. Data profiling is a key first step that assesses the content and structure of data—an information reconnaissance mission that is prerequisite to thorough cleansing and reconciliation across multiple sources. Ideally, data quality is viewed not as a collateral project, but as integral to a broader integration initiative with appropriate scoping, tools, and resources allocated to help ensure its success.
Back to Table of Contents