Q&A with the Experts
A business intelligence or data warehouse implementation can be a formidable undertaking. In these pages, leading business intelligence and data warehousing solution providers share their answers to the questions they hear often from industry professionals. Mark Hammond, an independent consultant, provides his analyst viewpoint to each Q&A.
Business Objects, an SAP company
What are the key trends affecting business users, and how will Business Objects, an SAP company, empower you to respond?
Three trends are transforming the business user environment.
- Companies need to make decisions based on data from inside and outside the enterprise and from structured and unstructured sources.
- Users need better tools and applications that support collaborative decision making.
- Companies are seeking competitive advantage by extending their business networks with partners, suppliers, and customers.
Today, Business Objects provides the market-leading BI platform and tools that unlock trusted information and enable business insight, performance management, and financial management—independent of the underlying business applications and data stores. While maintaining this portfolio, it will extend its applications offering to help business users, teams, and companies enhance collaboration through networks.
Business users are being prodded awayfrom reliance on spreadsheets towards moresophisticated BI tools that better support collaborativedecision making and near-real-timeinteraction with internal and external dataand partners. With 44,000 global customers,Business Objects is a leading option for transitioningfrom spreadsheet environments; thequestion is how SAP’s $6.8 billion acquisitionof Business Objects influences the installedbase and future adoption. A TDWI survey,“BI Solutions for SAP,” found 46 percent ofrespondents used Business Objects, versus37 percent for Cognos (IBM) and 30 percentfor Hyperion (Oracle). The ingredients arein place for an end-to-end SAP/BusinessObjects platform, but as with any acquisition,integration will be key.
Do any solution providers have a complete integration platform that would cover all aspects of data integration?
Data integration has evolved from fundamentallytwo architectures: batch-basedor ETL, and message-based or EAI. ETLvendors have focused on more complextransformations and processing sets ofrecords quickly. EAI vendors have emphasizedindividual transactions or messages,more simplistic transformations, and realtimedata movement. While the tools havebegun to converge, there is still a substantialfunctionality difference between thetwo technologies. Most ETL providers,while purporting to have real-time capabilities,don’t have as robust a messagingarchitecture as EAI vendors. Conversely,while most EAI solutions have data transformationcapabilities, they don’t havenearly as complete a set of functionsor the volume processing capabilities ofETL. Most complex data integrationenvironments will find a use for botharchitectures.
So far, no mature one-size-fits-all integrationplatform has emerged. Large enterprisesinstead have tended to take advantage ofcapabilities unique to ETL and to EAI withdiscrete deployments that answer tacticalbusiness needs—for instance, real-time EAIcapabilities in financial services and the highdata volume capacity of ETL for CRM. BothETL and EAI vendors have made strides ininteroperability between the technologies,and some large organizations with complexdata integration needs have successfullycustomized hybrid solutions that capitalizeon the best of both worlds. Look for furtherintegration of the integrators, so to speak, asdemands for enterprise integration continueto grow.
How does data governancedrive data integration and MDMprojects?
The policies and practices that form adata governance program provide thecore discipline and perspective neededfor successful data integration and MDMprograms. Any time data is integratedor consolidated, you need a set of uniformpolicies to guide this process. Datagovernance techniques and technologiesfacilitate the creation of these policies,which become the business rules thatgovern the consistency, accuracy, and reliabilityof corporate data.
Data governance is occasionally paid littlemore than lip service. That’s a recipe forfailure. For larger scale data integration andMDM projects, a clearly articulated datagovernance plan is essential to long-termsuccess. Start small and prepare to growyour data governance program as stakeholdershammer out common data definitionsand reconcile data, process, and politicalissues. A data governance initiative with aset budget, strong executive sponsorship,and tight collaboration between businessand IT can and should be in place to helporganizations make the most of data integrationand MDM.
How can DATAllegro be usedto augment my Teradatainvestment?
As batch windows for data loads grow smaller and smaller, Teradata customers are running out of bandwidth to perform aggregations of data for business intelligence reporting. DATAllegro provides a suite of utilities to import data from Teradata. The DATAllegro appliance can then be used to run the aggregations and export the data back into Teradata.
A key component of the Teradata Utilities Suite is the ability to directly load Teradata’s binary export file format. The atomic-level data is first exported as a binary file from Teradata. Then the data is aggregated into “summary tables” using DATAllegro’s high-speed appliance. Finally, the aggregated tables are exported and loaded back into Teradata.
DATAllegro fired a salvo in the ongoingdata warehousing appliance war with itsFebruary 2008 announcement of a newset of utilities to migrate from or augmentTeradata systems. A month earlier, a reportfrom Ventana Research recommended thatcustomers considering deployment of datawarehouse appliances rather than increasingthe number of their Teradata systemsshould look carefully at the forthcomingTeradata 12.0, with faster batch loading andquery performance. For customers, it’s allgood news—competition is helping to drivedown price points, spurring innovation, andincreasing choice and flexibility.
We have several datawarehouses built on variousplatforms. How do we achievetransparency across our largeand diverse data sets?
Having multiple platforms with more than one architecture is a common situation. One approach is to bring data together and rationalize on one platform. Another is to integrate it before the application layer. Both are costly in terms of hardware acquisition, implementation, and infrastructure maintenance and can be very disruptive.
An alternative is to create a single, but virtual, pool of data where applications can access data, regardless of the source platform, in the format to which they’re accustomed. This level of transparency is achieved at the platform’s federation layer without disrupting interfaces between applications, database servers, and the physical data.
Federation, or enterprise information integration(EII), is an increasingly common wayto leverage data not only in multiple andheterogeneous data warehouses, but in datamarts, relational databases, and applicationsas well. Federation offers the advantage ofquick hit successes and enabling businessusers to query independent data sourceswithout the data actually being moved. Onthe downside, this “loosely coupled” datawarehouse architecture is not well suited forcomputationally intensive queries. For someorganizations, a hybrid approach that utilizesboth federation and a single enterprise datawarehouse will deliver the greatest bang forthe buck.
What impact does real-timereporting have on existingsystems?
The impact depends on the businessneed. For transactional level reporting,minor changes may have to be made tothe warehouse schema to support transactionaldetails from operating systems.If the need is for frequent reporting onaggregates, the frequency and its impacton ETL jobs will have to be determined.The amount of data may have an impacton operating systems if more frequentextracts are required. Understanding theimpact will help make the decision ofleveraging changed data capture technologiesthat can mitigate the risk toproduction environments.
The demand for real-time information hasgiven rise to operational BI and introducedvexing questions that organizations mustaddress. The first technical question thatan organization needs to address is whetherto use a data warehousing architecture todeliver just-in-time data or bypass it altogether.Without a warehouse, organizationsneed to be careful to avoid generating conflictingdata sets. Operational BI alternativesto the warehouse approach include federatedquery models, event-driven analyticengines, integration with a message bus,and other techniques. Interestingly, a TDWIsurvey found 51 percent of respondents runningboth operational and analytic BI in thesame environment.
What is the most effective andefficient strategy for findingduplicates in a large customerdatabase?
The most successful way of finding a name match in a database is, first, to perform a search on an index built from name alone, thus building a candidate list of possible matches. Then refine, rank, or select the matches in that candidate list, based on other identification data.
The more of the name used in the key, and the greater the number of keys built per name, the greater the variety of search/match strategies that can be supported.
In large-scale systems, the choice and sophistication of the search/match strategy is consequential to performance demands, risk of missing critical data, need to avoid duplication of data, and the volume of data under indexing.
This approach requires that the databasehave a customer name index. Luckily, databaseadministrators usually create such anindex in databases that manage customerdata. However, it might be preferable tocreate a specialized index just for customerlookups and matches, to give these fasterperformance. Some vendor products fordata quality or identity searching can automaticallycreate and maintain specializedindices which—unlike an index in a standarddatabase—can cope with the misspellings,typos, and other quality problems typical ofcustomer data.
What’s the basic differencebetween analytical BI systemsand operational BI systems?
Analytical BI systems generally accessa data warehouse. They give users anexcellent view of past business events andentities but not of current business processes,which are ongoing. Analytical BIapplications rely on extract, transform,and load (ETL) tools to keep a datawarehouse current, perhaps once a day oronce a week. Operational business intelligencesystems, by contrast, give users areal-time view of business events as theyoccur. These BI applications generallyobtain information from an automatedworkflow process or directly from productionsystems.
Another notable difference is the application and functional areas to which analytic BI and operational BI are applied. So far, operational BI has had the greatest appeal for time-sensitive, mission-critical systems such as fraud detection and supply chain, with negligible value for, say, HR. Operational BI adoption is steadily growing. A TDWI survey in mid-2007 found that 53 percent of respondents reported that their organizations were doing operational BI with intraday data delivery, though only 16 percent characterized their implementations as mature. Expect both adoption and maturity to increase as organizations take advantage of the real-time capabilities of operational BI.
What should I look for whenselecting an MDM vendor?
Look for a vendor focused on MDM. Your needs will evolve over time, so you need a vendor that will grow with you and whose product roadmap won’t get diluted by other products.
Look for a vendor that develops its core capabilities. As your needs evolve, you need a vendor that has control over its roadmap to ensure your future needs are supported.
Finally, look for a vendor focused on quick time to value. The longer your project incurs costs with no benefits, the less likely it is that you will recover your investment or obtain future executive support.
Organizations will do well to assess both howwell the solution addresses today’s tacticalchallenges and aligns with longer-term strategicneeds. As MDM is fundamentally a dataintegration practice, assess the compatibilityof an MDM solution with your existing ETL,EII, EAI, or other integration infrastructure.Look for cost efficiencies in repurposing yourdata integration technologies and expertise.From a strategic perspective, recognize thatMDM is still in its early phases and is likelyto evolve beyond customers and productsto include such entities as employees, suppliers,and partners. Ensuring an MDMvendor’s roadmap aligns with your strategicobjectives is rarely easy but is a key factor inrealizing long-term value.
Why use advanced datavisualizations to display businessintelligence information?
Advanced visualizations express data in more meaningful ways than is possible with traditional grids and graphs. Two important facets in visualizations improve data comprehension:
- Information presentation. The size, color, and animation of visualizations are based on the data, making it possible to quickly spot exceptions and anomalies.
- Information density. Large reports are collapsed into a single visualization, providing a bird’s-eye view of the entire data landscape without the need to scroll through the data.
Advanced visualizations let users make more informed decisions by providing timely, relevant, and accurate information to answer their business questions.
Fundamentally, data visualization is nothingnew. But in the past year or two, this fieldhas hit its stride as vendors successfullymarried rich visualization capabilities withpractical tools and technologies (e.g., dashboardsand event processing). The best datavisualization solutions excel at both style andsubstance to give users an engaging visualmedium atop a strong analytics platformwith deep reach into disparate data, rangingfrom desktop sources to data warehouses.Done right, data visualization can encouragestrong user adoption and deliver genuinebusiness insights by transforming slate-graynumeric data into a lively visual environmentthat highlights outliers and enables drillthroughto generate business answers.
What is the advantage of usinga different data warehousesolution for event data over mycurrent solution?
While your current data warehouse solutionis more than capable of storing eventdata, the biggest advantage is cost. Performanceimprovements are worth noting,but most executives are more interested incapital expenditures.
Since the SenSage Event DataWarehouse uses a columnar database forstorage, event data is compressed at a veryhigh rate. This reduces the storage requirementsby an order of magnitude andtherefore reduces storage capital expenditures.Additionally, the CPU requiredto search the reduced amount of datafurther reduces costs. Finally, softwarelicensing from SenSage is typically lessthan traditional data warehouse solutionswhile containing ETL and analytic toolsrequired for event data.
A dedicated event data warehouse can makesense for large organizations with a needto closely manage and analyze log files andother event data to strengthen security andregulatory compliance. Banking, telco, paymentprocessing, and healthcare companiesare among those increasingly deployingsecurity information and event management(SIEM) solutions and dedicated eventwarehouses to manage terabytes of eventdata, including log information generatedby network equipment. Prospective buyersshould recognize that SIEM solutions can becomplex and be prepared to scrutinize anarray of solutions in this rapidly expandingsector before selecting one that best meetstheir needs.
What’s the difference betweentechnology-focused andbusiness-focused MDM starts?
Technology-focused MDM starts advocatethat companies start with a singledata type (such as customer), implementMDM using a small footprint(such as registry style), or deploy MDMsolely with a data warehouse to improvereporting. These approaches may limitthe scope and potential return on investment(ROI) from MDM, since they donot attempt to solve the most pressingand difficult business problems. MDMis more precisely about solving businessproblems by efficiently managing masterdata that is critical to a company’s businessoperations. Consequently, a businessfocusedapproach can provide a completeMDM solution that addresses thespecific business problem and providestangible business value and significantROI in a short-term timeframe.
A technology-focused approach to MDMcan make sense when an organization issuddenly confronted by large volumes ofinconsistent and redundant customer, product,or other master data that compromisesbusiness performance. A merger or acquisition,for instance, can result in master datainflux of crisis proportions. What’s importantis that any tactical effort to attack theproblem at a technology level aligns with abroader, business-focused MDM strategy.Technology- and business-focused MDMcan and should evolve in lockstep in anincremental, multi-year evolution towardsthe common goal of a single, trusted set ofmaster data that delivers measurable businessvalue.
How can I speed querying in adata warehouse?
Aggregates are the best way to speedwarehouse queries. A query answeredfrom base-level data can take hours andinvolve millions of data records andmillions of calculations. With precalculatedaggregates, the same query canbe answered in seconds with just a fewrecords and calculations.
High-performance aggregation simplifiesthe creation, administration, andexecution of aggregation jobs. It summarizesdata much faster than otheraggregation methods such as C programs,SQL statements, or third-party multipurposedata warehouse packages. Itprovides the necessary flexibility to selectthe best aggregates for optimizing queryperformance.
With data volumes growing inexorably, datawarehouse query performance is becominga significant issue that can derail BIeffectiveness. Increasing user populations,complex and concurrent queries,and demand for near-real-time informationcomplicate the challenge. Organizations in aperformance pinch need to closely examineroot causes in their DW environment andselect the solution, or combination of solutions,that best addresses their situation.Six common approaches are (1) brute force(more hardware and software licenses), (2)incremental tuning, (3) migrate to a new DWplatform, (4) use memory caching or DWappliances, (5) aggregate data subsets forrapid retrieval, and (6) index data to accelerateretrieval from base tables.
What are the main benefits ofopen source data integration?
Open source data integration has maturedand is now technically on par with, orsuperior to, traditional, proprietary solutions.Open source brings to the table adifferent business model. Open sourcerequires no initial investment so projectscan get started easily, and carries noper-source/target or per-CPU costs sodeployments are not restricted by fundingissues. Users only pay for the supportthey use. From the openness perspective,it reduces the dependency of the user, andadditional connectors and features can bebuilt easily into the product. However,open source is not free. Technical support,training, and costs of development mustbe considered, but it’s a lot less expensivethan the alternatives.
Open source data integration software isproving attractive for departmental or tacticaldeployments, resource-constrainedgovernment, educational, and nonprofitorganizations, and ISVs and Web 2.0 playersusing LAMP (Linux, Apache, MySQL,PHP). From a cost perspective, it’s hard tobeat free downloads, low-cost support, anda freely available community knowledgebase. While it’s steadily maturing, opensource is still a ways from matching the bigdata integration vendors on such capabilitiesas integrated data profiling and quality,changed data capture, high availability,and robust metadata management. Mostorganizations running data integration inmission-critical systems will stick with thestatus quo for now.