Q&A with the Experts
A business intelligence or data warehouse implementation can be a formidable undertaking. In these pages, leading business intelligence and data warehousing solution providers share their answers to the questions they hear often from industry professionals. Mark Hammond, an independent consultant, provides his analyst viewpoint to each Q&A.
How can I accomplish true spreadsheet automation?
Eliminate common problems at their source so that the data is correct, complete, and secure, the formulas and design structure of the document are sturdy and accurate, and the process is repeatable and reliable. Focus on:
- The data. Eliminate manual entry, copy and paste, and .CSV importing as much as possible; instead, manage queries through an enterprise metadata layer to ensure accuracy and security.
- The design. Use a â€œblueprint-basedâ€ design approach and tools specifically meant to improve design practices.
- The distribution. Automate with a server-centric architecture that manufactures pure Excel documents and does not require an Office plug-in to pull data in order to refresh the document.
Despite glitzier BI applications, spreadsheets remain a highly popular data analysis tool. Loyal users have been willing to tolerate the risk of errors introduced through manual data entry, as well as the time required to load and manipulate data. The business as a whole, of course, is more risk-averse and frowns on unrepeatable processes, redundancy, and duplication of spreadmarts. Spreadsheet automation can offer a significant improvement by centralizing spreadsheet creation and distribution in a consistent framework. Adopters should take pains to implement appropriate access and security mechanisms, and some organizations will want to examine spreadsheet automation from a compliance perspective to ensure verifiable data integrity.
Our company has seen sizable growth in our Internet sales channel over the past few years, but we know very little about thecustomers who use or don't use this channel. How can we optimize even more sales and profitability through this channel?
Integrate customer segmentation data from a third-party source with your transactional data from your Web site to target customers who are not yet using this channel or who abandon a purchase before completion. An online ticketbroker recently did this and gained greater understanding of customer buying patterns. Now it can anticipate demand anddeploy changes to its marketplace in near real time to accommodate market-changing events.
Third-party demographic data that matches the identity of customers who have purchased from your Web site can provide a huge advantage in building out profiles by age, gender, location, and buying patterns. Look to capture as much information as possible about buyers during your e-commerce transaction processes (short, unobtrusive surveys can be effective). Close analysis of Web site traffic patterns can pay off with revealing trends showing how surfers got to your site, and where they go next. Over time, you can build a foundation of segmented customer information that is ripe for BI analysisto determine sales channel effectiveness and customer preferences and to support targeted marketing campaigns with special offers, e-newsletters, and the like.
We have an existing data warehouse that was built to support the business, but usage is lower than expected. How can we foster adoption?
The answer is simple, but the challenge is not. Give the business what it needs, not what you think it needs, or what they ask for the first time out. It is difficult for users to articulate exactly what they want before they see it for the first time. Low adoption or waning adoption is not uncommon once the initial enthusiasm wears off and the limitations become apparent. Revisiting the business purpose and applicability frequently with the users is key to success. Continually ask the question, "What purpose does the data warehouse serve for the business, and how well is it serving that purpose?"
To increase the adoption and value from adata warehouse, it's essential to first understand why usage is lower than expected. Survey the user population on what they like and what they don't. Identify the pain points that users experience. Assemble a focus group that drives collaboration between business and IT. Training sessions provide an opportunity to hear user concerns and communicate the DW value proposition. Recognize that force-feeding new technologies often generates resistance and subpar adoption. If your users are reluctant to abandon Excel, look into Excel add-ons that interface with your data warehouse to help incrementally boost usage.
What's the best way to start a data governance program?
Any data governance program will cover three elements: people, policies, and technology. To begin any effort, companies should assess their level of maturity across each of those elements. For the "people" component, organizations can measure what level of support the governance effort has (executive, director) as well as the number of resources devoted to data stewardship. "Policies" refers to the number and complexity of business rules and controls that the company can implement within the IT environment. Finally, "technology" refers to the data profiling, data quality, and data monitoring capabilities available to enhance and maintain the value of corporate information.
The short answer: keep it simple! Start your data governance program by using a framework of the six basic questions of who, what, where, when, why, and how. This clearly defined structure can bring focus to your efforts and plainly delineate challenges, objectives, and road maps for a diverse group of stakeholders. Organizations need to be sensitive to the risk of a data governance program spiraling into unnecessary complexity and contradictory concepts from the start. By starting simply, you can build a readily understood foundation that progressively matures as participants master the nuances and details typical to data governance. Naturally, organizations will benefit by researching best practices and lessons learned from companies that have adopted data governance programs.
We would like to encrypt our data warehouse in order to be compliant with recent legislation, but we are concerned about performance. Is it possible to encrypt data but not affect data load and data query performance?
Absolutely. Many data warehouse administrators assume that data encryption is expensive, difficult to administer, and will create a significant degradation to performance. This is because most encryption schemes rely on software-based encryption, or they try to employ encryption of data "in flight." With hardware-based encryption for data "at rest," a dedicated encryption engine sits between the database and the disk storage used by each server. Since all data is decrypted before the database software starts processing it, all queries and loads work the same as before, with no significant impact on performance.
Identifying and implementing an ideal encryption solution can be nearly as complex as encrypted data itself. The choices typically involve trade-offs in price, performance, and degrees of security. Hardware-based solutions can sidestep the load and query performance penalty inevitable with database-level encryption by offloading encryption to a dedicated machine; however, some of these systems can be expensive and introduce an additional layer of interfaces to manage. Despite the management overhead that both software- and hardware-based encryption can impose, encryption offers the ultimate in data security and is attracting increased attention from organizations with a need to protect sensitive data.
GIS seems to be deployed departmentally; can it be a strategic enterprise resource?
ESRI has developed its products based on open standards, ensuring a high level of interoperability across platforms, databases, development languages, and applications. ESRI's GIS applications are engineered to be an integral part of an organization's strategic IT infrastructure. Our use of Web standards such as XML, SOAP, UDDI, and WSDL has led to successful enterprise SOA implementations where specific location services such as geocoding are being used throughout the organization. Our support of standard developer environments including VB,C++, .NET, and Java has led to integrations with all major BI platforms.
To date, most GIS deployments in mainstream business (outside of specializedutilities and energy systems, for instance) have focused on specific issues such as store location, customer clustering, and logistics. Lately, however, spatial data is increasingly featured in BI dashboards and provides business users with a visual meansof understanding and analyzing information in a location-based context. Giving users an opportunity to explore the capabilities of GIS in dashboards is a good way to further adoption across the enterprise. Partnerships between GIS and BI vendors have made spatial analysis an option for many organizations, and, interestingly, Web 2.0 sites such as frappr.com are helping to put spatial data more squarely on the map.
What exactly is ad hoc query and reporting and how can I use it to improve adoption of my BI solution?
Ad hoc tools allow users to spontaneously browse, slice and dice, and discover aggregate intelligence and detail evidence. This contrasts with more static BI models where subscribers receive reports predetermined by power users.
To provide such an application, relevant datastores (e.g., a data warehouse or internal/external sources) are connected to a data cleansing, text analytics, and high-performance index and query engine. On this platform, one or more ad hoc entry points can be provided (keyword querying, dynamically built drill-down trees, etc.) to access real-time aggregates and distribution charts.
With an ad hoc reporting tool, individuals can view the facts and figures relevant to their tasks, thereby increasing their performance and effectiveness. The combined intuitive simplicity and direct feedback are proven to dramatically increase user adoption of BI tools.
Theoretically, ad hoc query and reporting empowers enterprising business users to explore data by drilling down, slicing and dicing, and conducting what-if analysis. The dividends can be huge if a user discovers an unrecognized relationship that bears on efficiency or the bottom line. On the other hand, organizations need to recognize that extending ad hoc query across a broad user base can mean that some users will waste hours running queries of little business value, sometimes merely to satisfy intellectual curiosity. Organizations will profit by a judicious allocation of ad hoc query capabilities and by monitoring the usage and overall effectiveness of these analytic tools.
How important is metadata for successful data warehousing projects?
Extremely important. A centralized metadata repository for information integration, one that supports analysis and active sharing of metadata artifacts across a full range of integration activities and user roles, increases the trust and understanding of the information. These artifacts include: technical metadata about the various sources of information; business metadata that describes the business meaning and usage of information; and operational metadata that describes what happens within the integration process. As companies seek to gain a more comprehensive viewpoint of their information, they need a way to keep track of how these different sources are related and how information is being pulled together and presented in reporting and analysis tools.
The need for regulatory compliance is a high-profile business driver that is prompting organizations to pay more attention to the management of metadata in data warehousing and operational systems. Leading metadata management tools feature robust lineage tracking, change analysis, and auditability features that can help organizations better account for the iterations and lifecycle of enterprise information at a granular metadata level. However, organizations need to be wary of the risk of architecting metadata silos and should strive towards standards-based, extensible metadata models that reduce, rather than add to, complexity in the data management infrastructure.
What is the role of data integration for enterprise data warehousing?
A critical consideration for deploying enterprise data warehousing is creating an architecture that can support the data requirements of all types of applications. This is a departure from traditional thinking, in which the data warehouse was built to support a relatively limited set of reports and analytics, and was tightly coupled to the business intelligence application.
In contrast, the enterprise data warehousing (EDW) process must function atop a data integration platform that powers the ongoing lifecycle of data access, discovery, quality, integration, and delivery via data services that can support multiple applications as part of a serviceoriented architecture (SOA).
Because it spans multiple divisions andgeographic locations and is aimed at improving enterprise performance, enterprise data warehousing puts an even greater premium on the quality, consistency, and accuracy of data. As a result, the data integration processes that support EDW are arguably the most important element in the EDW equation. Perfecting data integration technologies and processes for EDW can be a significant challenge given the breadth and complexity of enterprise data and the range of disparate source systems. However, organizations that sidestep this critical element introduce the significant risk of the "garbage in, garbage out" phenomenon and can undermine the value that EDW offers.
Use of business intelligence and analytics is not new. How have customer' requirements for these decision-support solutions changed in the last few years?
The change occurs in two areas: people and data. Organizations access and analyze less than 20 percent of the data they collect from their business processes and systems. The nontechnical managers, operational employees, business partners, and customers demand it, and we have an opportunity to address the demands of masses of users. The interface for BI is no longer a complex tool or a manually generated spreadsheet; instead, it takes the form of custom applications, dashboards, operational scorecards, maps, and data visualization elements. Data architectures for BI must go beyond the traditional data warehouse and incorporate operational and third-party data, as well as new types of data.
Over the past few years, broadened BI adoption has given organizations a good read on the insights and value that may be generated by an effective BI deployment. Now customers are looking to build on success in several key areas, including: 1) enhancing the business relevance of data; 2) enabling more proactive decision making; 3) tightening real-time BI integration with business processes; and 4) improving end-user trust in data accuracy. To achieve those objectives, customers are looking for standards-based solutions that integrate in an SOA and reduce the complexity of siloed BI deployments, transitioning towards holistic enterprise BI systems.
How does Microsoft plan to democratize performance management?
Traditional inhibitors to the broad adoption of performance management have been high costs associated with implementations, complex tools and user interfaces that require costly and time-consuming training for employees, and confusion over disparate systems and tools for the various capabilities, including planning, budgeting, forecasting, analytics and scorecarding. Performance management tools and processes traditionally have also been siloed or stovepiped, meaning that they sit outside the day-to-day business processes of most employees. Office PerformancePoint Server 2007 was designed to address and eliminate these inhibitors, enabling performance management across the enterprise, not just for the CFO and financial analyst.
Giving lower-level employees tools that encourage contributions to enterprise performance is a sound idea. The concept of democratizing performance management, a key theme in Microsoft's rollout of its PerformancePoint CPM solution, will prompt customer organizations to ask what performance management means for a CFO versus a call center rep versus other user roles. Organizations that extend greater performance management to lower-level employees will need to safeguard information access to ensure security of sensitive sales, productivity, and other performance-related information.
What's the best way for executives to consume all of the data needed to manage their businesses?
Every day, executives and managers are faced with the challenge of reviewing volumes of data to make important business decisions. Traditionally, managers would have to find, run, and review dozens of reports and piece together similar data among those reports. With the new breed of dynamic dashboards, information from a dozen or more reports can be compressed into one expressive dashboard. Intuitive on-dashboard controls allow users to flip rapidly through interrelated information views without changing focus or context. The dashboard buttons and selectors are easy to use, so users don't need training. Managers can now find all of the interrelated data in a single dashboard.
Dashboards and scorecards have emerged as preferred ways for many executives to view performance information and quickly find, analyze, and explore the data they need to perform their jobs. Which solution is selected will depend on business needs, and it's worthwhile to understand the subtle distinction between the two. Dashboards are geared to inform users what the organization is doing. Scorecards emphasize how well they are doing it by tracking progress against metrics. Clarifying business objectives for a dashboard (management) or scorecard (monitoring) at the start will help your organization select and deploy the optimal solution for executives to consume and react to business information.
What can my organization gain by integrating business intelligence technologies with service-oriented architectures (SOA)?
First, it can gain an understanding of the business performance impact of business processes. Improving business results requires detailed business insight. Without enough insight into the impact and contribution of a business process on overall performance and goals, it is difficult to determine what changes should be made to improve business performance.
Second, facilitate business users taking actions as a result of insight. When business users gain insights that flag a business performance issue, the resulting corrective actions often require invoking a business process. These processes could now be started directly from the BI dashboards used to analyze performance, thereby improving efficiency and agility.
Business intelligence is a natural fit for the service-oriented architecture (SOA). Properly implemented, an SOA can provide a loosely coupled and extensible environment in which previously siloed BI applications can interoperate an important step toward realizing the ideal of enterprise data warehousing and enterprise performance management. Query and reporting can be rendered as services (executing in concert with requisite data integration services) that readily extend the functional reach of analytics across a broader number of source systems. Start with small, quick-hit projects that deliver high value and build a foundation of modular components and best practices that can be incrementally extended across your infrastructure.
What can BI vendors do to become more customer-centric?
Given the increasing adoption and strategicimportance of BI, customers arenow expecting greater transparency fromtheir BI vendors. In many instances, customers and prospects want access to product documentation, product road maps,and unfiltered feedback from other users, or even trial software for products that they have not purchased. Customers shouldn't be afraid to ask for any of these, and vendors shouldn't hesitate to provide them. BI licensing is another area that needs more transparency. Customers hate repurchasing products or paying enablement fees for products they thought they already had paid for.
Steady growth in the availability and functionalityof open source BI software is beginning to drive greater customer-centricityin the BI market. Traditional BI vendorsare having to rethink licensing terms, freetrials, upgrade cycles, and source documentationas the number of open source BI providers grows and more enterprises explore open source BI as a lower-cost, quick-hit alternative to traditional solutions. On the technology side, commercial vendors are challenged to answer increased customer demand for highly flexible, SOA-ready applications based on open standards that can be readily embedded in existing systems. As open source BI continues to evolve, transparency and flexibility in meeting customer expectations will be key.
Why is it important to empower business users to own and manage CPM solutions?
Businesses change continuously. IT doesn't understand nuances of thechanges, and the business users can't implement them. As a result, systems are months behind business requirements. It is important that business users are empowered to react to change and implement the changes on their own. This is most critical for corporate performance management (CPM) solutions. With budgeting or variance analysis reporting, or a strategy review, needs of users dynamic and often ad hoc; it is essential that business users model and implement changes. CPM offerings enable the business user to own, model, and maintain the solutions. This, along with the advantage of integration with ERP and built-in governance, risk, and compliance (GRC), can help business users implement changes without depending on IT.
The desirability of business owners using and managing corporate performance management solutions (CPM) reflects a broader trend towards greater business-side involvement in and management of BI systems. Organizations are gravitating toward a more user-centric BI model and away from the IT-led, systems-oriented data management practices of the past. Achieving this ideal requires greater collaboration between business and IT, with tech-savvy business sponsors who can bridge the gap between corporate objectives and supporting technology. CPM systems are an obvious target for business ownership because they are near the top of the BI food chain and are expected to enable game-changing insightsby putting useful performance management information in the hands of executives and managers.
How can I extract intelligence from textual data?
Text is very different from the conventional, structured data held in customer data warehouses, but it is rich in context. You can mine text (for example, call center notes or responses to survey questions) to extract concepts, the most important and relevant words and phrases. These concepts, along with information on how they are linked to each other and the sentiment (positive/ negative) associated with them, are effectively structured data. They can be merged with behavioral, descriptive, and attitudinal data to build a holistic customer view for predictive analytics.
Vendors have made significant strides in the sophistication and functionality of text analytics software in recent years. The "how" of text mining will vary by product, but the "why" is common across many organizations: unstructured and semistructured data is necessary for organizations that want to achieve a fully mature view of enterprise performance. TDWI research shows that organizations' use of unstructured data in data warehouses is on the rise, while vendors and data warehousing professionals are working to refine techniques and technologies needed to transform it into structures meaningful to a warehouse or to a reporting tool.
How can I increase user scalability in a distributed analytics environment?
A Sybase IQ multiplex configuration provides virtually unlimited user scalability. To increase the number of users, add a Sybase IQ node running on an additional server. Each node can support different users or manage different reporting or query workloads. Nodes can also partition the user base by service quality, so that higher performing nodes can support high value users or applications.
Multiplexing enables Sybase IQ to optimize workloads across multiple servers within a single environment. With one writer node and multiple reader nodes, adding additional applications and users to a database can be solved simply by adding reader nodes. The impact is minimal, as nodes do not depend upon or interfere with each other.
Grid computing environments are gaining attention as viable platforms to enhance scalability for large-scale data warehousing as RDBMS, BI, and data integration vendors increasingly add native functionality to take advantage of multi-node clusters. These platforms will feature load-balancing algorithms to distribute queries across nodes based on resource utilization (e.g., CPU usage, memory, disk speed), and enable administrators to prioritize task execution based on user role, time of day, short- and long-running queries, and other criteria. Naturally, diligent trial-and-error application of performance tuning techniques, partitioning, and parallel processing can also pay significant scalability dividends. Apart from scalability, grid systems offer an opportunity to build in high availability and failover to better ensure system continuity.
As data volumes continue to grow, uploading data to the data warehouse has become much too time-consuming. Is there away to speed this process?
Uploading data can be simplified by using a data management solution with changed data capture (CDC). Also known as delta processing, the technique compares old and new files to identify changes. Only the changed data is updated in the master file, reducing the amount of data uploaded to a data warehouse, thus improving performance.
In a simple example, create a join using yesterday's transaction file as one source and today's transaction file as the other. Next, define the join key from yesterday's data to be the entire record, and define the join key from today's transaction file to be today's entire record. Keep matched records, unmatched records from yesterday's data, and unmatched records from today's data. Create three conditions (Added, Deleted, and Updated) using a tool such as Syncsort's DMExpress. Finally, add a target file and partition the data by these three conditions, creating one output file for each condition.
Though not new, changed data capture (CDC) has found its sweet spot with the advent of the real-time data warehouse. Available as a built-in or optional component in leading ETL tools, CDC can drastically reduce the volume of data to be moved into a warehouse. Reducing data loads by 85 percent can reduce load times at a similar rate, making multiple intraday loads practical. The additional benefits of minimizing impact on operational applications, bandwidth, and I/O make CDC worth considering by any organization that wants to freshen its warehouse data to enhance its business value.
How do we enhance the level of end-user confidence in the data warehouse?
To increase end users' confidence in the data warehouse, it is imperative to build a transparent audit process using business controls for the data integration. Business controls are essentially sets of data elements at the higher levels of grain that determine the business performance, e.g., sales revenue or number of new business acquired. It is essential to establish an automated balancing control process through alternate data flows and to establish a baseline for correctness. Approaches such as general ledger balancing for financial metrics also help.
It's essential first to gauge the degree of end-user confidence in a data warehouse, and to pinpoint specific issues that may be undermining user confidence (and causing underutilization of the system). Depending on the gravity of the situation, this effort could be either an informal survey and follow-up, or a structured program that spans multiple divisions and extends for several months. By assessing end-user confidence in such areas as data quality, data timeliness, ease of use, and query/reporting effectiveness, organizations can build a road map to attack problem areas and continually enhance the business value of DW systems.