Data Warehousing on a Shoestring Budget (Part 3 of 3)
You can implement data warehouse solutions on a small budget by focusing on system, database, ETL, and reporting technologies that work in concert with requirements gathering, development, testing, and training. In the final part of our series, we look at how to choose the right data warehouse project.
by Nathan Rawling
Making informed decisions about hardware, software, and staffing, as detailed in the prior two parts of this series, are important facets within every data warehousing project. However, a final (and critical) aspect of completing a data warehouse on a shoestring budget is choosing the right project for your organization.
There are data warehouse requirements that are beyond simple limits -- a project with several source systems on different mainframe platforms; data with multiple languages and character sets; extreme availability requirements; and vast data volumes. These are just a few examples of such overwhelming requirements, particularly in combination, which make shoestring budgets impractical or unsatisfactory. For most organizations, though, there are several opportunities for relatively straightforward data warehousing projects that do not require a substantial investment to produce results.
An organization building a data warehouse on a limited budget must select projects that meet four requirements:
- Provide a clear benefit to the business
- Utilize a small number of source systems
- Work with source systems that are well understood
- Allow creation of straightforward data maps
The goal must be to build a data warehouse that will allow your organization to begin realizing the benefits of such an implementation with a minimum investment. By building slowly and experiencing a few small successes, organizations will be able to better justify making larger investments in data warehousing and pursuing future projects of larger scope.
When success factors are not considered and/or pursued, a common "horror story" of data warehousing implementations often emerges. The process seems to be endless, utilizing precious budgets and resources, without bringing any real benefits to your organization. Perhaps it is the result of scope creep, technical breakdowns, shifts in organizational direction, or simply communication failures. Regardless, addressing success factors and breaking down projects into smaller phases can help you avoid the "warehouse without end" scenario and allow for ready execution of projects in a timely, cost-effective manner. A careful look at the business benefits of the data warehousing project will help you to choose an appropriate project.
A data warehouse is rarely an end in itself. It is a tool to help your organization handle data more effectively, answer questions, analyze data, and achieve success. A thriving data warehousing project must bring data together or reorganize data so it can be used in new and innovative ways.
Consider the real life importance of the data warehouse. If customers are stored in one system, but sales are stored in another system, it can be difficult or impossible to report on sales for each customer or per channel. Similarly, if sales are stored separately from production costs, calculating the unit margins can pose a real challenge. Simply put, the goal of the data warehouse is to make an organization operate better.
Selecting Initial Projects
Finding the right data warehousing project to begin with is never easy. However, many common requests for a data warehouse can serve as potential starting points, such as financial reports, management dashboards, purchases of third-party data (such as market-share information), and cross-functional reporting. A major part of managing any data warehousing project is controlling the scope. These example projects all lend themselves to situations where the needs of the users can be separated into "must-have," "nice-to-have," and "future-capabilities" categories to isolate the immediate business needs from features with possibly limited returns. Focusing development on immediate needs make it possible to deliver a data warehouse solution with minimal scope creep on a tight budget.
Quarterly and biannual financial reports often bring together many sources into a spreadsheet. In addition to the sheer volume of manual effort and time required to complete such reporting requirements, the approach can introduce typographical errors and frequently, the visual appeal of the final product is unsatisfactory. Often, though, the most difficult challenge in transitioning this process to a data warehouse is collecting the business rules contained within the spreadsheets. If the required raw data is available in existing source systems, the potential savings can make such a project truly shine.
Certain financial reporting projects, can realize substantial benefits to the organization through simple consolidation functionality and available analytical tools, especially when the number of sources is limited (i.e., one budget system and one general ledger). Also, the potential savings from eliminating manual processes from the organization's account practice can justify data warehouse expenditures.
Dashboards pose a unique challenge. Typically, they summarize data at a high level from source data that is finely detailed. Sliding windows, fall-back criteria, and other rather difficult requirements can quickly complicate a dashboard view. Dashboard performance is typically a challenge as well. Running a dashboard directly from a live system may provide a popular real-time view, but to generate monthly, quarterly, or yearly statistics, the dashboard will have to scan enormous volumes of source data and aggregate it.
The visual appeal of dashboards is hard to deny, though. The key to maximum DW performance is for dashboard developers to gather detailed data into the warehouse, where it should be aggressively aggregated, rather than forcing the reporting suite to perform the aggregations in real-time. In some cases, the dashboard's source data might come from only one or two systems. However, those systems are transactional and not capable of producing answers rapidly enough to satisfy end users. In such situations, data warehousing techniques can quickly and efficiently show benefits. A report that in the past might have taken 45 minutes can be optimized to run in seconds just by creating high-level aggregate tables.
Third-Party Information/Unstructured Data ETL
Many organizations receive data from third parties. Whether from electronic orders, through electronic-data-interchange (EDI), purchased data, customer profiles, or other standardized data where the format is "take what we give you," the data from third parties is almost always produced in a structure convenient to the third party. This format is not likely to be ideal for every organization, which makes third-party data tricky to work with.
Unstructured data from documents such as reports, spreadsheets, Web sites, and other non-database sources can similarly hold useful information that is difficult to access in a consolidated way. Unstructured data ETL is a relatively new feature for traditional ETL tools, but it can provide a view into this repository of business information which was previously unavailable.
However, DW technologies can combine such data with internal systems so end users can make better, more-informed decisions. The combination of internal relational data with unstructured data and/or third-party data typically means building conforming dimensions, which is an ideal opportunity for a data warehouse.
Many organizations have data stored in silos by function. Your organization may, in fact, have a database of customer contacts used by your sales team, a scheduling system used by field technicians, and a general ledger application used by the financial team.
Unfortunately, such data silos do not reflect how a business truly operates. At some point, a call to a customer generates an order; a field technician is scheduled to install a system; and invoices are sent. How can an organization figure out the average revenue per sales call or the average cost per installation? Can an organization detect fraudulent transactions in any of the three silos without looking in the others?
A data warehouse can become a powerful tool in cross-functional reporting as it can help an organization answer these types of questions and provide a more coherent view for management.
It is certainly possible to build a data warehouse on a shoestring budget. Discipline when choosing technology, accounting for the human costs of data warehousing, and selecting the right project can help your organization reap the benefits of data warehousing without becoming financially overextended.
Each step in the data warehousing lifecycle allows you to balance requirements and costs. Moreover, taking a fresh and realistic approach to reviewing current organizational capabilities can help you avoid spending on unnecessary technology features or selecting projects that simply are too complicated to justify until more resources are available.
Data warehousing can be an outstanding solution for organizations seeking to maximize its information. It is, however, a process that should be undertaken with a full understanding of what is needed and how organizational demands can be reconciled with the bottom line.
Nathan Rawling is a data integration/business intelligence consultant with ISA Consulting, a systems integration firm. You can contact him at firstname.lastname@example.org.