The True Cost of Integration in the World of BI
The cost of data integration is clearly misunderstood by most building BI systems. You need to understand more than you think, and it takes more time than you expect.
By David S. Linthicum, SVP Cloud Technology Partner
Loraine Lawson has done a tremendous job covering the data integration space for many years. I recently happened to catch one of her posts that resonated with me, and it should with you, too, if you’re looking to implement systems in the world of BI.
According to Loraine, “When it comes to the cost of a BI deployment, it’s not the software that will get you; it’s the miscellany -- the miscellaneous integration work, in particular.” She is referring to Forrester analyst Boris Evelson, who focuses on application development and delivery, who recently published a report that explored how to estimate the cost of a BI deployment.
Evelson applied the 80/20 rule in a previous post about BI costs, stating that the initial design and build of the data integration should be about 80 percent of your start-up costs. This, compared with the reporting and analytics tasks, which should be about 20 percent. Considering my background in data integration, I did not find that surprising, but perhaps you did.
Indeed, it’s logical that integration is -- and will be -- a significant cost for any BI project. The data integration hassles are legendary and include bringing together all relevant data from various operational systems not designed to feed BI systems. “When you look at ongoing costs, though, the roles reverse, making data integration 20 percent of the costs versus reporting and analytics.”
Why so expensive? Well, it’s data integration. It’s hard, it’s complex, it’s perhaps one of the most difficult jobs in the world of BI, and it’s often unsung. However, it’s also critical. Indeed, the ability to bring in the right data on a timely basis is directly related to the value that the BI system will bring to the business -- more so than the types of analytics it’s looking to drive. It’s the old “garbage-in-garbage-out” concept.
The problems that most professionals who leverage data integration for BI need to solve include:
- The ability to connect to different operational data stores using whatever interface is provided, if any
- The ability to consume the data at the volumes required, making sure not to alter the quality or content of the data
- The ability to deal with various semantics and structures, and alter the semantics and the structures as the information flows from the source operational data store to the BI target
- The ability to deal with exceptions and manage each as to not stop the flow of data
- The ability to maintain data governance and data security, making sure that all laws and policies are followed
- The ability to alter the flow of the data as needed; this includes adding and deleting systems as required by the analytical engine
Thus, the initial approach when creating a BI system should be to understand where the information will come from to support the required analytics. This includes creating a data integration strategy along with selecting some data integration technology to solve the problem.
Although many people believe that this is a rather simple problem to solve, one that just requires the selection and implementation of any number of data integration engines, nothing could be further from the truth. This requires some careful planning, which includes an understanding of all of the data in the source systems and what the data means.
Many people approach BI differently, but I typically take a data-driven approach. This means dealing with the information at the source, in terms of understanding. What does customer information mean in one system versus another system? What about the differences in the way the data is structured? What about compliance issues? All of these concepts (and more) should be addressed.
Once we understand what data is present and how the data can be externalized to the BI system, we have our starting point. For the best results, it’s helpful to understand your data as abstracted data -- meaning you should use a more solid and well-defined structure on top of the physical database. This may exist within the data integration technology or perhaps the BI tool(s), or even within your own designs.
Data latency issues must also be addressed, including how often the data will flow from the source system to the analytics engine or even to intermediary storage. Of course, everyone wants something approaching real-time. However, that is not always possible, considering the limits of operational data sources. Indeed, several times a day, week, or month are typical limitations you’ll experience. The trick is to understand those up front and build the BI system around them.
The cost of data integration in the world of BI is not fully understood by most who attempt BI. The good news is that we are beginning to understand the role and the true cost of data integration better than we did in years prior.
David S. Linthicum is a big data and cloud computing expert and consultant. He is the author or co-author of 13 books on computing, including Enterprise Application Integration (Addison Wesley). You can contact the author at www.davidlinthicum.com.