How Operational and Informational Data Management Have Converged
Three examples demonstrate how insights obtained through incremental and real-time analytics can enhance strategic decisions and operational processes.
By Jesús Barrasa, Ph.D., Senior Solutions Architect, Denodo
For years there has been a clear divide, both technological and conceptual, between the operational and informational aspects of data management. This has long been imposed by the limitations of the technologies supporting each of these spaces and the data stack they enforced on data processing. This reality even shaped organizations by splitting data owners and subject matter experts from analysts. Business intelligence (BI) professionals had to live with the fact that pulling real-time data from transactional systems for analysis could not be done without significantly affecting operational performance. Similarly, transactional application designers had to accept that data analysis could not be carried out in time to feed relevant insights into operational processes.
However, the introduction of in-memory databases, specialized storage and massively parallel processing (MPP) computing are helping to overcome these limitations. In the transactional space, it is now possible to apply analytics to day-to-day business operations to make them more effective and, in turn, this knowledge-driven approach is increasingly grabbing attention in all industries. Similarly, in the informational space, enriching strategic decision-making with insights gleaned from the continuous analysis of "big" operational data is quickly becoming a reality.
Today, these technology and business drivers are operating an effective convergence between operational and informational spaces in data management. This is evident in the shortening of the data pipeline -- collection, analysis, decision, action -- to the point of making it become a fast and agile, fully automated continuum driven by models and algorithms. This dramatically shortens the time between when actionable knowledge is discovered and when relevant action can be taken. From an architectural perspective, this convergence is also evident in the collapse of the data stacks thanks to a remarkable reduction in the number of technology layers involved in processing data by removing unnecessary replications.
However, at the same time, the adoption of these enabling technologies is contributing to the creation of a much more dispersed and heterogeneous data management ecosystem, which does not necessarily go hand in hand with the elimination of the technical and organizational silos. One consequence is the requirement for some degree of simplification to abstract the business from this underlying complexity. In the data integration layer, this simplification is coming from the application of modern, lightweight integration approaches such as data virtualization.
The following three examples demonstrate how insights obtained through incremental and real-time analytics can be used to enhance strategic decisions but also be automatically translated into rules and then injected back into operational processes with significant benefits. In all scenarios, the convergence of operational and informational spaces mentioned earlier is apparent.
The first example revolves around dynamic pricing in e-commerce. Online retailers are increasingly using algorithmic approaches to implement repricing strategies in order to remain competitive. The algorithms driving the price fluctuations are typically based on intensive data munging and the application of pattern detection models on data retrieved in real or near-real time. The data can include intelligence from monitoring competitor prices, customer behaviors, a combination of the two, and the inclusion of other data streams.
This real-time analysis is not only adding value by helping to drive an overarching sales strategy, it can also help search engine marketing performance. For instance, real-time analysis can set rules to disable specific ads for a given product when priced higher than the competition, or reduce search keyword bids when the data suggests that a competitor is out of stock on a shared item. From a long-term perspective, the real-time view on ad performance enables the definition of more agile and effective marketing strategies.
Another use case is that of equipment failure prediction, failure impact analysis, and root cause analysis. In this example, consider the oil and gas industry (although the same ideas could be applied to manufacturers or telcos). Activities such as exploration and production generate large amounts of heterogeneous data from sensors, logistics, and business operations. The diversity of the data produced and its fragmentation makes the process of gaining actionable insights significantly challenging.
However, new technologies such as specialized big data stores or in-memory computing make it possible to construct key-component failure prediction models from "big" operational data such as drill-rig sensors and drill operations. Given the huge costs associated with non-productive time that results from drilling motor failure when drilling wells, models such as these are being used to build early warning systems, reduce costs, and minimize unplanned downtime. This is achieved by driving the definition of proactive and cost-effective equipment maintenance schedules and strategies that can outperform traditional or schedule-based corrective maintenance.
A recent study by Accenture across eight industries including oil and gas and manufacturing indicated that big data analytics driven by predictive maintenance can generate savings of up to 12 percent over scheduled repairs, leading to a 30 percent reduction in maintenance costs and a 70 percent cut in downtime from equipment breakdowns.
Our final example is the finance industry and involves areas such as cybertheft and fraud-prevention applications. Although financial services may initially approach these new enabling technologies with an analytical/informational eye, they are now expanding their view to draw an integrated picture of analytical and operational use of such hyper-scale processing. This kind of data intensive real-time analysis is becoming an affordable objective that every organization can use.
Interestingly enough, and common to all three scenarios described, the obstacles for the adoption of analytics in day-to-day operations come more from the difficulties accessing the necessary data streams and integrating their diversity to form a single picture rather than from technological limitations such as processing speed or actionable online storage capacity.
The inevitable technical heterogeneity introduced by the adoption of best-of-breed platforms, the technologies enabling the convergence described in the scenarios above, and the need to make them coexist with legacy systems, imposes a need for a certain level of abstraction. This is required to decouple the physical layer from the business layer and to enable a beneficial shift in the focus of organizations from a physical/implementation viewpoint to a more logical/semantic perspective. More importantly, this shift is taking place at different levels in the architecture.
We see it at the data source level, as modern storage systems are trying to alleviate the rigidity and limitations of the relational paradigm. The notion of schema-on-read proposes this raw-physical/view-logical separation, especially during the design phase, by eliminating the need to know what data structures have to look like by anticipating how they will actually be consumed. This could be a limiting factor when future reuse and repurposing of the same data arises.
Similarly, at the consumer level, we are seeing adaptive user experience platforms that no longer need to know how the information has to be consumed; neither from the point of view of the access paradigm or from that of the type of device or application used. This is also true in the integration layer by using declarative approaches like data virtualization.
Data virtualization technology enables the separation of logical from physical integration so that hybrid techniques (such as real-time, cache, and batch) can leverage the advances in real-time push-down query optimization, in-memory caching, and selective ETL. As a result, this reduces the wasteful full replication of traditional ETL.
Organizations that can fully leverage all operational "big" data for both decision support and to inject analytics in operational processes will be able to react quickly and insightfully to changes and ultimately gain significant competitive advantage. Data virtualization is an effective enabler of the convergence between operational and informational spaces in data management, and is quickly becoming a key element in modern data architectures at organizations of all sizes and in any industry, which maximizes usage and optimizes how users leverage enterprise data assets.
Jesús Barrasa is a senior solutions architect at Denodo Technologies. He has over 15 years of experience in the software industry. Prior to joining Denodo, he worked for large IT services companies such as Atos and Steria and more recently in data integration using semantic technologies at Ontology Systems. He's been involved in large integration projects with customers including Vodafone, T-Mobile, and Level3. Barrasa holds a Ph.D. in computer science and was also a lecturer at the Technical University of Madrid for over six years. You can contact the author at email@example.com.