How Data Virtualization Turbocharges Cognitive Analytics and Big Data
Data virtualization helps organizations access and virtually integrate disparate data to feed cognitive analytical tools and delivers that intelligence more easily and widely through data services.
By Suresh Chandrasekaran, Senior Vice President, Denodo Technologies
Cognitive analytics (CA) refers to a range of analytical strategies inspired by how the human brain processes information, draws conclusions, and translates learning into decisions and actions. In today's business technology world, it refers to how we use large amounts of data to learn about certain types of business-related functions such as customer loyalty and buying behavior, predictive analytics for operational processes such as pipeline maintenance, and other uses of data mining for business intelligence.
Many of the practical issues surrounding high-level analytics involve challenges such as the precise methods used to collect and store data in a central location (at least traditionally), the need for a higher level of interaction for learning and decision making, and the availability of the required tools to interpret this data. Companies getting involved with cognitive analytics need to build good systems for cross-platform data usage and processing to a particular end. Enter data virtualization (DV).
DV isn't the "engine" behind cognitive analytics, but it is part of the "power and usability system" that significantly increases its business value. Acting like a car's turbocharger, DV provides faster access to more diverse data on the input side, and like a car's transmission, alternator, or air-conditioning units, DV delivers the power output where it is needed for a variety of operational and informational purposes throughout the enterprise. This increases the exposure and use of the resulting analytics on the consumption side and therefore its business value. Without the data access, real-time integration, and data services delivery provided by data virtualization, cognitive analytics can still operate, but not to its potential. Examples of the benefits provided by DV across the value chain from access, integration, and delivery include advantages to:
Data access: DV uniquely enables efficient access to data sources that enhance cognitive analysis. Open data is one such category. All across the world, governments, communities, companies, and individuals are generating new information at an unprecedented rate. Increasingly, these are available through open data APIs and feeds. Today's data virtualization platforms make it point-and-click easy to acquire this data (typically via REST or even direct Web access, pre-integrating security protocols such as OAuth) and combine that with internal and traditional data sources. Because the volumes of open data can be so large, DV almost always makes more sense than physically replicating the data to the enterprise. Furthermore, and depending on the business area, open data can significantly enhance the effectiveness of cognitive analytics.
Data integration: Data virtualization's inherent real-time integration capabilities add to any cognitive analytics system. Capturing, replicating, and storing data in a central location is expensive and slow. Although that cannot be completely eliminated for some type of analytics, data virtualization provides more flexibility. It lets the data live wherever it is located but provides abstracted and integrated access to the users. This reduces the time it takes to make it actionable, which is crucial for cognitive analytics. DV also enables business leaders, data scientists, innovators, and operational decision makers to achieve better outcomes, all without having to worry about how to access, store, convert, integrate, copy, move, secure, and distribute data upfront and on an ongoing basis.
Data delivery: Another noteworthy value of DV is its ability to democratize the use of complex analytics into nuggets that are easily consumed across the enterprise. This often is done by making the result sets consumable in mobile environments, as part of operational processes, alerts, and decision support delivered just-in-time. Technically, cognitive analytics platforms have complex output formats and special tools needed to visualize them properly. Data virtualization can repurpose analytic result sets as easily consumed outputs delivered as SQL-based interfaces, or more commonly as REST data service APIs and widgets used by mobile and wearable devices, operational UI, and external partners.
Data Virtualization in the Real-World
Across all industries, DV is enabling organizations to perform a range of different analytical strategies that are used to learn about certain types of business-related functions. Real-world examples of how data virtualization can enhance cognitive analytics include:
Climate data analytics: A San Francisco-based company has built an innovative and successful business model by providing crop insurance and weather-related data services based on predictive analytics of large amounts climate data. Specifically, the company explains how they perform several types of cognitive and probabilistic analytics with the aid of DV:
- Weather models and prediction of adverse weather events
- Crop yield models - that predict impact of adverse weather on crop yields
- Financial impact and risk models
DV has made a huge impact in how quickly they can react to new data sets and new requirements -- three times faster, with a third of the staff -- and still deliver integrated information to sales, customer service and eventually to farmers around the world to help protect and improve their farming operations. Data virtualization integrates their scientific data sets on Hadoop, Amazon S3, and statistical systems with business and transactional data from ERP, Salesforce.com and traditional databases and provides meaningful business information to users via mobile and BI reports.
European Union Homeland Security project: Saab Aerospace systems led a consortium of 9 companies to create a unique multi-country, multi-agency, multi-tier system called HITS/ISAC which is an acronym for its functions of early detection, dissemination, and interception of perceived threats to the EU countries. Data virtualization was used to bring together several sub-systems and specialized components for cognitive analytics, pattern recognition, linguistics, and messaging and security. Between the voluminous raw data collection efforts and timely intelligence dispatch were multiple layers of abstraction and virtualization that provide both security and independence to the collaborating agencies, as well as a flexible data architecture. The holistic solution would not have been possible without data virtualization to achieve predictive and cognitive patterns that are applied to the movement of people, goods, communications, or money that could point to probable money laundering or terrorist attack.
Smart network management: Telefonica, a giant Telco operating in Europe and South America, uses data virtualization to knit together network monitoring and analytics applications, call logs, communication services usage patterns, guaranteed service levels to premium customers, and other factors to recommend ongoing optimizations to the network, as well as prioritize critical response actions when something goes wrong.
Data Virtualization Query Performance Optimization: The final example is both interesting and unique -- it is, in a sense, the reverse scenario of cognitive analytics enhancing data virtualization. One leading data virtualization vendor is using cognitive analytics internally within its own platform to optimize query performance. The platform internally collects statistics of query execution times of various distributed data sources under many scenarios. As a result, thousands of data points on query times are stored as histograms and analyzed to automatically create predictive rules for what data must be cached or pre-fetched and when to optimize performance. As in other examples, these cognitive analytic results and their implications are easily visualized through system monitoring tools for the developers or architects responsible for the solution to confirm or override in operation.
How DV Bridges the Gap between Big Data and Cognitive Analytics
In each of these real-world examples, DV has helped organizations to access, virtually integrate, and provide data services to the analytical tools that actually perform the cognitive analytics. It also takes the results of specialized "result sets" and makes it easier to consume in the context of other business information. As a result, data virtualization, which is already a must-have capability for agile BI and logical data warehousing, is now also gaining traction with cognitive, predictive, and big data analytics projects to reduce their upfront costs and accelerate value.
Suresh Chandrasekaran, senior vice president at Denodo Technologies, is responsible for global strategic marketing and growth. Before Denodo, Chandrasekaran served in executive roles as general manager, and VP of product management and marketing at leading Web and enterprise software companies and as a management consultant at Booz Allen & Hamilton. You can contact the author at [email protected].