CEO Perspective: Data Engineering Key to Data Analytics
Despite the push into AI and ML, enterprises should be focused on the data platform.
- By James E. Powell
- May 14, 2019
What's ahead for BI and analytics? We spoke to Matt Cain, president and CEO of Couchbase, about edge computing, AI and machine learning (ML) strategies, and the power and promise of NoSQL. Cain joined Couchbase in 2017 as president and CEO, where he is responsible for driving the company's mission "to be the database that revolutionizes digital innovation." Prior to Couchbase, he held global leadership roles at Veritas, Symantec, and Cisco.
Upside: What technology or methodology must be part of an enterprise's data strategy if it wants to be competitive today? Why?
Matt Cain: We believe that three major things must be part of an enterprise's data strategy:
- Moving away from data silos and data sprawl using modern technologies such as NoSQL to create greater flexibility and actionable insights on operational data in near real time
- Capturing and managing data on the edge where more of the interactions are happening
- Leveraging ML and AI strategies to take advantage of the data you have
In today's world, it all comes down to end customers interacting with the freshest and most relevant data through modern applications so that their expectations are not just met but exceeded. From a business perspective, unleashing the power of data through faster and more precisely targeted insights helps create opportunities to increase revenue and reduce operational costs.
What one emerging technology are you most excited about and think has the greatest potential? What's so special about this technology?
Even though NoSQL is no longer emerging per se, it's still in the early adoption stage within many companies. Every enterprise has a database and data processing modernization strategy, yet we're just scratching the surface of its potential uses and benefits.
What is the single biggest challenge enterprises face today? How do most enterprises respond (and is it working)?
I'd say it's the volume, velocity, variety, and veracity of data, which fundamentally affect how data should be stored and processed. Most often, data resides in disparate databases, data silos, and/or applications, which presents major problems for organizations. Companies spend over three-quarters of their time and resources simply processing the data so that it can be used by the business.
Business-critical decisions, customer experiences, and employee effectiveness all rely on timely and relevant data insights. In order to unleash the benefits that are contained in the data, it must be stored and processed using a platform built to provide the performance, scalability, and flexibility necessary to create those operational insights.
Is there a new technology in data and analytics that is creating more challenges than most people realize? How should enterprises adjust their approach to it?
Although ML and AI have been highlighted as the next great technology breakthrough for businesses, it's clear that there is a big disconnect between theory and practice. The challenge is in clearly understanding the difference between data science and data engineering. Data scientists tend to create custom algorithms and data models that require precise types of data input in order to generate specific results or output. This approach is often brittle. When the quality of the input varies or the data model doesn't accurately reflect real-world behavior, the results can lead to incorrect decisions and actions (garbage in/garbage out).
Businesses rely on data engineering, which creates general-purpose processes with trusted and verifiable inputs and outputs. Enterprises should approach ML and AI as an exercise in data engineering, where the incoming data is well understood (and validated), the algorithms are well described (applicable use cases and limitations), and the actionable outcomes are well tested and validated to ensure they are producing the expected results.
What initiative is your organization spending the most time/resources on today?
As a B2B organization, our focus is on capturing all possible data from a variety of sources (Web traffic, sales interactions, customer feedback, etc.) and leveraging that data to optimize business decisions that result in better products, better customer interactions, and reduced operating costs. To provide a specific example, Web traffic analysis over time allows us to optimize our online and offline content, in turn enabling us to improve the outcomes of our sales and marketing efforts. The changes we make to our content can be used as a predictor of future interactions. Our ongoing metrics serve as a feedback mechanism to see if our interaction model and our corresponding changes are generating the results we predict and expect.
Where do you see analytics and data management headed in 2019 and beyond? What's just over the horizon that we haven't heard much about yet?
In terms of specific technologies, we see a lot of conversations in the industry about ML, AI, and predictive analytics across massive data sets. However, we do not hear as much dialogue about edge analytics, which is the ability to apply analytical functionality directly to IoT data. Edge analytics makes sense of the terabytes of continuously flowing IoT data to empower field workers and line-of-business decisions as well as help organizations to better serve remote customers in different ways. Edge and predictive analytics will mature and complement each other, allowing businesses to generate insights and outcomes from all of their data, regardless of where it resides.
As we look further into the future, we believe a fundamental issue that is going to become crucial is the data processing platform itself. We've seen a lot of attempts with point solutions that provide ML, AI, and predictive analytics (and add edge analytics in the coming years) technologies, but how you integrate these technologies into an extensible, functional, manageable data processing platform has mostly been left as an exercise for the reader.
Platforms with the ability to easily integrate these disparate technologies into a seamless platform that can bring different data processing approaches to business data, regardless of where and how it is stored, will be a key differentiator for those solutions that endure.
Describe your product/solution and the problem it solves for enterprises.
Developed as an alternative to inflexible relational databases, Couchbase was architected for today's massively interactive Web, mobile, and IoT applications. By making it easier to capture, manipulate, and retrieve data in every interaction between customers, employees, and machines, Couchbase enables businesses to deliver innovative digital solutions that drive deeper customer and employee engagement.
Our cloud-native, NoSQL, document-oriented database and key-value store provides developer agility, manageability, and strong performance at any scale -- from any cloud to the edge. With cross-datacenter replication technology and the Couchbase Autonomous Operator for Kubernetes, Couchbase provides cloud interoperability that gives customers the freedom to pick any cloud.
James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him
via email here.