The Future of DataOps: Four Trends to Expect
What's ahead for DataOps? From automated data analysis to the transformation of subject matter experts into data curators, we look at what's next in the last article in our four-part series.
- By Mark Marinelli
- April 30, 2019
In this series on DataOps, we've covered the three key functions necessary to build a DataOps team and the need to think about agility. We've also examined real-world examples of how taking a DataOps approach to data can bring tremendous advantages to organizations.
What does the future hold? Here are four trends you can expect.
Trend #1: Ever-increasing sources of valuable data will intensify the need for smart, automated data analysis
IoT devices will generate enormous volumes of data that must be analyzed if organizations want to gain insights -- such as when crops need water or heavy equipment needs service. John Chambers, former CEO of Cisco, declared there will be 500 billion connected devices by 2025. That's nearly 100 times the number of people on the planet. These devices are going to create a data tsunami.
People typically enter data into apps using keyboards, mice, or finger swipes. IoT devices have many more ways to communicate data. A typical mobile phone has nearly 14 sensors, including an accelerometer, GPS, and even a radiation detector. Industrial machines such as wind turbines and gene sequencers can easily have 100 sensors. A utility grid power sensor can send data 60 times per second -- a construction forklift, once per minute.
IoT devices are just one factor driving this massive increase in the amount of data enterprises can leverage. The end result of all this new data is that its management and analysis will become harder and continue to strain or break traditional data management processes and tools. Only through increased automation via artificial intelligence and machine learning will this diverse and dynamic data be manageable economically.
Trend #2: You'll want to stitch together your own custom solution from purpose-built components
The explosion of new types of data in great volumes has demolished the (erroneous) assumption that you can master big data through a single platform (assuming you'd even want to). The attraction of an integrated, single-vendor platform that turns dirty data into valuable information lies in its ability to avoid integration costs and risks.
The truth is, no one vendor can keep up with the ever-evolving landscape of tools to build enterprise data management pipelines and package the best ones into a unified solution. You end up with an assembly of second-tier approaches rather than a platform composed of best-of-breed components. When someone comes up with a better mousetrap, you won't be able to swap it in.
Many companies are buying applications designed specifically to acquire, organize, prepare, and analyze/visualize their own unique types of data. (Part 2 of this series discusses the kinds of tools needed for every DataOps team.) In the future, the need to stitch together purpose-built, interoperable technologies will become increasingly important for success with big data, and we will see some reference architectures coalesce, just as we've seen historically with LAMP, ELK, etc.
Organizations will have to turn to both open source and commercial components that can address the complexity of the modern data supply chain. These components will need to be integrated into end-to-end solutions, but fortunately they have been built to support the interoperability necessary to make them work together.
Trend #3: User sophistication will increase and advanced tools will become more approachable
The next few years of data analysis will require a symbiotic relationship between human knowledge and technology. With more data in a variety of formats to deal with, organizations will need to take advantage of advancements in automation (AI and machine learning) to augment human talent.
Simultaneously, knowledge workers will need to improve their technical skills to bridge the gaps that technology cannot fill completely. Only through the combination of automation and increased human knowledge can organizations solve the problem of getting the right data to the right users so they can make smarter, more beneficial decisions.
As more graduates who have studied data science and/or data engineering enter the workforce, and as existing knowledge workers upgrade their skills, the supply of data-proficient workers will increase. As we see more data management tools package AI/ML in cleaner user interfaces, abstracting away the arcana of their underlying algorithms, we'll see the barriers to adoption of these newer techniques lower. These factors combine to form a powerful dynamic that will accelerate progress in the domain.
Trend #4: Subject matter experts will become data curators and stewards
Organizations will need to think about crowdsourcing when it comes to data discoverability, maintenance, and quality improvement. The ultimate people required to make data unification truly effective are not data engineers, but rather highly contextual recommenders -- subject matter experts -- who, if directly engaged in the unification process, can enable a new level of productivity in data delivery.
Data consumers -- nontechnical users -- know customer, sales, HR, and other data by heart. They can assess the quality of the data and contribute their expertise to projects to improve data integrity. However, they are too busy to devote their time to the focused tasks of data curation and stewardship. There will be a huge opportunity for improvement as more people are allowed to work with the data they know best and provide feedback on whether the data is accurate and valuable from within their existing tools and workflows.
Incorporating this data feedback systematically, instead of having it locked up in emails or possibly never provided at all, will produce dramatic gains in the ability to focus data quality efforts on the right problem sets, to correct issues with source data, and ultimately to prevent bad data from entering the enterprise in the first place.
A Final Word
Traditional data management techniques are adequate when data sets are static and relatively few, but they break down in environments of high volume and complexity. This is largely due to their top-down, rules-based approaches, which often require significant manual effort to build and maintain. These approaches are becoming extinct quickly.
The future is inevitable -- more data, technology advancements, and an increasing need for curation by SMEs. Data unification technology will help by connecting and mastering data sets through human-guided machine learning. The future is bright for organizations that embrace this new approach.
Mark Marinelli is head of product with Tamr, which builds innovative solutions to help enterprises unify and leverage their key data. A 20-year veteran of enterprise data management and analytics software, Mark has held engineering, product management, and technology strategy roles at Lucent Technologies, Macrovision, and most recently at Lavastorm, where he was chief technology officer.