BI Advances to Watch For in 2020
In 2020, enterprises may realize the cloud may not be everything they’d hoped for after all, migrations to NoSQL will cease, and ML will be operationalized.
- By Monte Zweben
- December 13, 2019
[Editor’s note: Upside asked executives from around the country to tell us what top three trends they believe data professionals should pay attention to in 2020. Monte Zweben, CEO and cofounder of Splice Machine, focused on the cloud, NoSQL, and machine learning.]
2020 Trend #1: Enterprises become disillusioned by the cloud
Cloud disillusionment will blossom because the meter is always running. Companies that rushed to the cloud finish their first phase of projects and realize that they are running the same applications as before that do not take advantage of new data sources to make them supercharged with AI. Their operating expenses actually increased because the savings in human operators were entirely overwhelmed by the cost of the cloud compute resources for applications that are always on. These resources were capitalized when deployed on-premises but now hit the P&L.
In addition, the burden to integrate various components of the enterprise data infrastructure follows enterprises to the cloud. For example, a company can easily subscribe to Amazon S3, Redshift or Snowflake, RDS or Dynamo, and any number of machine learning engines available on AWS, but the task of integrating them to build a unified platform continues to rest on that company's shoulders. In this scenario, migrating to the cloud does not provide the company with any competitive advantage.
2020 Trend #2: Migrations to NoSQL cease
The modernization of SQL apps to NoSQL will terminate as there is no longer a reason to do so with the availability of distributed SQL. New, specialized apps may use NoSQL but SQL-to-NoSQL projects to modernize legacy applications stop. A lack of SQL in NoSQL systems in and of itself offers no inherent advantage. To abandon a well-understood query language with a large industry footprint for a NoSQL-specific language is a risky proposition. A lack of full SQL support requires the application to be rewritten almost from scratch, and this involves building a new data model.
Despite the heavy lifting, companies might still find it challenging to match the performance of SQL databases. Additionally, finding developers with the required expertise is significantly more difficult than hiring one of the vast numbers of developers already fluent in SQL. Tools to assist in converting existing SQL to NoSQL dialects are incomplete or nonexistent.
A much less risky approach to modernizing legacy applications is to replace your legacy database with a database that scales out, allows storage of diverse data types, and supports in-database machine learning and artificial intelligence -- all without abandoning SQL. By taking this approach, you will keep your application, its workflow, and its logic intact, thereby eliminating the need to retrain users on a brand new application.
2020 Trend #3: ML is operationalized
Companies will adopt best practices to operationalize machine learning and move to production for mission-critical processes. They will do this by abandoning separate platforms to build, train, and deploy the models and running the ML models where the transactional data is stored. Other best practices include:
- Build feature factories -- environments that foster experimentation. When data scientists build a machine-learning model, they start with a large number of attributes or features that are predictive of the event they are interested in. You can think of data scientists as working in a factory and feature vectors as their raw material to shape into a model. To build a robust model, data scientists continuously experiment with different data sets, parameters, and algorithms. They try out different combinations of features, often swapping out one or adding a new feature to replace a less predictive one, or using a feature transformed or derived from other features in the new run. Only through continuous experimentation on the feature production line can data scientists build a model accurate enough to be put into production. The same process is at the heart of keeping models accurate and improving them over time.
- Form interdisciplinary teams. Organizational silos will break, and multidisciplinary teams comprising of data engineers, application developers, data scientists, and subject matter experts will emerge.
- Start with applications, not data. Companies should kill the complexity of data lakes and start focusing on applications. The DevOps burden and developmental complexity of data lakes that require duct-taping many disparate compute and storage components together to form a functional solution will become too heavy for companies to bear. Companies will realize that they don't need to assemble massively scalable infrastructure with combined operational, analytical, and ML capabilities themselves to modernize mission-critical applications.
- Use new tools. New tools to track data science workflow will become the standard. These tools are required to keep the feature factory running smoothly. Internal stakeholders and external regulators, especially in regulated industries such as financial services and insurance, will require companies to keep track of the model building process and to provide evidence of non-discriminatory behavior.
- Eliminate Lambda architecture. New, comprehensive data platforms will kill the Lambda architecture. Companies will consider an integrated platform to modernize and extend the functionality of their legacy applications. These are intelligent SQL platforms that enable companies to be agile, data-rich, and intelligent without the risk and expense of rewriting their applications.
Monte Zweben is the CEO and cofounder of Splice Machine. A technology industry veteran, Monte’s early career was spent with the NASA Ames Research Center as the deputy chief of the artificial intelligence branch, where he won the prestigious Space Act Award for his work on the Space Shuttle program. Monte is the coauthor of Intelligent Scheduling and has published articles in the Harvard Business Review and computer science journals and conference proceedings. He was Chairman of Rocket Fuel Inc. and serves on the Dean’s Advisory Board for Carnegie Mellon University’s School of Computer Science.