Prerequisite: None
For this talk, the term “advanced analytics” will be used to mean data science, machine learning, and artificial intelligence: the collection of advanced, specialized techniques that have evolved in the last 20 years to get more value and meaning from data.
The sensation around ChatGPT is just one example of advanced analytics: machine learning models have been deployed in consumer-facing businesses for years to recommend the next product or service, interrupt fraud, evaluate loan applications, and make other decisions. Similarly, machine learning is used in maintenance and operation to predict failures, pre-position repair resources, and autonomously pilot factory vehicles—and machine learning is just one of several forms of advanced analytics.
Although many companies are investing in such advances, relatively little attention has been given to the data platform needed to enable them at scale. Data scientists and machine learning specialists typically focus on the tools and algorithms that they want to use, rather than the data platform.
However, the data platform and the strategies used to create the analytics data repository (data lake, data lakehouse, or data warehouse) are—or should be—the foundation of the advanced analytics program. With the right approach to the platform and the database, most advanced analytics can be performed at scale inside the database. Further, a feature store shared by multiple machine learning models can be maintained inside the database.
In-database advanced analytics confers advantages in scalability, efficiency, security, consistency, and cycle time. This talk will be about strategies for the data platform as an enabler of advanced analytics and the benefits that can be realized with a strategic approach.