March 30, 2018
As organizations collect and analyze increasing amounts of data, they are turning to the data lake as the platform to perform more advanced analytics such as machine learning.
Why a data lake? Machine learning often requires an iterative process that can drain performance on a traditional warehouse. Data lakes are made for scale and experimentation. They also provide ample, diverse training data for the most comprehensive learning experience, which makes algorithmic assessments more accurate and successful when put into production.
This TDWI Checklist Report presents more details about the data requirements for advanced analytics on a data lake. The bulk of the report is about best practices for analytics—with a focus on machine learning—as performed on data lakes.