Lead Inventive Scientist, Data Science and AI Research
AT&T Chief Data Office, USA
This session will include a moderated Q&A featuring questions from the live audience.
As the world moves toward automation by running statistical, machine learning, and AI algorithms on exponentially increasing amounts of data, the ability to extract actionable insights depends on the quality of the data. Data science pipelines that support ML/AI activities focus primarily on scale and speed of data delivery often at the cost of quality. Data quality issues frequently manifest as anomalies, and if not detected and addressed promptly, contaminate data in warehouses and data lakes, leading to erroneous conclusions and expensive and time-consuming data reconciliation and repairs. In this talk, we discuss the process of ensuring data quality by managing and addressing anomalies throughout the entire data science pipeline with examples drawn from real life.