Prerequisite: None
Chloe Mawer
Ph.D.
Senior Data Scientist
Silicon Valley Data Science
With the recent advancements in machine learning algorithms and statistical techniques, and the increasing ease of implementing them, it is tempting to ignore the power and necessity of exploratory data analysis (EDA), the crucial step before diving into machine learning or statistical modeling. Simply applying machine learning algorithms without a proper orientation of the dataset can lead to wasted time and spurious conclusions.
EDA allows practitioners to gain intuition for the pattern of the data, identify anomalies, narrow down a set of alternative modeling approaches, devise strategies to handle missing data, and ensure correct interpretation of the results. Further, EDA can rapidly generate insights and answer many questions without requiring complex modeling.
In this talk, I will prove just how valuable exploratory data analysis can be before any modeling even takes place, both in terms of the insight that it can bring, as well as the improvements it can make in the modeling process. Through examples, I will outline best practices for EDA of cross-section, time series, and panel data. I will also differentiate the EDA needed for categorical and numerical data types. This talk is for data practitioners who are interested in enriching their EDA abilities.