Exploratory data analysis (EDA) involves understanding the data by visualizing it, calculating statistics, and creating correlation matrices. This stage helps identify patterns, relationships, and anomalies in the data, which can guide further steps in the ML pipeline.
Option C (Correct): "Exploratory data analysis": This is the correct answer as the tasks described (correlation matrix, calculating statistics, visualizing data) are all part of the EDA process.
Option A: "Data pre-processing" is incorrect because it involves cleaning and transforming data, not initial analysis.
Option B: "Feature engineering" is incorrect because it involves creating new features from raw data, not analyzing the data's existing structure.
Option D: "Hyperparameter tuning" is incorrect because it refers to optimizing model parameters, not analyzing the data.
AWS AI Practitioner References:
Stages of the Machine Learning Pipeline: AWS outlines EDA as the initial phase of understanding and exploring data before moving to more specific preprocessing, feature engineering, and model training stages.