PMI-CPMAI highlights data pipelines and preprocessing as critical components of AI/ML configuration management. A core principle is that all evaluation datasets must be processed through consistent, validated preprocessing steps (cleaning, normalization, feature engineering, encoding, etc.). If different test datasets experience different preprocessing logic, parameter settings, or transformations, performance metrics will naturally appear inconsistent, not because of the model itself but because the inputs are not comparable.
The guidance notes that configuration management for AI must track not only model versions but also data transformations, feature pipelines, and parameter settings. Inconsistent metrics across test datasets are a classic symptom of mismatched preprocessing, such as applying different scaling, missing-value handling, text tokenization, or feature selection strategies across datasets. Overfitting and model complexity affect generalization, but typically manifest as consistently poor performance on out-of-sample data, rather than erratic metrics between test sets prepared correctly.
Therefore, when a team observes inconsistent performance metrics across different test datasets, PMI-CPMAI would direct them to first check whether the data preprocessing steps are implemented correctly and consistently across those datasets. The likely cause of the inconsistency issue is incorrect (or inconsistent) data preprocessing steps.