Basic Concept: Data integrity in AI systems requires not only that data is accurate at a point in time, but that its entire history of transformation and usage can be traced and verified. Tracking how data has been used and transformed throughout the AI system lifecycle provides ongoing integrity assurance. CompTIA SecAI+ Study Guide covers data governance controls including lineage for AI integrity.
Why D is Correct: Data lineage tracks and documents the complete journey of data from its origin through every transformation, processing step, and use within an AI system. By recording what happened to the data, when, by whom, and through which processes, data lineage provides the audit trail needed to ensure data integrity throughout the AI system ' s data usage lifecycle. It enables verification that data has been used as intended and has not been improperly modified at any stage.
Why A is Wrong: Data masking replaces sensitive data values with anonymized equivalents to protect privacy. It is a confidentiality control that modifies data values rather than a mechanism for ensuring or tracking data integrity across the system.
Why B is Wrong: Data cleansing removes or corrects errors, inconsistencies, and noise in datasets to improve data quality. It is a data preparation activity that improves data accuracy at a point in time but does not track data usage or provide ongoing integrity assurance throughout the AI system lifecycle.
Why C is Wrong: Data verification confirms that data meets expected quality standards and validates its accuracy at a specific check point. While important for quality assurance, it provides a point-in-time check rather than continuous tracking of data usage and transformations as data lineage does.