Basic Concept: When an AI chatbot powered by RAG retrieves and outputs sensitive patient information such as names, medical histories, or identifiers, the risk of Protected Health Information (PHI) disclosure must be mitigated. CompTIA SecAI+ Study Guide covers data protection techniques for AI systems handling sensitive health data.
Why A is Correct: Masking replaces sensitive data values such as patient names, dates of birth, medical record numbers, and diagnoses with redacted or anonymized equivalents in the chatbot ' s output. Even if the RAG system retrieves records containing patient information, masking ensures that the sensitive fields are obscured before the response is presented to the user. This directly prevents PHI disclosure while allowing the chatbot to provide useful responses based on the underlying data patterns.
Why B is Wrong: Classification involves categorizing data by its sensitivity level such as public, internal, confidential, or restricted. While it identifies which data requires protection, classification alone does not transform or obscure the sensitive values in chatbot outputs.
Why C is Wrong: Data minimization is a privacy principle that limits data collection to only what is necessary for the specified purpose. While valuable as a design principle when building the RAG knowledge base, it is a data governance strategy rather than a technical output control that can be applied to mitigate existing PHI disclosure in chatbot responses.
Why D is Wrong: Normalization is a data processing technique that scales numerical values to a standard range or standardizes data formats. It is a preprocessing step for improving model training efficiency, not a data protection technique for preventing patient information disclosure in AI outputs.