The goal is to prevent a fine-tuned large language model (LLM) on Amazon Bedrock from revealing private customer data. Let’s analyze the options:
A. Amazon Bedrock Guardrails: Guardrails in Amazon Bedrock allow users to define policies to filter harmful or sensitive content in model inputs and outputs. While useful for real-time content moderation, they do not address the risk of private data being embedded in the model during fine-tuning, as the model could still memorize sensitive information.
B. Remove personally identifiable information (PII) from the customer data before fine-tuning the LLM: Removing PII (e.g., names, addresses, account numbers) from the training dataset ensures that the model does not learn or memorize sensitive customer data, reducing the risk of data leakage. This is a proactive and effective approach to data privacy during model training.
C. Increase the Top-K parameter of the LLM: The Top-K parameter controls the randomness of the model’s output by limiting the number of tokens considered during generation. Adjusting this parameter affects output diversity but does not address the privacy of customer data embedded in the model.
D. Store customer data in Amazon S3. Encrypt the data before fine-tuning the LLM: Encrypting data in Amazon S3 protects data at rest and in transit, but during fine-tuning, the data is decrypted and used to train the model. If PII is present, the model could still learn and potentially expose it, so encryption alone does not solve the problem.
Exact Extract Reference: AWS emphasizes data privacy in AI/ML workflows, stating, “To protect sensitive data, you can preprocess datasets to remove personally identifiable information (PII) before using them for model training. This reduces the risk of models inadvertently learning or exposing sensitive information.” (Source: AWS Best Practices for Responsible AI, https://aws.amazon.com/machine-learning/responsible-ai/ ). Additionally, the Amazon Bedrock documentation notes that users are responsible for ensuring compliance with data privacy regulations during fine-tuning (https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization.html ).
Removing PII before fine-tuning is the most direct and effective way to prevent the model from revealing private customer data, making B the correct answer.
[:, AWS Bedrock Documentation: Model Customization (https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization.html), AWS Responsible AI Best Practices (https://aws.amazon.com/machine-learning/responsible-ai/), AWS AI Practitioner Study Guide (emphasis on data privacy in LLM fine-tuning), , ]