You are training an ML model using data stored in BigQuery that contains several values...

Google Professional-Machine-Learning-Engineer Question Answer

You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (Pll). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?

Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.

Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption

Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.

Before training, use BigQuery to select only the columns that do not contain sensitive data Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.

Explanation:

The best option for reducing the sensitivity of the dataset before training the model is to use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption. This option allows you to keep every column in the dataset, while protecting the sensitive data from unauthorized access or exposure. The Cloud DLP API can detect and classify various types of sensitive data, such as names, email addresses, phone numbers, credit card numbers, and more 1 . Dataflow can create scalable and reliable pipelines to process large volumes of data from BigQuery and other sources 2 . Format Preserving Encryption (FPE) is a technique that encrypts sensitive data while preserving its original format and length, which can help maintain the utility and validity of the data 3 . By using Dataflow with the DLP API, you can apply FPE to the sensitive values in the dataset, and store the encrypted data in BigQuery or another destination. Yo u can also use the same pipeline to decrypt the data when needed, by using the same encryption key and method 4 .

The other options are not as suitable as option B, for the following reasons:

Option A: Using Dataflow to ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column, would reduce the sensitivity of the data, but also the utility and accuracy of the data. Randomization is a technique that replaces sensitive data with random values, which can prevent re-identification of t he data, but also distort the distribution and relationships of the data 3 . This can affect the performance and quality of the ML model, especially if every column is critical to the model.

Option C: Using the Cloud DLP API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt, would reduce the sensitivity of the data, but also the utility and validity of the data. AES-256 is a symmetric encryption algorithm that uses a 256-bit key to encrypt and decrypt data. A salt is a random value that is added to the data before encryption, to increase the randomness and security of the encrypted data. However, AES-256 does not preserve the format or length of the original data, which can cause problems when storing or processing the data. For example, if the original data is a 10-digit phone number, AES-256 would produce a much longer and different string, which can break the schema or logic of the dataset 3 .

Option D: Before training, using BigQuery to select only the columns that do not contain sensitive data, and creating an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals, would reduce the exposure of the sensitive data, but also the completeness and relevance of the data. An authorized view is a BigQuery view that allows you to share query results with particular users or groups, without giving them access to the underlying tables. However, this option assumes that you can identify the columns that do not contain sensitive data, which may not be easy or accurate. Moreover, this option would remove some columns from the dataset, which can affect the performance and quality of the ML model, especially if every column is critical to the model.

[References:, Preparing for Google Cloud Certification: Machine Learning Engineer, Course 5: Responsible AI, Week 2: Privacy, Google Cloud Professional Machine Learning Engineer Exam Guide, Section 5: Developing responsible AI solutions, 5.2 Implementing privacy techniques, Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 9: Responsible AI, Section 9.4: Privacy, De-identification techniques, Cloud Data Loss Prevention (DLP) API, Dataflow, Using Dataflow and Sensitive Data Protection to securely tokenize and import data from a relational database to BigQuery, [AES encryption], [Salt (cryptography)], [Authorized views], ]

Google Professional-Machine-Learning-Engineer View All Questions

Google Professional-Machine-Learning-Engineer Summary

Vendor: Google
Product: Professional-Machine-Learning-Engineer
Update on: Jun 15, 2026
Questions: 296

Price: $52.5 ~~$149.99~~

Buy Now Professional-Machine-Learning-Engineer PDF + Testing Engine Pack

You recently joined an enterprise-scale company that has thousands of datasets.

Your data science team has requested a system that supports scheduled model retraining, Docker containers,...

Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmasmnth

You are training an ML model using data stored in BigQuery that contains several values...

The Answer Is:

Explanation:

Google Professional-Machine-Learning-Engineer Summary

Payments We Accept

Contact Us