Questions and Answers

Question jdF2V1IHrIh9F64AgOdg

Question

A company uses AWS Glue Data Catalog to index data that is uploaded to an Amazon S3 bucket every day. The company uses a daily batch processes in an extract, transform, and load (ETL) pipeline to upload data from external sources into the S3 bucket.

The company runs a daily report on the S3 data. Some days, the company runs the report before all the daily data has been uploaded to the S3 bucket. A data engineer must be able to send a message that identifies any incomplete data to an existing Amazon Simple Notification Service (Amazon SNS) topic.

Which solution will meet this requirement with the LEAST operational overhead?

Choices

  • A: Create data quality checks for the source datasets that the daily reports use. Create a new AWS managed Apache Airflow cluster. Run the data quality checks by using Airflow tasks that run data quality queries on the columns data type and the presence of null values. Configure Airflow Directed Acyclic Graphs (DAGs) to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.
  • B: Create data quality checks on the source datasets that the daily reports use. Create a new Amazon EMR cluster. Use Apache Spark SQL to create Apache Spark jobs in the EMR cluster that run data quality queries on the columns data type and the presence of null values. Orchestrate the ETL pipeline by using an AWS Step Functions workflow. Configure the workflow to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.
  • C: Create data quality checks on the source datasets that the daily reports use. Create data quality actions by using AWS Glue workflows to confirm the completeness and consistency of the datasets. Configure the data quality actions to create an event in Amazon EventBridge if a dataset is incomplete. Configure EventBridge to send the event that informs the data engineer about the incomplete datasets to the Amazon SNS topic.
  • D: Create AWS Lambda functions that run data quality queries on the columns data type and the presence of null values. Orchestrate the ETL pipeline by using an AWS Step Functions workflow that runs the Lambda functions. Configure the Step Functions workflow to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.

Question UPm9nHOaC1Ol2lifCyzb

Question

A company stores customer data that contains personally identifiable information (PII) in an Amazon Redshift cluster. The company’s marketing, claims, and analytics teams need to be able to access the customer data.

The marketing team should have access to obfuscated claim information but should have full access to customer contact information. The claims team should have access to customer information for each claim that the team processes. The analytics team should have access only to obfuscated PII data.

Which solution will enforce these data access requirements with the LEAST administrative overhead?

Choices

  • A: Create a separate Redshift cluster for each team. Load only the required data for each team. Restrict access to clusters based on the teams.
  • B: Create views that include required fields for each of the data requirements. Grant the teams access only to the view that each team requires.
  • C: Create a separate Amazon Redshift database role for each team. Define masking policies that apply for each team separately. Attach appropriate masking policies to each team role.
  • D: Move the customer data to an Amazon S3 bucket. Use AWS Lake Formation to create a data lake. Use fine-grained security capabilities to grant each team appropriate permissions to access the data.

Question t5GGzFQalaXm9CLcW8en

Question

A financial company recently added more features to its mobile app. The new features required the company to create a new topic in an existing Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster.

A few days after the company added the new topic, Amazon CloudWatch raised an alarm on the RootDiskUsed metric for the MSK cluster.

How should the company address the CloudWatch alarm?

Choices

  • A: Expand the storage of the MSK broker. Configure the MSK cluster storage to expand automatically.
  • B: Expand the storage of the Apache ZooKeeper nodes.
  • C: Update the MSK broker instance to a larger instance type. Restart the MSK cluster.
  • D: Specify the Target Volume-in-GiB parameter for the existing topic.

Question 8N1yJt5DkQQRKJ9K6abi

Question

A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour. Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)

Choices

  • A: Configure AWS Glue triggers to run the ETL jobs every hour.
  • B: Use AWS Glue DataBrew to clean and prepare the data for analytics.
  • C: Use AWS Lambda functions to schedule and run the ETL jobs every hour.
  • D: Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.
  • E: Use the Redshift Data API to load transformed data into Amazon Redshift.

Question U7V6JGtJgE5QlSKO9IfW

Question

A data engineer needs to build an enterprise data catalog based on the company’s Amazon S3 buckets and Amazon RDS databases. The data catalog must include storage format metadata for the data in the catalog.

Which solution will meet these requirements with the LEAST effort?

Choices

  • A: Use an AWS Glue crawler to scan the S3 buckets and RDS databases and build a data catalog. Use data stewards to inspect the data and update the data catalog with the data format.
  • B: Use an AWS Glue crawler to build a data catalog. Use AWS Glue crawler classifiers to recognize the format of data and store the format in the catalog.
  • C: Use Amazon Macie to build a data catalog and to identify sensitive data elements. Collect the data format information from Macie.
  • D: Use scripts to scan data elements and to assign data classifications based on the format of the data.