Questions and Answers

Question P2ON0ECNRy7k0rDkxllw

Question

A data engineer needs to debug an AWS Glue job that reads from Amazon S3 and writes to Amazon Redshift. The data engineer enabled the bookmark feature for the AWS Glue job. The data engineer has set the maximum concurrency for the AWS Glue job to 1.

The AWS Glue job is successfully writing the output to Amazon Redshift. However, the Amazon S3 files that were loaded during previous runs of the AWS Glue job are being reprocessed by subsequent runs.

What is the likely reason the AWS Glue job is reprocessing the files?

Choices

  • A: The AWS Glue job does not have the s3:GetObjectAcl permission that is required for bookmarks to work correctly.
  • B: The maximum concurrency for the AWS Glue job is set to 1.
  • C: The data engineer incorrectly specified an older version of AWS Glue for the Glue job.
  • D: The AWS Glue job does not have a required commit statement.

Question JG1Sy4gdmRfIa3jod11G

Question

An ecommerce company wants to use AWS to migrate data pipelines from an on-premises environment into the AWS Cloud. The company currently uses a third-party tool in the on-premises environment to orchestrate data ingestion processes.

The company wants a migration solution that does not require the company to manage servers. The solution must be able to orchestrate Python and Bash scripts. The solution must not require the company to refactor any code.

Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: AWS Lambda
  • B: Amazon Managed Workflows for Apache Airflow (Amazon MVVAA)
  • C: AWS Step Functions
  • D: AWS Glue

Question l6atByHKAMPCauQHRONx

Question

A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column. Which solution will MOST speed up the Athena query performance?

Choices

  • A: Change the data format from .csv to JSON format. Apply Snappy compression.
  • B: Compress the .csv files by using Snappy compression.
  • C: Change the data format from .csv to Apache Parquet. Apply Snappy compression.
  • D: Compress the .csv files by using gzip compression.

Question NN6uiHStJJsoMvogsNkU

Question

A retail company stores data from a product lifecycle management (PLM) application in an on-premises MySQL database. The PLM application frequently updates the database when transactions occur.

The company wants to gather insights from the PLM application in near real time. The company wants to integrate the insights with other business datasets and to analyze the combined dataset by using an Amazon Redshift data warehouse.

The company has already established an AWS Direct Connect connection between the on-premises infrastructure and AWS.

Which solution will meet these requirements with the LEAST development effort?

Choices

  • A: Run a scheduled AWS Glue extract, transform, and load (ETL) job to get the MySQL database updates by using a Java Database Connectivity (JDBC) connection. Set Amazon Redshift as the destination for the ETL job.
  • B: Run a full load plus CDC task in AWS Database Migration Service (AWS DMS) to continuously replicate the MySQL database changes. Set Amazon Redshift as the destination for the task.
  • C: Use the Amazon AppFlow SDK to build a custom connector for the MySQL database to continuously replicate the database changes. Set Amazon Redshift as the destination for the connector.
  • D: Run scheduled AWS DataSync tasks to synchronize data from the MySQL database. Set Amazon Redshift as the destination for the tasks.

Question MYA0pP4zVyD0unCKwb2K

Question

A marketing company uses Amazon S3 to store clickstream data. The company queries the data at the end of each day by using a SQL JOIN clause on S3 objects that are stored in separate buckets.

The company creates key performance indicators (KPIs) based on the objects. The company needs a serverless solution that will give users the ability to query data by partitioning the data. The solution must maintain the atomicity, consistency, isolation, and durability (ACID) properties of the data.

Which solution will meet these requirements MOST cost-effectively?

Choices

  • A: Amazon S3 Select
  • B: Amazon Redshift Spectrum
  • C: Amazon Athena
  • D: Amazon EMR