Questions and Answers

Question CDVc5g6TdzBiVbj0MfuF

Question

A data engineer is using Amazon Athena to analyze sales data that is in Amazon S3. The data engineer writes a query to retrieve sales amounts for 2023 for several products from a table named sales_data. However, the query does not return results for all of the products that are in the sales_data table. The data engineer needs to troubleshoot the query to resolve the issue. The data engineer’s original query is as follows: SELECT product_name, sum(sales_amount)

FROM sales_data -

WHERE year = 2023 -

GROUP BY product_name - How should the data engineer modify the Athena query to meet these requirements?

Choices

  • A: Replace sum(sales_amount) with count(*) for the aggregation.
  • B: Change WHERE year = 2023 to WHERE extract(year FROM sales_data) = 2023.
  • C: Add HAVING sum(sales_amount) > 0 after the GROUP BY clause.
  • D: Remove the GROUP BY clause.

Question iISCzyGUXj0nvAaziDoD

Question

A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer needs to query only one column of the data. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Configure an AWS Lambda function to load data from the S3 bucket into a pandas dataframe. Write a SQL SELECT statement on the dataframe to query the required column.
  • B: Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.
  • C: Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the required column.
  • D: Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon Athena to query the required column.

Question edtce3XeFZm5iaVJNh7S

Question

A data engineer maintains custom Python scripts that perform a data formatting process that many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the data engineer must manually update all the Lambda functions. The data engineer requires a less manual way to update the Lambda functions. Which solution will meet this requirement?

Choices

  • A: Store a pointer to the custom Python scripts in the execution context object in a shared Amazon S3 bucket.
  • B: Package the custom Python scripts into Lambda layers. Apply the Lambda layers to the Lambda functions.
  • C: Store a pointer to the custom Python scripts in environment variables in a shared Amazon S3 bucket.
  • D: Assign the same alias to each Lambda function. Call reach Lambda function by specifying the function’s alias.

Question u1vZCTFgF12c7U5mpJF2

Question

A company uses Amazon Redshift for its data warehouse. The company must automate refresh schedules for Amazon Redshift materialized views. Which solution will meet this requirement with the LEAST effort?

Choices

  • A: Use Apache Airflow to refresh the materialized views.
  • B: Use an AWS Lambda user-defined function (UDF) within Amazon Redshift to refresh the materialized views.
  • C: Use the query editor v2 in Amazon Redshift to refresh the materialized views.
  • D: Use an AWS Glue workflow to refresh the materialized views.

Question EzmG52BuwPapIdLcVqj7

Question

A data engineer must orchestrate a data pipeline that consists of one AWS Lambda function and one AWS Glue job. The solution must integrate with AWS services. Which solution will meet these requirements with the LEAST management overhead?

Choices

  • A: Use an AWS Step Functions workflow that includes a state machine. Configure the state machine to run the Lambda function and then the AWS Glue job.
  • B: Use an Apache Airflow workflow that is deployed on an Amazon EC2 instance. Define a directed acyclic graph (DAG) in which the first task is to call the Lambda function and the second task is to call the AWS Glue job.
  • C: Use an AWS Glue workflow to run the Lambda function and then the AWS Glue job.
  • D: Use an Apache Airflow workflow that is deployed on Amazon Elastic Kubernetes Service (Amazon EKS). Define a directed acyclic graph (DAG) in which the first task is to call the Lambda function and the second task is to call the AWS Glue job.