Questions and Answers

Question POjliky7vY1cSOhizJln

Question

A company stores petabytes of data in thousands of Amazon S3 buckets in the S3 Standard storage class. The data supports analytics workloads that have unpredictable and variable data access patterns. The company does not access some data for months. However, the company must be able to retrieve all data within milliseconds. The company needs to optimize S3 storage costs. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Use S3 Storage Lens standard metrics to determine when to move objects to more cost-optimized storage classes. Create S3 Lifecycle policies for the S3 buckets to move objects to cost-optimized storage classes. Continue to refine the S3 Lifecycle policies in the future to optimize storage costs.
  • B: Use S3 Storage Lens activity metrics to identify S3 buckets that the company accesses infrequently. Configure S3 Lifecycle rules to move objects from S3 Standard to the S3 Standard-Infrequent Access (S3 Standard-IA) and S3 Glacier storage classes based on the age of the data.
  • C: Use S3 Intelligent-Tiering. Activate the Deep Archive Access tier.
  • D: Use S3 Intelligent-Tiering. Use the default access tier.

Question kF7JkGwSBIWPnFZRnYHI

Question

During a security review, a company identified a vulnerability in an AWS Glue job. The company discovered that credentials to access an Amazon Redshift cluster were hard coded in the job script. A data engineer must remediate the security vulnerability in the AWS Glue job. The solution must securely store the credentials. Which combination of steps should the data engineer take to meet these requirements? (Choose two.)

Choices

  • A: Store the credentials in the AWS Glue job parameters.
  • B: Store the credentials in a configuration file that is in an Amazon S3 bucket.
  • C: Access the credentials from a configuration file that is in an Amazon S3 bucket by using the AWS Glue job.
  • D: Store the credentials in AWS Secrets Manager.
  • E: Grant the AWS Glue job IAM role access to the stored credentials.

Question sMTE6fsGvzkykm9R4oKT

Question

A data engineer uses Amazon Redshift to run resource-intensive analytics processes once every month. Every month, the data engineer creates a new Redshift provisioned cluster. The data engineer deletes the Redshift provisioned cluster after the analytics processes are complete every month. Before the data engineer deletes the cluster each month, the data engineer unloads backup data from the cluster to an Amazon S3 bucket. The data engineer needs a solution to run the monthly analytics processes that does not require the data engineer to manage the infrastructure manually. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Use Amazon Step Functions to pause the Redshift cluster when the analytics processes are complete and to resume the cluster to run new processes every month.
  • B: Use Amazon Redshift Serverless to automatically process the analytics workload.
  • C: Use the AWS CLI to automatically process the analytics workload.
  • D: Use AWS CloudFormation templates to automatically process the analytics workload.

Question MlyJ2jo9DsR0Arvs0ms1

Question

A company receives a daily file that contains customer data in .xls format. The company stores the file in Amazon S3. The daily file is approximately 2 GB in size. A data engineer concatenates the column in the file that contains customer first names and the column that contains customer last names. The data engineer needs to determine the number of distinct customers in the file. Which solution will meet this requirement with the LEAST operational effort?

Choices

  • A: Create and run an Apache Spark job in an AWS Glue notebook. Configure the job to read the S3 file and calculate the number of distinct customers.
  • B: Create an AWS Glue crawler to create an AWS Glue Data Catalog of the S3 file. Run SQL queries from Amazon Athena to calculate the number of distinct customers.
  • C: Create and run an Apache Spark job in Amazon EMR Serverless to calculate the number of distinct customers.
  • D: Use AWS Glue DataBrew to create a recipe that uses the COUNT_DISTINCT aggregate function to calculate the number of distinct customers.

Question tyDYP36owfIY0DtoHaYv

Question

A healthcare company uses Amazon Kinesis Data Streams to stream real-time health data from wearable devices, hospital equipment, and patient records. A data engineer needs to find a solution to process the streaming data. The data engineer needs to store the data in an Amazon Redshift Serverless warehouse. The solution must support near real-time analytics of the streaming data and the previous day’s data. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Load data into Amazon Kinesis Data Firehose. Load the data into Amazon Redshift.
  • B: Use the streaming ingestion feature of Amazon Redshift.
  • C: Load the data into Amazon S3. Use the COPY command to load the data into Amazon Redshift.
  • D: Use the Amazon Aurora zero-ETL integration with Amazon Redshift.