Questions and Answers

Question MKrVF9K3aA6KlbZET9Dx

Question

A media company wants to improve a system that recommends media content to customer based on user behavior and preferences. To improve the recommendation system, the company needs to incorporate insights from third-party datasets into the company’s existing analytics platform. The company wants to minimize the effort and time required to incorporate third-party datasets. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Use API calls to access and integrate third-party datasets from AWS Data Exchange.
  • B: Use API calls to access and integrate third-party datasets from AWS DataSync.
  • C: Use Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS CodeCommit repositories.
  • D: Use Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR).

Question anmqEXJWq0fWcXuASMXo

Question

A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information. The data engineer must identify and remove duplicate information from the legacy application data. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Write a custom extract, transform, and load (ETL) job in Python. Use the DataFrame.drop_duplicates() function by importing the Pandas library to perform data deduplication.
  • B: Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication.
  • C: Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.
  • D: Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

Question UUGBzJe20I95UfeKQPcG

Question

A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3. Which actions will provide the FASTEST queries? (Choose two.)

Choices

  • A: Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
  • B: Use a columnar storage file format.
  • C: Partition the data based on the most common query predicates.
  • D: Split the data into files that are less than 10 KB.
  • E: Use file formats that are not splittable.

Question I07dgTyOtcTFYZgTuwet

Question

A company uses Amazon RDS to store transactional data. The company runs an RDS DB instance in a private subnet. A developer wrote an AWS Lambda function with default settings to insert, update, or delete data in the DB instance. The developer needs to give the Lambda function the ability to connect to the DB instance privately without using the public internet. Which combination of steps will meet this requirement with the LEAST operational overhead? (Choose two.)

Choices

  • A: Turn on the public access setting for the DB instance.
  • B: Update the security group of the DB instance to allow only Lambda function invocations on the database port.
  • C: Configure the Lambda function to run in the same subnet that the DB instance uses.
  • D: Attach the same security group to the Lambda function and the DB instance. Include a self-referencing rule that allows access through the database port.
  • E: Update the network ACL of the private subnet to include a self-referencing rule that allows access through the database port.

Question BF47CG7McNndhYmimi0r

Question

A company has a frontend ReactJS website that uses Amazon API Gateway to invoke REST APIs. The APIs perform the functionality of the website. A data engineer needs to write a Python script that can be occasionally invoked through API Gateway. The code must return results to API Gateway. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Deploy a custom Python script on an Amazon Elastic Container Service (Amazon ECS) cluster.
  • B: Create an AWS Lambda Python function with provisioned concurrency.
  • C: Deploy a custom Python script that can integrate with API Gateway on Amazon Elastic Kubernetes Service (Amazon EKS).
  • D: Create an AWS Lambda function. Ensure that the function is warm by scheduling an Amazon EventBridge rule to invoke the Lambda function every 5 minutes by using mock events.