Questions and Answers

Question TXZkNyEdRRYN7kSsZ197

Question

A financial company wants to use Amazon Athena to run on-demand SQL queries on a petabyte-scale dataset to support a business intelligence (BI) application. An AWS Glue job that runs during non-business hours updates the dataset once every day. The BI application has a standard data refresh frequency of 1 hour to comply with company policies. A data engineer wants to cost optimize the company’s use of Amazon Athena without adding any additional infrastructure costs. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Configure an Amazon S3 Lifecycle policy to move data to the S3 Glacier Deep Archive storage class after 1 day.
  • B: Use the query result reuse feature of Amazon Athena for the SQL queries.
  • C: Add an Amazon ElastiCache cluster between the BI application and Athena.
  • D: Change the format of the files that are in the dataset to Apache Parquet.

Question Oor5ZbSRCetBdy92wq4f

Question

A company’s data engineer needs to optimize the performance of table SQL queries. The company stores data in an Amazon Redshift cluster. The data engineer cannot increase the size of the cluster because of budget constraints. The company stores the data in multiple tables and loads the data by using the EVEN distribution style. Some tables are hundreds of gigabytes in size. Other tables are less than 10 MB in size. Which solution will meet these requirements?

Choices

  • A: Keep using the EVEN distribution style for all tables. Specify primary and foreign keys for all tables.
  • B: Use the ALL distribution style for large tables. Specify primary and foreign keys for all tables.
  • C: Use the ALL distribution style for rarely updated small tables. Specify primary and foreign keys for all tables.
  • D: Specify a combination of distribution, sort, and partition keys for all tables.

Question XhN4qorEZugGY1k6u3rS

Question

A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format: //IMG//

Which solution will meet this requirement with the LEAST coding effort?

Choices

  • A: Use AWS Glue DataBrew to read the files. Use the NEST_TO_ARRAY transformation to create the new column.
  • B: Use AWS Glue DataBrew to read the files. Use the NEST_TO_MAP transformation to create the new column.
  • C: Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.
  • D: Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.

Question cZ42nzaegEGf7uUn6DBW

Question

A company receives call logs as Amazon S3 objects that contain sensitive customer information. The company must protect the S3 objects by using encryption. The company must also use encryption keys that only specific employees can access. Which solution will meet these requirements with the LEAST effort?

Choices

  • A: Use an AWS CloudHSM cluster to store the encryption keys. Configure the process that writes to Amazon S3 to make calls to CloudHSM to encrypt and decrypt the objects. Deploy an IAM policy that restricts access to the CloudHSM cluster.
  • B: Use server-side encryption with customer-provided keys (SSE-C) to encrypt the objects that contain customer information. Restrict access to the keys that encrypt the objects.
  • C: Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.
  • D: Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the Amazon S3 managed keys that encrypt the objects.

Question qTQ0caTJet4BbUPTXc1b

Question

A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application. Which solution will meet these requirements with the LEAST operational overhead?

Choices

  • A: Establish WebSocket connections to Amazon Redshift.
  • B: Use the Amazon Redshift Data API.
  • C: Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.
  • D: Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.