Questions and Answers

Question TXZkNyEdRRYN7kSsZ197

Question

A financial company wants to use Amazon Athena to run on-demand SQL queries on a petabyte-scale dataset to support a business intelligence (BI) application. An AWS Glue job that runs during non-business hours updates the dataset once every day. The BI application has a standard data refresh frequency of 1 hour to comply with company policies. A data engineer wants to cost optimize the company’s use of Amazon Athena without adding any additional infrastructure costs. Which solution will meet these requirements with the LEAST operational overhead?

Choices

A: Configure an Amazon S3 Lifecycle policy to move data to the S3 Glacier Deep Archive storage class after 1 day.
B: Use the query result reuse feature of Amazon Athena for the SQL queries.
C: Add an Amazon ElastiCache cluster between the BI application and Athena.
D: Change the format of the files that are in the dataset to Apache Parquet.

answer?

Answer: B Answer_ET: B Community answer B (91%) 9% Discussion

Comment 1138517 by rralucard_

Upvotes: 6

Selected Answer: B https://docs.aws.amazon.com/athena/latest/ug/performance-tuning.html Use the Query Result Reuse Feature of Amazon Athena. This leverages Athena’s built-in feature to reduce redundant data scans and thus lowers query costs.

Comment 1361480 by Ell89

Upvotes: 1

Selected Answer: D D query result reuse will benefit the same queries that are being re-run, it wont benefit new queries. parquet will benefit all queries.

Comment 1301534 by rsmf

Upvotes: 1

Selected Answer: B Why not D? The question specifies the option with the least overhead, and it clearly states that the Glue job runs once a day. Since the data for that day will not change, there’s no need for additional overhead.

Comment 1259744 by MinTheRanger

Upvotes: 1

D. Because “query reuse feature” is reliable only when it’s identical but here hourly refresh might be on data related to that hour.

Comment 1259418 by MinTheRanger

Upvotes: 3

Why not D?

Comment 1196840 by Ousseyni

Upvotes: 2

Selected Answer: B B. Use the query result reuse feature of Amazon Athena for the SQL queries.

Comment 1184119 by FuriouZ

Upvotes: 1

Selected Answer: B It’s B: Glacier adds more retrieval time and the other options cost some money

Question Oor5ZbSRCetBdy92wq4f

Question

A company’s data engineer needs to optimize the performance of table SQL queries. The company stores data in an Amazon Redshift cluster. The data engineer cannot increase the size of the cluster because of budget constraints. The company stores the data in multiple tables and loads the data by using the EVEN distribution style. Some tables are hundreds of gigabytes in size. Other tables are less than 10 MB in size. Which solution will meet these requirements?

Choices

A: Keep using the EVEN distribution style for all tables. Specify primary and foreign keys for all tables.
B: Use the ALL distribution style for large tables. Specify primary and foreign keys for all tables.
C: Use the ALL distribution style for rarely updated small tables. Specify primary and foreign keys for all tables.
D: Specify a combination of distribution, sort, and partition keys for all tables.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1138522 by rralucard_

Upvotes: 8

Selected Answer: C Use the ALL Distribution Style for Rarely Updated Small Tables. This approach optimizes the performance of joins involving these smaller tables and is a common best practice in Redshift data warehousing. For the larger tables, maintaining the EVEN distribution style or considering a KEY-based distribution (if there are common join columns) could be more appropriate.

Comment 1310413 by jk15997

Upvotes: 3

why not D?

Comment 1228701 by pypelyncar

Upvotes: 3

Selected Answer: C For small tables (less than 10 MB in size) that are rarely updated, using the ALL distribution style can provide better query performance. With the ALL distribution style, each compute node stores a copy of the entire table, eliminating the need for data redistribution or shuffling during certain queries. This can significantly improve query performance, especially for joins and aggregations involving small tables.

Comment 1207322 by DevoteamAnalytix

Upvotes: 2

Selected Answer: C “ALL distribution is appropriate only for relatively slow moving tables; that is, tables that are not updated frequently or extensively.” (https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html)

Question XhN4qorEZugGY1k6u3rS

Question

A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format: //IMG//

Which solution will meet this requirement with the LEAST coding effort?

Choices

A: Use AWS Glue DataBrew to read the files. Use the NEST_TO_ARRAY transformation to create the new column.
B: Use AWS Glue DataBrew to read the files. Use the NEST_TO_MAP transformation to create the new column.
C: Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.
D: Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.

answer?

Answer: B Answer_ET: B Community answer B (95%) 5% Discussion

Comment 1181412 by FuriouZ

Upvotes: 12

Selected Answer: B NEST_TO_ARRAY would result in: [ {“key”: “key1”, “value”: “value1”}, {“key”: “key2”, “value”: “value2”}, {“key”: “key3”, “value”: “value3”}]

while NEST_TO_MAP results: { “key1”: “value1”, “key2”: “value2”, “key3”: “value3” } Therefore go with B

Comment 1228703 by pypelyncar

Upvotes: 3

Selected Answer: B The NEST_TO_MAP transformation is specifically designed to convert data from nested structures (like rows in a CSV) into key-value pairs, perfectly matching the requirement of creating a new column with address components as key-value pairs

Comment 1196842 by Ousseyni

Upvotes: 4

Selected Answer: B AWS Glue DataBrew is a visual data preparation tool that allows for easy transformation of data without requiring extensive coding. The NEST_TO_MAP transformation in DataBrew allows you to convert columns into a JSON map, which aligns with the desired JSON format for the address data.

Comment 1177241 by GiorgioGss

Upvotes: 1

Selected Answer: A Come on guys. That’s and array there so…

Comment 1173310 by kj07

Upvotes: 2

Option B: NEST_TO_MAP: Converts user-selected columns into key-value pairs, each with a key representing the column name and a value representing the row value. The order of the selected column is not maintained while creating the resultant map. The different column data types are typecast to a common type that supports the data types of all columns. https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.NEST_TO_MAP.html

PIVOT: Converts all the row values in a selected column into individual columns with values.

NEST_TO_ARRAY: Converts user-selected columns into array values. The order of the selected columns is maintained while creating the resultant array. The different column data types are typecast to a common type that supports the data types of all columns.

Comment 1168062 by damaldon

Upvotes: 1

Ans. A NEST_TO_ARRAY Converts user-selected columns into array values. The order of the selected columns is maintained while creating the resultant array. https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.NEST_TO_ARRAY.html

Question cZ42nzaegEGf7uUn6DBW

Question

A company receives call logs as Amazon S3 objects that contain sensitive customer information. The company must protect the S3 objects by using encryption. The company must also use encryption keys that only specific employees can access. Which solution will meet these requirements with the LEAST effort?

Choices

A: Use an AWS CloudHSM cluster to store the encryption keys. Configure the process that writes to Amazon S3 to make calls to CloudHSM to encrypt and decrypt the objects. Deploy an IAM policy that restricts access to the CloudHSM cluster.
B: Use server-side encryption with customer-provided keys (SSE-C) to encrypt the objects that contain customer information. Restrict access to the keys that encrypt the objects.
C: Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.
D: Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the Amazon S3 managed keys that encrypt the objects.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1196843 by Ousseyni

Upvotes: 3

Selected Answer: C C. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.

Server-side encryption with AWS KMS (SSE-KMS) provides strong encryption for S3 objects while allowing fine-grained access control through AWS Key Management Service (KMS). With SSE-KMS, you can control access to encryption keys using IAM policies, ensuring that only specific employees can access them.

This solution requires minimal effort as it leverages AWS’s managed encryption service (SSE-KMS) and integrates seamlessly with S3. Additionally, IAM policies can be easily configured to restrict access to the KMS keys, providing granular control over who can access the encryption keys.

Comment 1194771 by Christina666

Upvotes: 3

Selected Answer: C Encryption at Rest: SSE-KMS provides robust encryption of the sensitive call log data while it’s stored in S3. Key Management and Access Control: AWS KMS offers centralized key management. You can easily create and manage KMS keys (Customer Master Keys - CMKs) and use fine-grained IAM policies to restrict access to specific employees. Minimal Effort: SSE-KMS is a built-in S3 feature. Enabling it requires minimal configuration and no custom code for encryption/decryption.

Comment 1184201 by FuriouZ

Upvotes: 1

Selected Answer: C KMS because you can restrict access and of course for pricing ;)

Comment 1177248 by GiorgioGss

Upvotes: 4

Selected Answer: C Least effort = C

Comment 1138534 by rralucard_

Upvotes: 4

Selected Answer: C Option D does not provide the ability to restrict access to the encryption keys

Question qTQ0caTJet4BbUPTXc1b

Question

A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application. Which solution will meet these requirements with the LEAST operational overhead?

Choices

A: Establish WebSocket connections to Amazon Redshift.
B: Use the Amazon Redshift Data API.
C: Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.
D: Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1558783 by ninomfr64

Upvotes: 1

Selected Answer: B You can query a Redshift cluster with either JDBC/ODBC or Data API. The latter just requires you to maintain AWS SDK and IAM role, while the former needs network plumbing to access VPC, JDBC/ODBC driver, database credentials possibly stored in Secret Manager

Comment 1409665 by Scotty_Nguyen

Upvotes: 1

Selected Answer: B B is correct

Comment 1388081 by Palee

Upvotes: 1

Selected Answer: B Most efficient solution

Comment 1315087 by bhawna901

Upvotes: 2

Amazon Redshift Data API: Provides a serverless and simple HTTP-based API to interact with Redshift. Ideal for web-based applications since it eliminates the need to manage persistent database connections (like JDBC or ODBC). Allows the trading application to send queries directly to Redshift using HTTPS requests, making it easy to integrate with modern applications. Removes the complexity of managing database connection pooling in real-time, which reduces operational overhead. Securely integrates with IAM roles and policies for authentication and access control.

Comment 1281789 by markill123

Upvotes: 1

Selected Answer: B A) Redshift doesn’t support WebSockets; C) It is way harder to manage DB connections than using Redshift Data API which will offer you the possibility to run SQL queries directly. D)

Comment 1243838 by 04e06cb

Upvotes: 1

Selected Answer: B B is correct

Comment 1209085 by k350Secops

Upvotes: 2

Selected Answer: B Inside application with minimal effort then using API would be correct

Comment 1205975 by DevoteamAnalytix

Upvotes: 3

Selected Answer: B “The Amazon Redshift Data API enables you to painlessly access data from Amazon Redshift with all types of traditional, cloud-native, and containerized, serverless web service-based applications and event-driven applications.” https://aws.amazon.com/de/blogs/big-data/using-the-amazon-redshift-data-api-to-interact-with-amazon-redshift-clusters/#:~:text=The%20Amazon%20Redshift%20Data%20API%20is%20not%20a%20replacement%20for,supported%20by%20the%20AWS%20SDK.

Comment 1171034 by GiorgioGss

Upvotes: 3

Selected Answer: B Even if you don’t know nothing about them, you will still choose B because it seems the “LEAST operational overhead” :)

Comment 1158039 by Alcee

Upvotes: 1

B. DATA API

Comment 1137882 by TonyStark0122

Upvotes: 4

B. Use the Amazon Redshift Data API.

Explanation: The Amazon Redshift Data API is a lightweight, HTTPS-based API that provides an alternative to using JDBC or ODBC drivers for running queries against Amazon Redshift. It allows you to execute SQL queries directly from within your application without the need for managing connections or drivers. This reduces operational overhead as there’s no need to manage and maintain WebSocket or JDBC connections.

Comment 1125633 by milofficial

Upvotes: 4

Selected Answer: B Real time queries with S3 are obviously BS. B it is:

https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html

vuthanhdatt's Second Brain

Explorer

Associate-DEA-C01_35

Questions and Answers

Question TXZkNyEdRRYN7kSsZ197

Question

Choices

Comment 1138517 by rralucard_

Comment 1361480 by Ell89

Comment 1301534 by rsmf

Comment 1259744 by MinTheRanger

Comment 1259418 by MinTheRanger

Comment 1196840 by Ousseyni

Comment 1184119 by FuriouZ

Question Oor5ZbSRCetBdy92wq4f

Question

Choices

Comment 1138522 by rralucard_

Comment 1310413 by jk15997

Comment 1228701 by pypelyncar

Comment 1207322 by DevoteamAnalytix

Question XhN4qorEZugGY1k6u3rS

Question

Choices

Comment 1181412 by FuriouZ

Comment 1228703 by pypelyncar

Comment 1196842 by Ousseyni

Comment 1177241 by GiorgioGss

Comment 1173310 by kj07

Comment 1168062 by damaldon

Question cZ42nzaegEGf7uUn6DBW

Question

Choices

Comment 1196843 by Ousseyni

Comment 1194771 by Christina666

Comment 1184201 by FuriouZ

Comment 1177248 by GiorgioGss

Comment 1138534 by rralucard_

Question qTQ0caTJet4BbUPTXc1b

Question

Choices

Comment 1558783 by ninomfr64

Comment 1409665 by Scotty_Nguyen

Comment 1388081 by Palee

Comment 1315087 by bhawna901

Comment 1281789 by markill123

Comment 1243838 by 04e06cb

Comment 1209085 by k350Secops

Comment 1205975 by DevoteamAnalytix

Comment 1171034 by GiorgioGss

Comment 1158039 by Alcee

Comment 1137882 by TonyStark0122

Comment 1125633 by milofficial

Graph View

Table of Contents