Questions and Answers
Question TXZkNyEdRRYN7kSsZ197
Question
A financial company wants to use Amazon Athena to run on-demand SQL queries on a petabyte-scale dataset to support a business intelligence (BI) application. An AWS Glue job that runs during non-business hours updates the dataset once every day. The BI application has a standard data refresh frequency of 1 hour to comply with company policies. A data engineer wants to cost optimize the company’s use of Amazon Athena without adding any additional infrastructure costs. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Configure an Amazon S3 Lifecycle policy to move data to the S3 Glacier Deep Archive storage class after 1 day.
- B: Use the query result reuse feature of Amazon Athena for the SQL queries.
- C: Add an Amazon ElastiCache cluster between the BI application and Athena.
- D: Change the format of the files that are in the dataset to Apache Parquet.
answer?
Answer: B Answer_ET: B Community answer B (91%) 9% Discussion
Comment 1138517 by rralucard_
- Upvotes: 6
Selected Answer: B https://docs.aws.amazon.com/athena/latest/ug/performance-tuning.html Use the Query Result Reuse Feature of Amazon Athena. This leverages Athena’s built-in feature to reduce redundant data scans and thus lowers query costs.
Comment 1361480 by Ell89
- Upvotes: 1
Selected Answer: D D query result reuse will benefit the same queries that are being re-run, it wont benefit new queries. parquet will benefit all queries.
Comment 1301534 by rsmf
- Upvotes: 1
Selected Answer: B Why not D? The question specifies the option with the least overhead, and it clearly states that the Glue job runs once a day. Since the data for that day will not change, there’s no need for additional overhead.
Comment 1259744 by MinTheRanger
- Upvotes: 1
D. Because “query reuse feature” is reliable only when it’s identical but here hourly refresh might be on data related to that hour.
Comment 1259418 by MinTheRanger
- Upvotes: 3
Why not D?
Comment 1196840 by Ousseyni
- Upvotes: 2
Selected Answer: B B. Use the query result reuse feature of Amazon Athena for the SQL queries.
Comment 1184119 by FuriouZ
- Upvotes: 1
Selected Answer: B It’s B: Glacier adds more retrieval time and the other options cost some money
Question Oor5ZbSRCetBdy92wq4f
Question
A company’s data engineer needs to optimize the performance of table SQL queries. The company stores data in an Amazon Redshift cluster. The data engineer cannot increase the size of the cluster because of budget constraints. The company stores the data in multiple tables and loads the data by using the EVEN distribution style. Some tables are hundreds of gigabytes in size. Other tables are less than 10 MB in size. Which solution will meet these requirements?
Choices
- A: Keep using the EVEN distribution style for all tables. Specify primary and foreign keys for all tables.
- B: Use the ALL distribution style for large tables. Specify primary and foreign keys for all tables.
- C: Use the ALL distribution style for rarely updated small tables. Specify primary and foreign keys for all tables.
- D: Specify a combination of distribution, sort, and partition keys for all tables.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1138522 by rralucard_
- Upvotes: 8
Selected Answer: C Use the ALL Distribution Style for Rarely Updated Small Tables. This approach optimizes the performance of joins involving these smaller tables and is a common best practice in Redshift data warehousing. For the larger tables, maintaining the EVEN distribution style or considering a KEY-based distribution (if there are common join columns) could be more appropriate.
Comment 1310413 by jk15997
- Upvotes: 3
why not D?
Comment 1228701 by pypelyncar
- Upvotes: 3
Selected Answer: C For small tables (less than 10 MB in size) that are rarely updated, using the ALL distribution style can provide better query performance. With the ALL distribution style, each compute node stores a copy of the entire table, eliminating the need for data redistribution or shuffling during certain queries. This can significantly improve query performance, especially for joins and aggregations involving small tables.
Comment 1207322 by DevoteamAnalytix
- Upvotes: 2
Selected Answer: C “ALL distribution is appropriate only for relatively slow moving tables; that is, tables that are not updated frequently or extensively.” (https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html)
Question XhN4qorEZugGY1k6u3rS
Question
A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format: //IMG//
Which solution will meet this requirement with the LEAST coding effort?
Choices
- A: Use AWS Glue DataBrew to read the files. Use the NEST_TO_ARRAY transformation to create the new column.
- B: Use AWS Glue DataBrew to read the files. Use the NEST_TO_MAP transformation to create the new column.
- C: Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.
- D: Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.
answer?
Answer: B Answer_ET: B Community answer B (95%) 5% Discussion
Comment 1181412 by FuriouZ
- Upvotes: 12
Selected Answer: B NEST_TO_ARRAY would result in: [ {“key”: “key1”, “value”: “value1”}, {“key”: “key2”, “value”: “value2”}, {“key”: “key3”, “value”: “value3”}]
while NEST_TO_MAP results: { “key1”: “value1”, “key2”: “value2”, “key3”: “value3” } Therefore go with B
Comment 1228703 by pypelyncar
- Upvotes: 3
Selected Answer: B The NEST_TO_MAP transformation is specifically designed to convert data from nested structures (like rows in a CSV) into key-value pairs, perfectly matching the requirement of creating a new column with address components as key-value pairs
Comment 1196842 by Ousseyni
- Upvotes: 4
Selected Answer: B AWS Glue DataBrew is a visual data preparation tool that allows for easy transformation of data without requiring extensive coding. The NEST_TO_MAP transformation in DataBrew allows you to convert columns into a JSON map, which aligns with the desired JSON format for the address data.
Comment 1177241 by GiorgioGss
- Upvotes: 1
Selected Answer: A Come on guys. That’s and array there so…
Comment 1173310 by kj07
- Upvotes: 2
Option B: NEST_TO_MAP: Converts user-selected columns into key-value pairs, each with a key representing the column name and a value representing the row value. The order of the selected column is not maintained while creating the resultant map. The different column data types are typecast to a common type that supports the data types of all columns. https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.NEST_TO_MAP.html
PIVOT: Converts all the row values in a selected column into individual columns with values.
NEST_TO_ARRAY: Converts user-selected columns into array values. The order of the selected columns is maintained while creating the resultant array. The different column data types are typecast to a common type that supports the data types of all columns.
Comment 1168062 by damaldon
- Upvotes: 1
Ans. A NEST_TO_ARRAY Converts user-selected columns into array values. The order of the selected columns is maintained while creating the resultant array. https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.NEST_TO_ARRAY.html
Question cZ42nzaegEGf7uUn6DBW
Question
A company receives call logs as Amazon S3 objects that contain sensitive customer information. The company must protect the S3 objects by using encryption. The company must also use encryption keys that only specific employees can access. Which solution will meet these requirements with the LEAST effort?
Choices
- A: Use an AWS CloudHSM cluster to store the encryption keys. Configure the process that writes to Amazon S3 to make calls to CloudHSM to encrypt and decrypt the objects. Deploy an IAM policy that restricts access to the CloudHSM cluster.
- B: Use server-side encryption with customer-provided keys (SSE-C) to encrypt the objects that contain customer information. Restrict access to the keys that encrypt the objects.
- C: Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.
- D: Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the Amazon S3 managed keys that encrypt the objects.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1196843 by Ousseyni
- Upvotes: 3
Selected Answer: C C. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.
Server-side encryption with AWS KMS (SSE-KMS) provides strong encryption for S3 objects while allowing fine-grained access control through AWS Key Management Service (KMS). With SSE-KMS, you can control access to encryption keys using IAM policies, ensuring that only specific employees can access them.
This solution requires minimal effort as it leverages AWS’s managed encryption service (SSE-KMS) and integrates seamlessly with S3. Additionally, IAM policies can be easily configured to restrict access to the KMS keys, providing granular control over who can access the encryption keys.
Comment 1194771 by Christina666
- Upvotes: 3
Selected Answer: C Encryption at Rest: SSE-KMS provides robust encryption of the sensitive call log data while it’s stored in S3. Key Management and Access Control: AWS KMS offers centralized key management. You can easily create and manage KMS keys (Customer Master Keys - CMKs) and use fine-grained IAM policies to restrict access to specific employees. Minimal Effort: SSE-KMS is a built-in S3 feature. Enabling it requires minimal configuration and no custom code for encryption/decryption.
Comment 1184201 by FuriouZ
- Upvotes: 1
Selected Answer: C KMS because you can restrict access and of course for pricing ;)
Comment 1177248 by GiorgioGss
- Upvotes: 4
Selected Answer: C Least effort = C
Comment 1138534 by rralucard_
- Upvotes: 4
Selected Answer: C Option D does not provide the ability to restrict access to the encryption keys
Question qTQ0caTJet4BbUPTXc1b
Question
A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Establish WebSocket connections to Amazon Redshift.
- B: Use the Amazon Redshift Data API.
- C: Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.
- D: Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1558783 by ninomfr64
- Upvotes: 1
Selected Answer: B You can query a Redshift cluster with either JDBC/ODBC or Data API. The latter just requires you to maintain AWS SDK and IAM role, while the former needs network plumbing to access VPC, JDBC/ODBC driver, database credentials possibly stored in Secret Manager
Comment 1409665 by Scotty_Nguyen
- Upvotes: 1
Selected Answer: B B is correct
Comment 1388081 by Palee
- Upvotes: 1
Selected Answer: B Most efficient solution
Comment 1315087 by bhawna901
- Upvotes: 2
Amazon Redshift Data API: Provides a serverless and simple HTTP-based API to interact with Redshift. Ideal for web-based applications since it eliminates the need to manage persistent database connections (like JDBC or ODBC). Allows the trading application to send queries directly to Redshift using HTTPS requests, making it easy to integrate with modern applications. Removes the complexity of managing database connection pooling in real-time, which reduces operational overhead. Securely integrates with IAM roles and policies for authentication and access control.
Comment 1281789 by markill123
- Upvotes: 1
Selected Answer: B A) Redshift doesn’t support WebSockets; C) It is way harder to manage DB connections than using Redshift Data API which will offer you the possibility to run SQL queries directly. D)
Comment 1243838 by 04e06cb
- Upvotes: 1
Selected Answer: B B is correct
Comment 1209085 by k350Secops
- Upvotes: 2
Selected Answer: B Inside application with minimal effort then using API would be correct
Comment 1205975 by DevoteamAnalytix
- Upvotes: 3
Selected Answer: B “The Amazon Redshift Data API enables you to painlessly access data from Amazon Redshift with all types of traditional, cloud-native, and containerized, serverless web service-based applications and event-driven applications.” https://aws.amazon.com/de/blogs/big-data/using-the-amazon-redshift-data-api-to-interact-with-amazon-redshift-clusters/#:~:text=The%20Amazon%20Redshift%20Data%20API%20is%20not%20a%20replacement%20for,supported%20by%20the%20AWS%20SDK.
Comment 1171034 by GiorgioGss
- Upvotes: 3
Selected Answer: B Even if you don’t know nothing about them, you will still choose B because it seems the “LEAST operational overhead” :)
Comment 1158039 by Alcee
- Upvotes: 1
B. DATA API
Comment 1137882 by TonyStark0122
- Upvotes: 4
B. Use the Amazon Redshift Data API.
Explanation: The Amazon Redshift Data API is a lightweight, HTTPS-based API that provides an alternative to using JDBC or ODBC drivers for running queries against Amazon Redshift. It allows you to execute SQL queries directly from within your application without the need for managing connections or drivers. This reduces operational overhead as there’s no need to manage and maintain WebSocket or JDBC connections.
Comment 1125633 by milofficial
- Upvotes: 4
Selected Answer: B Real time queries with S3 are obviously BS. B it is:
https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html