Questions and Answers
Question 6U0NnOAX76T9TBuQwixa
Question
A company uses Amazon RDS for MySQL as the database for a critical application. The database workload is mostly writes, with a small number of reads. A data engineer notices that the CPU utilization of the DB instance is very high. The high CPU utilization is slowing down the application. The data engineer must reduce the CPU utilization of the DB Instance. Which actions should the data engineer take to meet this requirement? (Choose two.)
Choices
- A: Use the Performance Insights feature of Amazon RDS to identify queries that have high CPU utilization. Optimize the problematic queries.
- B: Modify the database schema to include additional tables and indexes.
- C: Reboot the RDS DB instance once each week.
- D: Upgrade to a larger instance size.
- E: Implement caching to reduce the database query load.
answer?
Answer: AD Answer_ET: AD Community answer AD (76%) BD (24%) Discussion
Comment 1173246 by kj07
- Upvotes: 10
Here the issue is with the writes and caching will not solve them.
I will go with A and D.
Comment 1187784 by lucas_rfsb
- Upvotes: 9
Selected Answer: AD I will go for A and D, since other options are more likely to improve read performance issues.
Comment 1307900 by michele_scar
- Upvotes: 2
Selected Answer: AD With A you should understand why the CPU is in high loading. B is mentioned in the last phrase of A (optimizing). Remain valid only D
Comment 1207931 by sdas1
- Upvotes: 2
A and D With a workload that is mostly writes and a small number of reads, caching will not be as effective in reducing CPU utilization compared to read-heavy workloads. https://repost.aws/knowledge-center/rds-aurora-postgresql-high-cpu
Comment 1181455 by fceb2c1
- Upvotes: 5
Selected Answer: AD A and D.
For A it is mentioned here https://repost.aws/knowledge-center/rds-instance-high-cpu
Comment 1177208 by GiorgioGss
- Upvotes: 5
Selected Answer: BD Since the questions states that “the database workload is mostly writes” let’s eliminate the options that improves the reads.
Comment 1168355 by damaldon
- Upvotes: 4
Ans. AE A) Use Amazon RDS Performance Insights to identify the query that’s responsible for the database load. Check the SQL tab that corresponds to a particular timeframe. E) If there’s a query that’s repeatedly running, use prepared statements to lower the pressure on your CPU. Repeated running of prepared statements caches the query plan. Because the plan is already in cache for further runs, the time for planning is much less. https://repost.aws/knowledge-center/rds-aurora-postgresql-high-cpu
Question 1Olof47WGx9kxR3pAo1y
Question
A company has used an Amazon Redshift table that is named Orders for 6 months. The company performs weekly updates and deletes on the table. The table has an interleaved sort key on a column that contains AWS Regions. The company wants to reclaim disk space so that the company will not run out of storage space. The company also wants to analyze the sort key column. Which Amazon Redshift command will meet these requirements?
Choices
- A: VACUUM FULL Orders
- B: VACUUM DELETE ONLY Orders
- C: VACUUM REINDEX Orders
- D: VACUUM SORT ONLY Orders
answer?
Answer: C Answer_ET: C Community answer C (83%) A (17%) Discussion
Comment 1177214 by GiorgioGss
- Upvotes: 12
Selected Answer: C https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html “A full vacuum doesn’t perform a reindex for interleaved tables. To reindex interleaved tables followed by a full vacuum, use the VACUUM REINDEX option.” A - “A full vacuum doesn’t perform a reindex for interleaved tables.”- from the docs above B- “A DELETE ONLY vacuum operation doesn’t sort table data.” - from the docs above D - “without reclaiming space freed by deleted rows. ” - from the docs above
Comment 1292373 by Shatheesh
- Upvotes: 1
Selected Answer: A
Comment 1208413 by d8945a1
- Upvotes: 1
Selected Answer: C VACUUM REINDEX makes an additional pass to analyze the interleaved sort keys.
https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html#r_VACUUM_command-parameters
Comment 1194731 by Christina666
- Upvotes: 1
Selected Answer: C Reclaiming Space: After updates and deletes, Redshift tables can retain deleted data blocks, taking up space. The VACUUM REINDEX command:
Reclaims the space taken up by the deleted rows. Rebuilds indexes on the sort key columns. Analyzing the Sort Key: Since the sort key column contains AWS Regions, rebuilding the indexes on this column will help cluster data according to region. This clustering can improve performance for queries that filter or group by region.
Comment 1190303 by arvehisa
- Upvotes: 1
Selected Answer: C Correct Answer: C
Requirements:
- relcaim the disk space
- analyze the sork key column
Document: https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html#vacuum-reindex VACUUM FULL: A full vacuum doesn’t perform a reindex for interleaved tables. To reindex interleaved tables followed by a full vacuum, use the VACUUM REINDEX option. VACUUM REINDEX: Analyzes the distribution of the values in interleaved sort key columns, then performs a full VACUUM operation.
Comment 1188812 by lucas_rfsb
- Upvotes: 3
Selected Answer: A FULL is the only one which claims space and sorts. FULL Sorts the specified table (or all tables in the current database) and reclaims disk space occupied by rows that were marked for deletion by previous UPDATE and DELETE operations. VACUUM FULL is the default.
https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html
Comment 1174320 by damaldon
- Upvotes: 1
B is the answer
Comment 1173255 by kj07
- Upvotes: 1
Option C
Analyzes the distribution of the values in interleaved sort key columns, then performs a full VACUUM operation. If REINDEX is used, a table name is required.
VACUUM REINDEX takes significantly longer than VACUUM FULL because it makes an additional pass to analyze the interleaved sort keys. The sort and merge operation can take longer for interleaved tables because the interleaved sort might need to rearrange more rows than a compound sort.
If a VACUUM REINDEX operation terminates before it completes, the next VACUUM resumes the reindex operation before performing the full vacuum operation.
Comment 1164502 by CalvinL4
- Upvotes: 1
The answer should be B. The VACUUM DELETE ONLY command is used in Amazon Redshift to remove rows that have been marked for deletion due to updates and deletes in a table.
Question IRkHcAQpv5FkuXpJgPX4
Question
A manufacturing company wants to collect data from sensors. A data engineer needs to implement a solution that ingests sensor data in near real time. The solution must store the data to a persistent data store. The solution must store the data in nested JSON format. The company must have the ability to query from the data store with a latency of less than 10 milliseconds. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use a self-hosted Apache Kafka cluster to capture the sensor data. Store the data in Amazon S3 for querying.
- B: Use AWS Lambda to process the sensor data. Store the data in Amazon S3 for querying.
- C: Use Amazon Kinesis Data Streams to capture the sensor data. Store the data in Amazon DynamoDB for querying.
- D: Use Amazon Simple Queue Service (Amazon SQS) to buffer incoming sensor data. Use AWS Glue to store the data in Amazon RDS for querying.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1228685 by pypelyncar
- Upvotes: 2
Selected Answer: C Amazon Kinesis Data Streams is a fully managed service that allows for seamless integration of diverse data sources, including IoT sensors. By using Kinesis Data Streams as the ingestion mechanism, the company can avoid the overhead of setting up and managing an Apache Kafka cluster or other data ingestion pipelines.
Comment 1205246 by Snape
- Upvotes: 3
Selected Answer: C near real time = Kinesis Data streams
Comment 1196832 by Ousseyni
- Upvotes: 1
Selected Answer: C Option C is the best solution to meet the requirements
Comment 1164063 by Felix_G
- Upvotes: 3
Option C is the best solution to meet the requirements with the least operational overhead:
Use Amazon Kinesis Data Streams to ingest real-time sensor data Store the nested JSON data in Amazon DynamoDB for low latency queries The key advantages of Option C are:
Kinesis Data Streams fully manages real-time data ingestion with auto-scaling and persistence DynamoDB provides single digit millisecond latency for queries DynamoDB natively supports nested JSON data models Fully managed services minimize operational overhead In contrast:
Option A requires managing Kafka clusters Option B uses Lambda which can’t provide persistent storage Option D requires integrating SQS, Glue, and RDS leading to complexity
Comment 1138497 by rralucard_
- Upvotes: 3
Selected Answer: C Option C, using Amazon Kinesis Data Streams to capture the sensor data and storing it in Amazon DynamoDB for querying, is the best solution to meet the requirements with the least operational overhead. This solution is well-optimized for real-time data ingestion, supports the desired data format, and provides the necessary query performance.
Question uMbpK5mZDDSG9UZ4Eq66
Question
A company stores data in a data lake that is in Amazon S3. Some data that the company stores in the data lake contains personally identifiable information (PII). Multiple user groups need to access the raw data. The company must ensure that user groups can access only the PII that they require. Which solution will meet these requirements with the LEAST effort?
Choices
- A: Use Amazon Athena to query the data. Set up AWS Lake Formation and create data filters to establish levels of access for the company’s IAM roles. Assign each user to the IAM role that matches the user’s PII access requirements.
- B: Use Amazon QuickSight to access the data. Use column-level security features in QuickSight to limit the PII that users can retrieve from Amazon S3 by using Amazon Athena. Define QuickSight access levels based on the PII access requirements of the users.
- C: Build a custom query builder UI that will run Athena queries in the background to access the data. Create user groups in Amazon Cognito. Assign access levels to the user groups based on the PII access requirements of the users.
- D: Create IAM roles that have different levels of granular access. Assign the IAM roles to IAM user groups. Use an identity-based policy to assign access levels to user groups at the column level.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1187788 by lucas_rfsb
- Upvotes: 3
Selected Answer: A Amazon Athena to query the data and setting up AWS Lake Formation with data filters, the company can ensure that user groups can access only the personally identifiable information (PII) that they require. The combination of Athena for querying and Lake Formation for access control provides a comprehensive solution for managing PII access requirements effectively and securely
Comment 1165092 by Felix_G
- Upvotes: 2
Selected Answer: A The solution that will meet the requirements with the LEAST effort is:
A. Use Amazon Athena to query the data. Set up AWS Lake Formation and create data filters to establish levels of access for the company’s IAM roles. Assign each user to the IAM role that matches the user’s PII access requirements.
This option leverages AWS Lake Formation to create data filters and establish access levels for IAM roles, providing a straightforward approach to managing user access based on PII requirements.
Comment 1138501 by rralucard_
- Upvotes: 4
Selected Answer: A Option A, using Amazon Athena with AWS Lake Formation, is the most suitable solution. Lake Formation is designed to provide fine-grained access control to data lakes stored in S3 and integrates well with Athena, thereby meeting the requirements with the least effort. https://aws.amazon.com/blogs/big-data/anonymize-and-manage-data-in-your-data-lake-with-amazon-athena-and-aws-lake-formation/
Question lqOJC09SJoogEsSUssiC
Question
A data engineer must build an extract, transform, and load (ETL) pipeline to process and load data from 10 source systems into 10 tables that are in an Amazon Redshift database. All the source systems generate .csv, JSON, or Apache Parquet files every 15 minutes. The source systems all deliver files into one Amazon S3 bucket. The file sizes range from 10 MB to 20 GB. The ETL pipeline must function correctly despite changes to the data schema. Which data pipeline solutions will meet these requirements? (Choose two.)
Choices
- A: Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
- B: Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
- C: Configure an AWS Lambda function to invoke an AWS Glue crawler when a file is loaded into the S3 bucket. Configure an AWS Glue job to process and load the data into the Amazon Redshift tables. Create a second Lambda function to run the AWS Glue job. Create an Amazon EventBridge rule to invoke the second Lambda function when the AWS Glue crawler finishes running successfully.
- D: Configure an AWS Lambda function to invoke an AWS Glue workflow when a file is loaded into the S3 bucket. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
- E: Configure an AWS Lambda function to invoke an AWS Glue job when a file is loaded into the S3 bucket. Configure the AWS Glue job to read the files from the S3 bucket into an Apache Spark DataFrame. Configure the AWS Glue job to also put smaller partitions of the DataFrame into an Amazon Kinesis Data Firehose delivery stream. Configure the delivery stream to load data into the Amazon Redshift tables.
answer?
Answer: BD Answer_ET: BD Community answer BD (94%) 6% Discussion
Comment 1140287 by rralucard_
- Upvotes: 10
Selected Answer: BD Option B: Amazon EventBridge Rule with AWS Glue Workflow Job Every 15 Minutes - for its streamlined process, automated scheduling, and ability to handle schema changes.
Option D: AWS Lambda to Invoke AWS Glue Workflow When a File is Loaded - for its responsiveness to file arrival and adaptability to schema changes, though it is slightly more complex than option B.
Comment 1329072 by HagarTheHorrible
- Upvotes: 1
Selected Answer: BD change od schema is the key
Comment 1218346 by valuedate
- Upvotes: 1
Selected Answer: BD eventbridge rule or event trigger
Comment 1196836 by Ousseyni
- Upvotes: 1
Selected Answer: AE ChatCGT sid A and E
Comment 1194738 by Christina666
- Upvotes: 1
Selected Answer: BD eventbridge rule or event trigger
Comment 1190312 by arvehisa
- Upvotes: 1
I don’t think this pipeline should be triggered by an s3 file upload. However seems A cannot handle the data schema change.
if s3 trigger is good, then C and E are unnessessarily complexed. so I would go with B & D (despite the s3 trigger)
Comment 1188118 by lucas_rfsb
- Upvotes: 3
Selected Answer: BD I will go with BD
Comment 1165100 by Felix_G
- Upvotes: 2
Selected Answer: AB The two data pipeline solutions that will meet the requirements are:
A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
These solutions leverage AWS Glue to process and load the data from different file formats in the S3 bucket into the Amazon Redshift tables, while also handling changes to the data schema.
Comment 1139449 by evntdrvn76
- Upvotes: 1
The correct answers are A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables and B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. These solutions automate the ETL pipeline with minimal operational overhead.