Questions and Answers
Question 17yAKeHZ8hrdV7tVD31d
Question
A data engineer is processing and analyzing multiple terabytes of raw data that is in Amazon S3. The data engineer needs to clean and prepare the data. Then the data engineer needs to load the data into Amazon Redshift for analytics.
The data engineer needs a solution that will give data analysts the ability to perform complex queries. The solution must eliminate the need to perform complex extract, transform, and load (ETL) processes or to manage infrastructure.
Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use Amazon EMR to prepare the data. Use AWS Step Functions to load the data into Amazon Redshift. Use Amazon QuickSight to run queries.
- B: Use AWS Glue DataBrew to prepare the data. Use AWS Glue to load the data into Amazon Redshift. Use Amazon Redshift to run queries.
- C: Use AWS Lambda to prepare the data. Use Amazon Kinesis Data Firehose to load the data into Amazon Redshift. Use Amazon Athena to run queries.
- D: Use AWS Glue to prepare the data. Use AWS Database Migration Service (AVVS DMS) to load the data into Amazon Redshift. Use Amazon Redshift Spectrum to run queries.
answer?
Answer: B Answer_ET: B Community answer B (67%) D (33%) Discussion
Comment 1272036 by teo2157
- Upvotes: 1
Selected Answer: B It can´t be D as DMS doesn´t support S3 as a source, it’s B as it achieve all the goals described in the subject.
Comment 1270475 by seouk
- Upvotes: 1
Selected Answer: D the LEAST operational overhead …
Comment 1264184 by catoteja
- Upvotes: 1
Selected Answer: B B. They can do the “complex” queries in redshift.
Comment 1261866 by phkhadse
- Upvotes: 2
Option B
Question 2Rd77GRTCqBJVTIpJGvi
Question
A company uses an AWS Lambda function to transfer files from a legacy SFTP environment to Amazon S3 buckets. The Lambda function is VPC enabled to ensure that all communications between the Lambda function and other AVS services that are in the same VPC environment will occur over a secure network.
The Lambda function is able to connect to the SFTP environment successfully. However, when the Lambda function attempts to upload files to the S3 buckets, the Lambda function returns timeout errors. A data engineer must resolve the timeout issues in a secure way.
Which solution will meet these requirements in the MOST cost-effective way?
Choices
- A: Create a NAT gateway in the public subnet of the VPC. Route network traffic to the NAT gateway.
- B: Create a VPC gateway endpoint for Amazon S3. Route network traffic to the VPC gateway endpoint.
- C: Create a VPC interface endpoint for Amazon S3. Route network traffic to the VPC interface endpoint.
- D: Use a VPC internet gateway to connect to the internet. Route network traffic to the VPC internet gateway.
answer?
Answer: B Answer_ET: B Community answer B (86%) 14% Discussion
Comment 1262108 by ArunRav
- Upvotes: 6
Selected Answer: B Option B - VPC Gateway Endpoint for Amazon S3
Comment 1261865 by phkhadse
- Upvotes: 5
Option B - VPC Gateway Endpoint for Amazon S3 While interface endpoints is a viable solution, it can be more complex and expensive compared to a gateway endpoint. VPC interface endpoints charge per hour and per gigabyte of data transferred.
Comment 1271020 by Ashishk1
- Upvotes: 1
Selected Answer: C The solution that will meet the requirements of resolving the timeout issues when uploading files from the Lambda function to Amazon S3 buckets in a secure and cost-effective way is C. Create a VPC interface endpoint for Amazon S3. Route network traffic to the VPC interface endpoint .
Question IzBIvnb6XUfBu4cUw1gs
Question
A company reads data from customer databases that run on Amazon RDS. The databases contain many inconsistent fields. For example, a customer record field that iPnamed place_id in one database is named location_id in another database. The company needs to link customer records across different databases, even when customer record fields do not match.
Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Create a provisioned Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook. Use the FindMatches transform to find duplicate records in the data.
- B: Create an AWS Glue crawler to craw the databases. Use the FindMatches transform to find duplicate records in the data. Evaluate and tune the transform by evaluating the performance and results.
- C: Create an AWS Glue crawler to craw the databases. Use Amazon SageMaker to construct Apache Spark ML pipelines to find duplicate records in the data.
- D: Create a provisioned Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook. Use an Apache Spark ML model to find duplicate records in the data. Evaluate and tune the model by evaluating the performance and results.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1328353 by HagarTheHorrible
- Upvotes: 1
Selected Answer: B AWS Glue Crawler: Automatically discovers the schema and structure of data in the RDS databases, saving significant manual effort. Creates a unified data catalog that can be queried or transformed.
Comment 1263249 by komorebi
- Upvotes: 4
Selected Answer: B Answer is B
Question ATffmBNIqO3Vhjb7e0c0
Question
A finance company receives data from third-party data providers and stores the data as objects in an Amazon S3 bucket.
The company ran an AWS Glue crawler on the objects to create a data catalog. The AWS Glue crawler created multiple tables. However, the company expected that the crawler would create only one table.
The company needs a solution that will ensure the AVS Glue crawler creates only one table.
Which combination of solutions will meet this requirement? (Choose two.)
Choices
- A: Ensure that the object format, compression type, and schema are the same for each object.
- B: Ensure that the object format and schema are the same for each object. Do not enforce consistency for the compression type of each object.
- C: Ensure that the schema is the same for each object. Do not enforce consistency for the file format and compression type of each object.
- D: Ensure that the structure of the prefix for each S3 object name is consistent.
- E: Ensure that all S3 object names follow a similar pattern.
answer?
Answer: AD Answer_ET: AD Community answer AD (83%) AB (17%) Discussion
Comment 1346612 by Salam9
- Upvotes: 1
Selected Answer: AD I have seen this official answer in the practical exam in the AWS Skills builder website
Comment 1338145 by kailu
- Upvotes: 1
Selected Answer: AB D focuses on the S3 prefix structure, which affects partitioning but not the creation of a single table. Consistency in file format and schema is much more important in determining how AWS Glue handles the data.
Comment 1263252 by komorebi
- Upvotes: 1
Selected Answer: AD Answer is AD
Comment 1262806 by teo2157
- Upvotes: 3
Selected Answer: AD To ensure that the AWS Glue crawler creates only one table and handles the object format, compression type, schema, and prefix structure consistently: Ensure Consistent Object Format, Compression Type, Schema, and Prefix Structure
- Consistent Object Format:
Ensure that all objects in the S3 bucket are in the same format (e.g., CSV, JSON, Parquet).
- Consistent Compression Type:
Ensure that all objects use the same compression type (e.g., GZIP, Snappy).
- Consistent Schema:
Ensure that all objects have the same schema (i.e., the same fields with the same data types).
- Consistent Prefix Structure:
- Ensure that all objects follow a consistent naming convention and prefix structure in the S3 bucket (e.g.,
s3://your-bucket/path/to/data/).
Question IzQseynAwdztRgTfMupX
Question
An application consumes messages from an Amazon Simple Queue Service (Amazon SQS) queue. The application experiences occasional downtime. As a result of the downtime, messages within the queue expire and are deleted after 1 day. The message deletions cause data loss for the application.
Which solutions will minimize data loss for the application? (Choose two.)
Choices
- A: Increase the message retention period
- B: Increase the visibility timeout.
- C: Attach a dead-letter queue (DLQ) to the SQS queue.
- D: Use a delay queue to delay message delivery
- E: Reduce message processing time.
answer?
Answer: AC Answer_ET: AC Community answer AC (63%) AE (38%) Discussion
Comment 1331279 by axantroff
- Upvotes: 1
Selected Answer: AE In my opinion, A is obvious and one of the two correct answers. Additionally, I checked B, C, and D in more detail, and they basically do not make sense as they do not contribute in any way to handling messages that were just delayed. See the documentation for reference:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html https://aws.amazon.com/what-is/dead-letter-queue/ So, only E remains as another valid option. It makes sense because the faster we are able to process events, the less likely we are to violate the expiration policy
Comment 1328359 by HagarTheHorrible
- Upvotes: 1
Selected Answer: AC Increasing the message retention period (A) ensures messages are available longer, while attaching a dead-letter queue (C) allows recovery and reprocessing of unprocessed messages, effectively minimizing data loss.
Comment 1323908 by altonh
- Upvotes: 2
Selected Answer: AE It cannot be C. Messages go to DLQ only if processed. But if the message is not processed at all and it expires, then it will be deleted from the queue.
Comment 1265754 by aragon_saa
- Upvotes: 1
Selected Answer: AC Answer is AC
Comment 1265671 by matt200
- Upvotes: 3
Selected Answer: AC To minimize data loss for the application consuming messages from an Amazon SQS queue, the following two solutions are most effective:
A. Increase the message retention period**: By increasing the message retention period, you ensure that messages remain in the queue for a longer duration before being automatically deleted. This provides more time for the application to recover from downtime and process the messages, thereby reducing the chance of data loss due to message expiration.
C. Attach a dead-letter queue (DLQ) to the SQS queue**: A DLQ can be used to capture messages that cannot be processed successfully. When messages fail to be processed after a certain number of attempts (as defined by the redrive policy), they are moved to the DLQ. This allows you to investigate and handle these messages separately, preventing data loss.