Questions and Answers
Question uX3yFQqCjyO2vfCeJsqQ
Question
A company stores details about transactions in an Amazon S3 bucket. The company wants to log all writes to the S3 bucket into another S3 bucket that is in the same AWS Region. Which solution will meet this requirement with the LEAST operational effort?
Choices
- A: Configure an S3 Event Notifications rule for all activities on the transactions S3 bucket to invoke an AWS Lambda function. Program the Lambda function to write the event to Amazon Kinesis Data Firehose. Configure Kinesis Data Firehose to write the event to the logs S3 bucket.
- B: Create a trail of management events in AWS CloudTraiL. Configure the trail to receive data from the transactions S3 bucket. Specify an empty prefix and write-only events. Specify the logs S3 bucket as the destination bucket.
- C: Configure an S3 Event Notifications rule for all activities on the transactions S3 bucket to invoke an AWS Lambda function. Program the Lambda function to write the events to the logs S3 bucket.
- D: Create a trail of data events in AWS CloudTraiL. Configure the trail to receive data from the transactions S3 bucket. Specify an empty prefix and write-only events. Specify the logs S3 bucket as the destination bucket.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1138441 by rralucard_
- Upvotes: 5
Selected Answer: D https://docs.aws.amazon.com/AmazonS3/latest/userguide/logging-with-S3.html Option D, creating a trail of data events in AWS CloudTrail, is the best solution to meet the requirement with the least operational effort. It directly logs the desired activities to another S3 bucket and does not involve the development and maintenance of additional resources like Lambda functions or Kinesis Data Firehose streams.
Comment 1213880 by VerRi
- Upvotes: 5
Selected Answer: D A: Don’t need all activities on the S3 bucket B: Management events include not only the data log but also the admin log C: Don’t need all activities on the S3 bucket Option D with the LEAST operational effort
Comment 1203388 by khchan123
- Upvotes: 3
Selected Answer: D Correct answer is D.
Option A or C require writing custom Lambda code to handle the events and write them to the Kinesis or S3 bucket so they are not the LEAST operational effort.
Comment 1201421 by LanoraMoe
- Upvotes: 1
S3 object level activities such as GetObject, DeleteObject, PutObject etc are considered as Data event in cloud trail. Read and Write event be monitored separately.
Comment 1200117 by okechi
- Upvotes: 1
The correct answer is B - CloudTrail of management events includes logging set ups like this
Comment 1177174 by GiorgioGss
- Upvotes: 3
Although it might be tempting going with C, please keep in mind that if we go with C we must define lambda code, lambda permission, triggers, etc. If we go with D we just enable a trail data events and that’s pretty much it.
Comment 1164024 by Felix_G
- Upvotes: 2
Other Options were Less Efficient: A. Leverage S3 Event Notifications, Lambda function, and Kinesis Data Firehose: While this option works, it involves setting up and managing three services, increasing complexity and operational overhead. Kinesis Data Firehose introduces an unnecessary intermediary step, adding complexity for a straightforward logging task. B. Utilize CloudTrail with Management Events: CloudTrail primarily tracks API calls and management activities related to S3 buckets, not data events like writes to objects. Consequently, it wouldn’t capture the desired S3 bucket writes. D. Employ CloudTrail with Data Events: Similar to option B, CloudTrail with data events doesn’t track individual object writes within a bucket. It focuses on object-level changes like creation, deletion, or metadata modification.
Question afAuKhJsDol2yynjEBpQ
Question
A data engineer needs to maintain a central metadata repository that users access through Amazon EMR and Amazon Athena queries. The repository needs to provide the schema and properties of many tables. Some of the metadata is stored in Apache Hive. The data engineer needs to import the metadata from Hive into the central metadata repository. Which solution will meet these requirements with the LEAST development effort?
Choices
- A: Use Amazon EMR and Apache Ranger.
- B: Use a Hive metastore on an EMR cluster.
- C: Use the AWS Glue Data Catalog.
- D: Use a metastore on an Amazon RDS for MySQL DB instance.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1138463 by rralucard_
- Upvotes: 6
Selected Answer: C https://aws.amazon.com/blogs/big-data/metadata-classification-lineage-and-discovery-using-apache-atlas-on-amazon-emr/ Option C, using the AWS Glue Data Catalog, is the best solution to meet the requirements with the least development effort. The AWS Glue Data Catalog is designed to be a central metadata repository that can integrate with various AWS services including EMR and Athena, providing a managed and scalable solution for metadata management with built-in Hive compatibility.
Comment 1233897 by vic614
- Upvotes: 1
Selected Answer: C Data Catalog.
Comment 1164039 by Felix_G
- Upvotes: 2
Option C, using the AWS Glue Data Catalog, requires the least development effort to meet the requirements for a central metadata repository accessed from EMR and Athena.
Question yETEE7lxUq9avCDFgY7f
Question
A company needs to build a data lake in AWS. The company must provide row-level data access and column-level data access to specific teams. The teams will access the data by using Amazon Athena, Amazon Redshift Spectrum, and Apache Hive from Amazon EMR. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use Amazon S3 for data lake storage. Use S3 access policies to restrict data access by rows and columns. Provide data access through Amazon S3.
- B: Use Amazon S3 for data lake storage. Use Apache Ranger through Amazon EMR to restrict data access by rows and columns. Provide data access by using Apache Pig.
- C: Use Amazon Redshift for data lake storage. Use Redshift security policies to restrict data access by rows and columns. Provide data access by using Apache Spark and Amazon Athena federated queries.
- D: Use Amazon S3 for data lake storage. Use AWS Lake Formation to restrict data access by rows and columns. Provide data access through AWS Lake Formation.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1273735 by Shanmahi
- Upvotes: 1
Selected Answer: D Using Amazon S3 for storage and AWS Lake Formation for fine-grained access control like row-level or column-level access.
Comment 1266970 by cas_tori
- Upvotes: 3
Selected Answer: D this id D
Comment 1164051 by Felix_G
- Upvotes: 4
Option D is the best solution to meet the requirements with the least operational overhead.
Using Amazon S3 for storage and AWS Lake Formation for access control and data access delivers the following advantages:
S3 provides a highly durable, available, and scalable data lake storage layer Lake Formation enables fine-grained access control down to column and row-level Integrates natively with Athena, Redshift Spectrum, and EMR for simplified data access Fully managed service minimizes admin overhead vs self-managing Ranger or piecemeal solutions
Comment 1138475 by rralucard_
- Upvotes: 4
Selected Answer: D https://docs.aws.amazon.com/lake-formation/latest/dg/cbac-tutorial.html Option D, using Amazon S3 for data lake storage and AWS Lake Formation for access control, is the most suitable solution. It meets the requirements for row-level and column-level access control and integrates well with Amazon Athena, Amazon Redshift Spectrum, and Apache Hive on EMR, all with lower operational overhead compared to the other options.
Question Hiu68ITqAgklcR2HrngD
Question
A company created an extract, transform, and load (ETL) data pipeline in AWS Glue. A data engineer must crawl a table that is in Microsoft SQL Server. The data engineer needs to extract, transform, and load the output of the crawl to an Amazon S3 bucket. The data engineer also must orchestrate the data pipeline. Which AWS service or feature will meet these requirements MOST cost-effectively?
Choices
- A: AWS Step Functions
- B: AWS Glue workflows
- C: AWS Glue Studio
- D: Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1125629 by milofficial
- Upvotes: 9
Selected Answer: B Glue workflows are the easiest solution here:
Comment 1182435 by dev_vicente
- Upvotes: 7
Selected Answer: B I asked an AI. Analysis of the answers: A. AWS Step Functions: It is a good option for orchestrating workflows with steps from different AWS services, but requires additional development to connect to Microsoft SQL Server. B. AWS Glue Workflows: This is the best and most profitable option. AWS Glue is designed specifically for ETL on AWS and integrates directly with data sources such as Microsoft SQL Server through connectors. This allows for easier configuration and avoids the need for additional development. C. AWS Glue Studio: It is a visual interface for AWS Glue that makes it easy to create and manage ETL jobs. However, the underlying functionality comes from AWS Glue (B) workflows. D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA): It’s a viable option, but it’s generally more expensive than native AWS services like AWS Glue Workflows. Additionally, it requires some Airflow experience for setup and maintenance.
Comment 1291351 by Adrifersilva
- Upvotes: 1
Selected Answer: B https://community.aws/content/2iBQiAGS4RvEolgSQKu4iF8InTV/choose-the-right-data-orchestration-service-for-your-data-pipeline?lang=en
Comment 1288857 by Shubham1989
- Upvotes: 1
Selected Answer: B Glue is easiest here to choose from.
Comment 1205968 by DevoteamAnalytix
- Upvotes: 3
Selected Answer: B Agree with B. CRAWLING and ETL are the main functions of a Glue workflow and MS SQL is supported: https://docs.aws.amazon.com/glue/latest/dg/crawler-data-stores.html
Comment 1158037 by Alcee
- Upvotes: 1
Is B !
Question yynVfzCTEKiPlBbmcgWr
Question
An airline company is collecting metrics about flight activities for analytics. The company is conducting a proof of concept (POC) test to show how analytics can provide insights that the company can use to increase on-time departures. The POC test uses objects in Amazon S3 that contain the metrics in .csv format. The POC test uses Amazon Athena to query the data. The data is partitioned in the S3 bucket by date. As the amount of data increases, the company wants to optimize the storage solution to improve query performance. Which combination of solutions will meet these requirements? (Choose two.)
Choices
- A: Add a randomized string to the beginning of the keys in Amazon S3 to get more throughput across partitions.
- B: Use an S3 bucket that is in the same account that uses Athena to query the data.
- C: Use an S3 bucket that is in the same AWS Region where the company runs Athena queries.
- D: Preprocess the .csv data to JSON format by fetching only the document keys that the query requires.
- E: Preprocess the .csv data to Apache Parquet format by fetching only the data blocks that are needed for predicates.
answer?
Answer: CE Answer_ET: CE Community answer CE (100%) Discussion
Comment 1138486 by rralucard_
- Upvotes: 6
Selected Answer: CE https://docs.aws.amazon.com/athena/latest/ug/performance-tuning.html
Comment 1369677 by Ramdi1
- Upvotes: 1
Selected Answer: CE C - Reduces latency and network costs → When Athena queries S3 data in the same AWS Region, data does not cross AWS Regions, improving performance. Lower query execution time → No inter-region data transfer delays. Cost-Effective → AWS charges for cross-region data transfers, but querying within the same region avoids these costs.
E - Parquet is a columnar storage format → Queries can fetch only needed columns, reducing scanning costs.
Comment 1220065 by tgv
- Upvotes: 1
Selected Answer: CE I will go with C and E.
Comment 1193465 by matasejem
- Upvotes: 1
C is not mentioned anywhere in the https://docs.aws.amazon.com/athena/latest/ug/performance-tuning.html
Comment 1167502 by damaldon
- Upvotes: 1
Answer C and E