Questions and Answers

Question lF89nQz3ngxh6PM1Yc1R

Question

A data engineer wants to improve the performance of SQL queries in Amazon Athena that run against a sales data table.

The data engineer wants to understand the execution plan of a specific SQL statement. The data engineer also wants to see the computational cost of each operation in a SQL query.

Which statement does the data engineer need to run to meet these requirements?

Choices

A: EXPLAIN SELECT * FROM sales;
B: EXPLAIN ANALYZE FROM sales;
C: EXPLAIN ANALYZE SELECT * FROM sales;
D: EXPLAIN FROM sales;

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1233772 by FunkyFresco

Upvotes: 5

Selected Answer: C use EXPLAIN ANALIZE https://docs.aws.amazon.com/athena/latest/ug/athena-explain-statement.html

Comment 1241795 by HunkyBunky

Upvotes: 1

Selected Answer: C explain analyze + select * from table

Comment 1230811 by tgv

Upvotes: 4

Selected Answer: C A - Only partially meets the requirements as it does not include computational costs. B - Incorrect syntax and does not meet the requirements. C - Fully meets the requirements by providing both the execution plan and the computational costs. D - Incorrect syntax and does not meet the requirements.

Question ypALFrZLCZIyw8BIdHLU

Question

A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time. Which solution will run the Glue jobs in the MOST cost-effective way?

Choices

A: Choose the FLEX execution class in the Glue job properties.
B: Use the Spot Instance type in Glue job properties.
C: Choose the STANDARD execution class in the Glue job properties.
D: Choose the latest version in the GlueVersion field in the Glue job properties.

answer?

Answer: A Answer_ET: A Community answer A (100%) Discussion

Comment 1226790 by pypelyncar

Upvotes: 7

Selected Answer: A The FLEX execution class leverages spare capacity within the AWS infrastructure to run Glue jobs at a discounted price compared to the standard execution class. Since the data engineer doesn’t have specific time constraints, utilizing spare capacity is ideal for cost savings. Today’s date its a checkbox in order to spare capacity and will mean we dont know when is going to finish, which is recommended to increase a timeout

Comment 1137887 by TonyStark0122

Upvotes: 6

A. Choose the FLEX execution class in the Glue job properties.

Explanation: The FLEX execution class in AWS Glue allows jobs to use idle resources within the Glue service, which can significantly reduce costs compared to the STANDARD execution class. With FLEX, Glue jobs run when resources are available, which is a cost-effective approach for jobs that don’t need to be completed within a specific timeframe.

Comment 1254500 by GabrielSGoncalves

Upvotes: 1

Selected Answer: A FLEX is how you lower Glue cost when you dont have urgency to run ETLs.

Comment 1209087 by k350Secops

Upvotes: 3

Selected Answer: A As its said the FLEX job comes cheaper that hiring a spot instance

Comment 1188287 by lucas_rfsb

Upvotes: 1

Selected Answer: A I’d go with A

Comment 1137269 by lalitjhawar

Upvotes: 5

A Flex allows you to optimize your costs on your non-urgent or non-time sensitive data integration workloads such as testing, and one-time data loads. With Flex, AWS Glue jobs run on spare compute capacity instead of dedicated hardware. The start and runtimes of jobs using Flex can vary because spare compute resources aren’t readily available and can be reclaimed during the run of a job

https://aws.amazon.com/blogs/big-data/introducing-aws-glue-flex-jobs-cost-savings-on-etl-workloads/

Question XL69xrSDiTPxT7DGhuc4

Question

A company plans to provision a log delivery stream within a VPC. The company configured the VPC flow logs to publish to Amazon CloudWatch Logs. The company needs to send the flow logs to Splunk in near real time for further analysis.

Which solution will meet these requirements with the LEAST operational overhead?

Choices

A: Configure an Amazon Kinesis Data Streams data stream to use Splunk as the destination. Create a CloudWatch Logs subscription filter to send log events to the data stream.
B: Create an Amazon Kinesis Data Firehose delivery stream to use Splunk as the destination. Create a CloudWatch Logs subscription filter to send log events to the delivery stream.
C: Create an Amazon Kinesis Data Firehose delivery stream to use Splunk as the destination. Create an AWS Lambda function to send the flow logs from CloudWatch Logs to the delivery stream.
D: Configure an Amazon Kinesis Data Streams data stream to use Splunk as the destination. Create an AWS Lambda function to send the flow logs from CloudWatch Logs to the data stream.

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1230814 by tgv

Upvotes: 6

Selected Answer: B Kinesis Data Firehose has built-in support for Splunk as a destination, making the integration straightforward. Using a CloudWatch Logs subscription filter directly to Firehose simplifies the data flow, eliminating the need for additional Lambda functions or custom integrations.

Comment 1242310 by bakarys

Upvotes: 4

Selected Answer: B Creating an Amazon Kinesis Data Firehose delivery stream to use Splunk as the destination and creating a CloudWatch Logs subscription filter to send log events to the delivery stream would meet these requirements with the least operational overhead.

Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. It can capture, transform, and deliver streaming data to Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, generic HTTP endpoints, and service providers like Splunk.

CloudWatch Logs subscription filters allow you to send real-time log events to Kinesis Data Firehose and are ideal for scenarios where you want to forward the logs to other services for further analysis.

Options A and D involve Kinesis Data Streams, which would require additional management and operational overhead. Option C involves creating a Lambda function, which also adds operational overhead. Therefore, option B is the best choice.

Question Vn7FPeskwiBIURn1NckS

Question

A company has a data lake on AWS. The data lake ingests sources of data from business units. The company uses Amazon Athena for queries. The storage layer is Amazon S3 with an AWS Glue Data Catalog as a metadata repository.

The company wants to make the data available to data scientists and business analysts. However, the company first needs to manage fine-grained, column-level data access for Athena based on the user roles and responsibilities.

Which solution will meet these requirements?

Choices

A: Set up AWS Lake Formation. Define security policy-based rules for the users and applications by IAM role in Lake Formation.
B: Define an IAM resource-based policy for AWS Glue tables. Attach the same policy to IAM user groups.
C: Define an IAM identity-based policy for AWS Glue tables. Attach the same policy to IAM roles. Associate the IAM roles with IAM groups that contain the users.
D: Create a resource share in AWS Resource Access Manager (AWS RAM) to grant access to IAM users.

answer?

Answer: A Answer_ET: A Community answer A (100%) Discussion

Comment 1244008 by Ja13

Upvotes: 6

Selected Answer: A Correct Solution: A. Set up AWS Lake Formation. Define security policy-based rules for the users and applications by IAM role in Lake Formation.

Explanation: AWS Lake Formation: This service simplifies and automates the process of securing and managing data lakes. It allows you to define fine-grained access control policies at the database, table, and column levels. Security Policy-Based Rules: Lake Formation allows you to create policies that specify which users or roles have access to specific data, including column-level access controls. This makes it easier to manage access based on roles and responsibilities.

Comment 1329435 by HagarTheHorrible

Upvotes: 1

Selected Answer: A A lake formation for any fine-grained access

Comment 1234146 by HunkyBunky

Upvotes: 1

Selected Answer: A A - Lake formation

Comment 1230815 by tgv

Upvotes: 4

Selected Answer: A Lake Formation supports fine-grained access control, including column-level permissions.

Question n5PPm7nmYeLrZ1gXh6P1

Question

A company has developed several AWS Glue extract, transform, and load (ETL) jobs to validate and transform data from Amazon S3. The ETL jobs load the data into Amazon RDS for MySQL in batches once every day. The ETL jobs use a DynamicFrame to read the S3 data.

The ETL jobs currently process all the data that is in the S3 bucket. However, the company wants the jobs to process only the daily incremental data.

Which solution will meet this requirement with the LEAST coding effort?

Choices

A: Create an ETL job that reads the S3 file status and logs the status in Amazon DynamoDB.
B: Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.
C: Enable job metrics for the ETL jobs to help keep track of processed objects in Amazon CloudWatch.
D: Configure the ETL jobs to delete processed objects from Amazon S3 after each run.

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1230816 by tgv

Upvotes: 8

Selected Answer: B AWS Glue job bookmarks are designed to handle incremental data processing by automatically tracking the state.

Comment 1249191 by andrologin

Upvotes: 1

Selected Answer: B AWS Glue Bookmarks can be used to pin where the data processing last stopped hence help with incremental processing.

Comment 1241794 by HunkyBunky

Upvotes: 1

Selected Answer: B B - bookmarks is a key

Comment 1240723 by bakarys

Upvotes: 3

Selected Answer: B The solution that will meet this requirement with the least coding effort is Option B: Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.

AWS Glue job bookmarks help ETL jobs to keep track of data that has already been processed during previous runs. By enabling job bookmarks, the ETL jobs can skip the processed data and only process the new, incremental data. This feature is designed specifically for this use case and requires minimal coding effort.

Options A, C, and D would require additional coding and operational effort. Option A would require creating a new ETL job and managing a DynamoDB table. Option C would involve setting up job metrics and CloudWatch, which doesn’t directly address processing incremental data. Option D would involve deleting data from S3 after processing, which might not be desirable if the original data needs to be retained. Therefore, Option B is the most suitable solution.

vuthanhdatt's Second Brain

Explorer

Associate-DEA-C01_40

Questions and Answers

Question lF89nQz3ngxh6PM1Yc1R

Question

Choices

Comment 1233772 by FunkyFresco

Comment 1241795 by HunkyBunky

Comment 1230811 by tgv

Question ypALFrZLCZIyw8BIdHLU

Question

Choices

Comment 1226790 by pypelyncar

Comment 1137887 by TonyStark0122

Comment 1254500 by GabrielSGoncalves

Comment 1209087 by k350Secops

Comment 1188287 by lucas_rfsb

Comment 1137269 by lalitjhawar

Question XL69xrSDiTPxT7DGhuc4

Question

Choices

Comment 1230814 by tgv

Comment 1242310 by bakarys

Question Vn7FPeskwiBIURn1NckS

Question

Choices

Comment 1244008 by Ja13

Comment 1329435 by HagarTheHorrible

Comment 1234146 by HunkyBunky

Comment 1230815 by tgv

Question n5PPm7nmYeLrZ1gXh6P1

Question

Choices

Comment 1230816 by tgv

Comment 1249191 by andrologin

Comment 1241794 by HunkyBunky

Comment 1240723 by bakarys

Graph View

Table of Contents