Questions and Answers
Question ufiWl4fuwq6KVdcyov4B
Question
A data engineer must orchestrate a series of Amazon Athena queries that will run every day. Each query can run for more than 15 minutes. Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)
Choices
- A: Use an AWS Lambda function and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.
- B: Create an AWS Step Functions workflow and add two states. Add the first state before the Lambda function. Configure the second state as a Wait state to periodically check whether the Athena query has finished using the Athena Boto3 get_query_execution API call. Configure the workflow to invoke the next query when the current query has finished running.
- C: Use an AWS Glue Python shell job and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.
- D: Use an AWS Glue Python shell script to run a sleep timer that checks every 5 minutes to determine whether the current Athena query has finished running successfully. Configure the Python shell script to invoke the next query when the current query has finished running.
- E: Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the Athena queries in AWS Batch.
answer?
Answer: AB Answer_ET: AB Community answer AB (66%) BE (16%) Other Discussion
Comment 1140646 by rralucard_
- Upvotes: 10
Selected Answer: AB AWS Lambda can be effectively used to trigger Athena queries. By using the start_query_execution API from the Athena Boto3 client, you can programmatically start Athena queries. Lambda functions are cost-effective as they charge based on the compute time used, and there’s no charge when the code is not running. However, Lambda has a maximum execution timeout of 15 minutes, which means it’s not suitable for long-running operations but can be used to trigger or start queries. AWS Step Functions can orchestrate multiple AWS services in workflows. By using a Wait state, the workflow can periodically check the status of the Athena query, and proceed to the next step once the query is complete. This approach is more scalable and reliable compared to continuously running a Lambda function, as Step Functions can handle long-running processes better and can maintain the state of each step in the workflow.
Comment 1167986 by GiorgioGss
- Upvotes: 8
Selected Answer: BE B - because https://docs.aws.amazon.com/step-functions/latest/dg/sample-athena-query.html E - because https://aws.amazon.com/blogs/big-data/orchestrate-amazon-emr-serverless-spark-jobs-with-amazon-mwaa-and-data-validation-using-amazon-athena/
Comment 1351766 by Evan_Lin
- Upvotes: 2
Selected Answer: AB After real-world testing, A is a valid answer. This is because the Lambda only sends the API request to Athena, which runs the query. Even if the Lambda times out, the query result is still stored in the designated S3 bucket.
Comment 1339175 by Udyan
- Upvotes: 2
Selected Answer: AB Why? B (Step Functions): Step Functions are ideal for orchestrating long-running workflows, including polling the Athena query status and invoking the next query when ready. A (Lambda): Lambda is used to programmatically trigger Athena queries within Step Functions, despite its 15-minute limitation, because Step Functions can manage the long runtime using Wait states. Why Not C, D, or E? C and D involve Glue, which is better suited for ETL jobs than orchestration, making them less efficient and cost-effective. E (Amazon MWAA) introduces unnecessary cost and complexity for a straightforward workflow.
Comment 1326504 by haby
- Upvotes: 4
Selected Answer: BC BC for me A - lambda function will stop at 900s, so it will stop before query finishes(more than 15mins) E - Airflow is way more complex and expensive than step function
Comment 1321633 by altonh
- Upvotes: 1
Selected Answer: CE AB - Because of the Lambda timeout
CE—is correct. The query will be executed by a glue job, which will be orchestrated by Airflow. The job will be scheduled using AWS Batch.
Comment 1320504 by Eleftheriia
- Upvotes: 2
Selected Answer: AB Not E because “You should use Step Functions if you prioritize cost and performance” https://aws.amazon.com/managed-workflows-for-apache-airflow/faqs/
And also the fact that the queries take longer than 15 min can be handled with step functions, therefore AB
Comment 1313060 by truongnguyen86
- Upvotes: 1
A.Why it’s correct: AWS Lambda is a cost-effective, serverless option for invoking Athena queries using the Boto3 API. Lambda charges are based on execution time and memory usage, making it an efficient solution for periodic query execution. B. Why it’s correct: Step Functions provide a serverless orchestration option with a pay-per-use pricing model. Adding a Wait state prevents excessive API calls and ensures queries are executed in sequence, making it a cost-effective and scalable solution. Why the other options are less optimal:
— E. Use Amazon Managed Workflows for Apache Airflow (MWAA): MWAA is powerful for complex workflows, but its pricing includes environment uptime costs, which can be higher than Lambda and Step Functions for simple tasks like orchestrating Athena queries. By choosing A and B, you balance cost-effectiveness and simplicity for orchestrating daily Athena queries.
Comment 1270616 by San_Juan
- Upvotes: 2
Selected Answer: BD
Lambda maximum timeout is 15 minutes. So the query takes more than Lambda could manage. So you cannot use lambda. Use Step-Function (answer B) or glue python (answer D) Airflow is more expensive than Glue/Step-Functions, so E is discarted also.
Comment 1260934 by V0811
- Upvotes: 2
Selected Answer: AB It should be AB
Comment 1239594 by alex1991
- Upvotes: 2
Selected Answer: AB Since the Athena API supports async/await, users are able to separate the steps into trigger queries and get results after 15 minutes.
Comment 1227000 by pypelyncar
- Upvotes: 1
Selected Answer: BE tricky, A is valid. Still, cost effective: B no one doubt on it. then why E? MWAA offers a managed Apache Airflow environment for orchestrating complex workflows. It can handle long-running tasks like Athena queries efficiently. Batch Processing: Leveraging AWS Batch within the Airflow workflow allows for distributed and scalable execution of the Athena queries, improving overall processing efficiency.
Comment 1218436 by valuedate
- Upvotes: 2
Selected Answer: AB my opinian
Comment 1215632 by valuedate
- Upvotes: 2
Selected Answer: AB I would prefer AB
Comment 1213595 by VerRi
- Upvotes: 3
Selected Answer: AB Lambda for kick start Athena Step Functions for orchestration
Comment 1206425 by sdas1
- Upvotes: 1
Option C and D involve using an AWS Glue Python shell script to run a sleep timer and periodically check whether the current Athena query has finished running. While this approach might seem cost-effective in terms of using AWS Glue, it’s not the most efficient way to manage the execution of Athena queries. AWS Glue is primarily designed for ETL (Extract, Transform, Load) tasks rather than orchestrating long-running query execution.
Therefore, while both options B, C and D could technically work, they might not be the most cost-effective or efficient solutions for orchestrating long-running Athena queries. Instead, options A and E would likely be more cost-effective and suitable for this scenario.
Comment 1195283 by Christina666
- Upvotes: 4
Selected Answer: AB Lambda call Athena query; Step function orchestrate query workflow
Comment 1186100 by arvehisa
- Upvotes: 4
Selected Answer: AB A: Lambda is a good option and it only trigger the athena not actually run it. No need 15 min for it. B. it mentioned a series of athena queries and it may means that one query should wait until the former one finished. B is the perfect way to do it. And lambda and step functions are very cost effective.
Comment 1182108 by cd93
- Upvotes: 2
Selected Answer: CD Remember, anything involves writing codes are gonna be cheaper than automated/UI guided workflows, so that left ACD, and aws Lambda can’t run for more than 15 mins so CD.
Guys, go take the associate architect certificate first… this is basic knowledge… stop spamming chatgpt (wrongly) generated answer
Comment 1178514 by certplan
Upvotes: 1
- AWS Glue Documentation:
- AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics. AWS Glue offers capabilities for running Python shell jobs, which can be used to execute custom scripts for various data processing tasks. The documentation provides details on how to create and manage Python shell jobs, including examples of using scripts to interact with AWS services like Athena.
- Reference: [AWS Glue Documentation]https://docs.aws.amazon.com/glue/index.html
Comment 1178513 by certplan
- Upvotes: 2
To justify the selection of options B and D as the most cost-effective combination for orchestrating Amazon Athena queries, let’s refer to the official AWS documentation:
- AWS Step Functions Documentation:
- AWS Step Functions is a fully managed service provided by AWS for coordinating the components of distributed applications and microservices using visual workflows. With Step Functions, you can build workflows that execute a sequence of AWS Lambda functions, API calls, and other AWS services. The documentation provides information on how to create workflows, define states, and configure wait states for checking the status of tasks, which aligns with the requirements of orchestrating Amazon Athena queries.
- Reference: [AWS Step Functions Documentation]https://docs.aws.amazon.com/step-functions/index.html
Comment 1178509 by certplan
- Upvotes: 2
Option E: Using Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate Athena queries in AWS Batch.
While Amazon MWAA provides managed Apache Airflow environments, which can be used for orchestrating workflows, it might not be the most cost-effective option for orchestrating Athena queries due to:
Complexity: Setting up and managing an Amazon MWAA environment can introduce additional complexity and potentially higher costs compared to other options.
Resource Allocation: Amazon MWAA environments come with a minimum cost, regardless of usage, and managing resources in AWS Batch might not be as cost-efficient for this specific use case compared to simpler solutions like Step Functions or Glue Python shell scripts.
Reference: [Amazon Managed Workflows for Apache Airflow Documentation]https://docs.aws.amazon.com/mwaa/index.html
Comment 1176607 by milofficial
- Upvotes: 2
Selected Answer: AB I changed my mind
Comment 1167978 by GiorgioGss
- Upvotes: 3
Guys… stop pasting from GPT… paste some official docs to prove your choice of options.
Comment 1164388 by CalvinL4
- Upvotes: 1
Lambda is out because it cannot run over 15 min.
Comment 1147076 by BartoszGolebiowski24
- Upvotes: 2
Selected Answer: AB We do not need to wait till Athena completes the query, so we will not reach a 15-minute timeout hard limit. So “A” is valid, we will use the step function to orchestrate the process.
Comment 1137948 by TonyStark0122
- Upvotes: 1
A. Use an AWS Lambda function and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically. B. Create an AWS Step Functions workflow and add two states. Add the first state before the Lambda function. Configure the second state as a Wait state to periodically check whether the Athena query has finished using the Athena Boto3 get_query_execution API call. Configure the workflow to invoke the next query when the current query has finished running.
Comment 1127228 by milofficial
- Upvotes: 3
Selected Answer: BC https://docs.aws.amazon.com/step-functions/latest/dg/sample-athena-query.html
Question mou0rOijjMVKHDEU7z5Z
Question
A data engineer wants to orchestrate a set of extract, transform, and load (ETL) jobs that run on AWS. The ETL jobs contain tasks that must run Apache Spark jobs on Amazon EMR, make API calls to Salesforce, and load data into Amazon Redshift.
The ETL jobs need to handle failures and retries automatically. The data engineer needs to use Python to orchestrate the jobs.
Which service will meet these requirements?
Choices
- A: Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
- B: AWS Step Functions
- C: AWS Glue
- D: Amazon EventBridge
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1308283 by Eleftheriia
- Upvotes: 7
Selected Answer: A Even though both MWAA and Step functions can be used for managing task failures, MWAA is more suitable since the engineer would like to use python to orchestrate jobs. Usually, Step functions is used for minimal infrastructure management
Comment 1307111 by kupo777
- Upvotes: 1
Correct Answer: B
AWS Step Functions allows you to build serverless workflows that coordinate various AWS services. It supports integrating with EMR for running Spark jobs, making API calls (including to Salesforce), and loading data into Redshift. Step Functions provide built-in error handling and retry capabilities, making it easier to manage failures in your workflow. Additionally, you can use AWS SDK for Python (Boto3) to interact with Step Functions, enabling you to write your orchestration logic in Python.
Question A2omQ4JSy3CrcGVNqyed
Question
A data engineer maintains custom Python scripts that perform a data formatting process that many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the data engineer must manually update all the Lambda functions.
The data engineer requires a less manual way to update the Lambda functions.
Which solution will meet this requirement?
Choices
- A: Store the custom Python scripts in a shared Amazon S3 bucket. Store a pointer to the custom scripts in the execution context object.
- B: Package the custom Python scripts into Lambda layers. Apply the Lambda layers to the Lambda functions.
- C: Store the custom Python scripts in a shared Amazon S3 bucket. Store a pointer to the customer scripts in environment variables.
- D: Assign the same alias to each Lambda function. Call each Lambda function by specifying the function’s alias.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1330751 by HagarTheHorrible
- Upvotes: 2
Selected Answer: B text book example of Lambda Layers …
Comment 1307107 by kupo777
- Upvotes: 3
Correct Answer: B
Lambda layers allow you to package common code and dependencies that can be shared across multiple Lambda functions. By placing the custom Python scripts in a layer, you can update the layer once and then update the version used by each Lambda function without needing to modify the function code directly. This approach reduces redundancy, streamlines updates, and ensures that all functions using the layer have access to the latest version of the scripts with minimal manual effort.
Question LDjobmz7wVUAn3cPJ2zW
Question
A company stores customer data in an Amazon S3 bucket. Multiple teams in the company want to use the customer data for downstream analysis. The company needs to ensure that the teams do not have access to personally identifiable information (PII) about the customers.
Which solution will meet this requirement with LEAST operational overhead?
Choices
- A: Use Amazon Macie to create and run a sensitive data discovery job to detect and remove PII.
- B: Use S3 Object Lambda to access the data, and use Amazon Comprehend to detect and remove PII.
- C: Use Amazon Data Firehose and Amazon Comprehend to detect and remove PII.
- D: Use an AWS Glue DataBrew job to store the PII data in a second S3 bucket. Perform analysis on the data that remains in the original S3 bucket.
answer?
Answer: B Answer_ET: B Community answer B (90%) 10% Discussion
Comment 1330752 by HagarTheHorrible
- Upvotes: 3
Selected Answer: B it is not A, Macie can only detect the PII
Comment 1328237 by WarPig666
- Upvotes: 2
Selected Answer: B A can’t be correct. Macie can discover PII, but not automatically redact it.
Comment 1327836 by paali
- Upvotes: 2
Selected Answer: B Macie will only detect sensitive data, it can’t redact it. So, we can use option B
With S3 Object Lambda and a prebuilt AWS Lambda function powered by Amazon Comprehend, you can protect PII data retrieved from S3 before returning it to an application.
Comment 1326768 by 7a1d491
- Upvotes: 1
Selected Answer: A Amazon Macie is a fully managed data security and privacy service that uses machine learning to automatically discover, classify, and protect sensitive data, including personally identifiable information (PII). By running a sensitive data discovery job in Macie, the company can automatically identify PII in the S3 bucket and provide actionable insights to help secure it. The operational overhead is minimized because Macie handles the discovery and classification of PII automatically.
Comment 1313981 by michele_scar
- Upvotes: 2
Selected Answer: B https://docs.aws.amazon.com/AmazonS3/latest/userguide/tutorial-s3-object-lambda-redact-pii.html
Comment 1307102 by kupo777
- Upvotes: 3
Correct Answer: A
Amazon Macie is designed specifically for discovering and protecting sensitive data within AWS environments. It automates the process of identifying PII in your S3 buckets, allowing you to create jobs that can regularly scan for and manage sensitive information. This approach minimizes manual effort and integrates well into existing workflows, providing ongoing protection without requiring additional infrastructure or complex setups.
Question CY0ZKEDRKup1HszKaefN
Question
A company stores its processed data in an S3 bucket. The company has a strict data access policy. The company uses IAM roles to grant teams within the company different levels of access to the S3 bucket.
The company wants to receive notifications when a user violates the data access policy. Each notification must include the username of the user who violated the policy.
Which solution will meet these requirements?
Choices
- A: Use AWS Config rules to detect violations of the data access policy. Set up compliance alarms.
- B: Use Amazon CloudWatch metrics to gather object-level metrics. Set up CloudWatch alarms.
- C: Use AWS CloudTrail to track object-level events for the S3 bucket. Forward events to Amazon CloudWatch to set up CloudWatch alarms.
- D: Use Amazon S3 server access logs to monitor access to the bucket. Forward the access logs to an Amazon CloudWatch log group. Use metric filters on the log group to set up CloudWatch alarms.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1330754 by HagarTheHorrible
- Upvotes: 2
Selected Answer: C for monitoring API calls use CloutTrial, it is that simple
Comment 1307098 by kupo777
- Upvotes: 2
C is correct.