Questions and Answers
Question O103GC1OQDMqnJVGxkaZ
Question
A company uses AWS Step Functions to orchestrate a data pipeline. The pipeline consists of Amazon EMR jobs that ingest data from data sources and store the data in an Amazon S3 bucket. The pipeline also includes EMR jobs that load the data to Amazon Redshift. The company’s cloud infrastructure team manually built a Step Functions state machine. The cloud infrastructure team launched an EMR cluster into a VPC to support the EMR jobs. However, the deployed Step Functions state machine is not able to run the EMR jobs. Which combination of steps should the company take to identify the reason the Step Functions state machine is not able to run the EMR jobs? (Choose two.)
Choices
- A: Use AWS CloudFormation to automate the Step Functions state machine deployment. Create a step to pause the state machine during the EMR jobs that fail. Configure the step to wait for a human user to send approval through an email message. Include details of the EMR task in the email message for further analysis.
- B: Verify that the Step Functions state machine code has all IAM permissions that are necessary to create and run the EMR jobs. Verify that the Step Functions state machine code also includes IAM permissions to access the Amazon S3 buckets that the EMR jobs use. Use Access Analyzer for S3 to check the S3 access properties.
- C: Check for entries in Amazon CloudWatch for the newly created EMR cluster. Change the AWS Step Functions state machine code to use Amazon EMR on EKS. Change the IAM access policies and the security group configuration for the Step Functions state machine code to reflect inclusion of Amazon Elastic Kubernetes Service (Amazon EKS).
- D: Query the flow logs for the VPC. Determine whether the traffic that originates from the EMR cluster can successfully reach the data providers. Determine whether any security group that might be attached to the Amazon EMR cluster allows connections to the data source servers on the informed ports.
- E: Check the retry scenarios that the company configured for the EMR jobs. Increase the number of seconds in the interval between each EMR task. Validate that each fallback state has the appropriate catch for each decision state. Configure an Amazon Simple Notification Service (Amazon SNS) topic to store the error messages.
answer?
Answer: BD Answer_ET: BD Community answer BD (81%) Other Discussion
Comment 1138098 by rralucard_
- Upvotes: 5
Selected Answer: BD https://docs.aws.amazon.com/step-functions/latest/dg/procedure-create-iam-role.html https://docs.aws.amazon.com/step-functions/latest/dg/service-integration-iam-templates.html
Comment 1177082 by GiorgioGss
- Upvotes: 5
Selected Answer: BD Permissions of course and we need to see if the traffic is blocked at any hops because they mention that EMR is IN vpc so… flow-logs
Comment 1426383 by sam_pre
- Upvotes: 1
Selected Answer: DE E> As par as I know, Step function does not require S3 access permission that EMR trying to access. so that eliminates E D and E make sense while E is bit less likely troubleshooting, but still valid
Comment 1188715 by lucas_rfsb
- Upvotes: 3
Selected Answer: BD I’d go in BD
Comment 1173453 by kj07
- Upvotes: 1
B&D. E is not an option to identify the failure reason.
Comment 1134471 by atu1789
- Upvotes: 2
Selected Answer: BE BE. In others are are redflag keywords
Question VthOUreYtkaYTJitsZwQ
Question
A company is developing an application that runs on Amazon EC2 instances. Currently, the data that the application generates is temporary. However, the company needs to persist the data, even if the EC2 instances are terminated. A data engineer must launch new EC2 instances from an Amazon Machine Image (AMI) and configure the instances to preserve the data. Which solution will meet this requirement?
Choices
- A: Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume that contains the application data. Apply the default settings to the EC2 instances.
- B: Launch new EC2 instances by using an AMI that is backed by a root Amazon Elastic Block Store (Amazon EBS) volume that contains the application data. Apply the default settings to the EC2 instances.
- C: Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume. Attach an Amazon Elastic Block Store (Amazon EBS) volume to contain the application data. Apply the default settings to the EC2 instances.
- D: Launch new EC2 instances by using an AMI that is backed by an Amazon Elastic Block Store (Amazon EBS) volume. Attach an additional EC2 instance store volume to contain the application data. Apply the default settings to the EC2 instances.
answer?
Answer: C Answer_ET: C Community answer C (73%) B (27%) Discussion
Comment 1203051 by khchan123
- Upvotes: 12
Selected Answer: C CCCCCCC - you need to attach an extra EBS volume
When an instance terminates, the value of the DeleteOnTermination attribute for each attached EBS volume determines whether to preserve or delete the volume. By default, the DeleteOnTermination attribute is set to True for the root volume. ref: https://repost.aws/knowledge-center/deleteontermination-ebs
Comment 1215820 by hnk
- Upvotes: 5
Selected Answer: C C is correct
Comment 1364014 by saqib839
- Upvotes: 2
Selected Answer: C The correct answer is C.
Explanation:
When you launch an EC2 instance from an AMI, the root volume’s DeleteOnTermination attribute is set to True by default, which means the data on that volume will be deleted when the instance is terminated. To persist data beyond the lifetime of the instance without changing any settings, you should store the data on an additional (non-root) EBS volume because non-root volumes are not automatically deleted on termination.
Comment 1364012 by saqib839
- Upvotes: 2
Selected Answer: B The correct answer is B.
Explanation:
Amazon EBS volumes are persistent, meaning that the data remains intact even after an EC2 instance is terminated, provided that the volume isn’t set to be deleted on termination. By using an AMI that is backed by a root Amazon EBS volume that contains the application data, the data engineer ensures that the application data is stored persistently. In contrast, EC2 instance store volumes are ephemeral and would lose data when the instance terminates.
Comment 1359687 by Chanduchanti
- Upvotes: 2
Selected Answer: C When an instance terminates, the value of the DeleteOnTermination attribute for each attached EBS volume determines whether to preserve or delete the volume. By default, the DeleteOnTermination attribute is set to True for the root volume.
Comment 1357179 by saransh_001
- Upvotes: 3
Selected Answer: C Check in the option B and C the default settings are mentioned. By default an EC2 instance whenever terminates, its root volume also gets terminated. So launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume. Attach an Amazon Elastic Block Store (Amazon EBS) volume to contain the application data. Apply the default settings to the EC2 instances.
Comment 1300404 by mohamedTR
- Upvotes: 4
Selected Answer: C B: by default, delete on termination is checked
Comment 1295128 by mohamedTR
- Upvotes: 2
Selected Answer: B By using an AMI backed by an Amazon EBS root volume, you ensure that the application data is preserved, even if the EC2 instances are terminated, because EBS volumes persist independently of the EC2 lifecycle.
Comment 1292053 by ElFaramawi
- Upvotes: 2
Selected Answer: B This is because Amazon EBS volumes are persistent, meaning the data is preserved even if the EC2 instance is terminated, which meets the requirement to persist the data. C is incorrect because it suggests launching instances using an EC2 instance store volume, which is ephemeral. Even though it proposes attaching an Amazon EBS volume for data, the root volume remains an instance store.
Comment 1261813 by portland
- Upvotes: 3
Selected Answer: C Using default setting means B won’t work.
Comment 1247670 by sdas1
- Upvotes: 2
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/preserving-volumes-on-termination.html
Root volume By default, when you launch an instance the DeleteOnTermination attribute for the root volume of an instance is set to true. Therefore, the default is to delete the root volume of the instance when the instance terminates.
Non-root volume By default, when you attach a non-root EBS volume to an instance, its DeleteOnTermination attribute is set to false. Therefore, the default is to preserve these volumes.
Answer is C
Comment 1245432 by GustonMari
- Upvotes: 3
Selected Answer: C its C!!! B with default setting will delete the EBS volume on termination
Comment 1227581 by pypelyncar
- Upvotes: 3
Selected Answer: B Amazon EBS volumes provide persistent block storage for EC2 instances. Data written to an EBS volume is independent of the EC2 instance lifecycle. Even if the EC2 instance is terminated, the data on the EBS volume remains intact. Launching new EC2 instances from an AMI backed by an EBS volume containing the application data ensures the data persists across instance restarts or terminations
Comment 1213776 by VerRi
- Upvotes: 1
Selected Answer: B launch EC2 using AMI with root EBS that contains data
Comment 1212262 by ampersandor
- Upvotes: 4
B: the root EBS volume will be deleted on termination by default. C: the EBS is independent from EC2 Termination
Comment 1207694 by HunkyBunky
- Upvotes: 5
Selected Answer: C C - Looks better, because it will save data in all cases
Comment 1195272 by Christina666
- Upvotes: 5
Selected Answer: C ccccccc
Comment 1189370 by Luke97
- Upvotes: 2
Can someone explain why C is NOT right?
Comment 1177084 by GiorgioGss
- Upvotes: 3
Selected Answer: B This question is more for practitioner exam :)
Comment 1138296 by rralucard_
- Upvotes: 2
Selected Answer: B Amazon EBS volumes are network-attached, and they persist independently of the life of an EC2 instance. By using an AMI backed by an Amazon EBS volume, the root device for the instance is an EBS volume, which means the data will persist.
Comment 1134480 by atu1789
- Upvotes: 1
Selected Answer: B Voting for B
Question ndYOG8udGockGxKmqDLL
Question
A company uses Amazon Athena to run SQL queries for extract, transform, and load (ETL) tasks by using Create Table As Select (CTAS). The company must use Apache Spark instead of SQL to generate analytics. Which solution will give the company the ability to use Spark to access Athena?
Choices
- A: Athena query settings
- B: Athena workgroup
- C: Athena data source
- D: Athena query editor
answer?
Answer: B Answer_ET: B Community answer B (72%) C (28%) Discussion
Comment 1177095 by GiorgioGss
- Upvotes: 9
Selected Answer: B https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark-getting-started.html “To use Apache Spark in Amazon Athena, you create an Amazon Athena workgroup that uses a Spark engine.”
Comment 1227583 by pypelyncar
- Upvotes: 5
Selected Answer: C The Athena data source acts as a bridge between Athena and other analytics engines, such as Apache Spark. By using the Athena data source connector, you can access data stored in various formats (e.g., CSV, JSON, Parquet) and locations (e.g., Amazon S3, Apache Hive Metastore) through Spark applications
Comment 1313056 by lsj900605
- Upvotes: 2
Selected Answer: B It is B, not C. The workgroup is for organizing, controlling, and monitoring queries. The Data source is the mechanism that enables Spark to query data via Athena. It allows Spark to interact with Athena. The question focuses on enabling Apache Spark within Athena to generate analytics instead of using SQL. Thus, you must create a Spark-enabled workgroup
Comment 1303118 by theloseralreadytaken
- Upvotes: 2
Selected Answer: B Athena datasource doesn’t specifially enable Spark access
Comment 1245849 by andrologin
- Upvotes: 2
Selected Answer: B https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark-getting-started.html To get started with Apache Spark on Amazon Athena, you must first create a Spark enabled workgroup. After you switch to the workgroup, you can create a notebook or open an existing notebook. When you open a notebook in Athena, a new session is started for it automatically and you can work with it directly in the Athena notebook editor.
Comment 1220134 by lalitjhawar
- Upvotes: 4
C. Athena data source
The Athena data source is a specific connector or library that allows Apache Spark to interact with data stored in Amazon Athena. This connector enables Spark to read data from Athena tables directly into Spark DataFrames or RDDs (Resilient Distributed Datasets), allowing you to perform analytics and transformations using Spark’s capabilities.
Comment 1186101 by blackgamer
- Upvotes: 3
Selected Answer: B https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark-getting-started.html
Comment 1173174 by kj07
- Upvotes: 1
B is the correct answer. https://aws.amazon.com/blogs/big-data/explore-your-data-lake-using-amazon-athena-for-apache-spark/ You need an Athena workgroup as a prerequisite to use Apache Spark.
Comment 1167424 by damaldon
- Upvotes: 3
B. is the correct answer. To use Apache Spark in Amazon Athena, you create an Amazon Athena workgroup that uses a Spark engine. https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark-getting-started.html
Comment 1138312 by rralucard_
- Upvotes: 2
Selected Answer: C https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark.html
Question 28k2apVlfYPh0NRnUHV1
Question
A company needs to partition the Amazon S3 storage that the company uses for a data lake. The partitioning will use a path of the S3 object keys in the following format: s3://bucket/prefix/year=2023/month=01/day=01. A data engineer must ensure that the AWS Glue Data Catalog synchronizes with the S3 storage when the company adds new partitions to the bucket. Which solution will meet these requirements with the LEAST latency?
Choices
- A: Schedule an AWS Glue crawler to run every morning.
- B: Manually run the AWS Glue CreatePartition API twice each day.
- C: Use code that writes data to Amazon S3 to invoke the Boto3 AWS Glue create_partition API call.
- D: Run the MSCK REPAIR TABLE command from the AWS Glue console.
answer?
Answer: C Answer_ET: C Community answer C (94%) 6% Discussion
Comment 1138318 by rralucard_
- Upvotes: 8
Selected Answer: C Use code that writes data to Amazon S3 to invoke the Boto3 AWS Glue create_partition API call. This approach ensures that the Data Catalog is updated as soon as new data is written to S3, providing the least latency in reflecting new partitions.
Comment 1227584 by pypelyncar
- Upvotes: 4
Selected Answer: C By embedding the Boto3 create_partition API call within the code that writes data to S3, you achieve near real-time synchronization. The Data Catalog is updated immediately after a new partition is created in S3.
Comment 1222501 by tgv
- Upvotes: 2
Selected Answer: C The explanation could be more precise regarding the interaction with Amazon S3 and AWS Glue. The key point is that the process should be triggered immediately when new data is added to S3. This can be achieved through event-driven architecture, which indeed makes the solution intuitive and efficient.
Comment 1217479 by valuedate
- Upvotes: 1
Selected Answer: C add partition after writing the data in s3
Comment 1211298 by DevoteamAnalytix
- Upvotes: 1
Selected Answer: D It’s about “synchronizing AWS Glue Data Catalog with S3”. So for me it’s D - using MSCK REPAIR TABLE for existing S3 partitions (https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html)
Comment 1195095 by okechi
- Upvotes: 2
The answer is D
Comment 1177101 by GiorgioGss
- Upvotes: 1
Selected Answer: C It’s pure event-driven so… C
Comment 1134533 by atu1789
- Upvotes: 1
Selected Answer: C C. Least latency
Question 665xyLJVNoyrnepY4oFx
Question
A media company uses software as a service (SaaS) applications to gather data by using third-party tools. The company needs to store the data in an Amazon S3 bucket. The company will use Amazon Redshift to perform analytics based on the data. Which AWS service or feature will meet these requirements with the LEAST operational overhead?
Choices
- A: Amazon Managed Streaming for Apache Kafka (Amazon MSK)
- B: Amazon AppFlow
- C: AWS Glue Data Catalog
- D: Amazon Kinesis
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1222504 by tgv
- Upvotes: 5
Selected Answer: B That’s exactly the purpose of AppFlow: “fully-managed integration service that enables you to securely exchange data between software as a service (SaaS) applications, such as Salesforce, and AWS services, such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift. For example, you can ingest contact records from Salesforce to Amazon Redshift or pull support tickets from Zendesk to an Amazon S3 bucket.”
https://docs.aws.amazon.com/appflow/latest/userguide/what-is-appflow.html
Comment 1227589 by pypelyncar
- Upvotes: 4
Selected Answer: B the media company can leverage a fully managed service that simplifies the process of ingesting data from their third-party SaaS applications into an Amazon S3 bucket, with minimal operational overhead. Additionally, AppFlow can integrate with Amazon Redshift, allowing the company to load the ingested data directly into their analytics environment for further processing and analysis
Comment 1177108 by GiorgioGss
- Upvotes: 3
Selected Answer: B https://docs.aws.amazon.com/appflow/latest/userguide/flow-tutorial.html
Comment 1173177 by kj07
- Upvotes: 1
B seems the right choice here.
Comment 1138323 by rralucard_
- Upvotes: 2
Selected Answer: B https://d1.awsstatic.com/solutions/guidance/architecture-diagrams/integrating-third-party-saas-data-using-amazon-appflow.pdf Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between Software as a Service (SaaS) applications like Salesforce, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift, in just a few clicks. It can store the raw data pulled from SaaS applications in Amazon S3, and integrates with AWS Glue Data Catalog to catalog and store metadata