Questions and Answers
Question ieiNW4gZCJHilOCQ33SX
Question
A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data engineer has set up the necessary AWS Glue connection details and an associated IAM role. However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint. The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket. Which solution will meet this requirement?
Choices
- A: Update the AWS Glue security group to allow inbound traffic from the Amazon S3 VPC gateway endpoint.
- B: Configure an S3 bucket policy to explicitly grant the AWS Glue job permissions to access the S3 bucket.
- C: Review the AWS Glue job code to ensure that the AWS Glue connection details include a fully qualified domain name.
- D: Verify that the VPC’s route table includes inbound and outbound routes for the Amazon S3 VPC gateway endpoint.
answer?
Answer: D Answer_ET: D Community answer D (95%) 5% Discussion
Comment 1154639 by HunkyBunky
- Upvotes: 6
Selected Answer: D A - wrong - AWS glue - are serverless service, so it don’t have any security groups B - wrong - Because we have error with VPC, not with S3 itself C - wrong - Becuase with S3 - we always have only FQDN for buckets
Comment 1558598 by ninomfr64
- Upvotes: 1
Selected Answer: D A- NO: on SG we just need to allow outbound traffic, as SG i statefull reurn traffic is allowed B - NO: since we configured IAM permission for Glue Job, there is no need to configure a resource-policy (cross account is not mentioned) C- NO: in bucket connection configuration you just need to provide s3://bucket-name/prefix D - YES: although there is no inbound and outbound routes in route table, we need to ensure a route is in place to reach a the VPC Gateway Policy
Comment 1360500 by MephiboshethGumani
- Upvotes: 1
Selected Answer: D D. Verify that the VPC’s route table includes inbound and outbound routes for the Amazon S3 VPC gateway endpoint.
Explanation:
AWS Glue jobs need to connect to the S3 bucket through the Amazon S3 VPC gateway endpoint when they are in a VPC. If the route table does not have proper inbound and outbound routes to the S3 VPC gateway endpoint, the AWS Glue job will not be able to access S3, which results in an error.
Comment 1288880 by wilsonfromnyc9
- Upvotes: 1
D is valid
Comment 1167764 by GiorgioGss
- Upvotes: 4
Selected Answer: D Although there is no such thing as “inbound and outbound routes” when we talk about VPC route table, when we define a S3 gateway endpoint we must have proper routes in place. I will go with D.
Comment 1202956 by ampersandor
- Upvotes: 2
Selected Answer: D Be sure that the subnet configured for your AWS Glue connection has an Amazon S3 VPC gateway endpoint or a route to a NAT gateway in the subnet’s route table.
Comment 1280025 by GZMartinelli
- Upvotes: 1
Selected Answer: D D is correct
Comment 1250058 by lunachi4
- Upvotes: 1
Selected Answer: D I think D. We check “VPC’s route table”
Comment 1241275 by teo2157
- Upvotes: 1
Selected Answer: C A - wrong - AWS glue doesn’t have any security groups B - wrong - You can´t give permissions in the S3 to the AWS glue job but to the role D. wrong because there has to be a definend route for the S3 gateway endpoint in the subnet assigned to the glue job but not in the VPC’s route table and also route tables doesn´t have inbound and outbound routes.
Comment 1223301 by nanaw770
- Upvotes: 2
Selected Answer: D D is correct answer.
Comment 1219358 by tgv
- Upvotes: 1
I will go with D, the other options don’t seem to be related.
Comment 1213563 by VerRi
- Upvotes: 2
Selected Answer: D “problems with the Amazon S3 VPC gateway endpoint”
Comment 1150011 by damaldon
- Upvotes: 1
Go with A: If you receive an error, check the following:
The correct privileges are provided to the role selected. The correct Amazon S3 bucket is provided. The security groups and Network ACL allow the required incoming and outgoing traffic. The VPC you specified is connected to an Amazon S3 VPC endpoint.
Comment 1142067 by Aesthet
- Upvotes: 2
some relevant info: main: https://docs.aws.amazon.com/glue/latest/dg/connection-VPC-disable-proxy.html additional (glue crawler instead of glue job here, but I think this is relevant for both): https://docs.aws.amazon.com/glue/latest/dg/connection-S3-VPC.html
Comment 1142057 by Aesthet
- Upvotes: 4
Both ChatGPT and I agree with D
Question x2srhrlPFFMG1EyGFMrw
Question
A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
- B: Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
- C: Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
- D: Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1125639 by milofficial
- Upvotes: 13
Selected Answer: A “only if a user uploads data to an Amazon S3 bucket” that excludes B & C because we need s3:ObjectCreated:*
You don’t need SNS for S3 event notifications so A is easier.
Comment 1137889 by TonyStark0122
- Upvotes: 8
A. Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
Explanation: This solution directly triggers the Lambda function only when a .csv file is uploaded to the S3 bucket, minimizing unnecessary invocations of the Lambda function. It uses a specific event type (s3:ObjectCreated:*) and a filter rule to ensure that the Lambda function is invoked only for relevant events. Additionally, it directly invokes the Lambda function without the need for additional services like Amazon SNS, reducing operational overhead.
Comment 1291364 by Adrifersilva
- Upvotes: 1
Selected Answer: A s3:ObjectCreated:* instead of s3:*: triggers the Lambda function only when objects are created in the bucket.
Comment 1288447 by theloseralreadytaken
- Upvotes: 1
Selected Answer: A A is the answer for least operational. C also correct!
Comment 1226792 by pypelyncar
- Upvotes: 1
Selected Answer: A since is the least operational, the D its a candidate, however add a SNS operation, which in this case is not needed. so A includes S3 and triggering towards the lambda function. 2 services.
Comment 1209088 by k350Secops
- Upvotes: 1
Selected Answer: A S3 event notification to lamba for file prefix with.csv is the least overhead way
Comment 1205983 by DevoteamAnalytix
- Upvotes: 2
Selected Answer: A “You can use Lambda to process event notifications from Amazon Simple Storage Service. Amazon S3 can send an event to a Lambda function when an object is created or deleted” https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
Question GCnQ3QXxxgqP7T4hz1B3
Question
An insurance company stores transaction data that the company compressed with gzip.
The company needs to query the transaction data for occasional audits.
Which solution will meet this requirement in the MOST cost-effective way?
Choices
- A: Store the data in Amazon Glacier Flexible Retrieval. Use Amazon S3 Glacier Select to query the data.
- B: Store the data in Amazon S3. Use Amazon S3 Select to query the data.
- C: Store the data in Amazon S3. Use Amazon Athena to query the data.
- D: Store the data in Amazon Glacier Instant Retrieval. Use Amazon Athena to query the data.
answer?
Answer: B Answer_ET: B Community answer B (53%) A (42%) 5% Discussion
Comment 1231313 by tgv
- Upvotes: 9
Selected Answer: A Actually, I think A makes more sense.
Comment 1559179 by AM027
- Upvotes: 1
Selected Answer: A cold data stored in Glacier can be easily queried within minutes.
Comment 1347373 by YUICH
- Upvotes: 1
Selected Answer: B For workloads with low access frequency where you only need to query data occasionally (for example, during audits), option (A)—S3 Glacier Flexible Retrieval combined with S3 Glacier Select—provides the most cost-effective solution.
Comment 1344640 by div_div
- Upvotes: 1
Selected Answer: C Transaction Data Refers To Data Which Are Updating Frequently and To Query That Data occasionally Means It Can Be Query At Any Time (In Question Time Is Not Define). So We Can’t Take Risk For Customer To Wait For Hours To Get The Result And The Best Way To Query The Data On Top of The S3 Bucket We Can Use Athena.
Comment 1336834 by BigMrT
- Upvotes: 1
Selected Answer: B Glacier Select incurs higher costs compared to S3 Select.
Comment 1313770 by ctndba
- Upvotes: 1
You cannot use S3 Select on S3 Glacier Flexible Retrieval storage class. So answer is B, based on given options.
Comment 1294234 by mohamedTR
- Upvotes: 2
Selected Answer: B B is the more cost-effective solution for occasional audits. It allows for easier access to the data without incurring high retrieval costs
Comment 1290442 by manig
- Upvotes: 1
gzip compressed data querying → s3 select -Answer B
Comment 1289138 by LR2023
- Upvotes: 2
Selected Answer: A https://aws.amazon.com/blogs/aws/s3-glacier-select/
option B is not cost effective as it is stored in standard S3
Comment 1285202 by PashoQ
- Upvotes: 2
Selected Answer: A Occasional audits, so go for S3 glacier select
Comment 1265602 by cas_tori
- Upvotes: 4
Selected Answer: B this is B
Comment 1264489 by IanJang
- Upvotes: 1
IT is A
Comment 1263040 by mns0173
- Upvotes: 2
Glacier is an expensive option in cases when you need to access data occasionally
Comment 1260524 by lenneth39
- Upvotes: 1
Selected Answer: C I am not sure whether to go for B or C. Can anyone comment on this? B: No problem, but not available if Parquet is Gzip compressed. But the problem statement doesn’t say Parquet is Gzip compressed. C: Correct if Parquet is Gzip compressed, but B is more cost-effective if csv or json is Gzip compressed
Comment 1250086 by andrologin
- Upvotes: 3
Selected Answer: B I think the solution is either B or D but I would go with B because they mentioned storing the data in gzip and not parquet which is optimised for Athena queries
Comment 1245770 by 4bc91ae
- Upvotes: 2
there is no such thing as Glacier Flexible Retrieval, so its no A . its either B or D and most likely its D for the cost
Comment 1242620 by bakarys
- Upvotes: 2
Selected Answer: B B. Store the data in Amazon S3. Use Amazon S3 Select to query the data.
Amazon S3 is a cost-effective object storage service, and S3 Select allows you to retrieve only a subset of data from an object by using simple SQL expressions. S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also supports GZIP and BZIP2 compression formats, which makes it suitable for the given scenario where the data is compressed with gzip.
While Amazon Athena is a powerful query service, it can be more expensive than S3 Select for occasional queries. Amazon Glacier and Glacier Select are designed for long-term archival storage and not for frequent access or queries, which might not be suitable for occasional audits. Therefore, option B is the most cost-effective choice for this scenario.
Comment 1241726 by FunkyFresco
- Upvotes: 2
Selected Answer: B ill go with B, because. of cost to query
Comment 1240769 by bakarys
- Upvotes: 2
Selected Answer: B B. Store the data in Amazon S3. Use Amazon S3 Select to query the data.
Amazon S3 is a cost-effective storage service, and S3 Select allows you to retrieve only a subset of data from an object by using simple SQL expressions. S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also supports GZIP compression, which is the format used by the company. This makes it a cost-effective solution for occasional queries needed for audits.
Comment 1239896 by Alagong
- Upvotes: 3
Selected Answer: B Option B (Amazon S3 with S3 Select) is generally more cost-effective and operationally efficient for occasional audits of gzip-compressed data. It provides faster access to data and lower querying costs, which are critical factors for ad-hoc and timely data retrievals. While Option A (Amazon Glacier Flexible Retrieval with S3 Glacier Select) offers cheaper storage, its longer retrieval times and potential higher querying costs make it less suitable for use cases requiring timely access to data.
Comment 1234149 by HunkyBunky
- Upvotes: 3
Looks like that A - fit better in question requirements
Comment 1231202 by GHill1982
- Upvotes: 3
Selected Answer: B On the assumptions that querying the audit data is time sensitive and the transaction data is compressed into a single object I would go with using S3 and S3 select to query the data.
Comment 1230895 by artworkad
- Upvotes: 4
Selected Answer: A B and C are not cost effective. A is more cost effective than D. I go with A.
Question Qra1B2c2PlT9gWAfdXvt
Question
A data engineer finished testing an Amazon Redshift stored procedure that processes and inserts data into a table that is not mission critical. The engineer wants to automatically run the stored procedure on a daily basis.
Which solution will meet this requirement in the MOST cost-effective way?
Choices
- A: Create an AWS Lambda function to schedule a cron job to run the stored procedure.
- B: Schedule and run the stored procedure by using the Amazon Redshift Data API in an Amazon EC2 Spot Instance.
- C: Use query editor v2 to run the stored procedure on a schedule.
- D: Schedule an AWS Glue Python shell job to run the stored procedure.
answer?
Answer: C Answer_ET: C Community answer C (72%) A (17%) 11% Discussion
Comment 1230897 by artworkad
- Upvotes: 7
Selected Answer: C This can be achieved with query editor v2 (https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-schedule-query.html)
Comment 1250089 by andrologin
- Upvotes: 4
Selected Answer: C I go with C because it runs the query within the Redshift instance, B may not be appropriate because it involves other services on top of the Redshift instance and there is movement of data across the services.
Comment 1247424 by 4d716d6
- Upvotes: 1
Selected Answer: B given that the table is not mission-critical and requires the “MOST cost-effective way.”
Comment 1238353 by salayea28
- Upvotes: 1
Selected Answer: B I am going with option B, given that the table is not mission-critical and requires the “MOST cost-effective way.” AWS Spot Instances are Amazon EC2 instances that allow you to utilize spare EC2 capacity at a significantly lower cost than On-Demand instances. These instances are ideal for flexible workloads that can tolerate interruptions, such as batch processing, data analysis, and background processing jobs.
Comment 1231203 by GHill1982
- Upvotes: 2
Selected Answer: C I think all options other than using the query editor will incur additional costs.
Comment 1230856 by tgv
- Upvotes: 3
Selected Answer: A AWS Lambda, combined with Amazon CloudWatch Events for scheduling, provides a low-cost, serverless, and reliable way to automatically run the stored procedure daily.
Comment 1230662 by lalitjhawar
- Upvotes: 1
A. Create an AWS Lambda function to schedule a cron job to run the stored procedure.
Question dCIhCgzu42WYnfI1aSK5
Question
A marketing company collects clickstream data. The company sends the clickstream data to Amazon Kinesis Data Firehose and stores the clickstream data in Amazon S3. The company wants to build a series of dashboards that hundreds of users from multiple departments will use.
The company will use Amazon QuickSight to develop the dashboards. The company wants a solution that can scale and provide daily updates about clickstream activity.
Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)
Choices
- A: Use Amazon Redshift to store and query the clickstream data.
- B: Use Amazon Athena to query the clickstream data
- C: Use Amazon S3 analytics to query the clickstream data.
- D: Access the query data through a QuickSight direct SQL query.
- E: Access the query data through QuickSight SPICE (Super-fast, Parallel, In-memory Calculation Engine). Configure a daily refresh for the dataset.
answer?
Answer: BE Answer_ET: BE Community answer BE (100%) Discussion
Comment 1241462 by Ja13
- Upvotes: 5
Selected Answer: BE B. Use Amazon Athena to query the clickstream data. E. Access the query data through QuickSight SPICE (Super-fast, Parallel, In-memory Calculation Engine). Configure a daily refresh for the dataset.
Here’s why:
B. Use Amazon Athena to query the clickstream data: Amazon Athena allows you to run SQL queries directly on data stored in Amazon S3 without the need for complex ETL processes. It is a cost-effective solution for querying large datasets on S3.
E. Access the query data through QuickSight SPICE: QuickSight SPICE is designed for fast, in-memory data analysis and can scale to support many users and large datasets. By configuring a daily refresh, you ensure that the dashboards are updated with the latest data while keeping query performance high and costs low.
Comment 1331437 by HagarTheHorrible
- Upvotes: 1
Selected Answer: BE both are more or less the only possible
Comment 1231205 by GHill1982
- Upvotes: 2
Selected Answer: BE Agree with B & E. Athena would be cheaper than Redshift. S3 analytics is irrelevant. The functionality in SPICE should be more cost effective than direct SQL by reducing the frequency and volume of queries.
Comment 1230861 by tgv
- Upvotes: 1
Selected Answer: BE Athena charges based on the amount of data scanned per query, which can be cost-effective for ad-hoc querying and periodic updates.
SPICE can be more cost-effective for frequent access and analysis by multiple users as it reduces the load on the underlying data source.