Questions and Answers

Question LFstZv6LOrzcl2rCxeyL

Question

A company has an application that uses an Amazon API Gateway REST API and an AWS Lambda function to retrieve data from an Amazon DynamoDB instance. Users recently reported intermittent high latency in the application’s response times. A data engineer finds that the Lambda function experiences frequent throttling when the company’s other Lambda functions experience increased invocations.

The company wants to ensure the API’s Lambda function operate without being affected by other Lambda functions.

Which solution will meet this requirement MOST cost-effectively?

Choices

A: Increase the number of read capacity unit (RCU) in DynamoDB.
B: Configure provisioned concurrency for the Lambda function.
C: Configure reserved concurrency for the Lambda function.
D: Increase the Lambda function timeout and allocated memory.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1387448 by daed09

Upvotes: 1

Selected Answer: C C: It is more cost-effectively, because we reserve slots for the critical/specific Lambda function (configuration instead of more resources).

With answer B we increase concurrency allocation, but it will increase costs, additionally if more Lambda functions are created this issue will ocurr again

Comment 1358492 by italiancloud2025

Upvotes: 1

Selected Answer: C C: Sí, la concurrencia reservada aísla la Lambda del impacto de otras funciones, siendo la opción más rentable.

Question Zgk6zC4rpHb1WSSIuZ1T

Question

A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII. Which solution will meet this requirement with the LEAST operational effort?

Choices

A: Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set the S3 data lake as the target for the delivery stream.
B: Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
C: Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
D: Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table and to transform the data. Use the same Lambda function to ingest the data into the S3 data lake.

answer?

Answer: B Answer_ET: B Community answer B (68%) C (30%) 2% Discussion

Comment 1176613 by milofficial

Upvotes: 12

Selected Answer: B How does Data Quality obfuscate PII? You can do this directly in Glue Studio: https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html

Comment 1235549 by Khooks

Upvotes: 5

Selected Answer: B Option C involves additional steps and complexity with creating rules in AWS Glue Data Quality, which adds more operational effort compared to directly using AWS Glue Studio’s capabilities.

Comment 1411897 by Kalyso

Upvotes: 1

Selected Answer: B Actually it is B. No need to create a rule in AWS Glue.

Comment 1339468 by plutonash

Upvotes: 1

Selected Answer: C B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake. Detect PII transform only detects. Obfuscate the PII ok but how ? Answer C explain how

Comment 1339178 by Udyan

Upvotes: 1

Selected Answer: C Why C is better than B: Obfuscation clarity: Option C explicitly mentions using a Glue Data Quality rule to obfuscate PII, while option B does not specify how obfuscation is implemented. Accuracy: Glue Data Quality provides a more structured way to handle obfuscation compared to relying solely on Glue Studio’s PII detection. Thus, C is the most accurate and operationally efficient solution.

Comment 1282335 by markill123

Upvotes: 1

The keyt

Comment 1262275 by antun3ra

Upvotes: 2

Selected Answer: B B provides a streamlined, mostly visual approach using purpose-built tools for data processing and PII handling, making it the solution with the least operational effort.

Comment 1256447 by portland

Upvotes: 1

Selected Answer: C https://aws.amazon.com/blogs/big-data/automated-data-governance-with-aws-glue-data-quality-sensitive-data-detection-and-aws-lake-formation/

Comment 1246544 by qwertyuio

Upvotes: 2

Selected Answer: B https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html

Comment 1239905 by bakarys

Upvotes: 1

Selected Answer: C anwser is C

Comment 1231172 by bigfoot1501

Upvotes: 3

I don’t think we need to use much more services to fulfill these requirements. Just AWS Glue is enough, it can detect and obfuscate PII data already. Source: https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html#choose-action-pii

Comment 1213614 by VerRi

Upvotes: 3

Selected Answer: C We cannot directly handle PII with Glue Studio, and Glue Data Quality can be used to handle PII.

Comment 1208166 by Just_Ninja

Upvotes: 1

Selected Answer: A A very easy was is to use the SDK to identify PII.

https://docs.aws.amazon.com/code-library/latest/ug/comprehend_example_comprehend_DetectPiiEntities_section.html

Comment 1206501 by kairosfc

Upvotes: 3

Selected Answer: C The transform Detect PII in AWS Glue Studio is specifically used to identify personally identifiable information (PII) within the data. It can detect and flag this information, but on its own, it does not perform the obfuscation or removal of these details.

To effectively obfuscate or alter the identified PII, an additional transformation would be necessary. This could be accomplished in several ways, such as:

Writing a custom script within the same AWS Glue job using Python or Scala to modify the PII data as needed. Using AWS Glue Data Quality, if available, to create rules that automatically obfuscate or modify the data identified as PII. AWS Glue Data Quality is a newer tool that helps improve data quality through rules and transformations, but whether it’s needed will depend on the functionality’s availability and the specificity of the obfuscation requirements

Comment 1195086 by okechi

Upvotes: 2

Answer is option C. Period

Comment 1186108 by arvehisa

Upvotes: 4

Selected Answer: B B is correct. C: glue data quality cannot obfuscate the PII D: need to write code but the question is the “LEAST operational effort”

Comment 1178611 by certplan

Upvotes: 2

In python --- from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from pyspark.sql import SparkSession

Initialize Spark session

spark = SparkSession.builder
.appName(“Example Glue Job”)
.getOrCreate()

Initialize Glue context

glueContext = GlueContext(SparkContext.getOrCreate())

Retrieve Glue job arguments

args = getResolvedOptions(sys.argv, [‘JOB_NAME’])

Define your EMR step

emr_step = [ { “Name”: “My EMR Step”, “ActionOnFailure”: “CONTINUE”, “HadoopJarStep”: { “Jar”: “s3://your-bucket/emr-scripts/your_script.jar”, “Args”: [ “arg1”, “arg2” ] } } ]

Execute the EMR step

response = glueContext.start_job_run(args[‘JOB_NAME’], job_run_args={‘—extra-py-files’: ‘your_script.py’}) print(response)

Comment 1178582 by certplan

Upvotes: 2

B. Utilizes AWS Glue Studio for PII detection, AWS Step Functions for orchestration, and S3 for storage. Glue Studio simplifies PII detection, and Step Functions can streamline the data pipeline orchestration, potentially reducing operational effort compared to option A.

C. Similar to option B, but it additionally includes AWS Glue Data Quality for obfuscating PII. This might add a bit more complexity but can also streamline the process if Glue Data Quality offers convenient features for PII obfuscation.

Comment 1176241 by jellybella

Upvotes: 4

Selected Answer: B AWS Glue Data Quality is a feature that automatically validates the quality of the data during a Glue job run, but it’s not typically used for data obfuscation.

Comment 1167996 by GiorgioGss

Upvotes: 1

Selected Answer: C https://dev.to/awscommunity-asean/validating-data-quality-with-aws-glue-databrew-4df4 https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html

Comment 1147090 by BartoszGolebiowski24

Upvotes: 1

I think this is A. We ingest data to s3 with a PPI transformation. We do not need to use glue, or step function here in that case.

Comment 1140072 by rralucard_

Upvotes: 2

Selected Answer: C Option C seems to be the best solution to meet the requirement with the least operational effort. It leverages AWS Glue Studio for PII detection, AWS Glue Data Quality for obfuscation, and AWS Step Functions for orchestration, minimizing the need for custom coding and manual processes.

Comment 1137955 by TonyStark0122

Upvotes: 1

C. Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake

Question x7gOZZm0eUap17diRR3h

Question

A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the company’s operational databases into an Amazon S3 based data lake. The ETL workflows use AWS Glue and Amazon EMR to process data. The company wants to improve the existing architecture to provide automated orchestration and to require minimal manual effort. Which solution will meet these requirements with the LEAST operational overhead?

Choices

A: AWS Glue workflows
B: AWS Step Functions tasks
C: AWS Lambda functions
D: Amazon Managed Workflows for Apache Airflow (Amazon MWAA) workflows

answer?

Answer: B Answer_ET: B Community answer B (74%) A (26%) Discussion

Comment 1215631 by valuedate

Upvotes: 15

Selected Answer: B Glue Workflow only orchestrate crawlers and glue jobs

Comment 1206035 by DevoteamAnalytix

Upvotes: 7

Selected Answer: B For me it’s B because I did not found a possibility how Glue can trigger/orchestrate EMR processes OOTB. But with StepFunction there is a way: https://aws.amazon.com/blogs/big-data/orchestrate-amazon-emr-serverless-jobs-with-aws-step-functions/

Comment 1402185 by Rpathak4

Upvotes: 1

Selected Answer: A Why Not the Other Options?

B. AWS Step Functions More flexible but requires manual setup of states and transitions for Glue & EMR. Higher operational overhead than Glue Workflows. C. AWS Lambda Lambda is not ideal for long-running ETL workflows. Best suited for lightweight data transformations or event-driven tasks. D. Amazon MWAA (Apache Airflow) More control but requires cluster management and custom DAGs. Higher maintenance than Glue Workflows.

Comment 1400074 by Palee

Upvotes: 1

Selected Answer: B The company wants to improve the existing architecture so A cannot be the right choice

Comment 1339477 by plutonash

Upvotes: 1

Selected Answer: B it is interesting to choose A for minimum effort but only step functions can trigger the work both on EMR and on GLUE jobs

Comment 1330983 by ttpro1995

Upvotes: 2

Selected Answer: B We have both Glue job and EMR job, so we need Step Functions to connect those. Airflow can do it, but required more dev work.

Comment 1292212 by Adrifersilva

Upvotes: 1

Selected Answer: A glue workflows is part of the glue ecosystem so its provides seamless integration with minimal changes

Comment 1292155 by Shatheesh

Upvotes: 1

Answer A, Glue workflows

Comment 1271802 by Shanmahi

Upvotes: 1

Selected Answer: A Glue workflows are managed services and best for considering least operational overhead.

Comment 1260937 by V0811

Upvotes: 1

Selected Answer: A AWS Glue Workflows are specifically designed for orchestrating ETL jobs in AWS Glue. They allow you to define and manage complex workflows that include multiple jobs and triggers, all within the AWS Glue environment.Integration: AWS Glue workflows seamlessly integrate with other AWS Glue components, making it easier to manage ETL processes without the need for external orchestration tools.Minimal Operational Overhead: Since AWS Glue is a fully managed service, using Glue workflows will reduce the operational overhead compared to managing separate orchestrators or building custom solutions.While D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is also a good choice for more complex orchestration, it may involve more management overhead compared to the more straightforward AWS Glue workflows. Thus, AWS Glue workflows provide the least operational overhead given the context of this scenario.

Comment 1241793 by HunkyBunky

Upvotes: 1

Selected Answer: B B - because AWS Glue can’t trigger EMR

Comment 1218976 by FunkyFresco

Upvotes: 3

Selected Answer: B EMR in workflows , i dont think so

Comment 1213623 by VerRi

Upvotes: 4

Selected Answer: B There is no way for Glue Workflow to trigger EMR

Comment 1204111 by acoshi

Upvotes: 2

Selected Answer: A https://aws.amazon.com/blogs/big-data/orchestrate-an-etl-pipeline-using-aws-glue-workflows-triggers-and-crawlers-with-custom-classifiers/

Comment 1187012 by lucas_rfsb

Upvotes: 6

Selected Answer: A Since it seems to me that this pipeline is complex, with multiple workflows, I would go for Glue workflows.

Comment 1184835 by jasango

Upvotes: 3

Yo me voy por la D) Amazon MWAA porque Glue Workflows solo admite Jobs de Glue y Step Function puede fucionar pero no son workflows de datos. Amazon MWAA son workflows de datos y esta integrado tanto con Glue como EMR: https://aws.amazon.com/blogs/big-data/simplify-aws-glue-job-orchestration-and-monitoring-with-amazon-mwaa/

Comment 1178609 by certplan

Upvotes: 2

Here’s an example of how you can use AWS Glue to initiate an EMR (Elastic MapReduce) job:

Let’s assume you have an AWS Glue job that performs ETL tasks on data stored in Amazon S3. You want to leverage EMR for a specific task within this job, such as running a complex Spark job.

Define a Glue Job: Create an AWS Glue job using the AWS Glue console, SDK, or CLI. Define the input and output data sources, as well as the transformations you want to apply.

Incorporate EMR Step: Within the Glue job script, include a section where you define an EMR step. An EMR step is a unit of work that performs a specific task on an EMR cluster.

Code follows in the next entry…

Comment 1170845 by GiorgioGss

Upvotes: 5

Selected Answer: B orchestrating = step function

Comment 1140077 by rralucard_

Upvotes: 3

Selected Answer: A Option A, AWS Glue Workflows, seems to be the best solution to meet the requirements with the least operational overhead. It offers a seamless integration with the company’s existing AWS Glue and Amazon EMR setup, providing a managed and straightforward way to orchestrate their ETL workflows without extensive additional setup or manual intervention.

Comment 1137956 by TonyStark0122

Upvotes: 2

Glue Work flows

Comment 1127573 by [Removed]

Upvotes: 4

Selected Answer: B Orchestrating different AWS services is a typical use case for Step Functions: https://docs.aws.amazon.com/step-functions/latest/dg/connect-emr.html https://docs.aws.amazon.com/step-functions/latest/dg/connect-glue.html

Question B3DNZ3VldhpQJPc7I1ku

Question

A company currently stores all of its data in Amazon S3 by using the S3 Standard storage class. A data engineer examined data access patterns to identify trends. During the first 6 months, most data files are accessed several times each day. Between 6 months and 2 years, most data files are accessed once or twice each month. After 2 years, data files are accessed only once or twice each year. The data engineer needs to use an S3 Lifecycle policy to develop new data storage rules. The new storage solution must continue to provide high availability. Which solution will meet these requirements in the MOST cost-effective way?

Choices

A: Transition objects to S3 One Zone-Infrequent Access (S3 One Zone-IA) after 6 months. Transfer objects to S3 Glacier Flexible Retrieval after 2 years.
B: Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months. Transfer objects to S3 Glacier Flexible Retrieval after 2 years.
C: Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months. Transfer objects to S3 Glacier Deep Archive after 2 years.
D: Transition objects to S3 One Zone-Infrequent Access (S3 One Zone-IA) after 6 months. Transfer objects to S3 Glacier Deep Archive after 2 years.

answer?

Answer: C Answer_ET: C Community answer C (58%) B (42%) Discussion

Comment 1174656 by helpaws

Upvotes: 19

Selected Answer: B “S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1-2 times per year and is retrieved asynchronously”

Source: https://aws.amazon.com/s3/storage-classes/glacier/

Comment 1325939 by WarPig666

Upvotes: 7

Selected Answer: C Flexible retrieval will be higher cost than deep archive. If records only need to be retrieved once or twice a year, this doesn’t mean they need to be instantly available.

Comment 1402188 by Rpathak4

Upvotes: 1

Selected Answer: C Why Not the Other Options?

A. S3 One Zone-IA → Glacier Flexible Retrieval ❌ One Zone-IA is risky (data loss if the AZ fails). Glacier Flexible Retrieval is more expensive than Deep Archive.

B. S3 Standard-IA → Glacier Flexible Retrieval ❌ Glacier Flexible Retrieval is not the cheapest long-term storage. Deep Archive costs much less.

D. S3 One Zone-IA → Glacier Deep Archive ❌ One Zone-IA lacks high availability (single AZ failure = data loss). S3 Standard-IA is safer.

Comment 1358447 by anonymous_learner_2

Upvotes: 2

Selected Answer: C Glacier deep archive has the same availability as flexible retrieval and there’s no retrieval time requirement so C is the most cost effective that meets the requirements.

Comment 1345000 by luigiDDD

Upvotes: 2

Selected Answer: C C is the most cost effective

Comment 1339487 by plutonash

Upvotes: 1

Selected Answer: B “data files are accessed only once or twice each year”, this is “S3 Glacier Flexible Retrieval” definition

Comment 1339184 by Udyan

Upvotes: 2

Selected Answer: C Is it mentioned in question that Retrieval time is constraint, no, so, if any engineer need to access data, say May and November, so he/she can wait for 2-3 days to get data, as in the long run, they have an year to analyze the data so, deep archive will save costs only.

Comment 1339183 by Udyan

Upvotes: 2

Selected Answer: C This question was in Stephen Maarek Udemy practice questions too, here concern not given for extraction time so, just see cost friendlyness, thus, C over B

Comment 1329009 by HagarTheHorrible

Upvotes: 1

Selected Answer: B deep archive doesn’t make sense

Comment 1322650 by Eleftheriia

Upvotes: 1

Selected Answer: B For once or twice a year it is flexible retrieval.

Comment 1322512 by jk15997

Upvotes: 2

Selected Answer: C There is no requirement for the retrieval time.

Comment 1321709 by altonh

Upvotes: 4

Selected Answer: C There is no requirement for the retrieval time. So this is more cost-effective.

Comment 1318218 by iamwatchingyoualways

Upvotes: 3

Selected Answer: C No instant access is mentioned. Most Cost effective.

Comment 1313062 by truongnguyen86

Upvotes: 2

Selected Answer: B Option B is the correct answer because it balances cost-effectiveness and availability:

S3 Standard-IA offers cost savings for infrequently accessed data while maintaining high availability across multiple zones. S3 Glacier Flexible Retrieval is a good balance for archiving with occasional access needs.

Comment 1308362 by lsj900605

Upvotes: 1

Selected Answer: B B High availability means the need for readily available service. S3 Standard-IA deliver 99.9% availability vs S3 One Zone-IA deliver 99.5% availability S3 Glacier Flexible Retrieval has configurable retrieval times, from minutes to hours, with free bulk retrievals. But with S3 Glacier Deep Archive it’s retrieval time is within 12 hours https://aws.amazon.com/s3/storage-classes/

Comment 1303392 by LrdKanien

Upvotes: 1

Selected Answer: B B due to flex

Comment 1292794 by michele_scar

Upvotes: 3

Selected Answer: C It’s C, the request is about “High availability” not “Less time to retrieve the data”. The other request is “most cost effective” so deleting A and D for the HA, remains B and C that both satisfy the HA. Now choose the most cost effective ⇒ C

Comment 1288506 by theloseralreadytaken

Upvotes: 1

Selected Answer: B Galcier Deep Archive take longer retrieval time (hours to days) than Glacier Flexible Retrieval.

Comment 1285515 by GZMartinelli

Upvotes: 1

Selected Answer: B Should be B. The questions asks for high availability, when a file is in Glacier Deep Archive, it takes more time to be available to use.

Comment 1279871 by shammous

Upvotes: 1

Selected Answer: B To retrieve data from the S3 Glacier Deep Archive, you need more than 12 hours! The scenario mentions that we still need to access data instantly once or twice yearly. S3 Glacier Flexible Retrieval is more appropriate in this case.

Comment 1264964 by sachin

Upvotes: 1

• S3 Glacier Instant Retrieval delivers the lowest cost storage, up to 68% lower cost (than S3 Standard-Infrequent Access), for long-lived data that is accessed once per quarter and requires millisecond retrieval. • S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1-2 times per year and is retrieved asynchronously. • S3 Glacier Deep Archive delivers the lowest cost storage, up to 75% lower cost (than S3 Glacier Flexible Retrieval), for long-lived archive data that is accessed less than once per year and is retrieved asynchronously.

Comment 1252338 by andrologin

Upvotes: 2

Selected Answer: B Based on this link https://aws.amazon.com/s3/storage-classes/glacier/ Glacier Flexible Retrieval is cheaper that Instant Retrieval

Comment 1246317 by LR2023

Upvotes: 2

questions doesnt provide clairty on when data is accessed does it need to made available instantly or not. Deep archive times are longer.

Comment 1244675 by imymoco

Upvotes: 1

B https://aws.amazon.com/jp/about-aws/whats-new/2021/11/amazon-s3-glacier-storage-class-amazon-s3-glacier-flexible-retrieval/

Comment 1235554 by Khooks

Upvotes: 1

Selected Answer: C Answer should be C. B: While this option transitions to S3 Glacier Flexible Retrieval after 2 years, which provides quicker retrieval times than Glacier Deep Archive, it is more expensive. Given the infrequent access pattern after 2 years, the additional cost is not justified.

Comment 1228948 by Fexo

Upvotes: 1

Selected Answer: B Answer is B, because Object needs to be retrieved once / twice monthly , hence GFR S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1-2 times per year and is retrieved asynchronously

https://aws.amazon.com/s3/storage-classes/glacier/

Comment 1219400 by tgv

Upvotes: 1

Selected Answer: C I will go with C because Glacier Flexible Retrieval is way more expensive than Glacier Deep Archive.

Comment 1208177 by Just_Ninja

Upvotes: 2

Selected Answer: C HA and cost effective. Here is no hint in the question for instant access..

Comment 1194551 by Christina666

Upvotes: 1

Selected Answer: C HA - C

Comment 1187042 by lucas_rfsb

Upvotes: 4

Selected Answer: C Since it was requested high availability, then can’t be Standard One Zone. And in the end of 2 years it was asked for the most cost effective, than Glacier Deep Archive.

Comment 1186112 by arvehisa

Upvotes: 2

Selected Answer: C 2 requirements: 1)highly available 2) cost-effective. no mention about load time so C is correct.

Comment 1185478 by blackgamer

Upvotes: 2

Selected Answer: C The question mention “the most cost-effective way”. C is most cost-effective and still highly available. The requirement doesn’t indicate the retrieval time requirement.

Comment 1174561 by CalvinL4

Upvotes: 1

C is wrong. Deep archive requires very long to load.

Comment 1170846 by GiorgioGss

Upvotes: 4

Selected Answer: C The new storage solution must continue to provide high availability = A and C out. After 2 years, data files are accessed only once or twice each year. && MOST cost-effective way = C. C is more cost-effective than B.

Comment 1164402 by CalvinL4

Upvotes: 2

Will go with B.

Comment 1137958 by TonyStark0122

Upvotes: 3

Option B: Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months. Transfer objects to S3 Glacier Flexible Retrieval after 2 years.

Question SPEX1gRxOyP0BIkQx8Ux

Question

A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks. The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team’s BI cluster. The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster. Which solution will meet these requirements?

Choices

A: Set up the sales team BI cluster as a consumer of the ETL cluster by using Redshift data sharing.
B: Create materialized views based on the sales team’s requirements. Grant the sales team direct access to the ETL cluster.
C: Create database views based on the sales team’s requirements. Grant the sales team direct access to the ETL cluster.
D: Unload a copy of the data from the ETL cluster to an Amazon S3 bucket every week. Create an Amazon Redshift Spectrum table based on the content of the ETL cluster.

answer?

Answer: A Answer_ET: A Community answer A (70%) D (30%) Discussion

Comment 1186113 by arvehisa

Upvotes: 5

Selected Answer: A A: redshift data sharing: https://docs.aws.amazon.com/redshift/latest/dg/data_sharing_intro.html With data sharing, you can securely and easily share live data across Amazon Redshift clusters. B: materialized view is only within 1 redshift cluster, across different tables

Comment 1187071 by lucas_rfsb

Upvotes: 5

Selected Answer: D In my opinion using Redshift Data Sharing will consume less resources. ‘D’ envolves using a S3 bucket.

Comment 1286657 by motk123

Upvotes: 2

Seems that the performance of the critical ETL cluster should not be affected when using data sharing, so the answer is likely A:

https://docs.aws.amazon.com/redshift/latest/dg/data_sharing_intro.html

Supporting different kinds of business-critical workloads – Use a central extract, transform, and load (ETL) cluster that shares data with multiple business intelligence (BI) or analytic clusters. This approach provides read workload isolation and chargeback for individual workloads. You can size and scale your individual workload compute according to the workload-specific requirements of price and performance.

https://docs.aws.amazon.com/redshift/latest/dg/considerations.html The performance of the queries on shared data depends on the compute capacity of the consumer clusters.

Comment 1277860 by wimalik

Upvotes: 1

A as Redshift data sharing allows you to share live data across Redshift clusters without having to duplicate the data. This feature enables the sales team to access the data from the ETL cluster directly without interrupting the critical analysis tasks or overloading the ETL cluster’s resources. The sales team can join this shared data with their own data in the BI cluster efficiently.

Comment 1273861 by San_Juan

Upvotes: 1

Selected Answer: D “The solution must minimize usage of the computing resources of the ETL cluster.” That is key. You shouldn’t use ETL cluster, so unload data to S3 and run queries in a separate Redshift Spectrum database. ETL cluster do nothing meanwhile.

Comment 1213627 by VerRi

Upvotes: 3

Selected Answer: A Typetical Redshift data sharing use case

Comment 1209382 by valuedate

Upvotes: 2

key words: “weekly” “The solution must minimize usage of the computing resources of the ETL cluster.”

Answer:D

Comment 1207697 by d8945a1

Upvotes: 4

Selected Answer: A Typical usecase of datasharing in Redshift.

The question mentions that - ‘team needs to join data from the ETL cluster with data that is in the sales team’s BI cluster.’ This is possible with datashare.

Comment 1184845 by jasango

Upvotes: 3

Selected Answer: D The spectrum table is accessed from the sales cluster with zero impact on the ETL cluster.

Comment 1178917 by certplan

Upvotes: 1

Options A, B, and C involve granting the sales team direct access to the ETL cluster, which could potentially impact the performance of the ETL cluster and interfere with its critical analysis tasks. Option D provides a more isolated and scalable approach by leveraging Amazon S3 and Redshift Spectrum for data sharing while minimizing the usage of the ETL cluster’s computing resources.

https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum-sharing-data.html https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-design-tables.html

Comment 1170851 by GiorgioGss

Upvotes: 5

Selected Answer: A Initially I would go with B but that definitely will use more resource.

Comment 1127576 by [Removed]

Upvotes: 4

Selected Answer: A To share data between Redshift clusters and meet the requirements of sharing ETL cluster data with the sales team without interrupting critical analysis tasks and minimizing the usage of the ETL cluster’s computing resources, Redshift Data Sharing is the way to go

https://docs.aws.amazon.com/redshift/latest/dg/data_sharing_intro.html

“Supporting different kinds of business-critical workloads – Use a central extract, transform, and load (ETL) cluster that shares data with multiple business intelligence (BI) or analytic clusters. This approach provides read workload isolation and chargeback for individual workloads. You can size and scale your individual workload compute according to the workload-specific requirements of price and performance”

vuthanhdatt's Second Brain

Explorer

Associate-DEA-C01_25

Questions and Answers

Question LFstZv6LOrzcl2rCxeyL

Question

Choices

Comment 1387448 by daed09

Comment 1358492 by italiancloud2025

Question Zgk6zC4rpHb1WSSIuZ1T

Question

Choices

Comment 1176613 by milofficial

Comment 1235549 by Khooks

Comment 1411897 by Kalyso

Comment 1339468 by plutonash

Comment 1339178 by Udyan

Comment 1282335 by markill123

Comment 1262275 by antun3ra

Comment 1256447 by portland

Comment 1246544 by qwertyuio

Comment 1239905 by bakarys

Comment 1231172 by bigfoot1501

Comment 1213614 by VerRi

Comment 1208166 by Just_Ninja

Comment 1206501 by kairosfc

Comment 1195086 by okechi

Comment 1186108 by arvehisa

Comment 1178611 by certplan

Initialize Spark session

Initialize Glue context

Retrieve Glue job arguments

Define your EMR step

Execute the EMR step

Comment 1178582 by certplan

Comment 1176241 by jellybella

Comment 1167996 by GiorgioGss

Comment 1147090 by BartoszGolebiowski24

Comment 1140072 by rralucard_

Comment 1137955 by TonyStark0122

Question x7gOZZm0eUap17diRR3h

Question

Choices

Comment 1215631 by valuedate

Comment 1206035 by DevoteamAnalytix

Comment 1402185 by Rpathak4

Comment 1400074 by Palee

Comment 1339477 by plutonash

Comment 1330983 by ttpro1995

Comment 1292212 by Adrifersilva

Comment 1292155 by Shatheesh

Comment 1271802 by Shanmahi

Comment 1260937 by V0811

Comment 1241793 by HunkyBunky

Comment 1218976 by FunkyFresco

Comment 1213623 by VerRi

Comment 1204111 by acoshi

Comment 1187012 by lucas_rfsb

Comment 1184835 by jasango

Comment 1178609 by certplan

Comment 1170845 by GiorgioGss

Comment 1140077 by rralucard_

Comment 1137956 by TonyStark0122

Comment 1127573 by [Removed]

Question B3DNZ3VldhpQJPc7I1ku

Question

Choices

Comment 1174656 by helpaws

Comment 1325939 by WarPig666

Comment 1402188 by Rpathak4

Comment 1358447 by anonymous_learner_2

Comment 1345000 by luigiDDD

Comment 1339487 by plutonash

Comment 1339184 by Udyan

Comment 1339183 by Udyan

Comment 1329009 by HagarTheHorrible

Comment 1322650 by Eleftheriia

Comment 1322512 by jk15997

Comment 1321709 by altonh

Comment 1318218 by iamwatchingyoualways