Questions and Answers
Question LFstZv6LOrzcl2rCxeyL
Question
A company has an application that uses an Amazon API Gateway REST API and an AWS Lambda function to retrieve data from an Amazon DynamoDB instance. Users recently reported intermittent high latency in the application’s response times. A data engineer finds that the Lambda function experiences frequent throttling when the company’s other Lambda functions experience increased invocations.
The company wants to ensure the API’s Lambda function operate without being affected by other Lambda functions.
Which solution will meet this requirement MOST cost-effectively?
Choices
- A: Increase the number of read capacity unit (RCU) in DynamoDB.
- B: Configure provisioned concurrency for the Lambda function.
- C: Configure reserved concurrency for the Lambda function.
- D: Increase the Lambda function timeout and allocated memory.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1387448 by daed09
- Upvotes: 1
Selected Answer: C C: It is more cost-effectively, because we reserve slots for the critical/specific Lambda function (configuration instead of more resources).
With answer B we increase concurrency allocation, but it will increase costs, additionally if more Lambda functions are created this issue will ocurr again
Comment 1358492 by italiancloud2025
- Upvotes: 1
Selected Answer: C C: Sí, la concurrencia reservada aísla la Lambda del impacto de otras funciones, siendo la opción más rentable.
Question Zgk6zC4rpHb1WSSIuZ1T
Question
A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII. Which solution will meet this requirement with the LEAST operational effort?
Choices
- A: Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set the S3 data lake as the target for the delivery stream.
- B: Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
- C: Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
- D: Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table and to transform the data. Use the same Lambda function to ingest the data into the S3 data lake.
answer?
Answer: B Answer_ET: B Community answer B (68%) C (30%) 2% Discussion
Comment 1176613 by milofficial
- Upvotes: 12
Selected Answer: B How does Data Quality obfuscate PII? You can do this directly in Glue Studio: https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html
Comment 1235549 by Khooks
- Upvotes: 5
Selected Answer: B Option C involves additional steps and complexity with creating rules in AWS Glue Data Quality, which adds more operational effort compared to directly using AWS Glue Studio’s capabilities.
Comment 1411897 by Kalyso
- Upvotes: 1
Selected Answer: B Actually it is B. No need to create a rule in AWS Glue.
Comment 1339468 by plutonash
- Upvotes: 1
Selected Answer: C B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake. Detect PII transform only detects. Obfuscate the PII ok but how ? Answer C explain how
Comment 1339178 by Udyan
- Upvotes: 1
Selected Answer: C Why C is better than B: Obfuscation clarity: Option C explicitly mentions using a Glue Data Quality rule to obfuscate PII, while option B does not specify how obfuscation is implemented. Accuracy: Glue Data Quality provides a more structured way to handle obfuscation compared to relying solely on Glue Studio’s PII detection. Thus, C is the most accurate and operationally efficient solution.
Comment 1282335 by markill123
- Upvotes: 1
The keyt
Comment 1262275 by antun3ra
- Upvotes: 2
Selected Answer: B B provides a streamlined, mostly visual approach using purpose-built tools for data processing and PII handling, making it the solution with the least operational effort.
Comment 1256447 by portland
- Upvotes: 1
Selected Answer: C https://aws.amazon.com/blogs/big-data/automated-data-governance-with-aws-glue-data-quality-sensitive-data-detection-and-aws-lake-formation/
Comment 1246544 by qwertyuio
- Upvotes: 2
Selected Answer: B https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html
Comment 1239905 by bakarys
- Upvotes: 1
Selected Answer: C anwser is C
Comment 1231172 by bigfoot1501
- Upvotes: 3
I don’t think we need to use much more services to fulfill these requirements. Just AWS Glue is enough, it can detect and obfuscate PII data already. Source: https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html#choose-action-pii
Comment 1213614 by VerRi
- Upvotes: 3
Selected Answer: C We cannot directly handle PII with Glue Studio, and Glue Data Quality can be used to handle PII.
Comment 1208166 by Just_Ninja
- Upvotes: 1
Selected Answer: A A very easy was is to use the SDK to identify PII.
Comment 1206501 by kairosfc
- Upvotes: 3
Selected Answer: C The transform Detect PII in AWS Glue Studio is specifically used to identify personally identifiable information (PII) within the data. It can detect and flag this information, but on its own, it does not perform the obfuscation or removal of these details.
To effectively obfuscate or alter the identified PII, an additional transformation would be necessary. This could be accomplished in several ways, such as:
Writing a custom script within the same AWS Glue job using Python or Scala to modify the PII data as needed. Using AWS Glue Data Quality, if available, to create rules that automatically obfuscate or modify the data identified as PII. AWS Glue Data Quality is a newer tool that helps improve data quality through rules and transformations, but whether it’s needed will depend on the functionality’s availability and the specificity of the obfuscation requirements
Comment 1195086 by okechi
- Upvotes: 2
Answer is option C. Period
Comment 1186108 by arvehisa
- Upvotes: 4
Selected Answer: B B is correct. C: glue data quality cannot obfuscate the PII D: need to write code but the question is the “LEAST operational effort”
Comment 1178611 by certplan
- Upvotes: 2
In python --- from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from pyspark.sql import SparkSession
Initialize Spark session
spark = SparkSession.builder
.appName(“Example Glue Job”)
.getOrCreate()Initialize Glue context
glueContext = GlueContext(SparkContext.getOrCreate())
Retrieve Glue job arguments
args = getResolvedOptions(sys.argv, [‘JOB_NAME’])
Define your EMR step
emr_step = [ { “Name”: “My EMR Step”, “ActionOnFailure”: “CONTINUE”, “HadoopJarStep”: { “Jar”: “s3://your-bucket/emr-scripts/your_script.jar”, “Args”: [ “arg1”, “arg2” ] } } ]
Execute the EMR step
response = glueContext.start_job_run(args[‘JOB_NAME’], job_run_args={‘—extra-py-files’: ‘your_script.py’}) print(response)
Comment 1178582 by certplan
- Upvotes: 2
B. Utilizes AWS Glue Studio for PII detection, AWS Step Functions for orchestration, and S3 for storage. Glue Studio simplifies PII detection, and Step Functions can streamline the data pipeline orchestration, potentially reducing operational effort compared to option A.
C. Similar to option B, but it additionally includes AWS Glue Data Quality for obfuscating PII. This might add a bit more complexity but can also streamline the process if Glue Data Quality offers convenient features for PII obfuscation.
Comment 1176241 by jellybella
- Upvotes: 4
Selected Answer: B AWS Glue Data Quality is a feature that automatically validates the quality of the data during a Glue job run, but it’s not typically used for data obfuscation.
Comment 1167996 by GiorgioGss
- Upvotes: 1
Selected Answer: C https://dev.to/awscommunity-asean/validating-data-quality-with-aws-glue-databrew-4df4 https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html
Comment 1147090 by BartoszGolebiowski24
- Upvotes: 1
I think this is A. We ingest data to s3 with a PPI transformation. We do not need to use glue, or step function here in that case.
Comment 1140072 by rralucard_
- Upvotes: 2
Selected Answer: C Option C seems to be the best solution to meet the requirement with the least operational effort. It leverages AWS Glue Studio for PII detection, AWS Glue Data Quality for obfuscation, and AWS Step Functions for orchestration, minimizing the need for custom coding and manual processes.
Comment 1137955 by TonyStark0122
- Upvotes: 1
C. Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake
Question x7gOZZm0eUap17diRR3h
Question
A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the company’s operational databases into an Amazon S3 based data lake. The ETL workflows use AWS Glue and Amazon EMR to process data. The company wants to improve the existing architecture to provide automated orchestration and to require minimal manual effort. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: AWS Glue workflows
- B: AWS Step Functions tasks
- C: AWS Lambda functions
- D: Amazon Managed Workflows for Apache Airflow (Amazon MWAA) workflows
answer?
Answer: B Answer_ET: B Community answer B (74%) A (26%) Discussion
Comment 1215631 by valuedate
- Upvotes: 15
Selected Answer: B Glue Workflow only orchestrate crawlers and glue jobs
Comment 1206035 by DevoteamAnalytix
- Upvotes: 7
Selected Answer: B For me it’s B because I did not found a possibility how Glue can trigger/orchestrate EMR processes OOTB. But with StepFunction there is a way: https://aws.amazon.com/blogs/big-data/orchestrate-amazon-emr-serverless-jobs-with-aws-step-functions/
Comment 1402185 by Rpathak4
- Upvotes: 1
Selected Answer: A Why Not the Other Options?
B. AWS Step Functions More flexible but requires manual setup of states and transitions for Glue & EMR. Higher operational overhead than Glue Workflows. C. AWS Lambda Lambda is not ideal for long-running ETL workflows. Best suited for lightweight data transformations or event-driven tasks. D. Amazon MWAA (Apache Airflow) More control but requires cluster management and custom DAGs. Higher maintenance than Glue Workflows.
Comment 1400074 by Palee
- Upvotes: 1
Selected Answer: B The company wants to improve the existing architecture so A cannot be the right choice
Comment 1339477 by plutonash
- Upvotes: 1
Selected Answer: B it is interesting to choose A for minimum effort but only step functions can trigger the work both on EMR and on GLUE jobs
Comment 1330983 by ttpro1995
- Upvotes: 2
Selected Answer: B We have both Glue job and EMR job, so we need Step Functions to connect those. Airflow can do it, but required more dev work.
Comment 1292212 by Adrifersilva
- Upvotes: 1
Selected Answer: A glue workflows is part of the glue ecosystem so its provides seamless integration with minimal changes
Comment 1292155 by Shatheesh
- Upvotes: 1
Answer A, Glue workflows
Comment 1271802 by Shanmahi
- Upvotes: 1
Selected Answer: A Glue workflows are managed services and best for considering least operational overhead.
Comment 1260937 by V0811
- Upvotes: 1
Selected Answer: A AWS Glue Workflows are specifically designed for orchestrating ETL jobs in AWS Glue. They allow you to define and manage complex workflows that include multiple jobs and triggers, all within the AWS Glue environment.Integration: AWS Glue workflows seamlessly integrate with other AWS Glue components, making it easier to manage ETL processes without the need for external orchestration tools.Minimal Operational Overhead: Since AWS Glue is a fully managed service, using Glue workflows will reduce the operational overhead compared to managing separate orchestrators or building custom solutions.While D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is also a good choice for more complex orchestration, it may involve more management overhead compared to the more straightforward AWS Glue workflows. Thus, AWS Glue workflows provide the least operational overhead given the context of this scenario.
Comment 1241793 by HunkyBunky
- Upvotes: 1
Selected Answer: B B - because AWS Glue can’t trigger EMR
Comment 1218976 by FunkyFresco
- Upvotes: 3
Selected Answer: B EMR in workflows , i dont think so
Comment 1213623 by VerRi
- Upvotes: 4
Selected Answer: B There is no way for Glue Workflow to trigger EMR
Comment 1204111 by acoshi
- Upvotes: 2
Selected Answer: A https://aws.amazon.com/blogs/big-data/orchestrate-an-etl-pipeline-using-aws-glue-workflows-triggers-and-crawlers-with-custom-classifiers/
Comment 1187012 by lucas_rfsb
- Upvotes: 6
Selected Answer: A Since it seems to me that this pipeline is complex, with multiple workflows, I would go for Glue workflows.
Comment 1184835 by jasango
- Upvotes: 3
Yo me voy por la D) Amazon MWAA porque Glue Workflows solo admite Jobs de Glue y Step Function puede fucionar pero no son workflows de datos. Amazon MWAA son workflows de datos y esta integrado tanto con Glue como EMR: https://aws.amazon.com/blogs/big-data/simplify-aws-glue-job-orchestration-and-monitoring-with-amazon-mwaa/
Comment 1178609 by certplan
- Upvotes: 2
Here’s an example of how you can use AWS Glue to initiate an EMR (Elastic MapReduce) job:
Let’s assume you have an AWS Glue job that performs ETL tasks on data stored in Amazon S3. You want to leverage EMR for a specific task within this job, such as running a complex Spark job.
Define a Glue Job: Create an AWS Glue job using the AWS Glue console, SDK, or CLI. Define the input and output data sources, as well as the transformations you want to apply.
Incorporate EMR Step: Within the Glue job script, include a section where you define an EMR step. An EMR step is a unit of work that performs a specific task on an EMR cluster.
Code follows in the next entry…
Comment 1170845 by GiorgioGss
- Upvotes: 5
Selected Answer: B orchestrating = step function
Comment 1140077 by rralucard_
- Upvotes: 3
Selected Answer: A Option A, AWS Glue Workflows, seems to be the best solution to meet the requirements with the least operational overhead. It offers a seamless integration with the company’s existing AWS Glue and Amazon EMR setup, providing a managed and straightforward way to orchestrate their ETL workflows without extensive additional setup or manual intervention.
Comment 1137956 by TonyStark0122
- Upvotes: 2
Glue Work flows
Comment 1127573 by [Removed]
- Upvotes: 4
Selected Answer: B Orchestrating different AWS services is a typical use case for Step Functions: https://docs.aws.amazon.com/step-functions/latest/dg/connect-emr.html https://docs.aws.amazon.com/step-functions/latest/dg/connect-glue.html
Question B3DNZ3VldhpQJPc7I1ku
Question
A company currently stores all of its data in Amazon S3 by using the S3 Standard storage class. A data engineer examined data access patterns to identify trends. During the first 6 months, most data files are accessed several times each day. Between 6 months and 2 years, most data files are accessed once or twice each month. After 2 years, data files are accessed only once or twice each year. The data engineer needs to use an S3 Lifecycle policy to develop new data storage rules. The new storage solution must continue to provide high availability. Which solution will meet these requirements in the MOST cost-effective way?
Choices
- A: Transition objects to S3 One Zone-Infrequent Access (S3 One Zone-IA) after 6 months. Transfer objects to S3 Glacier Flexible Retrieval after 2 years.
- B: Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months. Transfer objects to S3 Glacier Flexible Retrieval after 2 years.
- C: Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months. Transfer objects to S3 Glacier Deep Archive after 2 years.
- D: Transition objects to S3 One Zone-Infrequent Access (S3 One Zone-IA) after 6 months. Transfer objects to S3 Glacier Deep Archive after 2 years.
answer?
Answer: C Answer_ET: C Community answer C (58%) B (42%) Discussion
Comment 1174656 by helpaws
- Upvotes: 19
Selected Answer: B “S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1-2 times per year and is retrieved asynchronously”
Source: https://aws.amazon.com/s3/storage-classes/glacier/
Comment 1325939 by WarPig666
- Upvotes: 7
Selected Answer: C Flexible retrieval will be higher cost than deep archive. If records only need to be retrieved once or twice a year, this doesn’t mean they need to be instantly available.
Comment 1402188 by Rpathak4
- Upvotes: 1
Selected Answer: C Why Not the Other Options?
A. S3 One Zone-IA → Glacier Flexible Retrieval ❌ One Zone-IA is risky (data loss if the AZ fails). Glacier Flexible Retrieval is more expensive than Deep Archive.
B. S3 Standard-IA → Glacier Flexible Retrieval ❌ Glacier Flexible Retrieval is not the cheapest long-term storage. Deep Archive costs much less.
D. S3 One Zone-IA → Glacier Deep Archive ❌ One Zone-IA lacks high availability (single AZ failure = data loss). S3 Standard-IA is safer.
Comment 1358447 by anonymous_learner_2
- Upvotes: 2
Selected Answer: C Glacier deep archive has the same availability as flexible retrieval and there’s no retrieval time requirement so C is the most cost effective that meets the requirements.
Comment 1345000 by luigiDDD
- Upvotes: 2
Selected Answer: C C is the most cost effective
Comment 1339487 by plutonash
- Upvotes: 1
Selected Answer: B “data files are accessed only once or twice each year”, this is “S3 Glacier Flexible Retrieval” definition
Comment 1339184 by Udyan
- Upvotes: 2
Selected Answer: C Is it mentioned in question that Retrieval time is constraint, no, so, if any engineer need to access data, say May and November, so he/she can wait for 2-3 days to get data, as in the long run, they have an year to analyze the data so, deep archive will save costs only.
Comment 1339183 by Udyan
- Upvotes: 2
Selected Answer: C This question was in Stephen Maarek Udemy practice questions too, here concern not given for extraction time so, just see cost friendlyness, thus, C over B
Comment 1329009 by HagarTheHorrible
- Upvotes: 1
Selected Answer: B deep archive doesn’t make sense
Comment 1322650 by Eleftheriia
- Upvotes: 1
Selected Answer: B For once or twice a year it is flexible retrieval.
Comment 1322512 by jk15997
- Upvotes: 2
Selected Answer: C There is no requirement for the retrieval time.
Comment 1321709 by altonh
- Upvotes: 4
Selected Answer: C There is no requirement for the retrieval time. So this is more cost-effective.
Comment 1318218 by iamwatchingyoualways
- Upvotes: 3
Selected Answer: C No instant access is mentioned. Most Cost effective.
Comment 1313062 by truongnguyen86
- Upvotes: 2
Selected Answer: B Option B is the correct answer because it balances cost-effectiveness and availability:
S3 Standard-IA offers cost savings for infrequently accessed data while maintaining high availability across multiple zones. S3 Glacier Flexible Retrieval is a good balance for archiving with occasional access needs.
Comment 1308362 by lsj900605
- Upvotes: 1
Selected Answer: B B High availability means the need for readily available service. S3 Standard-IA deliver 99.9% availability vs S3 One Zone-IA deliver 99.5% availability S3 Glacier Flexible Retrieval has configurable retrieval times, from minutes to hours, with free bulk retrievals. But with S3 Glacier Deep Archive it’s retrieval time is within 12 hours https://aws.amazon.com/s3/storage-classes/
Comment 1303392 by LrdKanien
- Upvotes: 1
Selected Answer: B B due to flex
Comment 1292794 by michele_scar
- Upvotes: 3
Selected Answer: C It’s C, the request is about “High availability” not “Less time to retrieve the data”. The other request is “most cost effective” so deleting A and D for the HA, remains B and C that both satisfy the HA. Now choose the most cost effective ⇒ C
Comment 1288506 by theloseralreadytaken
- Upvotes: 1
Selected Answer: B Galcier Deep Archive take longer retrieval time (hours to days) than Glacier Flexible Retrieval.
Comment 1285515 by GZMartinelli
- Upvotes: 1
Selected Answer: B Should be B. The questions asks for high availability, when a file is in Glacier Deep Archive, it takes more time to be available to use.
Comment 1279871 by shammous
- Upvotes: 1
Selected Answer: B To retrieve data from the S3 Glacier Deep Archive, you need more than 12 hours! The scenario mentions that we still need to access data instantly once or twice yearly. S3 Glacier Flexible Retrieval is more appropriate in this case.
Comment 1264964 by sachin
- Upvotes: 1
• S3 Glacier Instant Retrieval delivers the lowest cost storage, up to 68% lower cost (than S3 Standard-Infrequent Access), for long-lived data that is accessed once per quarter and requires millisecond retrieval. • S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1-2 times per year and is retrieved asynchronously. • S3 Glacier Deep Archive delivers the lowest cost storage, up to 75% lower cost (than S3 Glacier Flexible Retrieval), for long-lived archive data that is accessed less than once per year and is retrieved asynchronously.
Comment 1252338 by andrologin
- Upvotes: 2
Selected Answer: B Based on this link https://aws.amazon.com/s3/storage-classes/glacier/ Glacier Flexible Retrieval is cheaper that Instant Retrieval
Comment 1246317 by LR2023
- Upvotes: 2
questions doesnt provide clairty on when data is accessed does it need to made available instantly or not. Deep archive times are longer.
Comment 1244675 by imymoco
- Upvotes: 1
Comment 1235554 by Khooks
- Upvotes: 1
Selected Answer: C Answer should be C. B: While this option transitions to S3 Glacier Flexible Retrieval after 2 years, which provides quicker retrieval times than Glacier Deep Archive, it is more expensive. Given the infrequent access pattern after 2 years, the additional cost is not justified.
Comment 1228948 by Fexo
- Upvotes: 1
Selected Answer: B Answer is B, because Object needs to be retrieved once / twice monthly , hence GFR S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1-2 times per year and is retrieved asynchronously
https://aws.amazon.com/s3/storage-classes/glacier/
Comment 1219400 by tgv
- Upvotes: 1
Selected Answer: C I will go with C because Glacier Flexible Retrieval is way more expensive than Glacier Deep Archive.
Comment 1208177 by Just_Ninja
- Upvotes: 2
Selected Answer: C HA and cost effective. Here is no hint in the question for instant access..
Comment 1194551 by Christina666
- Upvotes: 1
Selected Answer: C HA - C
Comment 1187042 by lucas_rfsb
- Upvotes: 4
Selected Answer: C Since it was requested high availability, then can’t be Standard One Zone. And in the end of 2 years it was asked for the most cost effective, than Glacier Deep Archive.
Comment 1186112 by arvehisa
- Upvotes: 2
Selected Answer: C 2 requirements: 1)highly available 2) cost-effective. no mention about load time so C is correct.
Comment 1185478 by blackgamer
- Upvotes: 2
Selected Answer: C The question mention “the most cost-effective way”. C is most cost-effective and still highly available. The requirement doesn’t indicate the retrieval time requirement.
Comment 1174561 by CalvinL4
- Upvotes: 1
C is wrong. Deep archive requires very long to load.
Comment 1170846 by GiorgioGss
- Upvotes: 4
Selected Answer: C The new storage solution must continue to provide high availability = A and C out. After 2 years, data files are accessed only once or twice each year. && MOST cost-effective way = C. C is more cost-effective than B.
Comment 1164402 by CalvinL4
- Upvotes: 2
Will go with B.
Comment 1137958 by TonyStark0122
- Upvotes: 3
Option B: Transition objects to S3 Standard-Infrequent Access (S3 Standard-IA) after 6 months. Transfer objects to S3 Glacier Flexible Retrieval after 2 years.
Question SPEX1gRxOyP0BIkQx8Ux
Question
A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks. The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team’s BI cluster. The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster. Which solution will meet these requirements?
Choices
- A: Set up the sales team BI cluster as a consumer of the ETL cluster by using Redshift data sharing.
- B: Create materialized views based on the sales team’s requirements. Grant the sales team direct access to the ETL cluster.
- C: Create database views based on the sales team’s requirements. Grant the sales team direct access to the ETL cluster.
- D: Unload a copy of the data from the ETL cluster to an Amazon S3 bucket every week. Create an Amazon Redshift Spectrum table based on the content of the ETL cluster.
answer?
Answer: A Answer_ET: A Community answer A (70%) D (30%) Discussion
Comment 1186113 by arvehisa
- Upvotes: 5
Selected Answer: A A: redshift data sharing: https://docs.aws.amazon.com/redshift/latest/dg/data_sharing_intro.html With data sharing, you can securely and easily share live data across Amazon Redshift clusters. B: materialized view is only within 1 redshift cluster, across different tables
Comment 1187071 by lucas_rfsb
- Upvotes: 5
Selected Answer: D In my opinion using Redshift Data Sharing will consume less resources. ‘D’ envolves using a S3 bucket.
Comment 1286657 by motk123
- Upvotes: 2
Seems that the performance of the critical ETL cluster should not be affected when using data sharing, so the answer is likely A:
https://docs.aws.amazon.com/redshift/latest/dg/data_sharing_intro.html
Supporting different kinds of business-critical workloads – Use a central extract, transform, and load (ETL) cluster that shares data with multiple business intelligence (BI) or analytic clusters. This approach provides read workload isolation and chargeback for individual workloads. You can size and scale your individual workload compute according to the workload-specific requirements of price and performance.
https://docs.aws.amazon.com/redshift/latest/dg/considerations.html The performance of the queries on shared data depends on the compute capacity of the consumer clusters.
Comment 1277860 by wimalik
- Upvotes: 1
A as Redshift data sharing allows you to share live data across Redshift clusters without having to duplicate the data. This feature enables the sales team to access the data from the ETL cluster directly without interrupting the critical analysis tasks or overloading the ETL cluster’s resources. The sales team can join this shared data with their own data in the BI cluster efficiently.
Comment 1273861 by San_Juan
- Upvotes: 1
Selected Answer: D “The solution must minimize usage of the computing resources of the ETL cluster.” That is key. You shouldn’t use ETL cluster, so unload data to S3 and run queries in a separate Redshift Spectrum database. ETL cluster do nothing meanwhile.
Comment 1213627 by VerRi
- Upvotes: 3
Selected Answer: A Typetical Redshift data sharing use case
Comment 1209382 by valuedate
- Upvotes: 2
key words: “weekly” “The solution must minimize usage of the computing resources of the ETL cluster.”
Answer:D
Comment 1207697 by d8945a1
- Upvotes: 4
Selected Answer: A Typical usecase of datasharing in Redshift.
The question mentions that - ‘team needs to join data from the ETL cluster with data that is in the sales team’s BI cluster.’ This is possible with datashare.
Comment 1184845 by jasango
- Upvotes: 3
Selected Answer: D The spectrum table is accessed from the sales cluster with zero impact on the ETL cluster.
Comment 1178917 by certplan
- Upvotes: 1
Options A, B, and C involve granting the sales team direct access to the ETL cluster, which could potentially impact the performance of the ETL cluster and interfere with its critical analysis tasks. Option D provides a more isolated and scalable approach by leveraging Amazon S3 and Redshift Spectrum for data sharing while minimizing the usage of the ETL cluster’s computing resources.
https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum-sharing-data.html https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-design-tables.html
Comment 1170851 by GiorgioGss
- Upvotes: 5
Selected Answer: A Initially I would go with B but that definitely will use more resource.
Comment 1127576 by [Removed]
- Upvotes: 4
Selected Answer: A To share data between Redshift clusters and meet the requirements of sharing ETL cluster data with the sales team without interrupting critical analysis tasks and minimizing the usage of the ETL cluster’s computing resources, Redshift Data Sharing is the way to go
https://docs.aws.amazon.com/redshift/latest/dg/data_sharing_intro.html
“Supporting different kinds of business-critical workloads – Use a central extract, transform, and load (ETL) cluster that shares data with multiple business intelligence (BI) or analytic clusters. This approach provides read workload isolation and chargeback for individual workloads. You can size and scale your individual workload compute according to the workload-specific requirements of price and performance”