Questions and Answers
Question POjliky7vY1cSOhizJln
Question
A company stores petabytes of data in thousands of Amazon S3 buckets in the S3 Standard storage class. The data supports analytics workloads that have unpredictable and variable data access patterns. The company does not access some data for months. However, the company must be able to retrieve all data within milliseconds. The company needs to optimize S3 storage costs. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use S3 Storage Lens standard metrics to determine when to move objects to more cost-optimized storage classes. Create S3 Lifecycle policies for the S3 buckets to move objects to cost-optimized storage classes. Continue to refine the S3 Lifecycle policies in the future to optimize storage costs.
- B: Use S3 Storage Lens activity metrics to identify S3 buckets that the company accesses infrequently. Configure S3 Lifecycle rules to move objects from S3 Standard to the S3 Standard-Infrequent Access (S3 Standard-IA) and S3 Glacier storage classes based on the age of the data.
- C: Use S3 Intelligent-Tiering. Activate the Deep Archive Access tier.
- D: Use S3 Intelligent-Tiering. Use the default access tier.
answer?
Answer: D Answer_ET: D Community answer D (89%) 11% Discussion
Comment 1177258 by GiorgioGss
- Upvotes: 12
Selected Answer: D Although C is more cost-effective, because of “must be able to retrieve all data within milliseconds” will go with D
Comment 1247687 by andrologin
- Upvotes: 2
Selected Answer: D Based on this docs https://docs.aws.amazon.com/AmazonS3/latest/userguide/intelligent-tiering-overview.html D will be appropriate as it allows for instant retrieval
Comment 1235533 by rpwags
- Upvotes: 3
Selected Answer: D Staying with “D”… The Amazon S3 Glacier Deep Archive storage class is designed for long-term data archiving where data retrieval times are flexible. It does not offer millisecond retrieval times. Instead, data retrieval from S3 Glacier Deep Archive typically takes 12 hours or more. For millisecond retrieval times, you would use the S3 Standard, S3 Standard-IA, or S3 One Zone-IA storage classes, which are designed for frequent or infrequent access with low latency.
Comment 1200232 by raghumvj
- Upvotes: 2
Selected Answer: D I am confused with C or D
Comment 1197928 by chris_spencer
- Upvotes: 1
Selected Answer: C C is correct.
“Amazon S3 Glacier Instant Retrieval is an archive storage class that delivers the lowest-cost storage for long-lived data that is rarely accessed and requires retrieval in milliseconds.” https://aws.amazon.com/s3/storage-classes/glacier/instant-retrieval/
Comment 1194774 by Christina666
- Upvotes: 2
Selected Answer: D least operation overhead, D
Comment 1190327 by arvehisa
- Upvotes: 2
The correct answer may be D. Intelligent tiering’s default access tier is:
- accessed less than 30 days: frequent access tier
- not accessed in 30-90 days: Infrequent Access tier
- not accessed more than 90 days: Archive Instant Access tier Other tiers require more retrieve time need activation. https://docs.aws.amazon.com/AmazonS3/latest/userguide/intelligent-tiering-overview.html
Comment 1174738 by helpaws
- Upvotes: 2
Selected Answer: C Amazon S3 Glacier Instant Retrieval is an archive storage class that delivers the lowest-cost storage for long-lived data that is rarely accessed and requires retrieval in milliseconds
Comment 1173314 by kj07
- Upvotes: 1
A few remarks: Data should be retrieved in ms. This means all the options with Glacier are wrong: BC
For D how you can set the S3 intelligent-Tiering if the current class is Standard? I guess you need a lifecycle policy.
Which leaves only A as an option.
Thoughts?
Comment 1168162 by damaldon
- Upvotes: 1
D. is correct
Comment 1165126 by Felix_G
- Upvotes: 1
Option C. Use S3 Intelligent-Tiering. Activate the Deep Archive Access tier.
By using S3 Intelligent-Tiering and activating the Deep Archive Access tier, the company can optimize S3 storage costs with minimal operational overhead. S3 Intelligent-Tiering automatically moves objects between four access tiers, including the Deep Archive Access tier, based on changing access patterns and cost optimization. This eliminates the need for manual lifecycle policies and constant refinement, as the storage class is adjusted automatically based on data access patterns, resulting in cost savings while ensuring quick access to all data when needed.
Comment 1138543 by rralucard_
- Upvotes: 3
Selected Answer: D Option D, using S3 Intelligent-Tiering with the default access tier, will meet the requirements best. It provides a hands-off approach to storage cost optimization while ensuring that data is available for analytics workloads within the required timeframe.
Question kF7JkGwSBIWPnFZRnYHI
Question
During a security review, a company identified a vulnerability in an AWS Glue job. The company discovered that credentials to access an Amazon Redshift cluster were hard coded in the job script. A data engineer must remediate the security vulnerability in the AWS Glue job. The solution must securely store the credentials. Which combination of steps should the data engineer take to meet these requirements? (Choose two.)
Choices
- A: Store the credentials in the AWS Glue job parameters.
- B: Store the credentials in a configuration file that is in an Amazon S3 bucket.
- C: Access the credentials from a configuration file that is in an Amazon S3 bucket by using the AWS Glue job.
- D: Store the credentials in AWS Secrets Manager.
- E: Grant the AWS Glue job IAM role access to the stored credentials.
answer?
Answer: DE Answer_ET: DE Community answer DE (100%) Discussion
Comment 1177260 by GiorgioGss
- Upvotes: 9
Selected Answer: DE D because it’s AWS best practice for securing creds and E because after you put cred in secrets you will need permissions for accesing
Comment 1168164 by damaldon
- Upvotes: 1
Ans, DE
Comment 1138547 by rralucard_
- Upvotes: 3
Selected Answer: DE D. Store the credentials in AWS Secrets Manager: AWS Secrets Manager is a service that helps you protect access to your applications, services, and IT resources without the upfront investment and on-going maintenance costs of operating your own infrastructure. It’s specifically designed for storing and retrieving credentials securely, and therefore, it is an appropriate choice for handling the Redshift cluster credentials.
E. Grant the AWS Glue job IAM role access to the stored credentials: IAM roles for AWS Glue will allow the job to assume a role with the necessary permissions to access the credentials in AWS Secrets Manager. This method avoids embedding credentials directly in the script or a configuration file and allows for centralized management of the credentials.
Question sMTE6fsGvzkykm9R4oKT
Question
A data engineer uses Amazon Redshift to run resource-intensive analytics processes once every month. Every month, the data engineer creates a new Redshift provisioned cluster. The data engineer deletes the Redshift provisioned cluster after the analytics processes are complete every month. Before the data engineer deletes the cluster each month, the data engineer unloads backup data from the cluster to an Amazon S3 bucket. The data engineer needs a solution to run the monthly analytics processes that does not require the data engineer to manage the infrastructure manually. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use Amazon Step Functions to pause the Redshift cluster when the analytics processes are complete and to resume the cluster to run new processes every month.
- B: Use Amazon Redshift Serverless to automatically process the analytics workload.
- C: Use the AWS CLI to automatically process the analytics workload.
- D: Use AWS CloudFormation templates to automatically process the analytics workload.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1194785 by Christina666
- Upvotes: 5
Selected Answer: B Fully Managed, Serverless: Redshift Serverless eliminates the need to manually create, manage, or delete clusters. It automatically scales resources based on the workload, reducing operational overhead significantly. Cost-Effective for Infrequent Workloads: Since the analytics processes run only once a month, Redshift Serverless’s pay-per-use model is ideal for minimizing costs during downtime. Seamless S3 Integration: Redshift Serverless natively integrates with S3 for backup and restore operations, ensuring compatibility with the existing process.
Comment 1219522 by 4c78df0
- Upvotes: 1
Selected Answer: B B is correct
Comment 1177263 by GiorgioGss
- Upvotes: 2
Selected Answer: B “does not require to manage the infrastructure manually” = Serverless
Comment 1168183 by damaldon
- Upvotes: 1
Ans. B
Comment 1165128 by Felix_G
- Upvotes: 3
Selected Answer B: Options A, C and D still involve manual tasks like administering CloudFormation stacks, using AWS CLI commands, or configuring Step Function state machines.
By leveraging Redshift Serverless, the data engineer avoids all cluster and infrastructure administration effort. This has the least operational overhead to run the monthly
Comment 1138549 by rralucard_
- Upvotes: 2
Selected Answer: B Use Amazon Redshift Serverless. This option allows the data engineer to focus on the analytics processes themselves without worrying about cluster provisioning, scaling, or management. It provides an on-demand, serverless solution that can handle variable workloads and is cost-effective for intermittent and irregular processing needs like those described.
Question MlyJ2jo9DsR0Arvs0ms1
Question
A company receives a daily file that contains customer data in .xls format. The company stores the file in Amazon S3. The daily file is approximately 2 GB in size. A data engineer concatenates the column in the file that contains customer first names and the column that contains customer last names. The data engineer needs to determine the number of distinct customers in the file. Which solution will meet this requirement with the LEAST operational effort?
Choices
- A: Create and run an Apache Spark job in an AWS Glue notebook. Configure the job to read the S3 file and calculate the number of distinct customers.
- B: Create an AWS Glue crawler to create an AWS Glue Data Catalog of the S3 file. Run SQL queries from Amazon Athena to calculate the number of distinct customers.
- C: Create and run an Apache Spark job in Amazon EMR Serverless to calculate the number of distinct customers.
- D: Use AWS Glue DataBrew to create a recipe that uses the COUNT_DISTINCT aggregate function to calculate the number of distinct customers.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1138552 by rralucard_
- Upvotes: 9
Selected Answer: D AWS Glue DataBrew: AWS Glue DataBrew is a visual data preparation tool that allows data engineers and data analysts to clean and normalize data without writing code. Using DataBrew, a data engineer could create a recipe that includes the concatenation of the customer first and last names and then use the COUNT_DISTINCT function. This would not require complex code and could be performed through the DataBrew user interface, representing a lower operational effort.
Comment 1228711 by pypelyncar
- Upvotes: 2
Selected Answer: D DataBrew supports various transformations, including the COUNT_DISTINCT function, which is ideal for calculating the number of unique values in a column (combined first and last names in this case).
Comment 1197192 by Ousseyni
- Upvotes: 2
Selected Answer: D go in D
Comment 1188250 by lucas_rfsb
- Upvotes: 2
Selected Answer: D since it’s less operational effort, I would go in D
Question tyDYP36owfIY0DtoHaYv
Question
A healthcare company uses Amazon Kinesis Data Streams to stream real-time health data from wearable devices, hospital equipment, and patient records. A data engineer needs to find a solution to process the streaming data. The data engineer needs to store the data in an Amazon Redshift Serverless warehouse. The solution must support near real-time analytics of the streaming data and the previous day’s data. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Load data into Amazon Kinesis Data Firehose. Load the data into Amazon Redshift.
- B: Use the streaming ingestion feature of Amazon Redshift.
- C: Load the data into Amazon S3. Use the COPY command to load the data into Amazon Redshift.
- D: Use the Amazon Aurora zero-ETL integration with Amazon Redshift.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1139892 by rralucard_
- Upvotes: 7
Selected Answer: B https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html Use the Streaming Ingestion Feature of Amazon Redshift: Amazon Redshift recently introduced streaming data ingestion, allowing Redshift to consume data directly from Kinesis Data Streams in near real-time. This feature simplifies the architecture by eliminating the need for intermediate steps or services, and it is specifically designed to support near real-time analytics. The operational overhead is minimal since the feature is integrated within Redshift.
Comment 1304366 by ssnei
- Upvotes: 1
option B
Comment 1219527 by 4c78df0
- Upvotes: 1
Selected Answer: B B is correct
Comment 1188254 by lucas_rfsb
- Upvotes: 1
Selected Answer: B I’d go in B