Questions and Answers
Question STisj3OFt3RFwY0AQ2mf
Question
A company needs to load customer data that comes from a third party into an Amazon Redshift data warehouse. The company stores order data and product data in the same data warehouse. The company wants to use the combined dataset to identify potential new customers.
A data engineer notices that one of the fields in the source data includes values that are in JSON format.
How should the data engineer load the JSON data into the data warehouse with the LEAST effort?
Choices
- A: Use the SUPER data type to store the data in the Amazon Redshift table.
- B: Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table.
- C: Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data.
- D: Use an AWS Lambda function to flatten the JSON data. Store the data in Amazon S3.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1330756 by HagarTheHorrible
- Upvotes: 2
Selected Answer: A The SUPER data type in Amazon Redshift allows you to store semi-structured data such as JSON directly in a Redshift table without the need to flatten or transform the data first.
Comment 1307097 by kupo777
- Upvotes: 3
A is correct.
https://docs.aws.amazon.com/redshift/latest/dg/super-overview.html
Question p5Nx7wJQnTJDEeUmM0TW
Question
A company wants to analyze sales records that the company stores in a MySQL database. The company wants to correlate the records with sales opportunities identified by Salesforce.
The company receives 2 GB of sales records every day. The company has 100 GB of identified sales opportunities. A data engineer needs to develop a process that will analyze and correlate sales records and sales opportunities. The process must run once each night.
Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to fetch both datasets. Use AWS Lambda functions to correlate the datasets. Use AWS Step Functions to orchestrate the process.
- B: Use Amazon AppFlow to fetch sales opportunities from Salesforce. Use AWS Glue to fetch sales records from the MySQL database. Correlate the sales records with the sales opportunities. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the process.
- C: Use Amazon AppFlow to fetch sales opportunities from Salesforce. Use AWS Glue to fetch sales records from the MySQL database. Correlate the sales records with sales opportunities. Use AWS Step Functions to orchestrate the process.
- D: Use Amazon AppFlow to fetch sales opportunities from Salesforce. Use Amazon Kinesis Data Streams to fetch sales records from the MySQL database. Use Amazon Managed Service for Apache Flink to correlate the datasets. Use AWS Step Functions to orchestrate the process.
answer?
Answer: C Answer_ET: C Community answer C (83%) B (17%) Discussion
Comment 1330759 by HagarTheHorrible
- Upvotes: 5
Selected Answer: C App Flow to get the data from Salse Force, Glue for ETL and Step Functions for orchestration, all managed all serverless, LEAST OVERHEAD!
Comment 1331807 by Eleftheriia
- Upvotes: 1
Selected Answer: B I assume Step Functions are more on the data processing side and also with less cost and overhead, but since the question is about workflow orchestration and this is the default definition of MWAA, why should someone select C over B?
Question aXAUEh2ywppsV5443j9d
Question
A company stores server logs in an Amazon S3 bucket. The company needs to keep the logs for 1 year. The logs are not required after 1 year.
A data engineer needs a solution to automatically delete logs that are older than 1 year.
Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Define an S3 Lifecycle configuration to delete the logs after 1 year.
- B: Create an AWS Lambda function to delete the logs after 1 year.
- C: Schedule a cron job on an Amazon EC2 instance to delete the logs after 1 year.
- D: Configure an AWS Step Functions state machine to delete the logs after 1 year.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1358487 by italiancloud2025
- Upvotes: 1
Selected Answer: A A: Sí, porque una configuración de ciclo de vida de S3 elimina automáticamente los logs después de 1 año sin intervención manual. B: No, requiere código personalizado y mantenimiento extra. C: No, implica gestionar una instancia EC2 y cron jobs, lo que añade complejidad. D: No, usar Step Functions añade sobrecarga operativa innecesaria para una tarea simple.
Comment 1330740 by HagarTheHorrible
- Upvotes: 1
Selected Answer: A Amazon S3 provides Lifecycle policies, which allow you to automate the management of objects stored in a bucket. You can configure a rule to automatically delete objects older than a specified age
Question DFX3YMoq7MDaAYt1BUFI
Question
A company is designing a serverless data processing workflow in AWS Step Functions that involves multiple steps. The processing workflow ingests data from an external API, transforms the data by using multiple AWS Lambda functions, and loads the transformed data into Amazon DynamoDB.
The company needs the workflow to perform specific steps based on the content of the incoming data.
Which Step Functions state type should the company use to meet this requirement?
Choices
- A: Parallel
- B: Choice
- C: Task
- D: Map
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1341210 by MerryLew
- Upvotes: 1
Selected Answer: B Choice adds conditional logic. IE, the status of incoming data.
Comment 1330742 by HagarTheHorrible
- Upvotes: 1
Selected Answer: B if something depends on something before that than it is Choice State.
Question i8SAYg4VG6Qo2kb1d0wl
Question
A data engineer created a table named cloudtrail_logs in Amazon Athena to query AWS CloudTrail logs and prepare data for audits. The data engineer needs to write a query to display errors with error codes that have occurred since the beginning of 2024. The query must return the 10 most recent errors.
Which query will meet these requirements?
Choices
- A: select count (*) as TotalEvents, eventname, errorcode, errormessage from cloudtrail_logswhere errorcode is not nulland eventtime >= ‘2024-01-01T00:00:00Z’ group by eventname, errorcode, errormessageorder by TotalEvents desclimit 10;
- B: select count (*) as TotalEvents, eventname, errorcode, errormessage from cloudtrail_logs where eventtime >= ‘2024-01-01T00:00:00Z’ group by eventname, errorcode, errormessage order by TotalEvents desc limit 10;
- C: select count (*) as TotalEvents, eventname, errorcode, errormessage from cloudtrail_logswhere eventtime >= ‘2024-01-01T00:00:00Z’ group by eventname, errorcode, errormessageorder by eventname asc limit 10;
- D: select count (*) as TotalEvents, eventname, errorcode, errormessage from cloudtrail_logs where errorcode is not nulland eventtime >= ‘2024-01-01T00:00:00Z’ group by eventname, errorcode, errormessagelimit 10;
answer?
Answer: A Answer_ET: A Community answer A (67%) B (33%) Discussion
Comment 1399257 by Ramdi1
- Upvotes: 2
Selected Answer: A Why Option A is Correct? ✅ WHERE errorcode IS NOT NULL → Filters out successful events, keeping only errors. ✅ AND eventtime >= ‘2024-01-01T00:00:00Z’ → Ensures only logs from 2024 are considered. ✅ GROUP BY eventname, errorcode, errormessage → Aggregates error occurrences per event. ✅ ORDER BY TotalEvents DESC → Sorts by the number of occurrences, ensuring the most frequent errors appear first. ✅ LIMIT 10 → Returns only the 10 most recent errors.
Comment 1363725 by simon2133
- Upvotes: 2
Selected Answer: A A. Same as B but including filter for error code being set
Comment 1355077 by fnuuu
- Upvotes: 1
Selected Answer: B Query in the Option B is correct with ‘desc’ order
Comment 1344818 by A_E_M
- Upvotes: 1
Selected Answer: B This is not the same, but it shows the important point. Descending order is the correct answer. SELECT * FROM cloudtrail_logs WHERE eventTime >= ‘2024-01-01’ AND errorCode IS NOT NULL ORDER BY eventTime DESC LIMIT 10;