Questions and Answers
Question s9ZU6ZSRb3BwfdA3XaN5
Question
A retail company is expanding its operations globally. The company needs to use Amazon QuickSight to accurately calculate currency exchange rates for financial reports. The company has an existing dashboard that includes a visual that is based on an analysis of a dataset that contains global currency values and exchange rates.
A data engineer needs to ensure that exchange rates are calculated with a precision of four decimal places. The calculations must be precomputed. The data engineer must materialize results in QuickSight super-fast, parallel, in-memory calculation engine (SPICE).
Which solution will meet these requirements?
Choices
- A: Define and create the calculated field in the dataset.
- B: Define and create the calculated field in the analysis.
- C: Define and create the calculated field in the visual.
- D: Define and create the calculated field in the dashboard.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1358485 by italiancloud2025
- Upvotes: 1
Selected Answer: A A: Sí, porque al crear el campo calculado en el dataset se precomputan los valores y se materializan en SPICE, asegurando precisión y rapidez. B: No, porque los campos calculados en el análisis se calculan en tiempo de consulta, no se precomputan. C: No, porque los campos calculados en la visual se generan al renderizar, no se almacenan en SPICE. D: No, porque los campos calculados en el dashboard tampoco se precomputan en SPICE.
Comment 1317293 by emupsx1
- Upvotes: 2
Selected Answer: A https://docs.aws.amazon.com/quicksight/latest/user/adding-a-calculated-field-analysis.html
Question SlSnhzKaZhFQ9RtXZC7D
Question
A company has three subsidiaries. Each subsidiary uses a different data warehousing solution. The first subsidiary hosts its data warehouse in Amazon Redshift. The second subsidiary uses Teradata Vantage on AWS. The third subsidiary uses Google BigQuery.
The company wants to aggregate all the data into a central Amazon S3 data lake. The company wants to use Apache Iceberg as the table format.
A data engineer needs to build a new pipeline to connect to all the data sources, run transformations by using each source engine, join the data, and write the data to Iceberg.
Which solution will meet these requirements with the LEAST operational effort?
Choices
- A: Use native Amazon Redshift, Teradata, and BigQuery connectors to build the pipeline in AWS Glue. Use native AWS Glue transforms to join the data. Run a Merge operation on the data lake Iceberg table.
- B: Use the Amazon Athena federated query connectors for Amazon Redshift, Teradata, and BigQuery to build the pipeline in Athena. Write a SQL query to read from all the data sources, join the data, and run a Merge operation on the data lake Iceberg table.
- C: Use the native Amazon Redshift connector, the Java Database Connectivity (JDBC) connector for Teradata, and the open source Apache Spark BigQuery connector to build the pipeline in Amazon EMR. Write code in PySpark to join the data. Run a Merge operation on the data lake Iceberg table.
- D: Use the native Amazon Redshift, Teradata, and BigQuery connectors in Amazon Appflow to write data to Amazon S3 and AWS Glue Data Catalog. Use Amazon Athena to join the data. Run a Merge operation on the data lake Iceberg table.
answer?
Answer: B Answer_ET: B Community answer B (50%) A (42%) 8% Discussion
Comment 1426531 by bad1ccc
- Upvotes: 1
Selected Answer: B https://docs.aws.amazon.com/athena/latest/ug/federated-queries.html
Comment 1399527 by Palee
- Upvotes: 1
Selected Answer: D The requirement is to aggregate the data in S3. Only option has exclusively called this out. So Ans D is correct
Comment 1341172 by MerryLew
- Upvotes: 1
Selected Answer: A Athena can be used to build certain types of data pipelines, particularly when the primary focus is on ad-hoc analysis and querying large datasets stored in S3 without the need for complex data transformations, but for more intricate data processing and heavy ETL operations, other AWS services like Glue are often more suitable due to their dedicated data processing capabilities.
Comment 1339738 by Eeshav15
- Upvotes: 1
Selected Answer: A Glue is the right tool to build pipeline
Comment 1312689 by michele_scar
- Upvotes: 2
Selected Answer: B https://docs.aws.amazon.com/athena/latest/ug/connectors-available.html
Comment 1311830 by Eleftheriia
- Upvotes: 3
Selected Answer: B Would it be B “If you have data in sources other than Amazon S3, you can use Athena Federated Query to query the data in place or build pipelines that extract data from multiple data sources and store them in Amazon S3. With Athena Federated Query, you can run SQL queries across data stored in relational, non-relational, object, and custom data sources.”
https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html
Comment 1307370 by kupo777
- Upvotes: 3
Correct Answer: B
Use the Amazon Athena federated query connectors for Amazon Redshift, Teradata, and BigQuery to build the pipeline in Athena. Write a SQL query to read from all the data sources, join the data, and run a Merge operation on the data lake Iceberg table.
Comment 1303981 by ae35a02
- Upvotes: 3
Selected Answer: A AWS GLUE has native connectors to Redshift, BigQuery and Terradata, and integrates with Iceberg format. Athena is not for building Pipelines, AppFlow is for transfering data from Saas applications
Comment 1303481 by Parandhaman_Margan
- Upvotes: 2
Answer:A
Question hNX977I3u2ABaqOSSfm9
Question
A company is building a data stream processing application. The application runs in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. The application stores processed data in an Amazon DynamoDB table.
The company needs the application containers in the EKS cluster to have secure access to the DynamoDB table. The company does not want to embed AWS credentials in the containers.
Which solution will meet these requirements?
Choices
- A: Store the AWS credentials in an Amazon S3 bucket. Grant the EKS containers access to the S3 bucket to retrieve the credentials.
- B: Attach an IAM role to the EKS worker nodes, Grant the IAM role access to DynamoDUse the IAM role to set up IAM roles service accounts (IRSA) functionality.
- C: Create an IAM user that has an access key to access the DynamoDB table. Use environment variables in the EKS containers to store the IAM user access key data.
- D: Create an IAM user that has an access key to access the DynamoDB table. Use Kubernetes secrets that are mounted in a volume of the EKS duster nodes to store the user access key data.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1317296 by emupsx1
- Upvotes: 1
Selected Answer: B https://docs.aws.amazon.com/eks/latest/userguide/create-node-role.html
Comment 1316514 by jacob_nz
- Upvotes: 1
Selected Answer: B https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
Question jUDcrdWkegc5Gtu2XjoQ
Question
A data engineer needs to onboard a new data producer into AWS. The data producer needs to migrate data products to AWS.
The data producer maintains many data pipelines that support a business application. Each pipeline must have service accounts and their corresponding credentials. The data engineer must establish a secure connection from the data producer’s on-premises data center to AWS. The data engineer must not use the public internet to transfer data from an on-premises data center to AWS.
Which solution will meet these requirements?
Choices
- A: Instruct the new data producer to create Amazon Machine Images (AMIs) on Amazon Elastic Container Service (Amazon ECS) to store the code base of the application. Create security groups in a public subnet that allow connections only to the on-premises data center.
- B: Create an AWS Direct Connect connection to the on-premises data center. Store the service account credentials in AWS Secrets manager.
- C: Create a security group in a public subnet. Configure the security group to allow only connections from the CIDR blocks that correspond to the data producer. Create Amazon S3 buckets than contain presigned URLS that have one-day expiration dates.
- D: Create an AWS Direct Connect connection to the on-premises data center. Store the application keys in AWS Secrets Manager. Create Amazon S3 buckets that contain presigned URLS that have one-day expiration dates.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1362734 by Ell89
- Upvotes: 1
Selected Answer: B B. all others contain partial nonsense
Comment 1341179 by MerryLew
- Upvotes: 1
Selected Answer: B For secure connections without cost constraints, always think Direct Connect.
Comment 1317300 by emupsx1
- Upvotes: 2
Selected Answer: B Direct Connect + Secret Manager
Question AIadvPaUTXGHkEUE739o
Question
A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.
The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.
Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Choose two.)
Choices
- A: Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.
- B: Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.
- C: Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.
- D: Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.
- E: Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on S3 events that the SQS queue receives.
answer?
Answer: AC Answer_ET: AC Community answer AC (54%) AB (38%) 8% Discussion
Comment 1362736 by Ell89
- Upvotes: 1
Selected Answer: AC • A leverages the event-driven capability of Glue Crawlers. • C uses AWS Lambda for direct and real-time updates to the Data Catalog. • This combination ensures incremental updates are made only when changes occur, reducing costs and operational complexity.
Comment 1348841 by YUICH
- Upvotes: 1
Selected Answer: AB (A) S3 Event-Based Crawler: Automatically triggers incremental catalog updates whenever new data arrives in the S3 bucket, reducing the need for custom code and manual intervention.
(B) Time-Based Schedule: Periodically runs the crawler to catch any missed events and keep the data catalog accurate and up to date.
Using both methods minimizes operational overhead while ensuring comprehensive and reliable incremental updates.
Comment 1331957 by axantroff
- Upvotes: 1
Selected Answer: AB Check out the design pattern documentation for this case. There’s no need for Lambda here, so option C should be excluded. Option B seems viable, along with option A (A is the obvious choice for me).
https://aws.amazon.com/blogs/big-data/run-aws-glue-crawlers-using-amazon-s3-event-notifications/
Comment 1312692 by michele_scar
- Upvotes: 3
Selected Answer: AC B and D are wrong due too “Manually” and “Scheduling”. E is too much for this use case
Comment 1308007 by tucobbad
- Upvotes: 3
Selected Answer: AC
Option A suggests creating an S3 event-based AWS Glue crawler to consume events from the SQS queue. This option is appropriate as it allows the crawler to automatically respond to events, thereby reducing manual intervention and ensuring timely updates to the Data Catalog
Option C involves using an AWS Lambda function to directly update the Data Catalog based on S3 events received from the SQS queue. This is a strong candidate as it automates the update process without the need for manual scheduling or intervention, thus minimizing operational overhead. AWS Glue Crawlers can consume events from an SQS queue: https://docs.aws.amazon.com/glue/latest/dg/crawler-s3-event-notifications.html
Comment 1305400 by pikuantne
- Upvotes: 3
Selected Answer: AB Based on this article (Option 1 for the architecture) it should be AB:
- Run the crawler on a schedule.
- Crawler polls for object create events in the SQS queue 3a. If there are events, crawler updates the Data Catalog 3b. If not, crawler stops
Comment 1303992 by ae35a02
- Upvotes: 1
Selected Answer: BC AWS Glue Crawlers can not consupe events from an SQS queue D introduce a manual operation E introduce more complexity so BC