Questions and Answers

Question Z0A01jDf4r42IOn8UQMF

Question

A company has a data lake in Amazon S3. The company uses AWS Glue to catalog data and AWS Glue Studio to implement data extract, transform, and load (ETL) pipelines.

The company needs to ensure that data quality issues are checked every time the pipelines run. A data engineer must enhance the existing pipelines to evaluate data quality rules based on predefined thresholds.

Which solution will meet these requirements with the LEAST implementation effort?

Choices

A: Add a new transform that is defined by a SQL query to each Glue ETL job. Use the SQL query to implement a ruleset that includes the data quality rules that need to be evaluated.
B: Add a new Evaluate Data Quality transform to each Glue ETL job. Use Data Quality Definition Language (DQDL) to implement a ruleset that includes the data quality rules that need to be evaluated.
C: Add a new custom transform to each Glue ETL job. Use the PyDeequ library to implement a ruleset that includes the data quality rules that need to be evaluated.
D: Add a new custom transform to each Glue ETL job. Use the Great Expectations library to implement a ruleset that includes the data quality rules that need to be evaluated.

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1341164 by MerryLew

Upvotes: 1

Selected Answer: B AWS Glue Data Quality works with Data Quality Definition Language (DQDL) to define data quality rules.

Comment 1317263 by emupsx1

Upvotes: 2

Selected Answer: B https://docs.aws.amazon.com/glue/latest/dg/tutorial-data-quality.html

Question 35clTfbd9Ssjg5a5kpxI

Question

A company has an application that uses a microservice architecture. The company hosts the application on an Amazon Elastic Kubernetes Services (Amazon EKS) cluster.

The company wants to set up a robust monitoring system for the application. The company needs to analyze the logs from the EKS cluster and the application. The company needs to correlate the cluster’s logs with the application’s traces to identify points of failure in the whole application request flow.

Which combination of steps will meet these requirements with the LEAST development effort? (Choose two.)

Choices

A: Use FluentBit to collect logs. Use OpenTelemetry to collect traces.
B: Use Amazon CloudWatch to collect logs. Use Amazon Kinesis to collect traces.
C: Use Amazon CloudWatch to collect logs. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to collect traces.
D: Use Amazon OpenSearch to correlate the logs and traces.
E: Use AWS Glue to correlate the logs and traces.

answer?

Answer: AD Answer_ET: AD Community answer AD (100%) Discussion

Comment 1317264 by emupsx1

Upvotes: 2

Selected Answer: AD https://aws.amazon.com/blogs/big-data/part-1-microservice-observability-with-amazon-opensearch-service-trace-and-log-correlation/

Question jdTXQhxwg7OaYT0XidaW

Question

A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently. The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database. Which AWS service should the company use to meet these requirements?

Choices

A: AWS Lambda
B: AWS Database Migration Service (AWS DMS)
C: AWS Direct Connect
D: AWS DataSync

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1127214 by milofficial

Upvotes: 21

Selected Answer: B Whoever is the admin that pre-marks the answers, it’s time to go

Comment 1137929 by TonyStark0122

Upvotes: 13

B. AWS Database Migration Service (AWS DMS)

Explanation: AWS Database Migration Service (DMS) is specifically designed for migrating data from various sources, including on-premises databases, to AWS with minimal downtime and disruption to applications. It supports homogeneous migrations (e.g., SQL Server to SQL Server) as well as heterogeneous migrations (e.g., SQL Server to Amazon RDS for SQL Server).

Comment 1279864 by shammous

Upvotes: 1

Hahaha, I loved the “downtown” typo in the question. I always say the same instead of “downtime”..

Comment 1269298 by San_Juan

Upvotes: 2

D could be OK. I mean, it is talking about migrating to a cloud database and switching-off the current on-premise database. So you could use AWS Snowball Edge Storage to move the backup of the on-premises database, and when it is in the edge storage, copy the data to a new cloud-based SQL server instance using AWS DataSync

https://aws.amazon.com/es/blogs/storage/seamlessly-migrate-large-sql-databases-using-aws-snowball-and-aws-datasync/

Comment 1226986 by pypelyncar

Upvotes: 2

Selected Answer: B AWS DMS offers a cost-effective solution for database migrations compared to replicating data to a fully managed RDS instance. You only pay for the resources used during the migration, making it ideal for infrequent, monthly transfers

Comment 1204495 by xaocho

Upvotes: 1

AWS Database Migration Service (DMS)

Comment 1186460 by lucas_rfsb

Upvotes: 2

Selected Answer: B B Since it’s for Migration porpouse, typical for DMS

Question XQSR4LhH1jh69gNTTyeK

Question

A company has a gaming application that stores data in Amazon DynamoDB tables. A data engineer needs to ingest the game data into an Amazon OpenSearch Service cluster. Data updates must occur in near real time.

Which solution will meet these requirements?

Choices

A: Use AWS Step Functions to periodically export data from the Amazon DynamoDB tables to an Amazon S3 bucket. Use an AWS Lambda function to load the data into Amazon OpenSearch Service.
B: Configure an AWS Glue job to have a source of Amazon DynamoDB and a destination of Amazon OpenSearch Service to transfer data in near real time.
C: Use Amazon DynamoDB Streams to capture table changes. Use an AWS Lambda function to process and update the data in Amazon OpenSearch Service.
D: Use a custom OpenSearch plugin to sync data from the Amazon DynamoDB tables.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1341169 by MerryLew

Upvotes: 1

Selected Answer: C DynamoDB supports streaming of item-level change data capture records in near-real time

Comment 1317265 by emupsx1

Upvotes: 1

Selected Answer: C https://docs.aws.amazon.com/opensearch-service/latest/developerguide/configure-client-ddb.html

Question rvmCQhbTEEENGDHeIXeR

Question

A company uses Amazon Redshift as its data warehouse service. A data engineer needs to design a physical data model.

The data engineer encounters a de-normalized table that is growing in size. The table does not have a suitable column to use as the distribution key.

Which distribution style should the data engineer use to meet these requirements with the LEAST maintenance overhead?

Choices

A: ALL distribution
B: EVEN distribution
C: AUTO distribution
D: KEY distribution

answer?

Answer: C Answer_ET: C Community answer C (67%) B (33%) Discussion

Comment 1324173 by idenrai

Upvotes: 2

Selected Answer: C the LEAST maintenance overhead = C

With AUTO distribution, Amazon Redshift assigns an optimal distribution style based on the size of the table data. For example, if AUTO distribution style is specified, Amazon Redshift initially assigns the ALL distribution style to a small table. When the table grows larger, Amazon Redshift might change the distribution style to KEY, choosing the primary key (or a column of the composite primary key) as the distribution key. If the table grows larger and none of the columns are suitable to be the distribution key, Amazon Redshift changes the distribution style to EVEN. The change in distribution style occurs in the background with minimal impact to user queries.

Comment 1317289 by emupsx1

Upvotes: 2

Selected Answer: C https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html

Comment 1316509 by jacob_nz

Upvotes: 2

Selected Answer: B If the table grows larger and none of the columns are suitable to be the distribution key, Amazon Redshift changes the distribution style to EVEN.

vuthanhdatt's Second Brain

Explorer

Associate-DEA-C01_14

Questions and Answers

Question Z0A01jDf4r42IOn8UQMF

Question

Choices

Comment 1341164 by MerryLew

Comment 1317263 by emupsx1

Question 35clTfbd9Ssjg5a5kpxI

Question

Choices

Comment 1317264 by emupsx1

Question jdTXQhxwg7OaYT0XidaW

Question

Choices

Comment 1127214 by milofficial

Comment 1137929 by TonyStark0122

Comment 1279864 by shammous

Comment 1269298 by San_Juan

Comment 1226986 by pypelyncar

Comment 1204495 by xaocho

Comment 1186460 by lucas_rfsb

Question XQSR4LhH1jh69gNTTyeK

Question

Choices

Comment 1341169 by MerryLew

Comment 1317265 by emupsx1

Question rvmCQhbTEEENGDHeIXeR

Question

Choices

Comment 1324173 by idenrai

Comment 1317289 by emupsx1

Comment 1316509 by jacob_nz

Graph View

Table of Contents