Questions and Answers

Question jw5igyacjyPIFGDEjzXe

Question

A company wants to use machine learning (ML) to perform analytics on data that is in an Amazon S3 data lake. The company has two data transformation requirements that will give consumers within the company the ability to create reports.

The company must perform daily transformations on 300 GB of data that is in a variety format that must arrive in Amazon S3 at a scheduled time. The company must perform one-time transformations of terabytes of archived data that is in the S3 data lake. The company uses Amazon Managed Workflows for Apache Airflow (Amazon MWAA) Directed Acyclic Graphs (DAGs) to orchestrate processing.

Which combination of tasks should the company schedule in the Amazon MWAA DAGs to meet these requirements MOST cost-effectively? (Choose two.)

Choices

  • A: For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
  • B: For daily incoming data, use Amazon Athena to scan and identify the schema.
  • C: For daily incoming data, use Amazon Redshift to perform transformations.
  • D: For daily and archived data, use Amazon EMR to perform data transformations.
  • E: For archived data, use Amazon SageMaker to perform data transformations.

Question POeLfaoMm1IkmxZNH7ll

Question

A retail company uses AWS Glue for extract, transform, and load (ETL) operations on a dataset that contains information about customer orders. The company wants to implement specific validation rules to ensure data accuracy and consistency.

Which solution will meet these requirements?

Choices

  • A: Use AWS Glue job bookmarks to track the data for accuracy and consistency.
  • B: Create custom AWS Glue Data Quality rulesets to define specific data quality checks.
  • C: Use the built-in AWS Glue Data Quality transforms for standard data quality validations.
  • D: Use AWS Glue Data Catalog to maintain a centralized data schema and metadata repository.