Questions and Answers

Question FrYN3PqCHR4dXLX5LARA

Question

An online retailer uses multiple delivery partners to deliver products to customers. The delivery partners send order summaries to the retailer. The retailer stores the order summaries in Amazon S3.

Some of the order summaries contain personally identifiable information (PII) about customers. A data engineer needs to detect PII in the order summaries so the company can redact the PII.

Which solution will meet these requirements with the LEAST operational overhead?

Choices

A: Amazon Textract
B: Amazon S3 Storage Lens
C: Amazon Macie
D: Amazon SageMaker Data Wrangler

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1341212 by MerryLew

Upvotes: 1

Selected Answer: C Detection only (no redaction) = Macie

Comment 1330746 by HagarTheHorrible

Upvotes: 1

Selected Answer: C PII in AWS ⇒ Macie

Question gGTCp95FPEoQvnluWJfq

Question

A retail company has a customer data hub in an Amazon S3 bucket. Employees from many countries use the data hub to support company-wide analytics. A governance team must ensure that the company’s data analysts can access data only for customers who are within the same country as the analysts. Which solution will meet these requirements with the LEAST operational effort?

Choices

A: Create a separate table for each country’s customer data. Provide access to each analyst based on the country that the analyst serves.
B: Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company’s access policies.
C: Move the data to AWS Regions that are close to the countries where the customers are. Provide access to each analyst based on the country that the analyst serves.
D: Load the data into Amazon Redshift. Create a view for each country. Create separate IAM roles for each country to provide access to data from each country. Assign the appropriate roles to the analysts.

answer?

Answer: B Answer_ET: B Community answer B (91%) 9% Discussion

Comment 1208677 by k350Secops

Upvotes: 12

Selected Answer: B AWS Lake Formation: It’s specifically designed for managing data lakes on AWS, providing capabilities for securing and controlling access to data. Row-Level Security: With Lake Formation, you can define fine-grained access control policies, including row-level security. This means you can enforce policies to restrict access to data based on specific conditions, such as the country associated with each customer. Least Operational Effort: Once the policies are defined within Lake Formation, they can be centrally managed and applied to the data in the S3 bucket without the need for creating separate tables or views for each country, as in options A, C, and D. This reduces operational overhead and complexity.

Comment 1387212 by dried0extents

Upvotes: 1

Selected Answer: A I agree that it is A

Comment 1271022 by gray2205

Upvotes: 1

if the situation is not about least operational effort, D makes sense

Comment 1250061 by lunachi4

Upvotes: 1

Selected Answer: B Select B. It means “with the LEAST operational effort”.

Comment 1223302 by nanaw770

Upvotes: 2

Selected Answer: B B is correct answer.

Comment 1187919 by mattia_besharp

Upvotes: 1

Selected Answer: B AWS really likes Lakeformation, plus creating separate tables might require some refactoring, and the requirements is about the LEAST operational effor

Comment 1184254 by rishadhb

Upvotes: 1

Selected Answer: A Agreed with Bartosz. I think setup DataLake, then integrate it with LakeFormation take a lot of effort than just separate the table

Comment 1167768 by GiorgioGss

Upvotes: 1

Selected Answer: B Keyword “LEAST operational effort” - I will go with B

Comment 1144280 by BartoszGolebiowski24

Upvotes: 2

Creating DataLake takes at least few days to set up and the solution should be LEAST operational. I think B is not correct.

Comment 1127586 by [Removed]

Upvotes: 3

Selected Answer: B https://docs.aws.amazon.com/lake-formation/latest/dg/register-data-lake.html https://docs.aws.amazon.com/lake-formation/latest/dg/registration-role.html

Question 2V920AXUufLcGYllQpMI

Question

A company is migrating on-premises workloads to AWS. The company wants to reduce overall operational overhead. The company also wants to explore serverless options. The company’s current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must maintain similar or better performance after the migration to AWS. Which extract, transform, and load (ETL) service will meet these requirements?

Choices

A: AWS Glue
B: Amazon EMR
C: AWS Lambda
D: Amazon Redshift

answer?

Answer: B Answer_ET: B Community answer B (82%) A (18%) Discussion

Comment 1127234 by milofficial

Upvotes: 18

Selected Answer: B Glue is like the more good-looking one, but weaker brother of EMR. So when it’s about petabyte scales, let EMR do the work and have Glue stay away from the action.

Comment 1361188 by Ell89

Upvotes: 1

Selected Answer: B Glue doesnt natively support Pig, HBase and Flink.

Comment 1339176 by Udyan

Upvotes: 1

Selected Answer: B Apache = EMR

Comment 1307701 by heavenlypearl

Upvotes: 2

Selected Answer: B Amazon EMR Serverless is a deployment option for Amazon EMR that provides a serverless runtime environment. This simplifies the operation of analytics applications that use the latest open-source frameworks, such as Apache Spark and Apache Hive. With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks.

https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html

Comment 1303439 by 87ebc7d

Upvotes: 2

Discarded, not ‘discarted’. ‘Discarted’ isn’t a word.

Comment 1281169 by leotoras

Upvotes: 1

B. Amazon EMR Serverless is a deployment option for Amazon EMR that provides a serverless runtime environment. This simplifies the operation of analytics applications that use the latest open-source frameworks, such as Apache Spark and Apache Hive. With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks.

Comment 1273884 by Eleftheriia

Upvotes: 2

Selected Answer: A I think it is A, Glue • Amazon EMR is used for petabyte-scale data collection and data processing. • AWS Glue is used as a serverless and managed ETL service, and also used for managing data quality with AWS Glue Data Quality.

Comment 1272408 by San_Juan

Upvotes: 1

Selected Answer: A Glue. It talks about “serverless” so EMR is discarted. The mention of Spark, Hbase, etc is for confusing you, because it doesn’t say that they wanted to keep using them. Glue can run Spark using “glueContext” (similar a SparkContext) for reading tables, files and create frames.

Comment 1264239 by sachin

Upvotes: 1

The company also wants to explore serverless options. ? Glue (A). or EMR Serverless

Comment 1260935 by V0811

Upvotes: 1

Selected Answer: A Serverless: AWS Glue is a fully managed, serverless ETL service that automates the process of data discovery, preparation, and transformation, helping minimize operational overhead.Integration with Big Data Tools: It integrates well with various AWS services and supports Spark jobs for ETL purposes, which aligns well with Apache Spark workloads.Performance: AWS Glue can handle large-scale ETL workloads, and it is designed to manage petabytes of data efficiently, comparable to the performance of on-premises solutions.While B. Amazon EMR could also be considered for its flexibility in handling big data workloads using tools like Apache Spark, it requires more management and doesn’t fit the serverless requirement as closely as AWS Glue. Therefore, AWS Glue is the most suitable choice given the constraints and requirements.

Comment 1227001 by pypelyncar

Upvotes: 3

Selected Answer: B EMR provides a managed Hadoop framework that natively supports Apache Pig, Oozie, Spark, and Flink. This allows the company to migrate their existing workloads with minimal code changes, reducing development effort

Comment 1223026 by tgv

Upvotes: 2

Selected Answer: B That’s exactly the purpose of EMR.

“Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.”

https://aws.amazon.com/emr/

Comment 1207947 by Just_Ninja

Upvotes: 3

Selected Answer: A Glue is Serverless :)

Comment 1191856 by wa212

Upvotes: 2

Selected Answer: B https://docs.aws.amazon.com/ja_jp/emr/latest/ManagementGuide/emr-what-is-emr.html

Comment 1178547 by certplan

Upvotes: 2

While AWS Glue is a fully managed ETL service and offers serverless capabilities, it might not provide the same level of performance and flexibility as Amazon EMR for handling petabyte-scale workloads with complex processing requirements.

AWS Glue is optimized for data integration, cataloging, and ETL jobs but may not be as well-suited for heavy-duty processing tasks that require frameworks like Apache Spark, Apache Flink, etc., which are commonly used for large-scale data processing.

Documentation on AWS Glue can be found in the AWS Glue Developer Guide https://docs.aws.amazon.com/glue/index.html.

Comment 1178545 by certplan

Upvotes: 2

A. AWS Glue: AWS Glue is a fully managed extract, transform, and load (ETL) service provided by Amazon Web Services (AWS). It allows users to prepare and load data for analytics purposes

B. Amazon EMR: Amazon Elastic MapReduce (EMR) is a cloud-based big data platform provided by AWS. It allows users to process and analyze large amounts of data using popular frameworks such as Apache Hadoop, Apache Spark, Apache Hive, Apache HBase, and more.

https://docs.aws.amazon.com/emr/index.html https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-best-practices.html https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage.html https://docs.aws.amazon.com/emr/latest/DeveloperGuide/emr-developer-guide.html

As per the AWS/Amazon docs, option B specifically calls out it out with the specific features/options that the question asked directly about.

Comment 1167991 by GiorgioGss

Upvotes: 1

Selected Answer: B https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html

Comment 1137945 by TonyStark0122

Upvotes: 1

A. AWS Glue

Comment 1127570 by [Removed]

Upvotes: 1

Selected Answer: B https://aws.amazon.com/emr/features/

Question hCILu73hTymVEyFTxAnm

Question

A company has an Amazon Redshift data warehouse that users access by using a variety of IAM roles. More than 100 users access the data warehouse every day.

The company wants to control user access to the objects based on each user’s job role, permissions, and how sensitive the data is.

Which solution will meet these requirements?

Choices

A: Use the role-based access control (RBAC) feature of Amazon Redshift.
B: Use the row-level security (RLS) feature of Amazon Redshift.
C: Use the column-level security (CLS) feature of Amazon Redshift.
D: Use dynamic data masking policies in Amazon Redshift.

answer?

Answer: A Answer_ET: A Community answer A (100%) Discussion

Comment 1330747 by HagarTheHorrible

Upvotes: 1

Selected Answer: A the only possible answers are A and B but B wouldn’t be enough.

Comment 1328410 by 7a1d491

Upvotes: 2

Selected Answer: A Row level or column level is not enough in this case

Question rrab1DCpzsdjD8jv4jv3

Question

A company uses Amazon DataZone as a data governance and business catalog solution. The company stores data in an Amazon S3 data lake. The company uses AWS Glue with an AWS Glue Data Catalog.

A data engineer needs to publish AWS Glue Data Quality scores to the Amazon DataZone portal.

Which solution will meet this requirement?

Choices

A: Create a data quality ruleset with Data Quality Definition language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. Configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.
B: Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. Define a data quality ruleset inside the jobs. Configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.
C: Create a data quality ruleset with Data Quality Definition language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. Configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.
D: Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. Define a data quality ruleset inside the jobs. Configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1330729 by HagarTheHorrible

Upvotes: 1

Selected Answer: C data zone should be configured to work with glue as data source

Comment 1328396 by 7a1d491

Upvotes: 1

Selected Answer: C Glue has to be the data source

vuthanhdatt's Second Brain

Explorer

Associate-DEA-C01_23

Questions and Answers

Question FrYN3PqCHR4dXLX5LARA

Question

Choices

Comment 1341212 by MerryLew

Comment 1330746 by HagarTheHorrible

Question gGTCp95FPEoQvnluWJfq

Question

Choices

Comment 1208677 by k350Secops

Comment 1387212 by dried0extents

Comment 1271022 by gray2205

Comment 1250061 by lunachi4

Comment 1223302 by nanaw770

Comment 1187919 by mattia_besharp

Comment 1184254 by rishadhb

Comment 1167768 by GiorgioGss

Comment 1144280 by BartoszGolebiowski24

Comment 1127586 by [Removed]

Question 2V920AXUufLcGYllQpMI

Question

Choices

Comment 1127234 by milofficial

Comment 1361188 by Ell89

Comment 1339176 by Udyan

Comment 1307701 by heavenlypearl

Comment 1303439 by 87ebc7d

Comment 1281169 by leotoras

Comment 1273884 by Eleftheriia

Comment 1272408 by San_Juan

Comment 1264239 by sachin

Comment 1260935 by V0811

Comment 1227001 by pypelyncar

Comment 1223026 by tgv

Comment 1207947 by Just_Ninja

Comment 1191856 by wa212

Comment 1178547 by certplan

Comment 1178545 by certplan

Comment 1167991 by GiorgioGss

Comment 1137945 by TonyStark0122

Comment 1127570 by [Removed]

Question hCILu73hTymVEyFTxAnm

Question

Choices

Comment 1330747 by HagarTheHorrible

Comment 1328410 by 7a1d491

Question rrab1DCpzsdjD8jv4jv3

Question

Choices

Comment 1330729 by HagarTheHorrible

Comment 1328396 by 7a1d491

Graph View

Table of Contents