Questions and Answers
Question FrYN3PqCHR4dXLX5LARA
Question
An online retailer uses multiple delivery partners to deliver products to customers. The delivery partners send order summaries to the retailer. The retailer stores the order summaries in Amazon S3.
Some of the order summaries contain personally identifiable information (PII) about customers. A data engineer needs to detect PII in the order summaries so the company can redact the PII.
Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Amazon Textract
- B: Amazon S3 Storage Lens
- C: Amazon Macie
- D: Amazon SageMaker Data Wrangler
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1341212 by MerryLew
- Upvotes: 1
Selected Answer: C Detection only (no redaction) = Macie
Comment 1330746 by HagarTheHorrible
- Upvotes: 1
Selected Answer: C PII in AWS ⇒ Macie
Question gGTCp95FPEoQvnluWJfq
Question
A retail company has a customer data hub in an Amazon S3 bucket. Employees from many countries use the data hub to support company-wide analytics. A governance team must ensure that the company’s data analysts can access data only for customers who are within the same country as the analysts. Which solution will meet these requirements with the LEAST operational effort?
Choices
- A: Create a separate table for each country’s customer data. Provide access to each analyst based on the country that the analyst serves.
- B: Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company’s access policies.
- C: Move the data to AWS Regions that are close to the countries where the customers are. Provide access to each analyst based on the country that the analyst serves.
- D: Load the data into Amazon Redshift. Create a view for each country. Create separate IAM roles for each country to provide access to data from each country. Assign the appropriate roles to the analysts.
answer?
Answer: B Answer_ET: B Community answer B (91%) 9% Discussion
Comment 1208677 by k350Secops
- Upvotes: 12
Selected Answer: B AWS Lake Formation: It’s specifically designed for managing data lakes on AWS, providing capabilities for securing and controlling access to data. Row-Level Security: With Lake Formation, you can define fine-grained access control policies, including row-level security. This means you can enforce policies to restrict access to data based on specific conditions, such as the country associated with each customer. Least Operational Effort: Once the policies are defined within Lake Formation, they can be centrally managed and applied to the data in the S3 bucket without the need for creating separate tables or views for each country, as in options A, C, and D. This reduces operational overhead and complexity.
Comment 1387212 by dried0extents
- Upvotes: 1
Selected Answer: A I agree that it is A
Comment 1271022 by gray2205
- Upvotes: 1
if the situation is not about least operational effort, D makes sense
Comment 1250061 by lunachi4
- Upvotes: 1
Selected Answer: B Select B. It means “with the LEAST operational effort”.
Comment 1223302 by nanaw770
- Upvotes: 2
Selected Answer: B B is correct answer.
Comment 1187919 by mattia_besharp
- Upvotes: 1
Selected Answer: B AWS really likes Lakeformation, plus creating separate tables might require some refactoring, and the requirements is about the LEAST operational effor
Comment 1184254 by rishadhb
- Upvotes: 1
Selected Answer: A Agreed with Bartosz. I think setup DataLake, then integrate it with LakeFormation take a lot of effort than just separate the table
Comment 1167768 by GiorgioGss
- Upvotes: 1
Selected Answer: B Keyword “LEAST operational effort” - I will go with B
Comment 1144280 by BartoszGolebiowski24
- Upvotes: 2
Creating DataLake takes at least few days to set up and the solution should be LEAST operational. I think B is not correct.
Comment 1127586 by [Removed]
- Upvotes: 3
Selected Answer: B https://docs.aws.amazon.com/lake-formation/latest/dg/register-data-lake.html https://docs.aws.amazon.com/lake-formation/latest/dg/registration-role.html
Question 2V920AXUufLcGYllQpMI
Question
A company is migrating on-premises workloads to AWS. The company wants to reduce overall operational overhead. The company also wants to explore serverless options. The company’s current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must maintain similar or better performance after the migration to AWS. Which extract, transform, and load (ETL) service will meet these requirements?
Choices
- A: AWS Glue
- B: Amazon EMR
- C: AWS Lambda
- D: Amazon Redshift
answer?
Answer: B Answer_ET: B Community answer B (82%) A (18%) Discussion
Comment 1127234 by milofficial
- Upvotes: 18
Selected Answer: B Glue is like the more good-looking one, but weaker brother of EMR. So when it’s about petabyte scales, let EMR do the work and have Glue stay away from the action.
Comment 1361188 by Ell89
- Upvotes: 1
Selected Answer: B Glue doesnt natively support Pig, HBase and Flink.
Comment 1339176 by Udyan
- Upvotes: 1
Selected Answer: B Apache = EMR
Comment 1307701 by heavenlypearl
- Upvotes: 2
Selected Answer: B Amazon EMR Serverless is a deployment option for Amazon EMR that provides a serverless runtime environment. This simplifies the operation of analytics applications that use the latest open-source frameworks, such as Apache Spark and Apache Hive. With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks.
https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html
Comment 1303439 by 87ebc7d
- Upvotes: 2
Discarded, not ‘discarted’. ‘Discarted’ isn’t a word.
Comment 1281169 by leotoras
- Upvotes: 1
B. Amazon EMR Serverless is a deployment option for Amazon EMR that provides a serverless runtime environment. This simplifies the operation of analytics applications that use the latest open-source frameworks, such as Apache Spark and Apache Hive. With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks.
Comment 1273884 by Eleftheriia
- Upvotes: 2
Selected Answer: A I think it is A, Glue • Amazon EMR is used for petabyte-scale data collection and data processing. • AWS Glue is used as a serverless and managed ETL service, and also used for managing data quality with AWS Glue Data Quality.
Comment 1272408 by San_Juan
- Upvotes: 1
Selected Answer: A Glue. It talks about “serverless” so EMR is discarted. The mention of Spark, Hbase, etc is for confusing you, because it doesn’t say that they wanted to keep using them. Glue can run Spark using “glueContext” (similar a SparkContext) for reading tables, files and create frames.
Comment 1264239 by sachin
- Upvotes: 1
The company also wants to explore serverless options. ? Glue (A). or EMR Serverless
Comment 1260935 by V0811
- Upvotes: 1
Selected Answer: A Serverless: AWS Glue is a fully managed, serverless ETL service that automates the process of data discovery, preparation, and transformation, helping minimize operational overhead.Integration with Big Data Tools: It integrates well with various AWS services and supports Spark jobs for ETL purposes, which aligns well with Apache Spark workloads.Performance: AWS Glue can handle large-scale ETL workloads, and it is designed to manage petabytes of data efficiently, comparable to the performance of on-premises solutions.While B. Amazon EMR could also be considered for its flexibility in handling big data workloads using tools like Apache Spark, it requires more management and doesn’t fit the serverless requirement as closely as AWS Glue. Therefore, AWS Glue is the most suitable choice given the constraints and requirements.
Comment 1227001 by pypelyncar
- Upvotes: 3
Selected Answer: B EMR provides a managed Hadoop framework that natively supports Apache Pig, Oozie, Spark, and Flink. This allows the company to migrate their existing workloads with minimal code changes, reducing development effort
Comment 1223026 by tgv
- Upvotes: 2
Selected Answer: B That’s exactly the purpose of EMR.
“Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.”
Comment 1207947 by Just_Ninja
- Upvotes: 3
Selected Answer: A Glue is Serverless :)
Comment 1191856 by wa212
- Upvotes: 2
Selected Answer: B https://docs.aws.amazon.com/ja_jp/emr/latest/ManagementGuide/emr-what-is-emr.html
Comment 1178547 by certplan
Upvotes: 2
While AWS Glue is a fully managed ETL service and offers serverless capabilities, it might not provide the same level of performance and flexibility as Amazon EMR for handling petabyte-scale workloads with complex processing requirements.
AWS Glue is optimized for data integration, cataloging, and ETL jobs but may not be as well-suited for heavy-duty processing tasks that require frameworks like Apache Spark, Apache Flink, etc., which are commonly used for large-scale data processing.
Documentation on AWS Glue can be found in the AWS Glue Developer Guide https://docs.aws.amazon.com/glue/index.html.
Comment 1178545 by certplan
- Upvotes: 2
A. AWS Glue: AWS Glue is a fully managed extract, transform, and load (ETL) service provided by Amazon Web Services (AWS). It allows users to prepare and load data for analytics purposes
B. Amazon EMR: Amazon Elastic MapReduce (EMR) is a cloud-based big data platform provided by AWS. It allows users to process and analyze large amounts of data using popular frameworks such as Apache Hadoop, Apache Spark, Apache Hive, Apache HBase, and more.
https://docs.aws.amazon.com/emr/index.html https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-best-practices.html https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage.html https://docs.aws.amazon.com/emr/latest/DeveloperGuide/emr-developer-guide.html
As per the AWS/Amazon docs, option B specifically calls out it out with the specific features/options that the question asked directly about.
Comment 1167991 by GiorgioGss
- Upvotes: 1
Selected Answer: B https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html
Comment 1137945 by TonyStark0122
- Upvotes: 1
A. AWS Glue
Comment 1127570 by [Removed]
- Upvotes: 1
Selected Answer: B https://aws.amazon.com/emr/features/
Question hCILu73hTymVEyFTxAnm
Question
A company has an Amazon Redshift data warehouse that users access by using a variety of IAM roles. More than 100 users access the data warehouse every day.
The company wants to control user access to the objects based on each user’s job role, permissions, and how sensitive the data is.
Which solution will meet these requirements?
Choices
- A: Use the role-based access control (RBAC) feature of Amazon Redshift.
- B: Use the row-level security (RLS) feature of Amazon Redshift.
- C: Use the column-level security (CLS) feature of Amazon Redshift.
- D: Use dynamic data masking policies in Amazon Redshift.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1330747 by HagarTheHorrible
- Upvotes: 1
Selected Answer: A the only possible answers are A and B but B wouldn’t be enough.
Comment 1328410 by 7a1d491
- Upvotes: 2
Selected Answer: A Row level or column level is not enough in this case
Question rrab1DCpzsdjD8jv4jv3
Question
A company uses Amazon DataZone as a data governance and business catalog solution. The company stores data in an Amazon S3 data lake. The company uses AWS Glue with an AWS Glue Data Catalog.
A data engineer needs to publish AWS Glue Data Quality scores to the Amazon DataZone portal.
Which solution will meet this requirement?
Choices
- A: Create a data quality ruleset with Data Quality Definition language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. Configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.
- B: Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. Define a data quality ruleset inside the jobs. Configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.
- C: Create a data quality ruleset with Data Quality Definition language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. Configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.
- D: Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. Define a data quality ruleset inside the jobs. Configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1330729 by HagarTheHorrible
- Upvotes: 1
Selected Answer: C data zone should be configured to work with glue as data source
Comment 1328396 by 7a1d491
- Upvotes: 1
Selected Answer: C Glue has to be the data source