Questions and Answers
Question LmO77wQjLuGBHtrjdhIX
Question
A data engineer needs to use an Amazon QuickSight dashboard that is based on Amazon Athena queries on data that is stored in an Amazon S3 bucket. When the data engineer connects to the QuickSight dashboard, the data engineer receives an error message that indicates insufficient permissions. Which factors could cause to the permissions-related errors? (Choose two.)
Choices
- A: There is no connection between QuickSight and Athena.
- B: The Athena tables are not cataloged.
- C: QuickSight does not have access to the S3 bucket.
- D: QuickSight does not have access to decrypt S3 data.
- E: There is no IAM role assigned to QuickSight.
answer?
Answer: CD Answer_ET: CD Community answer CD (83%) CE (17%) Discussion
Comment 1181484 by fceb2c1
- Upvotes: 8
Selected Answer: CD C and D https://docs.aws.amazon.com/quicksight/latest/user/troubleshoot-athena-insufficient-permissions.html
E is incorrect because it will result in authentication/authorization error, not insufficient permission error.
Comment 1138557 by rralucard_
- Upvotes: 6
Selected Answer: CD C. QuickSight does not have access to the S3 bucket: Amazon QuickSight needs to have the necessary permissions to access the S3 bucket where the data resides. If QuickSight lacks the permissions to read the data from the S3 bucket, it would result in an error indicating insufficient permissions.
D. QuickSight does not have access to decrypt S3 data: If the data in S3 is encrypted, QuickSight needs permissions to use the necessary keys to decrypt the data. Without access to the decryption keys, typically managed by AWS Key Management Service (KMS), QuickSight cannot read the encrypted data and would give an error.
Comment 1240628 by bakarys
- Upvotes: 2
Selected Answer: CE C. QuickSight does not have access to the S3 bucket. Amazon QuickSight needs to have the necessary permissions to access the Amazon S3 bucket where the data is stored. If these permissions are not correctly configured, QuickSight will not be able to access the data, resulting in an error.
E. There is no IAM role assigned to QuickSight. Amazon QuickSight uses AWS Identity and Access Management (IAM) roles to access AWS resources. If QuickSight is not assigned an IAM role, or if the assigned role does not have the necessary permissions, QuickSight will not be able to access the resources it needs, leading to an error.
Comment 1197194 by Ousseyni
- Upvotes: 1
Selected Answer: CD C and D
Comment 1194786 by Christina666
- Upvotes: 4
Selected Answer: CD The two most likely factors causing the permissions-related errors are:
C. QuickSight does not have access to the S3 bucket. To access data from an S3 bucket, QuickSight needs explicit S3 permissions. This is typically handled through an IAM role associated with the QuickSight service. D. QuickSight does not have access to decrypt S3 data. If the data in S3 is encrypted (e.g., using KMS), QuickSight must have the necessary permissions to decrypt the data using the relevant KMS key. Let’s analyze why the other options are less likely the primary culprits:
E. There is no IAM role assigned to QuickSight. QuickSight needs an IAM role for overall functionality. A missing role would likely cause broader service failures, not specific data access errors.
Comment 1176085 by taka5094
- Upvotes: 2
Selected Answer: CE I think the assumptions in the problem are insufficient. If the data is encrypted, then D can be the correct answer, but if not, then E is the correct answer.
Comment 1168329 by damaldon
- Upvotes: 2
Ans. CD https://docs.aws.amazon.com/quicksight/latest/user/troubleshoot-athena-insufficient-permissions.html
Question rK4gny3N266EtRyByDWy
Question
A company stores datasets in JSON format and .csv format in an Amazon S3 bucket. The company has Amazon RDS for Microsoft SQL Server databases, Amazon DynamoDB tables that are in provisioned capacity mode, and an Amazon Redshift cluster. A data engineering team must develop a solution that will give data scientists the ability to query all data sources by using syntax similar to SQL. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Amazon Athena to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
- B: Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Redshift Spectrum to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
- C: Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use AWS Glue jobs to transform data that is in JSON format to Apache Parquet or .csv format. Store the transformed data in an S3 bucket. Use Amazon Athena to query the original and transformed data from the S3 bucket.
- D: Use AWS Lake Formation to create a data lake. Use Lake Formation jobs to transform the data from all data sources to Apache Parquet format. Store the transformed data in an S3 bucket. Use Amazon Athena or Redshift Spectrum to query the data.
answer?
Answer: A Answer_ET: A Community answer A (67%) C (17%) B (17%) Discussion
Comment 1177346 by GiorgioGss
- Upvotes: 7
Selected Answer: A LEAST operational overhead? query straight with Athena without any intermediate actions or services
Comment 1228717 by pypelyncar
- Upvotes: 1
Selected Answer: A thena natively supports querying JSON data stored in S3 using standard SQL functions. This eliminates the need for additional data transformation steps using Glue jobs (as required in Option C or D).
Comment 1220133 by tgv
- Upvotes: 1
As chris_spencer mentioned below, now Athena supports querying with PartiQL which technically makes the answer A correct.
Comment 1215791 by VerRi
- Upvotes: 1
Selected Answer: A B requires Redshift Spectrum, so A
Comment 1197931 by chris_spencer
- Upvotes: 2
Selected Answer: C Answer should be C.
Amazon Athena does not support querying with PartiQL until 16.04.2024, https://aws.amazon.com/about-aws/whats-new/2024/04/amazon-athena-federated-query-pass-through/
The DEA01 exam should not have include the latest feature
Comment 1194793 by Christina666
- Upvotes: 4
Selected Answer: A A. Unified Querying with Athena: Athena provides a SQL-like interface for querying various data sources, including JSON and CSV in S3, as well as traditional databases. PartiQL Support: Athena’s PartiQL extension allows querying semi-structured JSON data directly, eliminating the need for a separate query engine. Serverless and Managed: Both AWS Glue and Athena are serverless, minimizing infrastructure management for the data engineers. No Unnecessary Transformations: Avoiding transformations for JSON data simplifies the pipeline and reduces operational overhead. B. Redshift Spectrum: While Spectrum can query external data, it’s primarily intended for Redshift data warehouse extensions. It adds complexity for the RDS and DynamoDB data sources.
Comment 1188875 by lucas_rfsb
- Upvotes: 4
Selected Answer: B I will go with B
Comment 1186419 by Luke97
- Upvotes: 4
The answer should be B. A is incorrect because Athena does NOT support PartiQL. C is NOT the least operational (has the additional step to convert JSON to Parquet or csv) D is incorrect because DynamoDB export data to S3 in DynamoDB JSON or Amzone Ion format only (https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/).
Comment 1184525 by halogi
- Upvotes: 2
Selected Answer: C AWS Athena can only query in SQL, not PartiQL, so both A and B are incorrect. LakeFormation can not work directly with DynamoDB, so D is incorrect. The only acceptable answer is C
Comment 1138558 by rralucard_
- Upvotes: 3
Selected Answer: A Option A, using AWS Glue and Amazon Athena, would meet the requirements with the least operational overhead. This solution allows data scientists to directly query data in its original format without the need for additional data transformation steps, making it easier to implement and manage.
Question gIzFbiSZbzKrsajvabGn
Question
A data engineer is configuring Amazon SageMaker Studio to use AWS Glue interactive sessions to prepare data for machine learning (ML) models. The data engineer receives an access denied error when the data engineer tries to prepare the data by using SageMaker Studio. Which change should the engineer make to gain access to SageMaker Studio?
Choices
- A: Add the AWSGlueServiceRole managed policy to the data engineer’s IAM user.
- B: Add a policy to the data engineer’s IAM user that includes the sts:AssumeRole action for the AWS Glue and SageMaker service principals in the trust policy.
- C: Add the AmazonSageMakerFullAccess managed policy to the data engineer’s IAM user.
- D: Add a policy to the data engineer’s IAM user that allows the sts:AddAssociation action for the AWS Glue and SageMaker service principals in the trust policy.
answer?
Answer: B Answer_ET: B Community answer B (61%) C (39%) Discussion
Comment 1220137 by tgv
- Upvotes: 6
Selected Answer: B I don’t believe you’re supposed to assign a FullAccess policy, so I will go with B.
Comment 1177365 by GiorgioGss
- Upvotes: 5
Selected Answer: B I will go with B since you can get access denied even with the AmazonSageMakerFullAccess. See here: https://stackoverflow.com/questions/64709871/aws-sagemaker-studio-createdomain-access-error
Comment 1294134 by mohamedTR
- Upvotes: 1
Selected Answer: B B. the engineer needs to assume specific roles to allow interaction between these services. The sts:AssumeRole action is necessary for this purpose
Comment 1271855 by junrun3
- Upvotes: 2
Selected Answer: C B, this approach involves setting up the trust relationship for roles. It is not a typical requirement for resolving access issues with SageMaker Studio directly.
Comment 1248616 by LR2023
- Upvotes: 1
OPtion A https://docs.aws.amazon.com/glue/latest/dg/glue-is-security.html
Comment 1194798 by Christina666
- Upvotes: 1
Selected Answer: C SageMaker Permissions: The AmazonSageMakerFullAccess managed policy provides broad permissions for using Amazon SageMaker features, including SageMaker Studio and the ability to interact with other AWS services like AWS Glue. Least Privilege: While this policy is quite permissive, it’s the most direct solution to the immediate access issue. After resolving the error, you can refine permissions for a more granular approach.
Comment 1188878 by lucas_rfsb
- Upvotes: 3
Selected Answer: C I will go with C
Comment 1181492 by fceb2c1
- Upvotes: 1
https://repost.aws/knowledge-center/sagemaker-featuregroup-troubleshooting
Comment 1168338 by damaldon
- Upvotes: 2
Ans. C https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html
Comment 1152387 by atu1789
- Upvotes: 2
Selected Answer: B B. Add a policy to the data engineer’s IAM user that includes the sts:AssumeRole action for the AWS Glue and SageMaker service principals in the trust policy.
• This is the most appropriate solution. The sts:AssumeRole action allows the data engineer’s IAM user to assume a role that has the necessary permissions for both AWS Glue and SageMaker. This is a common approach for granting cross-service access in AWS.
Comment 1138563 by rralucard_
- Upvotes: 3
Selected Answer: C Amazon SageMaker requires permissions to perform actions on your behalf. By attaching the AmazonSageMakerFullAccess managed policy to the data engineer’s IAM user, you grant the necessary permissions for SageMaker Studio to access AWS Glue and other related services.
Question 8np86vXgZkg0o7yhevKn
Question
A company extracts approximately 1 TB of data every day from data sources such as SAP HANA, Microsoft SQL Server, MongoDB, Apache Kafka, and Amazon DynamoDB. Some of the data sources have undefined data schemas or data schemas that change. A data engineer must implement a solution that can detect the schema for these data sources. The solution must extract, transform, and load the data to an Amazon S3 bucket. The company has a service level agreement (SLA) to load the data into the S3 bucket within 15 minutes of data creation. Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use Amazon EMR to detect the schema and to extract, transform, and load the data into the S3 bucket. Create a pipeline in Apache Spark.
- B: Use AWS Glue to detect the schema and to extract, transform, and load the data into the S3 bucket. Create a pipeline in Apache Spark.
- C: Create a PySpark program in AWS Lambda to extract, transform, and load the data into the S3 bucket.
- D: Create a stored procedure in Amazon Redshift to detect the schema and to extract, transform, and load the data into a Redshift Spectrum table. Access the table from Amazon S3.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1177366 by GiorgioGss
- Upvotes: 6
Selected Answer: B Least effort = B
Comment 1138567 by rralucard_
- Upvotes: 5
Selected Answer: B B. Use AWS Glue to detect the schema and to extract, transform, and load the data into the S3 bucket. Create a pipeline in Apache Spark.
Comment 1194800 by Christina666
- Upvotes: 1
Selected Answer: B Glue ETL
Comment 1173328 by kj07
- Upvotes: 3
The option with the least operational overhead is B.
Question aEXuTStHXOfOPr6jLSfB
Question
A company has multiple applications that use datasets that are stored in an Amazon S3 bucket. The company has an ecommerce application that generates a dataset that contains personally identifiable information (PII). The company has an internal analytics application that does not require access to the PII. To comply with regulations, the company must not share PII unnecessarily. A data engineer needs to implement a solution that with redact PII dynamically, based on the needs of each application that accesses the dataset. Which solution will meet the requirements with the LEAST operational overhead?
Choices
- A: Create an S3 bucket policy to limit the access each application has. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.
- B: Create an S3 Object Lambda endpoint. Use the S3 Object Lambda endpoint to read data from the S3 bucket. Implement redaction logic within an S3 Object Lambda function to dynamically redact PII based on the needs of each application that accesses the data.
- C: Use AWS Glue to transform the data for each application. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.
- D: Create an API Gateway endpoint that has custom authorizers. Use the API Gateway endpoint to read data from the S3 bucket. Initiate a REST API call to dynamically redact PII based on the needs of each application that accesses the data.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1268342 by teo2157
- Upvotes: 1
Selected Answer: B It’s B based on AWS documentation https://docs.aws.amazon.com/AmazonS3/latest/userguide/transforming-objects.html
Comment 1228730 by pypelyncar
- Upvotes: 3
Selected Answer: B S3 Object Lambda automatically triggers the Lambda function only when there’s a request to access data in the S3 bucket. This eliminates the need for pre-processing or creating multiple data copies with varying levels of redaction (Options A and C).
Comment 1219529 by 4c78df0
- Upvotes: 2
Selected Answer: B B is correct
Comment 1168344 by damaldon
- Upvotes: 4
Ans. B You can use an Amazon S3 Object Lambda Access Point to control access to documents with personally identifiable information (PII). https://docs.aws.amazon.com/comprehend/latest/dg/using-access-points.html
Comment 1152391 by atu1789
- Upvotes: 1
Selected Answer: B S3 Object Lambda allows you to add custom processing, such as redaction of PII, to data retrieved from S3. This is done dynamically, meaning you don’t need to store multiple copies of the data. It’s a more efficient and operationally simpler approach compared to managing multiple dataset versions.
Comment 1138568 by rralucard_
- Upvotes: 3
Selected Answer: B Amazon S3 Object Lambda allows you to add your own code to S3 GET requests to modify and process data as it is returned to an application. For example, you could use an S3 Object Lambda to dynamically redact personally identifiable information (PII) from data retrieved from S3. This would allow you to control access to sensitive information based on the needs of different applications, without having to create and manage multiple copies of your data.