Questions and Answers
Question qBXUvi4uADyEChXMpeSJ
Question
A data engineer is using an AWS Glue crawler to catalog data that is in an Amazon S3 bucket. The S3 bucket contains both .csv and json files. The data engineer configured the crawler to exclude the .json files from the catalog.
When the data engineer runs queries in Amazon Athena, the queries also process the excluded .json files. The data engineer wants to resolve this issue. The data engineer needs a solution that will not affect access requirements for the .csv files in the source S3 bucket.
Which solution will meet this requirement with the SHORTEST query times?
Choices
- A: Adjust the AWS Glue crawler settings to ensure that the AWS Glue crawler also excludes .json files.
- B: Use the Athena console to ensure the Athena queries also exclude the .json files.
- C: Relocate the .json files to a different path within the S3 bucket.
- D: Use S3 bucket policies to block access to the .json files.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1264605 by teo2157
- Upvotes: 8
Selected Answer: C Athena does not recognize exclude patterns that you specify an AWS Glue crawler. For example, if you have an Amazon S3 bucket that contains both .csv and .json files and you exclude the .json files from the crawler, Athena queries both groups of files. To avoid this, place the files that you want to exclude in a different location. https://docs.aws.amazon.com/athena/latest/ug/troubleshooting-athena.html
Comment 1297793 by AdityaB
- Upvotes: 1
If the AWS Glue crawler is configured to exclude .json files, then the AWS Glue Data Catalog will not have any metadata related to those .json files. In this case, the Athena table that uses the Glue Data Catalog would not be aware of the .json files at all, and Athena queries would only process the files that are included in the Glue catalog (e.g., .csv files).
Comment 1282687 by BenLearningDE
- Upvotes: 1
Athena will scan both types of files.
Although it may be feasible to adjust Athena query to exclude .json, the SHORTEST query times would be via relocating .json files to different path.
Question CEdPbL5xORbecPYTGQVc
Question
A data engineer set up an AWS Lambda function to read an object that is stored in an Amazon S3 bucket. The object is encrypted by an AWS KMS key.
The data engineer configured the Lambda function’s execution role to access the S3 bucket. However, the Lambda function encountered an error and failed to retrieve the content of the object.
What is the likely cause of the error?
Choices
- A: The data engineer misconfigured the permissions of the S3 bucket. The Lambda function could not access the object.
- B: The Lambda function is using an outdated SDK version, which caused the read failure.
- C: The S3 bucket is located in a different AWS Region than the Region where the data engineer works. Latency issues caused the Lambda function to encounter an error.
- D: The Lambda function’s execution role does not have the necessary permissions to access the KMS key that can decrypt the S3 object.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1308877 by AgboolaKun
- Upvotes: 2
Selected Answer: D The correct answer is D.
Here is why:
The Lambda function is configured to access the S3 bucket: The data engineer has already set up the Lambda function’s execution role to access the S3 bucket. This means that basic S3 access permissions are likely in place.
The object is encrypted with a KMS key: This is a crucial detail. When an object in S3 is encrypted with a KMS key, any entity trying to read that object needs two sets of permissions: a. Permission to access the S3 bucket and object b. Permission to use the specific KMS key for decryption
The error occurs when trying to retrieve the content: This suggests that the Lambda function can likely see the object (as it has S3 access) but fails when trying to read its contents.
To resolve this issue, the data engineer should grant the Lambda function’s execution role the required KMS permissions. Specifically, add the ‘kms:Decrypt’ permission for the KMS key used to encrypt the S3 object.
Comment 1265759 by aragon_saa
- Upvotes: 1
Selected Answer: D Answer is D
Comment 1265694 by matt200
- Upvotes: 1
Selected Answer: D Option D: The Lambda function’s execution role does not have the necessary permissions to access the KMS key that can decrypt the S3 object.
Question WqruLrifFwoPia9DAT15
Question
A data engineer has implemented data quality rules in 1,000 AWS Glue Data Catalog tables. Because of a recent change in business requirements, the data engineer must edit the data quality rules.
How should the data engineer meet this requirement with the LEAST operational overhead?
Choices
- A: Create a pipeline in AWS Glue ETL to edit the rules for each of the 1,000 Data Catalog tables. Use an AWS Lambda function to call the corresponding AWS Glue job for each Data Catalog table.
- B: Create an AWS Lambda function that makes an API call to AWS Glue Data Quality to make the edits.
- C: Create an Amazon EMR cluster. Run a pipeline on Amazon EMR that edits the rules for each Data Catalog table. Use an AWS Lambda function to run the EMR pipeline.
- D: Use the AWS Management Console to edit the rules within the Data Catalog.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1265760 by aragon_saa
- Upvotes: 1
Selected Answer: B Answer is B
Comment 1265697 by matt200
- Upvotes: 1
Selected Answer: B Option B: Create an AWS Lambda function that makes an API call to AWS Glue Data Quality to make the edits.
Question 6aLpnSlk1kMn3iIUNBsP
Question
Two developers are working on separate application releases. The developers have created feature branches named Branch A and Branch B by using a GitHub repository’s master branch as the source.
The developer for Branch A deployed code to the production system. The code for Branch B will merge into a master branch in the following week’s scheduled application release.
Which command should the developer for Branch B run before the developer raises a pull request to the master branch?
Choices
- A: git diff branchB master git commit -m
- B: git pull master
- C: git rebase master
- D: git fetch -b master
answer?
Answer: C Answer_ET: C Community answer C (89%) 11% Discussion
Comment 1363449 by simon2133
- Upvotes: 1
Selected Answer: B B It’s considered the general default option for a few reasons (1) it includes git fetch and (2) you won’t get a merge conflict if there happens to be a branch C spawned off after A was merged, you won’t need to use —force (3) related to (2) you’re less likely to be clobbering commit history
Comment 1308876 by AgboolaKun
- Upvotes: 4
Selected Answer: C The correct answer is C.
Here is why:
Rebasing Branch B onto the updated master branch ensures that Branch B incorporates all the recent changes from the master branch (including the changes from Branch A that were deployed to production).
It helps maintain a linear, clean history by placing Branch B’s commits on top of the latest master branch commits.
This approach reduces the likelihood of merge conflicts when the pull request is eventually merged into master.
It makes the code review process easier as all the changes in the pull request will be relevant and up-to-date.
By using git rebase master, the developer ensures that Branch B is up-to-date with all changes in the master branch, including those from Branch A, before creating the pull request. This approach helps maintain a clean, linear history and reduces the likelihood of conflicts during the merge process.
Comment 1268216 by mzansikiller
- Upvotes: 3
Rebasing In Git, there are two main ways to integrate changes from one branch into another: the merge and the rebase. In this section you’ll learn what rebasing is, how to do it, why it’s a pretty amazing tool, and in what cases you won’t want to use it.
The Basic Rebase If you go back to an earlier example from Basic Merging, you can see that you diverged your work and made commits on two different branches.
Answer C
Comment 1265761 by aragon_saa
- Upvotes: 2
Selected Answer: C Answer is C
Comment 1265699 by matt200
- Upvotes: 2
Selected Answer: C Option C: git rebase maste
Question JjReFmb3fkL3YgBB8fPk
Question
A company stores employee data in Amazon Resdshift. A table names Employee uses columns named Region ID, Department ID, and Role ID as a compound sort key.
Which queries will MOST increase the speed of query by using a compound sort key of the table? (Choose two.)
Choices
- A: Select *from Employee where Region ID=’North America’;
- B: Select *from Employee where Region ID=’North America’ and Department ID=20;
- C: Select *from Employee where Department ID=20 and Region ID=’North America’;
- D: Select *from Employee where Role ID=50;
- E: Select *from Employee where Region ID=’North America’ and Role ID=50;
answer?
Answer: BE Answer_ET: BE Community answer BE (35%) AB (32%) BC (32%) Discussion
Comment 1264994 by teo2157
- Upvotes: 7
Selected Answer: AB To maximize the speed of queries using a compound sort key in Amazon Redshift, you should structure your queries to take advantage of the order of the columns in the sort key. The most efficient queries will filter or join on the columns in the same order as the sort key. Saying that, the most efficient queries would be: SELECT * FROM Employee WHERE Region_ID = ‘region1’ AND Department_ID = ‘dept1’ AND Role_ID = ‘role1’; SELECT * FROM Employee WHERE Region_ID = ‘region1’ AND Department_ID = ‘dept1’; SELECT * FROM Employee WHERE Region_ID = ‘region1’;
Comment 1262608 by antun3ra
- Upvotes: 6
Selected Answer: BE To maximize the speed of queries by using the compound sort key (Region ID, Department ID, and Role ID) in the Employee table in Amazon Redshift, the queries should align with the order of the columns in the sort key.
Comment 1336645 by minhhnh
- Upvotes: 2
Selected Answer: BC The filter order in the query is irrelevant to the performance because the sort key itself determines the storage order. So the execution plan is the same
Comment 1328337 by HagarTheHorrible
- Upvotes: 1
Selected Answer: AB E is not optimal bc of skipping of the second column.
Comment 1324293 by altonh
- Upvotes: 2
Selected Answer: BC The execution plan of these 2 queries should be the same.
Comment 1318213 by RockyLeon
- Upvotes: 2
Selected Answer: BC sort key works best with the first column in the sort key and continuing in sequential order
Comment 1312623 by michele_scar
- Upvotes: 2
Selected Answer: AB The order is the key to speed up queries
Comment 1303470 by Parandhaman_Margan
- Upvotes: 4
Answer:AB A:This query filters by Region ID, which is the first column in the compound sort key. Queries filtering on the leading sort key column(s) will benefit from optimized performance because the data can be quickly located. B:
This query filters by both Region ID (the first column) and Department ID (the second column) in the sort key. This further narrows down the search space, leading to even faster query performance.
Comment 1303034 by tucobbad
- Upvotes: 4
Selected Answer: BC I would vote for B and C. I’ve tested with a compound sort key (3 columns) and even inverting predicate order the explain plan was the same.
Comment 1261491 by Shanmahi
- Upvotes: 5
Selected Answer: BE Based on the order of the compound sort key columns.