Questions and Answers
Question uVdTM49xE1hTJEPm2gBE
Question
A company has implemented a lake house architecture in Amazon Redshift. The company needs to give users the ability to authenticate into Redshift query editor by using a third-party identity provider (IdP).
A data engineer must set up the authentication mechanism.
What is the first step the data engineer should take to meet this requirement?
Choices
- A: Register the third-party IdP as an identity provider in the configuration settings of the Redshift cluster.
- B: Register the third-party IdP as an identity provider from within Amazon Redshift.
- C: Register the third-party IdP as an identity provider for AVS Secrets Manager. Configure Amazon Redshift to use Secrets Manager to manage user credentials.
- D: Register the third-party IdP as an identity provider for AWS Certificate Manager (ACM). Configure Amazon Redshift to use ACM to manage user credentials.
answer?
Answer: B Answer_ET: B Community answer B (68%) A (32%) Discussion
Comment 1285644 by PashoQ
- Upvotes: 7
Selected Answer: B https://docs.aws.amazon.com/redshift/latest/mgmt/redshift-iam-access-control-native-idp.html register the identity provider with Amazon Redshift, using SQL statements, which set authentication parameters that are unique to the identity provider.
Comment 1263251 by komorebi
- Upvotes: 6
Selected Answer: A Answer is A
Comment 1349639 by solopez_111
- Upvotes: 1
Selected Answer: A Since the question is asking for “The first step”, the correct answer is A. “First, you register Amazon Redshift as a third-party application with your identity provider, requesting the necessary API permissions” https://docs.aws.amazon.com/redshift/latest/mgmt/redshift-iam-access-control-native-idp.html
Comment 1348998 by YUICH
- Upvotes: 3
Selected Answer: B Why Option (A) is Correct Redshift Uses SAML at the Cluster Level To enable single sign-on with a SAML 2.0–compatible IdP (for example, Okta or Azure AD) for Redshift Query Editor, you register the IdP by uploading its SAML metadata in the Amazon Redshift console. This is done at the cluster configuration or security level—not “within” the database engine itself.
Option (B): “Within Amazon Redshift” There is no direct command such as CREATE IDENTITY PROVIDER inside Redshift SQL. Federating a third-party IdP requires configuring the cluster to trust that IdP’s SAML metadata. That is done via the AWS console or CLI at the cluster level, not by running commands inside the database.
Comment 1336861 by BigMrT
- Upvotes: 1
Selected Answer: A Redshift does not support directly registering the IdP “within” the service. The registration must be done through the cluster configuration settings.
Comment 1327274 by paali
- Upvotes: 2
Selected Answer: B o complete the preliminary setup between the identity provider and Amazon Redshift, you perform a couple of steps: First, you register Amazon Redshift as a third-party application with your identity provider, requesting the necessary API permissions. Then you create users and groups in the identity provider. Last, you register the identity provider with Amazon Redshift, using SQL statements, which set authentication parameters that are unique to the identity provider. As part of registering the identity provider with Redshift, you assign a namespace to make sure users and roles are grouped correctly.
Comment 1318179 by RockyLeon
- Upvotes: 3
Selected Answer: B https://docs.aws.amazon.com/redshift/latest/mgmt/redshift-iam-access-control-native-idp.html
Comment 1268221 by mzansikiller
- Upvotes: 5
To enable users to authenticate into the Amazon Redshift query editor using a third-party identity provider (IdP), the data engineer must first register that IdP within the configuration settings of the Redshift cluster itself.
Amazon Redshift natively supports integrating with external identity providers to manage user authentication. By registering the third-party IdP directly in the Redshift cluster settings, it establishes the trust relationship needed for Redshift to rely on that IdP for authenticating users when they log into the query editor. Answer A
Question 1a338l71iO2SKBFAXhNl
Question
A company currently uses a provisioned Amazon EMR cluster that includes general purpose Amazon EC2 instances. The EMR cluster uses EMR managed scaling between one to five task nodes for the company’s long-running Apache Spark extract, transform, and load (ETL) job. The company runs the ETL job every day.
When the company runs the ETL job, the EMR cluster quickly scales up to five nodes. The EMR cluster often reaches maximum CPU usage, but the memory usage remains under 30%.
The company wants to modify the EMR cluster configuration to reduce the EMR costs to run the daily ETL job.
Which solution will meet these requirements MOST cost-effectively?
Choices
- A: Increase the maximum number of task nodes for EMR managed scaling to 10.
- B: Change the task node type from general purpose EC2 instances to memory optimized EC2 instances.
- C: Switch the task node type from general purpose Re instances to compute optimized EC2 instances.
- D: Reduce the scaling cooldown period for the provisioned EMR cluster.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1308862 by AgboolaKun
- Upvotes: 1
Selected Answer: C C is the correct answer.
Here is why: Compute optimized Amazon EC2 instances are less expensive per CPU core than general purpose instances, making them the better choice for workloads that require high processing power, as they prioritize CPU cores over memory, resulting in a lower cost per vCPU compared to general purpose instances.
Comment 1262188 by antun3ra
- Upvotes: 4
Selected Answer: C current situation shows that the EMR cluster is reaching maximum CPU usage, but memory usage remains low (under 30%). This indicates that the workload is CPU-bound rather than memory-bound.
Comment 1261467 by Shanmahi
- Upvotes: 4
Selected Answer: C Since the ETL job reaches maximum CPU usage but not memory usage, switching from general-purpose instances to compute-optimized instances (such as C5 or C6g instances) can provide better performance per dollar for CPU-bound workloads.
Question 49QDnPfaUjMeGru4qTz4
Question
A company uploads .csv files to an Amazon S3 bucket. The company’s data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.
An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.
If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.
Which solution will meet these requirements?
Choices
- A: Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands to update the existing rows with new values from the staging Redshift table.
- B: Modify the AWS Glue job to load the previously inserted data into a MySQL database. Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift tables.
- C: Use Apache Spark’s DataFrame dropDuplicates() API to eliminate duplicates. Write the data to the Redshift tables.
- D: Use the AWS Glue ResolveChoice built-in transform to select the value of the column from the most recent record.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1261470 by Shanmahi
- Upvotes: 7
Selected Answer: A Two step approach involving creating a staging table, followed by using Redshift’s merge statement to update the target table from staging table and finally truncate/housekeep the staging table.
Question T7rfRHxj8NtTPis414k4
Question
A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of files into a fact table that is in a Redshift cluster.
The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the fact table.
Which solution will meet these requirements?
Choices
- A: Use multiple COPY commands to load the data into the Redshift cluster.
- B: Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.
- C: Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.
- D: Use a single COPY command to load the data into the Redshift cluster.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1260824 by canace
- Upvotes: 5
D? https://docs.aws.amazon.com/redshift/latest/dg/t_Loading-data-from-S3.html
Comment 1265568 by cas_tori
- Upvotes: 1
Selected Answer: D this is D
Comment 1262189 by antun3ra
- Upvotes: 3
Selected Answer: D A single COPY command automatically parallelizes the load operation across all nodes in the Redshift cluster. This ensures optimal use of cluster resources.
Comment 1261871 by phkhadse
- Upvotes: 2
A - Using multiple COPY commands allows parallel loading of data, which maximizes throughput.
Comment 1261474 by Shanmahi
- Upvotes: 2
Selected Answer: D Agree with canace; Redshift’s copy command uses MPP architecture to read and load in parallel from files into DWH.
Question x4CMK1CDTJYduX0iE8pr
Question
A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to query the data that is in the data lake.
The company needs to identify matching records even when the records do not have a common unique identifier.
Which solution will meet this requirement?
Choices
- A: Use Amazon Macie pattern matching as part of the ETL job.
- B: Train and use the AWS Glue PySpark Filter class in the ETL job.
- C: Partition tables and use the ETL job to partition the data on a unique identifier.
- D: Train and use the AWS Lake Formation FindMatches transform in the ETL job.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1261482 by Shanmahi
- Upvotes: 6
Selected Answer: D AWS Lake Formation provides machine learning capabilities to create custom transforms to cleanse your data. There is currently one available transform named FindMatches. The FindMatches transform enables you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. This will not require writing any code or knowing how machine learning works.
Comment 1318193 by RockyLeon
- Upvotes: 1
Selected Answer: D Correct answer is D
Comment 1289237 by LR2023
- Upvotes: 1
Selected Answer: D