Questions and Answers

Question uVdTM49xE1hTJEPm2gBE

Question

A company has implemented a lake house architecture in Amazon Redshift. The company needs to give users the ability to authenticate into Redshift query editor by using a third-party identity provider (IdP).

A data engineer must set up the authentication mechanism.

What is the first step the data engineer should take to meet this requirement?

Choices

A: Register the third-party IdP as an identity provider in the configuration settings of the Redshift cluster.
B: Register the third-party IdP as an identity provider from within Amazon Redshift.
C: Register the third-party IdP as an identity provider for AVS Secrets Manager. Configure Amazon Redshift to use Secrets Manager to manage user credentials.
D: Register the third-party IdP as an identity provider for AWS Certificate Manager (ACM). Configure Amazon Redshift to use ACM to manage user credentials.

answer?

Answer: B Answer_ET: B Community answer B (68%) A (32%) Discussion

Comment 1285644 by PashoQ

Upvotes: 7

Selected Answer: B https://docs.aws.amazon.com/redshift/latest/mgmt/redshift-iam-access-control-native-idp.html register the identity provider with Amazon Redshift, using SQL statements, which set authentication parameters that are unique to the identity provider.

Comment 1263251 by komorebi

Upvotes: 6

Selected Answer: A Answer is A

Comment 1349639 by solopez_111

Upvotes: 1

Selected Answer: A Since the question is asking for “The first step”, the correct answer is A. “First, you register Amazon Redshift as a third-party application with your identity provider, requesting the necessary API permissions” https://docs.aws.amazon.com/redshift/latest/mgmt/redshift-iam-access-control-native-idp.html

Comment 1348998 by YUICH

Upvotes: 3

Selected Answer: B Why Option (A) is Correct Redshift Uses SAML at the Cluster Level To enable single sign-on with a SAML 2.0–compatible IdP (for example, Okta or Azure AD) for Redshift Query Editor, you register the IdP by uploading its SAML metadata in the Amazon Redshift console. This is done at the cluster configuration or security level—not “within” the database engine itself.

Option (B): “Within Amazon Redshift” There is no direct command such as CREATE IDENTITY PROVIDER inside Redshift SQL. Federating a third-party IdP requires configuring the cluster to trust that IdP’s SAML metadata. That is done via the AWS console or CLI at the cluster level, not by running commands inside the database.

Comment 1336861 by BigMrT

Upvotes: 1

Selected Answer: A Redshift does not support directly registering the IdP “within” the service. The registration must be done through the cluster configuration settings.

Comment 1327274 by paali

Upvotes: 2

Selected Answer: B o complete the preliminary setup between the identity provider and Amazon Redshift, you perform a couple of steps: First, you register Amazon Redshift as a third-party application with your identity provider, requesting the necessary API permissions. Then you create users and groups in the identity provider. Last, you register the identity provider with Amazon Redshift, using SQL statements, which set authentication parameters that are unique to the identity provider. As part of registering the identity provider with Redshift, you assign a namespace to make sure users and roles are grouped correctly.

Comment 1318179 by RockyLeon

Upvotes: 3

Selected Answer: B https://docs.aws.amazon.com/redshift/latest/mgmt/redshift-iam-access-control-native-idp.html

Comment 1268221 by mzansikiller

Upvotes: 5

To enable users to authenticate into the Amazon Redshift query editor using a third-party identity provider (IdP), the data engineer must first register that IdP within the configuration settings of the Redshift cluster itself.

Amazon Redshift natively supports integrating with external identity providers to manage user authentication. By registering the third-party IdP directly in the Redshift cluster settings, it establishes the trust relationship needed for Redshift to rely on that IdP for authenticating users when they log into the query editor. Answer A

Question 1a338l71iO2SKBFAXhNl

Question

A company currently uses a provisioned Amazon EMR cluster that includes general purpose Amazon EC2 instances. The EMR cluster uses EMR managed scaling between one to five task nodes for the company’s long-running Apache Spark extract, transform, and load (ETL) job. The company runs the ETL job every day.

When the company runs the ETL job, the EMR cluster quickly scales up to five nodes. The EMR cluster often reaches maximum CPU usage, but the memory usage remains under 30%.

The company wants to modify the EMR cluster configuration to reduce the EMR costs to run the daily ETL job.

Which solution will meet these requirements MOST cost-effectively?

Choices

A: Increase the maximum number of task nodes for EMR managed scaling to 10.
B: Change the task node type from general purpose EC2 instances to memory optimized EC2 instances.
C: Switch the task node type from general purpose Re instances to compute optimized EC2 instances.
D: Reduce the scaling cooldown period for the provisioned EMR cluster.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1308862 by AgboolaKun

Upvotes: 1

Selected Answer: C C is the correct answer.

Here is why: Compute optimized Amazon EC2 instances are less expensive per CPU core than general purpose instances, making them the better choice for workloads that require high processing power, as they prioritize CPU cores over memory, resulting in a lower cost per vCPU compared to general purpose instances.

Comment 1262188 by antun3ra

Upvotes: 4

Selected Answer: C current situation shows that the EMR cluster is reaching maximum CPU usage, but memory usage remains low (under 30%). This indicates that the workload is CPU-bound rather than memory-bound.

Comment 1261467 by Shanmahi

Upvotes: 4

Selected Answer: C Since the ETL job reaches maximum CPU usage but not memory usage, switching from general-purpose instances to compute-optimized instances (such as C5 or C6g instances) can provide better performance per dollar for CPU-bound workloads.

Question 49QDnPfaUjMeGru4qTz4

Question

A company uploads .csv files to an Amazon S3 bucket. The company’s data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.

An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.

If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.

Which solution will meet these requirements?

Choices

A: Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands to update the existing rows with new values from the staging Redshift table.
B: Modify the AWS Glue job to load the previously inserted data into a MySQL database. Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift tables.
C: Use Apache Spark’s DataFrame dropDuplicates() API to eliminate duplicates. Write the data to the Redshift tables.
D: Use the AWS Glue ResolveChoice built-in transform to select the value of the column from the most recent record.

answer?

Answer: A Answer_ET: A Community answer A (100%) Discussion

Comment 1261470 by Shanmahi

Upvotes: 7

Selected Answer: A Two step approach involving creating a staging table, followed by using Redshift’s merge statement to update the target table from staging table and finally truncate/housekeep the staging table.

Question T7rfRHxj8NtTPis414k4

Question

A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of files into a fact table that is in a Redshift cluster.

The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the fact table.

Which solution will meet these requirements?

Choices

A: Use multiple COPY commands to load the data into the Redshift cluster.
B: Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.
C: Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.
D: Use a single COPY command to load the data into the Redshift cluster.

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1260824 by canace

Upvotes: 5

D? https://docs.aws.amazon.com/redshift/latest/dg/t_Loading-data-from-S3.html

Comment 1265568 by cas_tori

Upvotes: 1

Selected Answer: D this is D

Comment 1262189 by antun3ra

Upvotes: 3

Selected Answer: D A single COPY command automatically parallelizes the load operation across all nodes in the Redshift cluster. This ensures optimal use of cluster resources.

Comment 1261871 by phkhadse

Upvotes: 2

A - Using multiple COPY commands allows parallel loading of data, which maximizes throughput.

Comment 1261474 by Shanmahi

Upvotes: 2

Selected Answer: D Agree with canace; Redshift’s copy command uses MPP architecture to read and load in parallel from files into DWH.

Question x4CMK1CDTJYduX0iE8pr

Question

A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to query the data that is in the data lake.

The company needs to identify matching records even when the records do not have a common unique identifier.

Which solution will meet this requirement?

Choices

A: Use Amazon Macie pattern matching as part of the ETL job.
B: Train and use the AWS Glue PySpark Filter class in the ETL job.
C: Partition tables and use the ETL job to partition the data on a unique identifier.
D: Train and use the AWS Lake Formation FindMatches transform in the ETL job.

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1261482 by Shanmahi

Upvotes: 6

Selected Answer: D AWS Lake Formation provides machine learning capabilities to create custom transforms to cleanse your data. There is currently one available transform named FindMatches. The FindMatches transform enables you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. This will not require writing any code or knowing how machine learning works.

Comment 1318193 by RockyLeon

Upvotes: 1

Selected Answer: D Correct answer is D

Comment 1289237 by LR2023

Upvotes: 1

Selected Answer: D

vuthanhdatt's Second Brain

Explorer

Associate-DEA-C01_8

Questions and Answers

Question uVdTM49xE1hTJEPm2gBE

Question

Choices

Comment 1285644 by PashoQ

Comment 1263251 by komorebi

Comment 1349639 by solopez_111

Comment 1348998 by YUICH

Comment 1336861 by BigMrT

Comment 1327274 by paali

Comment 1318179 by RockyLeon

Comment 1268221 by mzansikiller

Question 1a338l71iO2SKBFAXhNl

Question

Choices

Comment 1308862 by AgboolaKun

Comment 1262188 by antun3ra

Comment 1261467 by Shanmahi

Question 49QDnPfaUjMeGru4qTz4

Question

Choices

Comment 1261470 by Shanmahi

Question T7rfRHxj8NtTPis414k4

Question

Choices

Comment 1260824 by canace

Comment 1265568 by cas_tori

Comment 1262189 by antun3ra

Comment 1261871 by phkhadse

Comment 1261474 by Shanmahi

Question x4CMK1CDTJYduX0iE8pr

Question

Choices

Comment 1261482 by Shanmahi

Comment 1318193 by RockyLeon

Comment 1289237 by LR2023

Graph View

Table of Contents