Questions and Answers
Question PsDxCWs9gzQ6dhFrYv3X
Question
A technology company currently uses Amazon Kinesis Data Streams to collect log data in real time. The company wants to use Amazon Redshift for downstream real-time queries and to enrich the log data.
Which solution will ingest data into Amazon Redshift with the LEAST operational overhead?
Choices
- A: Set up an Amazon Kinesis Data Firehose delivery stream to send data to a Redshift provisioned cluster table.
- B: Set up an Amazon Kinesis Data Firehose delivery stream to send data to Amazon S3. Configure a Redshift provisioned cluster to load data every minute.
- C: Configure Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to send data directly to a Redshift provisioned cluster table.
- D: Use Amazon Redshift streaming ingestion from Kinesis Data Streams and to present data as a materialized view.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1278716 by dashapetr
- Upvotes: 2
Selected Answer: D Amazon Redshift supports streaming ingestion from Amazon Kinesis Data Streams. The Amazon Redshift streaming ingestion feature provides low-latency, high-speed ingestion of streaming data from Amazon Kinesis Data Streams into an Amazon Redshift materialized view. Amazon Redshift streaming ingestion removes the need to stage data in Amazon S3before ingesting into Amazon Redshift.
link: https://docs.aws.amazon.com/streams/latest/dev/using-other-services-redshift.html
Comment 1278568 by EJGisME
- Upvotes: 1
Selected Answer: D D. Use Amazon Redshift streaming ingestion from Kinesis Data Streams and to present data as a materialized view.
Question v5QOi5ue0qH9cz3eaZQQ
Question
A company maintains a data warehouse in an on-premises Oracle database. The company wants to build a data lake on AWS. The company wants to load data warehouse tables into Amazon S3 and synchronize the tables with incremental data that arrives from the data warehouse every day.
Each table has a column that contains monotonically increasing values. The size of each table is less than 50 GB. The data warehouse tables are refreshed every night between 1 AM and 2 AM. A business intelligence team queries the tables between 10 AM and 8 PM every day.
Which solution will meet these requirements in the MOST operationally efficient way?
Choices
- A: Use an AWS Database Migration Service (AWS DMS) full load plus CDC job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3. Use custom logic in AWS Glue to append the daily incremental data to a full-load copy that is in Amazon S3.
- B: Use an AWS Glue Java Database Connectivity (JDBC) connection. Configure a job bookmark for a column that contains monotonically increasing values. Write custom logic to append the daily incremental data to a full-load copy that is in Amazon S3.
- C: Use an AWS Database Migration Service (AWS DMS) full load migration to load the data warehouse tables into Amazon S3 every day. Overwrite the previous day’s full-load copy every day.
- D: Use AWS Glue to load a full copy of the data warehouse tables into Amazon S3 every day. Overwrite the previous day’s full-load copy every day.
answer?
Answer: A Answer_ET: A Community answer A (80%) C (20%) Discussion
Comment 1303473 by Parandhaman_Margan
- Upvotes: 1
Answer:A Use an AWS Database Migration Service (AWS DMS) full load plus CDC job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3.
Comment 1290748 by LR2023
- Upvotes: 1
Selected Answer: C A seems to be an overkill using custom logic
Comment 1286062 by Fawk
- Upvotes: 4
Selected Answer: A DMS is definitely the service, and C is obviously wrong
Question hgCWfMTVT0ilpNJSphI6
Question
A company is building a data lake for a new analytics team. The company is using Amazon S3 for storage and Amazon Athena for query analysis. All data that is in Amazon S3 is in Apache Parquet format.
The company is running a new Oracle database as a source system in the company’s data center. The company has 70 tables in the Oracle database. All the tables have primary keys. Data can occasionally change in the source system. The company wants to ingest the tables every day into the data lake.
Which solution will meet this requirement with the LEAST effort?
Choices
- A: Create an Apache Sqoop job in Amazon EMR to read the data from the Oracle database. Configure the Sqoop job to write the data to Amazon S3 in Parquet format.
- B: Create an AWS Glue connection to the Oracle database. Create an AWS Glue bookmark job to ingest the data incrementally and to write the data to Amazon S3 in Parquet format.
- C: Create an AWS Database Migration Service (AWS DMS) task for ongoing replication. Set the Oracle database as the source. Set Amazon S3 as the target. Configure the task to write the data in Parquet format.
- D: Create an Oracle database in Amazon RDS. Use AWS Database Migration Service (AWS DMS) to migrate the on-premises Oracle database to Amazon RDS. Configure triggers on the tables to invoke AWS Lambda functions to write changed records to Amazon S3 in Parquet format.
answer?
Answer: C Answer_ET: C Community answer C (90%) 10% Discussion
Comment 1278727 by dashapetr
- Upvotes: 8
Selected Answer: C C: You can use S3 as a target and configure files to be in Parquet format https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html
Comment 1312632 by michele_scar
- Upvotes: 1
Selected Answer: C A and D wrong. B also wrong because Bookmark is used to mantain files that you don’t want to re-analyze in case of a re-run about a glue job.
Comment 1279578 by siheom
- Upvotes: 1
Selected Answer: B VOTE B
Question 56sTPw8xF2S7bPyXdOgu
Question
A transportation company wants to track vehicle movements by capturing geolocation records. The records are 10 bytes in size. The company receives up to 10.000 records every second. Data transmission delays of a few minutes are acceptable because of unreliable network conditions.
The transportation company wants to use Amazon Kinesis Data Streams to ingest the geolocation data. The company needs a reliable mechanism to send data to Kinesis Data Streams. The company needs to maximize the throughput efficiency of the Kinesis shards.
Which solution will meet these requirements in the MOST operationally efficient way?
Choices
- A: Kinesis Agent
- B: Kinesis Producer Library (KPL)
- C: Amazon Kinesis Data Firehose
- D: Kinesis SDK
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1330817 by HagarTheHorrible
- Upvotes: 2
Selected Answer: B KPL automatically batches and aggregates multiple records into a single payload before sending them to Kinesis Data Streams. This reduces the number of records sent and optimizes shard throughput usage.
Comment 1280260 by EJGisME
- Upvotes: 3
Selected Answer: B B. Kinesis Producer Library (KPL)
Question iMNFkht8C1aqZAUngi6K
Question
An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously.
A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates.
Which solution will meet these requirements with the LEAST operational overhead?
Choices
- A: Use the FindMatches feature of AWS Glue to remove duplicate records.
- B: Use non-Windows functions in Amazon Athena to remove duplicate records.
- C: Use Amazon Neptune ML and an Apache Gremlin script to remove duplicate records.
- D: Use the global tables feature of Amazon DynamoDB to prevent duplicate data.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1358479 by italiancloud2025
- Upvotes: 1
Selected Answer: A A: Sí, porque AWS Glue FindMatches utiliza machine learning para deduplicar datos y corregir errores ortográficos con mínima sobrecarga operativa. B: No, usar Athena requiere escribir consultas manuales y no maneja bien las variaciones de escritura. C: No, Neptune ML está orientado a análisis en grafos, no a la deduplicación de datos semi-estructurados. D: No, global tables en DynamoDB se usan para replicación, no para eliminar duplicados.
Comment 1286065 by Fawk
- Upvotes: 2
Selected Answer: A A - The other options are dumb and hardly make sense