Questions and Answers
Question Ijuy4KKaYfmUlAAs6IzB
Question
A company is building an inventory management system and an inventory reordering system to automatically reorder products. Both systems use Amazon Kinesis Data Streams. The inventory management system uses the Amazon Kinesis Producer Library (KPL) to publish data to a stream. The inventory reordering system uses the Amazon Kinesis Client Library (KCL) to consume data from the stream. The company configures the stream to scale up and down as needed.
Before the company deploys the systems to production, the company discovers that the inventory reordering system received duplicated data.
Which factors could have caused the reordering system to receive duplicated data? (Choose two.)
Choices
- A: The producer experienced network-related timeouts.
- B: The stream’s value for the IteratorAgeMilliseconds metric was too high.
- C: There was a change in the number of shards, record processors, or both.
- D: The AggregationEnabled configuration property was set to true.
- E: The max_records configuration property was set to a number that was too high.
answer?
Answer: AC Answer_ET: AC Community answer AC (100%) Discussion
Comment 1312636 by michele_scar
- Upvotes: 4
Selected Answer: AC https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html
Comment 1303579 by 0c2d840
- Upvotes: 4
Answer is AC https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html Consumer can add duplicates due to network timeouts. Producer can consume duplicates due to shards and record processor related changes.
Question xi9dI2CJANuDzaVKXp3p
Question
A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically. Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?
Choices
- A: AWS DataSync
- B: AWS Glue
- C: AWS Direct Connect
- D: Amazon S3 Transfer Acceleration
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1137927 by TonyStark0122
- Upvotes: 11
A. AWS DataSync
Explanation: AWS DataSync is a managed data transfer service that simplifies and accelerates moving large amounts of data online between on-premises storage and Amazon S3, EFS, or FSx for Windows File Server. DataSync is optimized for efficient, incremental, and reliable transfers of large datasets, making it suitable for transferring 5 TB of data with daily updates.
Comment 1409704 by sam_pre
- Upvotes: 1
Selected Answer: A DataSync perfectly fit for this requirement
Comment 1269247 by San_Juan
- Upvotes: 1
Aseems correct.
AWS Direct Connect is a networking service, nothing to be realted to sync data between on-premises and cloud storage services, as DataSync does (” online service that automates and accelerates moving data between on premises and AWS Storage services.”).
Comment 1226984 by pypelyncar
- Upvotes: 2
Selected Answer: A DataSync, locations, tasks, is all what you need.
Comment 1212618 by FunkyFresco
- Upvotes: 1
Selected Answer: A is datasync
Comment 1202447 by augustino0890
- Upvotes: 2
A. AWS DataSync AWS DataSync is a data transfer service specifically designed to simplify and accelerate moving large volumes of data between on-premises storage systems and AWS storage services like S3.
Comment 1198383 by KelvinPun
- Upvotes: 1
Selected Answer: A That’s the job of DataSync
Comment 1194023 by Rafaaws
- Upvotes: 1
A - DataSync is build for this use case
Comment 1127211 by milofficial
- Upvotes: 2
Selected Answer: A Typical DataSync use case
Question lYc77WFV6k4IUz7q6KHt
Question
An ecommerce company operates a complex order fulfilment process that spans several operational systems hosted in AWS. Each of the operational systems has a Java Database Connectivity (JDBC)-compliant relational database where the latest processing state is captured.
The company needs to give an operations team the ability to track orders on an hourly basis across the entire fulfillment process.
Which solution will meet these requirements with the LEAST development overhead?
Choices
- A: Use AWS Glue to build ingestion pipelines from the operational systems into Amazon Redshift Build dashboards in Amazon QuickSight that track the orders.
- B: Use AWS Glue to build ingestion pipelines from the operational systems into Amazon DynamoDBuild dashboards in Amazon QuickSight that track the orders.
- C: Use AWS Database Migration Service (AWS DMS) to capture changed records in the operational systems. Publish the changes to an Amazon DynamoDB table in a different AWS region from the source database. Build Grafana dashboards that track the orders.
- D: Use AWS Database Migration Service (AWS DMS) to capture changed records in the operational systems. Publish the changes to an Amazon DynamoDB table in a different AWS region from the source database. Build Amazon QuickSight dashboards that track the orders.
answer?
Answer: A Answer_ET: A Community answer A (70%) D (30%) Discussion
Comment 1341153 by MerryLew
- Upvotes: 2
Selected Answer: A DynamoDB is not designed to support relational databases. Redshift, however is.
Comment 1339657 by pepedaruiz999
- Upvotes: 3
Selected Answer: A DynamoDB is not relational data base
Comment 1331523 by axantroff
- Upvotes: 1
Selected Answer: D IDK, it feels like from DEV overhead D > A
Comment 1330181 by kailu
- Upvotes: 2
Selected Answer: D Using AWS DMS for real-time change data capture (CDC) and publishing the changes to DynamoDB, followed by building QuickSight dashboards, is the most efficient solution with the least development overhead for this use case
Comment 1317270 by emupsx1
- Upvotes: 2
Question CUuGCxeHOl7aKo8miAEm
Question
A data engineer needs to use Amazon Neptune to develop graph applications.
Which programming languages should the engineer use to develop the graph applications? (Choose two.)
Choices
- A: Gremlin
- B: SQL
- C: ANSI SQL
- D: SPARQL
- E: Spark SQL
answer?
Answer: AD Answer_ET: AD Community answer AD (100%) Discussion
Comment 1317232 by emupsx1
- Upvotes: 4
Selected Answer: AD https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-queries.html
Question sk217wzPN9YJHjd5pp8r
Question
A mobile gaming company wants to capture data from its gaming app. The company wants to make the data available to three internal consumers of the data. The data records are approximately 20 KB in size.
The company wants to achieve optimal throughput from each device that runs the gaming app. Additionally, the company wants to develop an application to process data streams. The stream-processing application must have dedicated throughput for each internal consumer.
Which solution will meet these requirements?
Choices
- A: Configure the mobile app to call the PutRecords API operation to send data to Amazon Kinesis Data Streams. Use the enhanced fan-out feature with a stream for each internal consumer.
- B: Configure the mobile app to call the PutRecordBatch API operation to send data to Amazon Kinesis Data Firehose. Submit an AWS Support case to turn on dedicated throughput for the company’s AWS account. Allow each internal consumer to access the stream.
- C: Configure the mobile app to use the Amazon Kinesis Producer Library (KPL) to send data to Amazon Kinesis Data Firehose. Use the enhanced fan-out feature with a stream for each internal consumer.
- D: Configure the mobile app to call the PutRecords API operation to send data to Amazon Kinesis Data Streams. Host the stream-processing application for each internal consumer on Amazon EC2 instances. Configure auto scaling for the EC2 instances.
answer?
Answer: A Answer_ET: A Community answer A (86%) 14% Discussion
Comment 1341155 by MerryLew
- Upvotes: 1
Selected Answer: A The fan out feature allows consumers to receive data from a stream with dedicated throughput
Comment 1309159 by AgboolaKun
- Upvotes: 2
Selected Answer: A The correct answer is A.
Here is why:
Amazon Kinesis Data Streams is designed for real-time streaming data.
The PutRecords API is efficient for sending multiple records to Kinesis in a single call, which is good for optimizing throughput from mobile devices.
The enhanced fan-out feature allows multiple consumers to read from the same stream with dedicated throughput for each consumer, which meets the requirement of dedicated throughput for each internal consumer.
This option uses uses Kinesis Data Streams for real-time processing, optimizes throughput from mobile devices with the PutRecords API, and provides dedicated throughput for each consumer using the enhanced fan-out feature.
Comment 1305382 by pikuantne
- Upvotes: 1
Selected Answer: A A is best, but I think it was supposed to be a SHARD for each consumer. B - doesn’t make any sense C - Firehose does not have enhanced fan-out afaik D - does not have the dedicated throughput as it doesn’t use enhanced fan-out with KDS
Comment 1289247 by LR2023
- Upvotes: 1
Selected Answer: B https://docs.aws.amazon.com/streams/latest/dev/kpl-with-firehose.html
KPL does work with firehose
Comment 1286067 by Fawk
- Upvotes: 2
Selected Answer: A Seems to be A - Since KPL does not work into firehouse and only streams, and additionally the dedicated throughput is solved through fan-out