Questions and Answers
Question gqMspiDbwO2HCt1xsAhC
Question
A company is creating near real-time dashboards to visualize time series data. The company ingests data into Amazon Managed Streaming for Apache Kafka (Amazon MSK). A customized data pipeline consumes the data. The pipeline then writes data to Amazon Keyspaces (for Apache Cassandra), Amazon OpenSearch Service, and Apache Avro objects in Amazon S3.
Which solution will make the data available for the data visualizations with the LEAST latency?
Choices
- A: Create OpenSearch Dashboards by using the data from OpenSearch Service.
- B: Use Amazon Athena with an Apache Hive metastore to query the Avro objects in Amazon S3. Use Amazon Managed Grafana to connect to Athena and to create the dashboards.
- C: Use Amazon Athena to query the data from the Avro objects in Amazon S3. Configure Amazon Keyspaces as the data catalog. Connect Amazon QuickSight to Athena to create the dashboards.
- D: Use AWS Glue to catalog the data. Use S3 Select to query the Avro objects in Amazon S3. Connect Amazon QuickSight to the S3 bucket to create the dashboards.
answer?
Answer: A Answer_ET: A Community answer A (80%) B (20%) Discussion
Comment 1316994 by jacob_nz
- Upvotes: 1
Selected Answer: B Timeseries/ Graph visualizations we use Graphana
Comment 1265757 by aragon_saa
- Upvotes: 1
Selected Answer: A Answer is A
Comment 1265672 by matt200
- Upvotes: 3
Selected Answer: A Option A: Create OpenSearch Dashboards by using the data from OpenSearch Service is the best choice for achieving the least latency. OpenSearch is designed for low-latency data retrieval and visualization, making it ideal for near real-time dashboards
Question SdOzTczlO5hNUefCufy6
Question
A data engineer maintains a materialized view that is based on an Amazon Redshift database. The view has a column named load_date that stores the date when each row was loaded.
The data engineer needs to reclaim database storage space by deleting all the rows from the materialized view.
Which command will reclaim the MOST database storage space?
Choices
- A: DELETE FROM materialized_view_name where 1=1
- B: TRUNCATE materialized_view_name
- C: VACUUM table_name where load_date⇐current_date materializedview
- D: DELETE FROM materialized_view_name where load_date⇐current_date
answer?
Answer: B Answer_ET: B Community answer B (50%) A (50%) Discussion
Comment 1457086 by bad1ccc
- Upvotes: 1
Selected Answer: B When you TRUNCATE a materialized view in Amazon Redshift, it removes all rows from the view and reclaims the most storage space because the operation does not log individual row deletions. This is far more efficient in terms of both time and space than a DELETE operation.
Comment 1364548 by JimOGrady
- Upvotes: 2
Selected Answer: B the key is “reclaim database storage space” Delete does not reclaim disk space
Comment 1349579 by sravanscr
- Upvotes: 1
Selected Answer: B in AWS Redshift, you can use the “TRUNCATE” command to delete all rows from a materialized view, effectively “truncating” it, especially when the materialized view is configured for streaming ingestion; this is a faster way to clear the data compared to a “DELETE” statement.
Comment 1347207 by YUICH
- Upvotes: 2
Selected Answer: A (B) TRUNCATE is invalid for materialized views, so it is excluded. In actual operations, to most effectively reuse storage, you need to delete all rows with a DELETE statement and then run VACUUM, as shown in (A) or (D). If you want to delete everything, option (A) is the most straightforward approach.
Comment 1343745 by A_E_M
- Upvotes: 3
Selected Answer: A Why this is the best option: Efficiency: By using “WHERE 1=1”, the database doesn’t need to iterate through each row individually to check a specific condition, resulting in faster deletion of all data. Storage reclamation: Deleting all rows using this method will free up the most storage space within the materialized view. Important Considerations: TRUNCATE vs DELETE: While “TRUNCATE” can also be used to remove all data from a table, it is not recommended for materialized views in Redshift as it might not always reclaim all the storage space effectively. VACUUM command: “VACUUM” is used to reclaim space within a table after deletions, but it’s not necessary when deleting all rows using “DELETE FROM … WHERE 1=1;” as the entire table will be emptied.
Comment 1308849 by AgboolaKun
- Upvotes: 1
Selected Answer: B B is the correct answer.
Here is why: TRUNCATE is the most efficient way to remove all rows from a table or materialized view in Amazon Redshift. It’s faster than DELETE and immediately reclaims disk space.
TRUNCATE removes all rows in a table without scanning them individually. This makes it much faster than DELETE operations, especially for large tables.
TRUNCATE automatically performs a VACUUM operation, which sorts the table and reclaims space.
TRUNCATE resets any auto-increment columns.
Comment 1303465 by Parandhaman_Margan
- Upvotes: 1
Answer:B TRUNCATE Command: The TRUNCATE command is the most efficient way to delete all rows from a table or materialized view. It does not scan the table, does not generate individual row delete actions, and effectively frees up space immediately by removing all data at once. It also resets any identity columns, if applicable.
Question 2M4MwFPvcrDWHD8EBW7p
Question
A media company wants to use Amazon OpenSearch Service to analyze rea-time data about popular musical artists and songs. The company expects to ingest millions of new data events every day. The new data events will arrive through an Amazon Kinesis data stream. The company must transform the data and then ingest the data into the OpenSearch Service domain.
Which method should the company use to ingest the data with the LEAST operational overhead?
Choices
- A: Use Amazon Kinesis Data Firehose and an AWS Lambda function to transform the data and deliver the transformed data to OpenSearch Service.
- B: Use a Logstash pipeline that has prebuilt filters to transform the data and deliver the transformed data to OpenSearch Service.
- C: Use an AWS Lambda function to call the Amazon Kinesis Agent to transform the data and deliver the transformed data OpenSearch Service.
- D: Use the Kinesis Client Library (KCL) to transform the data and deliver the transformed data to OpenSearch Service.
answer?
Answer: A Answer_ET: A Community answer A (50%) B (50%) Discussion
Comment 1359056 by Evan_Lin
- Upvotes: 2
Selected Answer: B why not B? Logstash is an open-source data ingestion tool that allows you to collect data from various sources, transform it, and send it to your desired destination. With prebuilt filters and support for over 200 plugins, Logstash allows users to easily ingest data regardless of the data source or type.
Comment 1338999 by maddyr
- Upvotes: 2
Selected Answer: B Logstash is a lightweight, open-source, server-side data processing pipeline that allows you to collect data from various sources, transform it on the fly, and send it to your desired destination. It is most often used as a data pipeline for Elasticsearch, an open-source analytics and search engine https://aws.amazon.com/what-is/elk-stack/#seo-faq-pairs#what-is-the-elk-stack
Comment 1268222 by mzansikiller
- Upvotes: 1
Amazon Kinesis Data Firehose is a fully managed service that reliably loads streaming data into data lakes, data stores and analytics services like OpenSearch Service. It can automatically scale to match the throughput of your data and requires no ongoing administration.
Answer A
Comment 1265758 by aragon_saa
- Upvotes: 1
Selected Answer: A Answer is A
Comment 1265673 by matt200
- Upvotes: 3
Selected Answer: A Option A: Use Amazon Kinesis Data Firehose and an AWS Lambda function to transform the data and deliver the transformed data to OpenSearch Service is the best choice for achieving the least operational overhead. Kinesis Data Firehose is a managed service that automates the data ingestion process, scales seamlessly, and integrates directly with OpenSearch Service, minimizing the need for manual intervention and infrastructure management.
Question bo6XtQqyJxsnPbCPOhZQ
Question
A company stores customer data tables that include customer addresses in an AWS Lake Formation data lake. To comply with new regulations, the company must ensure that users cannot access data for customers who are in Canada.
The company needs a solution that will prevent user access to rows for customers who are in Canada.
Which solution will meet this requirement with the LEAST operational effort?
Choices
- A: Set a row-level filter to prevent user access to a row where the country is Canada.
- B: Create an IAM role that restricts user access to an address where the country is Canada.
- C: Set a column-level filter to prevent user access to a row where the country is Canada.
- D: Apply a tag to all rows where Canada is the country. Prevent user access where the tag is equal to “Canada”.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1308854 by AgboolaKun
- Upvotes: 1
Selected Answer: A The solution that will meet the requirement with the least operational effort is A.
Here’s why:
Row-level security: AWS Lake Formation provides built-in row-level security, which allows you to control access to specific rows in a table based on conditions. This is precisely what’s needed in this scenario.
Least operational effort: Once set up, this filter will automatically apply to all queries without needing to modify the data or create complex IAM policies.
Scalability: As new data is added to the table, the filter will automatically apply, requiring no additional effort.
Precision: It directly addresses the requirement by preventing access to rows where the country is Canada, without affecting other data.
Comment 1263250 by komorebi
- Upvotes: 3
Selected Answer: A Answer is A
Question FFP8wbiFRXNlVNkx7q1U
Question
A company stores daily records of the financial performance of investment portfolios in .csv format in an Amazon S3 bucket. A data engineer uses AWS Glue crawlers to crawl the S3 data. The data engineer must make the S3 data accessible daily in the AWS Glue Data Catalog. Which solution will meet these requirements?
Choices
- A: Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler’s data store. Create a daily schedule to run the crawler. Configure the output destination to a new path in the existing S3 bucket.
- B: Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler’s data store. Create a daily schedule to run the crawler. Specify a database name for the output.
- C: Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler’s data store. Allocate data processing units (DPUs) to run the crawler every day. Specify a database name for the output.
- D: Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler’s data store. Allocate data processing units (DPUs) to run the crawler every day. Configure the output destination to a new path in the existing S3 bucket.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1137915 by TonyStark0122
- Upvotes: 8
B. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler’s data store. Create a daily schedule to run the crawler. Specify a database name for the output.
Explanation: Option B correctly sets up the IAM role with the necessary permissions using the AWSGlueServiceRole policy, which is designed for use with AWS Glue. It specifies the S3 bucket path of the source data as the crawler’s data store and creates a daily schedule to run the crawler. Additionally, it specifies a database name for the output, ensuring that the crawled data is properly cataloged in the AWS Glue Data Catalog.
Comment 1339450 by plutonash
- Upvotes: 2
Selected Answer: B answer B is incomplete. Even we include AWSGlueServiceRole policy on IAM role, S3 access is not garantee
Comment 1303374 by LrdKanien
- Upvotes: 1
How does Glue get access to S3 if you don’t do B?
Comment 1209098 by k350Secops
- Upvotes: 4
Selected Answer: B Glue Crawlers are serverless. Assigning DPUs is the point where i decided it option B
Comment 1167945 by GiorgioGss
- Upvotes: 4
Selected Answer: B A,C are wrong because you use don’t need full S3 access. D is wrong because you don’t need to provision DPU and the destination should be a database, not an s3 bucket. so it’s B