Questions and Answers

Question S9crePGbIfGIhbaKyCBi

Question

A Delta Lake table was created with the below query: //IMG//

Realizing that the original query had a typographical error, the below code was executed: ALTER TABLE prod.sales_by_stor RENAME TO prod.sales_by_store Which result will occur after running the second command?

Choices

  • A: The table reference in the metastore is updated and no data is changed.
  • B: The table name change is recorded in the Delta transaction log.
  • C: All related files and metadata are dropped and recreated in a single ACID transaction.
  • D: The table reference in the metastore is updated and all data files are moved.
  • E: A new Delta transaction log Is created for the renamed table.

Question AEGUCcKZpEIGyDwGTOGW

Question

The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database. After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active user. They then modify their code to the following (leaving all other variables unchanged). //IMG//

Which statement describes what will happen when the above code is executed?

Choices

  • A: The connection to the external table will fail; the string “REDACTED” will be printed.
  • B: An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the encoded password will be saved to DBFS.
  • C: An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the password will be printed in plain text.
  • D: The connection to the external table will succeed; the string value of password will be printed in plain text.
  • E: The connection to the external table will succeed; the string “REDACTED” will be printed.

Question UCAiGnwsC7FtrRe2PPSq

Question

The data engineering team maintains a table of aggregate statistics through batch nightly updates. This includes total sales for the previous day alongside totals and averages for a variety of time periods including the 7 previous days, year-to-date, and quarter-to-date. This table is named store_saies_summary and the schema is as follows: //IMG//

The table daily_store_sales contains all the information needed to update store_sales_summary. The schema for this table is: store_id INT, sales_date DATE, total_sales FLOAT If daily_store_sales is implemented as a Type 1 table and the total_sales column might be adjusted after manual data auditing, which approach is the safest to generate accurate reports in the store_sales_summary table?

Choices

  • A: Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and overwrite the store_sales_summary table with each Update.
  • B: Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and append new rows nightly to the store_sales_summary table.
  • C: Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.
  • D: Implement the appropriate aggregate logic as a Structured Streaming read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.
  • E: Use Structured Streaming to subscribe to the change data feed for daily_store_sales and apply changes to the aggregates in the store_sales_summary table with each update.

Question tmkarXIgodNeHCWFJwEv

Question

A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.

//IMG//

Which command should be removed from the notebook before scheduling it as a job?

Choices

  • A: Cmd 2
  • B: Cmd 3
  • C: Cmd 4
  • D: Cmd 5
  • E: Cmd 6

Question HLH8a1LIrgeotGnTdQ8N

Question

The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts transforms, and loads the data for their pipeline runs in 10 minutes.

Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

Choices

  • A: Manually trigger a job anytime the business reporting team refreshes their dashboards
  • B: Schedule a job to execute the pipeline once an hour on a new job cluster
  • C: Schedule a Structured Streaming job with a trigger interval of 60 minutes
  • D: Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster
  • E: Configure a job that executes every time new data lands in a given directory