Questions and Answers

Question lZOpomtDJfrMn9QUlWqN

Question

In order to facilitate near real-time workloads, a data engineer is creating a helper function to leverage the schema detection and evolution functionality of Databricks Auto Loader. The desired function will automatically detect the schema of the source directly, incrementally process JSON files as they arrive in a source directory, and automatically evolve the schema of the table when new fields are detected.

The function is displayed below with a blank:

//IMG//

Which response correctly fills in the blank to meet the specified requirements?

Choices

  • A:
  • B:
  • C:
  • D:
  • E:

Question TxiZ4DALxAD91qf40gNR

Question

The data engineering team maintains the following code:

//IMG//

Assuming that this code produces logically correct results and the data in the source table has been de-duplicated and validated, which statement describes what will occur when this code is executed?

Choices

  • A: The silver_customer_sales table will be overwritten by aggregated values calculated from all records in the gold_customer_lifetime_sales_summary table as a batch job.
  • B: A batch job will update the gold_customer_lifetime_sales_summary table, replacing only those rows that have different values than the current version of the table, using customer_id as the primary key.
  • C: The gold_customer_lifetime_sales_summary table will be overwritten by aggregated values calculated from all records in the silver_customer_sales table as a batch job.
  • D: An incremental job will leverage running information in the state store to update aggregate values in the gold_customer_lifetime_sales_summary table.
  • E: An incremental job will detect if new rows have been written to the silver_customer_sales table; if new rows are detected, all aggregates will be recalculated and used to overwrite the gold_customer_lifetime_sales_summary table.

Question rEbnCBpqJHz4ngwrdGXB

Question

The data architect has mandated that all tables in the Lakehouse should be configured as external (also known as “unmanaged”) Delta Lake tables.

Which approach will ensure that this requirement is met?

Choices

  • A: When a database is being created, make sure that the LOCATION keyword is used.
  • B: When configuring an external data warehouse for all table storage, leverage Databricks for all ELT.
  • C: When data is saved to a table, make sure that a full file path is specified alongside the Delta format.
  • D: When tables are created, make sure that the EXTERNAL keyword is used in the CREATE TABLE statement.
  • E: When the workspace is being configured, make sure that external cloud object storage has been mounted.

Question uctSwbSCqG0NFxyKqKjp

Question

An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable: //IMG//

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order. If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?

Choices

  • A: Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.
  • B: Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.
  • C: Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.
  • D: Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, the operation will fail.
  • E: Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.

Question qKkhM5fANqR6MXxhBbkC

Question

The marketing team is looking to share data in an aggregate table with the sales organization, but the field names used by the teams do not match, and a number of marketing-specific fields have not been approved for the sales org.

Which of the following solutions addresses the situation while emphasizing simplicity?

Choices

  • A: Create a view on the marketing table selecting only those fields approved for the sales team; alias the names of any fields that should be standardized to the sales naming conventions.
  • B: Create a new table with the required schema and use Delta Lake’s DEEP CLONE functionality to sync up changes committed to one table to the corresponding table.
  • C: Use a CTAS statement to create a derivative table from the marketing table; configure a production job to propagate changes.
  • D: Add a parallel table write to the current production pipeline, updating a new sales table that varies as required from the marketing table.
  • E: Instruct the marketing team to download results as a CSV and email them to the sales organization.