Questions and Answers

Question 9QfgldmXkHT1ECMg3vHD

Question

A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.

The proposed directory structure is displayed below:

//IMG//

Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?

Choices

  • A: No; Delta Lake manages streaming checkpoints in the transaction log.
  • B: Yes; both of the streams can share a single checkpoint directory.
  • C: No; only one stream can write to a Delta Lake table.
  • D: No; each of the streams needs to have its own checkpoint directory.

Question jHkr33IW5HmyW3zIKIv4

Question

Which statement describes the default execution mode for Databricks Auto Loader?

Choices

  • A: Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and idempotently loaded into the target Delta Lake table.
  • B: New files are identified by listing the input directory; the target table is materialized by directly querying all valid files in the source directory.
  • C: Webhooks trigger a Databricks job to run anytime new data arrives in a source directory; new data are automatically merged into target tables using rules inferred from the data.
  • D: New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.

Question 2pZj0DXg5ATNKEunCKko

Question

The Databricks workspace administrator has configured interactive clusters for each of the data engineering groups. To control costs, clusters are set to terminate after 30 minutes of inactivity. Each user should be able to execute workloads against their assigned clusters at any time of the day. Assuming users have been added to a workspace but not granted any permissions, which of the following describes the minimal permissions a user would need to start and attach to an already configured cluster.

Choices

  • A: “Can Manage” privileges on the required cluster
  • B: Workspace Admin privileges, cluster creation allowed, “Can Attach To” privileges on the required cluster
  • C: Cluster creation allowed, “Can Attach To” privileges on the required cluster
  • D: “Can Restart” privileges on the required cluster
  • E: Cluster creation allowed, “Can Restart” privileges on the required cluster

Question F8eT4lrxXItrnsIPGmaM

Question

A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams. The proposed directory structure is displayed below: //IMG//

Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?

Choices

  • A: No; Delta Lake manages streaming checkpoints in the transaction log.
  • B: Yes; both of the streams can share a single checkpoint directory.
  • C: No; only one stream can write to a Delta Lake table.
  • D: Yes; Delta Lake supports infinite concurrent writers.
  • E: No; each of the streams needs to have its own checkpoint directory.

Question KlQ0Q7J7ndUFckkmgTp8

Question

An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:

//IMG//

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.

If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?

Choices

  • A: Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.
  • B: Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.
  • C: Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.
  • D: Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.