Questions and Answers

Question J4ES6Y9oBmN6AfxQugWe

Question

A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:

//IMG//

A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67.

Which statement describes the outcome of this batch insert?

Choices

  • A: The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.
  • B: The write will fail completely because of the constraint violation and no records will be inserted into the target table.
  • C: The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.
  • D: The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.

Question Hr4Dz89YlXSHYdrmYbUi

Question

A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constraints and multi-table inserts to validate records on write.

Which consideration will impact the decisions made by the engineer while migrating this workload?

Choices

  • A: Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.
  • B: Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake’s upsert functionality.
  • C: Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.
  • D: All Delta Lake transactions are ACID compliant against a single table, and Databricks does not enforce foreign key constraints.

Question CLxwEQov5ZRylfNY7np3

Question

A table is registered with the following code: //IMG//

Both users and orders are Delta Lake tables. Which statement describes the results of querying recent_orders?

Choices

  • A: All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query finishes.
  • B: All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.
  • C: Results will be computed and cached when the table is defined; these cached results will incrementally update as new records are inserted into source tables.
  • D: All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query began.
  • E: The versions of each source table will be stored in the table transaction log; query results will be saved to DBFS with each query.

Question ijJAo4zKLjRIVcGS1dyq

Question

A data architect has heard about Delta Lake’s built-in versioning and time travel capabilities. For auditing purposes, they have a requirement to maintain a full record of all valid street addresses as they appear in the customers table.

The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.

Which piece of information is critical to this decision?

Choices

  • A: Data corruption can occur if a query fails in a partially completed state because Type 2 tables require setting multiple fields in a single update.
  • B: Shallow clones can be combined with Type 1 tables to accelerate historic queries for long-term versioning.
  • C: Delta Lake time travel cannot be used to query previous versions of these tables because Type 1 changes modify data files in place.
  • D: Delta Lake time travel does not scale well in cost or latency to provide a long-term versioning solution.

Question oXEkWwwf3UP2Rllio0jH

Question

A data engineer wants to join a stream of advertisement impressions (when an ad was shown) with another stream of user clicks on advertisements to correlate when impressions led to monetizable clicks.

In the code below, Impressions is a streaming DataFrame with a watermark (“event_time”, “10 minutes”)

//IMG//

The data engineer notices the query slowing down significantly.

Which solution would improve the performance?

Choices

  • A: Joining on event time constraint: clickTime >= impressionTime AND clickTime impressionTime interval 1 hour
  • B: Joining on event time constraint: clickTime + 3 hours < impressionTime - 2 hours
  • C: Joining on event time constraint: clickTime == impressionTime using a leftOuter join
  • D: Joining on event time constraint: clickTime >= impressionTime - interval 3 hours and removing watermarks