Questions and Answers

Question EdGJq4N8UhPRHxSUeFbX

Question

A task orchestrator has been configured to run two hourly tasks. First, an outside system writes Parquet data to a directory mounted at /mnt/raw_orders/. After this data is written, a Databricks job containing the following code is executed:

//IMG//

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order, and that the time field indicates when the record was queued in the source system.

If the upstream system is known to occasionally enqueue duplicate entries for a single order hours apart, which statement is correct?

Choices

  • A: Duplicate records enqueued more than 2 hours apart may be retained and the orders table may contain duplicate records with the same customer_id and order_id.
  • B: All records will be held in the state store for 2 hours before being deduplicated and committed to the orders table.
  • C: The orders table will contain only the most recent 2 hours of records and no duplicates will be present.
  • D: Duplicate records arriving more than 2 hours apart will be dropped, but duplicates that arrive in the same batch may both be written to the orders table.
  • E: The orders table will not contain duplicates, but records arriving more than 2 hours late will be ignored and missing from the table.

Question AmykmnppUMNnrpjPxD8b

Question

A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constraints and multi-table inserts to validate records on write.

Which consideration will impact the decisions made by the engineer while migrating this workload?

Choices

  • A: Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.
  • B: Databricks supports Spark SQL and JDBC; all logic can be directly migrated from the source system without refactoring.
  • C: Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.
  • D: All Delta Lake transactions are ACID compliant against a single table, and Databricks does not enforce foreign key constraints.
  • E: Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake’s upsert functionality.

Question iVvTjn6OSiwkTMSyZHVu

Question

A data architect has heard about Delta Lake’s built-in versioning and time travel capabilities. For auditing purposes, they have a requirement to maintain a full record of all valid street addresses as they appear in the customers table.

The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.

Which piece of information is critical to this decision?

Choices

  • A: Data corruption can occur if a query fails in a partially completed state because Type 2 tables require setting multiple fields in a single update.
  • B: Shallow clones can be combined with Type 1 tables to accelerate historic queries for long-term versioning.
  • C: Delta Lake time travel cannot be used to query previous versions of these tables because Type 1 changes modify data files in place.
  • D: Delta Lake time travel does not scale well in cost or latency to provide a long-term versioning solution.
  • E: Delta Lake only supports Type 0 tables; once records are inserted to a Delta Lake table, they cannot be modified.

Question OLEAOQTPoyUFbuesNCTS

Question

A table named user_ltv is being used to create a view that will be used by data analysts on various teams. Users in the workspace are configured into groups, which are used for setting up data access using ACLs.

The user_ltv table has the following schema:

email STRING, age INT, ltv INT

The following view definition is executed:

//IMG//

An analyst who is not a member of the auditing group executes the following query:

SELECT * FROM user_ltv_no_minors

Which statement describes the results returned by this query?

Choices

  • A: All columns will be displayed normally for those records that have an age greater than 17; records not meeting this condition will be omitted.
  • B: All age values less than 18 will be returned as null values, all other columns will be returned with the values in user_ltv.
  • C: All values for the age column will be returned as null values, all other columns will be returned with the values in user_ltv.
  • D: All records from all columns will be displayed with the values in user_ltv.
  • E: All columns will be displayed normally for those records that have an age greater than 18; records not meeting this condition will be omitted.

Question qBCT91l3ozzToP7ZwHHN

Question

The data governance team is reviewing code used for deleting records for compliance with GDPR. The following logic has been implemented to propagate delete requests from the user_lookup table to the user_aggregates table.

//IMG//

Assuming that user_id is a unique identifying key and that all users that have requested deletion have been removed from the user_lookup table, which statement describes whether successfully executing the above logic guarantees that the records to be deleted from the user_aggregates table are no longer accessible and why?

Choices

  • A: No; the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command.
  • B: No; files containing deleted records may still be accessible with time travel until a VACUUM command is used to remove invalidated data files.
  • C: Yes; the change data feed uses foreign keys to ensure delete consistency throughout the Lakehouse.
  • D: Yes; Delta Lake ACID guarantees provide assurance that the DELETE command succeeded fully and permanently purged these records.
  • E: No; the change data feed only tracks inserts and updates, not deleted records.