Questions and Answers

Question g8lN536cndw1XpBJjrxb

Question

A junior data engineer on your team has implemented the following code block. //IMG//

The view new_events contains a batch of records with the same schema as the events Delta table. The event_id field serves as a unique key for this table. When this query is executed, what will happen with new records that have the same event_id as an existing record?

Choices

  • A: They are merged.
  • B: They are ignored.
  • C: They are updated.
  • D: They are inserted.
  • E: They are deleted.

Question 48efBnqEFsEgo5q6TEhT

Question

A junior data engineer seeks to leverage Delta Lake’s Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows in a bronze table created with the property delta.enableChangeDataFeed = true. They plan to execute the following code as a daily job: //IMG//

Which statement describes the execution and results of running the above query multiple times?

Choices

  • A: Each time the job is executed, newly updated records will be merged into the target table, overwriting previous values with the same primary keys.
  • B: Each time the job is executed, the entire available history of inserted or updated records will be appended to the target table, resulting in many duplicate entries.
  • C: Each time the job is executed, the target table will be overwritten using the entire history of inserted or updated records, giving the desired result.
  • D: Each time the job is executed, the differences between the original and current versions are calculated; this may result in duplicate entries for some records.
  • E: Each time the job is executed, only those records that have been inserted or updated since the last execution will be appended to the target table, giving the desired result.

Question 9IZ0tsUD7ka1ifgNnTIp

Question

A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on the Kafka service is seven days. The pipeline has been in production for three months. Which describes how Delta Lake can help to avoid data loss of this nature in the future?

Choices

  • A: The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.
  • B: Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.
  • C: Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.
  • D: Data can never be permanently dropped or deleted from Delta Lake, so data loss is not possible under any circumstance.
  • E: Ingesting all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.

Question 4umAXcyONTdzTWREgmr2

Question

When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?

Choices

  • A: Cluster: New Job Cluster; Retries: Unlimited; Maximum Concurrent Runs: Unlimited
  • B: Cluster: New Job Cluster; Retries: None; Maximum Concurrent Runs: 1
  • C: Cluster: Existing All-Purpose Cluster; Retries: Unlimited; Maximum Concurrent Runs: 1
  • D: Cluster: New Job Cluster; Retries: Unlimited; Maximum Concurrent Runs: 1
  • E: Cluster: Existing All-Purpose Cluster; Retries: None; Maximum Concurrent Runs: 1

Question orSGipt9wuE4e50JXHlo

Question

A nightly job ingests data into a Delta Lake table using the following code: //IMG//

The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline. Which code snippet completes this function definition? def new_records():

Choices

  • A: return spark.readStream.table(“bronze”)
  • B: return spark.readStream.load(“bronze”)
  • C:
  • D: return spark.read.option(“readChangeFeed”, “true”).table (“bronze”)
  • E: