Questions and Answers

Question ZteZImx77lSHJs51Vigy

Question

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data. They run the following command:

DROP TABLE IF EXISTS my_table - While the object no longer appears when they run SHOW TABLES, the data files still exist. Which of the following describes why the data files still exist and the metadata files were deleted?

Choices

  • A: The table’s data was larger than 10 GB
  • B: The table’s data was smaller than 10 GB
  • C: The table was external
  • D: The table did not have a location
  • E: The table was managed

Question 5i1wMxJ7eBiNt7BTjvnD

Question

A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location. Which of the following data entities should the data engineer create?

Choices

  • A: Database
  • B: Function
  • C: View
  • D: Temporary view
  • E: Table

Question p9b7g6fqIxbPOw3ZGgwh

Question

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level. Which of the following tools can the data engineer use to solve this problem?

Choices

  • A: Unity Catalog
  • B: Data Explorer
  • C: Delta Lake
  • D: Delta Live Tables
  • E: Auto Loader

Question 9EBLqDOP5oWGRtAo7Nqr

Question

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE. The table is configured to run in Production mode using the Continuous Pipeline Mode. Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Choices

  • A: All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.
  • B: All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.
  • C: All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.
  • D: All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.
  • E: All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Question x9SRErNiiTJ0su46Snx4

Question

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

Choices

  • A: Checkpointing and Write-ahead Logs
  • B: Structured Streaming cannot record the offset range of the data being processed in each trigger.
  • C: Replayable Sources and Idempotent Sinks
  • D: Write-ahead Logs and Idempotent Sinks
  • E: Checkpointing and Idempotent Sinks