Questions and Answers

Question H15MG7w2KSjqnl96FurD

Question

A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.

Which approach can the data engineer take to identify the table that is dropping the records?

Choices

  • A: They can set up separate expectations for each table when developing their DLT pipeline.
  • B: They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.
  • C: They can set up DLT to notify them via email when records are dropped.
  • D: They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.

Question dj77z4lwOSYcRjblcFPJ

Question

What is used by Spark to record the offset range of the data being processed in each trigger in order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing?

Choices

  • A: Checkpointing and Write-ahead Logs
  • B: Replayable Sources and Idempotent Sinks
  • C: Write-ahead Logs and Idempotent Sinks
  • D: Checkpointing and Idempotent Sinks

Question R75OKQYdzOoHvmDwhR7b

Question

What describes the relationship between Gold tables and Silver tables?

Choices

  • A: Gold tables are more likely to contain aggregations than Silver tables.
  • B: Gold tables are more likely to contain valuable data than Silver tables.
  • C: Gold tables are more likely to contain a less refined view of data than Silver tables.
  • D: Gold tables are more likely to contain truthful data than Silver tables.

Question 9Gm25ynrYzW2IyOJxUlu

Question

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL. Which of the following commands could the data engineering team use to access sales in PySpark?

Choices

  • A: SELECT * FROM sales
  • B: There is no way to share data between PySpark and SQL.
  • C: spark.sql(“sales”)D. spark.delta.table(“sales”)
  • D: spark.table(“sales”)

Question JxCrj6gQstqtDXOGPCxg

Question

What describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Choices

  • A: CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.
  • B: CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.
  • C: CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.
  • D: CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.