Questions and Answers

Question rGkP3h3op3fQrXe2VzHm

Question

A Delta table of weather records is partitioned by date and has the below schema:

date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT

To find all the records from within the Arctic Circle, you execute a query with the below filter:

latitude > 66.3

Which statement describes how the Delta engine identifies which files to load?

Choices

  • A: All records are cached to an operational database and then the filter is applied
  • B: The Parquet file footers are scanned for min and max statistics for the latitude column
  • C: The Hive metastore is scanned for min and max statistics for the latitude column
  • D: The Delta log is scanned for min and max statistics for the latitude column

Question B9NUAzbbo6gLHRxTmJHP

Question

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device. Streaming DataFrame df has the following schema: “device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT” Code block: //IMG//

Choose the response that correctly fills in the blank within the code block to complete this task.

Choices

  • A: to_interval(“event_time”, “5 minutes”).alias(“time”)
  • B: window(“event_time”, “5 minutes”).alias(“time”)
  • C: “event_time”
  • D: window(“event_time”, “10 minutes”).alias(“time”)
  • E: lag(“event_time”, “10 minutes”).alias(“time”)

Question pKSF1v56p4M4zw9t2dDz

Question

A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create.

//IMG//

Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?

Choices

  • A: The logic defined in the referenced notebook will be executed three times on the referenced existing all purpose cluster.
  • B: The logic defined in the referenced notebook will be executed three times on new clusters with the configurations of the provided cluster ID.
  • C: Three new jobs named “Ingest new data” will be defined in the workspace, but no jobs will be executed.
  • D: One new job named “Ingest new data” will be defined in the workspace, but it will not be executed.

Question u2MMMuckvN64axxtNdMh

Question

A view is registered with the following code:

//IMG//

Both users and orders are Delta Lake tables.

Which statement describes the results of querying recent_orders?

Choices

  • A: The versions of each source table will be stored in the table transaction log; query results will be saved to DBFS with each query.
  • B: All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.
  • C: All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query finishes.
  • D: All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query began.

Question V0ARoNS3VX2DAeGQdG4t

Question

A data engineer is performing a join operation to combine values from a static userLookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

Choices

  • A: userLookup.join(streamingDF, [“user_id”], how=“right”)
  • B: streamingDF.join(userLookup, [“user_id”], how=“inner”)
  • C: userLookup.join(streamingDF, [“user_id”), how=“inner”)
  • D: userLookup.join(streamingDF, [“user_id”], how=“left”)