Questions and Answers

Question JzgN9me8Lptpl6ISTMsD

Question

Which statement describes Delta Lake Auto Compaction?

Choices

  • A: An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
  • B: Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
  • C: Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
  • D: Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
  • E: An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 128 MB.

Question 5O19TS9IYkdPvmKA2Tq2

Question

Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?

Choices

  • A: In the Executor’s log file, by grepping for “predicate push-down”
  • B: In the Stage’s Detail screen, in the Completed Stages table, by noting the size of data read from the Input column
  • C: In the Query Detail screen, by interpreting the Physical Plan
  • D: In the Delta Lake transaction log. by noting the column statistics

Question 2s4qdbmVnfpoSUcYFnuw

Question

A data engineer needs to capture pipeline settings from an existing setting in the workspace, and use them to create and version a JSON file to create a new pipeline.

Which command should the data engineer enter in a web terminal configured with the Databricks CLI?

Choices

  • A: Use list pipelines to get the specs for all pipelines; get the pipeline spec from the returned results; parse and use this to create a pipeline
  • B: Stop the existing pipeline; use the returned settings in a reset command
  • C: Use the get command to capture the settings for the existing pipeline; remove the pipeline_id and rename the pipeline; use this in a create command
  • D: Use the clone command to create a copy of an existing pipeline; use the get JSON command to get the pipeline definition; save this to git

Question orB34JO9sP88f3dof2O1

Question

Which REST API call can be used to review the notebooks configured to run as tasks in a multi-task job?

Choices

  • A: /jobs/runs/list
  • B: /jobs/list
  • C: /jobs/runs/get
  • D: /jobs/get

Question QLHg2dUpagOBFEL4qtbz

Question

A Data Engineer wants to run unit tests using common Python testing frameworks on Python functions defined across several Databricks notebooks currently used in production.

How can the data engineer run unit tests against functions that work with data in production?

Choices

  • A: Define and import unit test functions from a separate Databricks notebook
  • B: Define and unit test functions using Files in Repos
  • C: Run unit tests against non-production data that closely mirrors production
  • D: Define unit tests and functions within the same notebook