Questions and Answers

Question xaEQtKPQWV8nDENSCbXq

Question

A junior developer complains that the code in their notebook isn’t producing the correct results in the development environment. A shared screenshot reveals that while they’re using a notebook versioned with Databricks Repos, they’re using a personal branch that contains old logic. The desired branch named dev-2.3.9 is not available from the branch selection dropdown. Which approach will allow this developer to review the current logic for this notebook?

Choices

  • A: Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-2.3.9
  • B: Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.
  • C: Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch
  • D: Merge all changes back to the main branch in the remote Git repository and clone the repo again
  • E: Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository

Question 20N1c7BqnikzdlqTkbxP

Question

A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor. When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

Choices

  • A: The five Minute Load Average remains consistent/flat
  • B: Bytes Received never exceeds 80 million bytes per second
  • C: Total Disk Space remains constant
  • D: Network I/O never spikes
  • E: Overall cluster CPU utilization is around 25%

Question rruD0fJCk76pvMu50jx7

Question

Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?

Choices

  • A: In the Executor’s log file, by grepping for “predicate push-down”
  • B: In the Stage’s Detail screen, in the Completed Stages table, by noting the size of data read from the Input column
  • C: In the Storage Detail screen, by noting which RDDs are not stored on disk
  • D: In the Delta Lake transaction log. by noting the column statistics
  • E: In the Query Detail screen, by interpreting the Physical Plan

Question OMDtthj31Uvaf0fRLYnX

Question

Review the following error traceback: //IMG//

Which statement describes the error being raised?

Choices

  • A: The code executed was PySpark but was executed in a Scala notebook.
  • B: There is no column in the table named heartrateheartrateheartrate
  • C: There is a type error because a column object cannot be multiplied.
  • D: There is a type error because a DataFrame object cannot be multiplied.
  • E: There is a syntax error because the heartrate column is not correctly identified as a column.

Question 84GCbbeykbwleCQ4zbKb

Question

Which distribution does Databricks support for installing custom Python code packages?

Choices

  • A: sbt
  • B: CRANC. npm
  • C: Wheels
  • D: jars