Questions and Answers

Question slSue2y8VE49lfg0pryf

Question

Which statement regarding Spark configuration on the Databricks platform is true?

Choices

  • A: The Databricks REST API can be used to modify the Spark configuration properties for an interactive cluster without interrupting jobs currently running on the cluster.
  • B: Spark configurations set within a notebook will affect all SparkSessions attached to the same interactive cluster.
  • C: Spark configuration properties can only be set for an interactive cluster by creating a global init script.
  • D: Spark configuration properties set for an interactive cluster with the Clusters UI will impact all notebooks attached to that cluster.
  • E: When the same Spark configuration property is set for an interactive cluster and a notebook attached to that cluster, the notebook setting will always be ignored.

Question tV7c3Y99bBZympTgKibZ

Question

A developer has successfully configured their credentials for Databricks Repos and cloned a remote Git repository. They do not have privileges to make changes to the main branch, which is the only branch currently visible in their workspace.

Which approach allows this user to share their code updates without the risk of overwriting the work of their teammates?

Choices

  • A: Use Repos to checkout all changes and send the git diff log to the team.
  • B: Use Repos to create a fork of the remote repository, commit all changes, and make a pull request on the source repository.
  • C: Use Repos to pull changes from the remote Git repository; commit and push changes to a branch that appeared as changes were pulled.
  • D: Use Repos to merge all differences and make a pull request back to the remote repository.
  • E: Use Repos to create a new branch, commit all changes, and push changes to the remote Git repository.

Question 7tBseGd9qKWWrnsSJpYV

Question

In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both DEEP and SHALLOW CLONE, development tables are created using SHALLOW CLONE.

A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that VACUUM was run the day before.

Which statement describes why the cloned tables are no longer working?

Choices

  • A: Because Type 1 changes overwrite existing records, Delta Lake cannot guarantee data consistency for cloned tables.
  • B: Running VACUUM automatically invalidates any shallow clones of a table; DEEP CLONE should always be used when a cloned table will be repeatedly queried.
  • C: Tables created with SHALLOW CLONE are automatically deleted after their default retention threshold of 7 days.
  • D: The metadata created by the CLONE operation is referencing data files that were purged as invalid by the VACUUM command.
  • E: The data files compacted by VACUUM are not tracked by the cloned metadata; running REFRESH on the cloned table will pull in recent changes.

Question kUZpLaiA7Htn8FA1jRtV

Question

You are performing a join operation to combine values from a static userLookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

Choices

  • A: userLookup.join(streamingDF, [“userid”], how=“inner”)
  • B: streamingDF.join(userLookup, [“user_id”], how=“outer”)
  • C: streamingDF.join(userLookup, [“user_id”], how=“left”)
  • D: streamingDF.join(userLookup, [“userid”], how=“inner”)
  • E: userLookup.join(streamingDF, [“user_id”], how=“right”)

Question 659pbR3NuntI8jfmEI8H

Question

Spill occurs as a result of executing various wide transformations. However, diagnosing spill requires one to proactively look for key indicators.

Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?

Choices

  • A: Query’s detail screen and Job’s detail screen
  • B: Stage’s detail screen and Executor’s log files
  • C: Driver’s and Executor’s log files
  • D: Executor’s detail screen and Executor’s log files
  • E: Stage’s detail screen and Query’s detail screen