Questions and Answers

Question APNewqvzdgkPndRXuDoF

Question

When evaluating the Ganglia Metrics for a given cluster with 3 executor nodes, which indicator would signal proper utilization of the VM’s resources?

Choices

  • A: The five Minute Load Average remains consistent/flat
  • B: CPU Utilization is around 75%
  • C: Network I/O never spikes
  • D: Total Disk Space remains constant

Question IzjBnSE4i7eJq7L832mS

Question

The data engineer is using Spark’s MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally?

Choices

  • A: On Heap Memory Usage is within 75% of Off Heap Memory Usage
  • B: The RDD Block Name includes the “*” annotation signaling a failure to cache
  • C: Size on Disk is > 0
  • D: The number of Cached Partitions > the number of Spark Partitions

Question ILMDoLROtbBiqkZ11jiE

Question

Review the following error traceback:

//IMG//

Which statement describes the error being raised?

Choices

  • A: There is a syntax error because the heartrate column is not correctly identified as a column.
  • B: There is no column in the table named heartrateheartrateheartrate
  • C: There is a type error because a column object cannot be multiplied.
  • D: There is a type error because a DataFrame object cannot be multiplied.

Question 8gdvOFDAHUWPskUuL7GW

Question

What is a method of installing a Python package scoped at the notebook level to all nodes in the currently active cluster?

Choices

  • A: Run source env/bin/activate in a notebook setup script
  • B: Install libraries from PyPI using the cluster UI
  • C: Use %pip install in a notebook cell
  • D: Use %sh pip install in a notebook cell

Question Fo1XqOAEzyJJi4Esz2pJ

Question

What is the first line of a Databricks Python notebook when viewed in a text editor?

Choices

  • A: %python
  • B: // Databricks notebook source
  • C: # Databricks notebook source
  • D: — Databricks notebook source