Questions and Answers

Question M9TBKjpsMuwBYBlFAOqc

Question

An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code: df = spark.read.format(“parquet”).load(f”/mnt/source/(date)”) Which code block should be used to create the date Python variable used in the above code block?

Choices

  • A: date = spark.conf.get(“date”)
  • B: input_dict = input() date= input_dict[“date”]
  • C: import sys date = sys.argv[1]
  • D: date = dbutils.notebooks.getParam(“date”)
  • E: dbutils.widgets.text(“date”, “null”) date = dbutils.widgets.get(“date”)

Question sk2J5xKeNzrykM27zMEC

Question

A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3 Which statement describes how the Delta engine identifies which files to load?

Choices

  • A: All records are cached to an operational database and then the filter is applied
  • B: The Parquet file footers are scanned for min and max statistics for the latitude column
  • C: All records are cached to attached storage and then the filter is applied
  • D: The Delta log is scanned for min and max statistics for the latitude column
  • E: The Hive metastore is scanned for min and max statistics for the latitude column

Question 7wcwoe7HFa64NKQXGWGy

Question

The data engineering team has been tasked with configuring connections to an external database that does not have a supported native connector with Databricks. The external database already has data security configured by group membership. These groups map directly to user groups already created in Databricks that represent various teams within the company.

A new login credential has been created for each group in the external database. The Databricks Utilities Secrets module will be used to make these credentials available to Databricks users.

Assuming that all the credentials are configured correctly on the external database and group membership is properly configured on Databricks, which statement describes how teams can be granted the minimum necessary access to using these credentials?

Choices

  • A: “Manage” permissions should be set on a secret key mapped to those credentials that will be used by a given team.
  • B: “Read” permissions should be set on a secret key mapped to those credentials that will be used by a given team.
  • C: “Read” permissions should be set on a secret scope containing only those credentials that will be used by a given team.
  • D: “Manage” permissions should be set on a secret scope containing only those credentials that will be used by a given team. No additional configuration is necessary as long as all users are configured as administrators in the workspace where secrets have been added.

Question VeABWHbIxbNNJR0z2ajA

Question

Which indicators would you look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally? Assume you are using Spark’s MEMORY_ONLY storage level.

Choices

  • A: Size on Disk is < Size in Memory
  • B: The RDD Block Name includes the “*” annotation signaling a failure to cache
  • C: Size on Disk is > 0
  • D: The number of Cached Partitions > the number of Spark Partitions
  • E: On Heap Memory Usage is within 75% of Off Heap Memory Usage

Question CNRCo07OuYSxwtCFTb2l

Question

What is the first line of a Databricks Python notebook when viewed in a text editor?

Choices

  • A: %python
  • B: // Databricks notebook source
  • C: # Databricks notebook source
  • D: — Databricks notebook source
  • E: # MAGIC %python