Questions and Answers

Question M9TBKjpsMuwBYBlFAOqc

Question

An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code: df = spark.read.format(“parquet”).load(f”/mnt/source/(date)”) Which code block should be used to create the date Python variable used in the above code block?

Choices

A: date = spark.conf.get(“date”)
B: input_dict = input() date= input_dict[“date”]
C: import sys date = sys.argv[1]
D: date = dbutils.notebooks.getParam(“date”)
E: dbutils.widgets.text(“date”, “null”) date = dbutils.widgets.get(“date”)

answer?

Answer: E Answer_ET: E Community answer E (96%) 4% Discussion

Comment 1173064 by hal2401me

Upvotes: 10

Selected Answer: E dbutils.widget. Just passed the exam with score >80%. examtopics covers about 90% of questions. there were 5 questions I didn’t see here in examtopics. But friends, you need to look at the discussions, and do test yourself. many answers provided here, even most voted answer, does NOT exists anymore in the exam - not the question, but the answer. Wish you all good luck, friends!

Comment 1410425 by ultimomassimo

Upvotes: 1

Selected Answer: E E 100% There is no such thing as dbutils.notebooks.getParam, so no idea why some halfwits suggest D…

Comment 1361428 by shoaibmohammed3

Upvotes: 1

Selected Answer: E dbutils.widgets is where we store all the param and use .get to fetch the params

Comment 1358833 by johnserafim

Upvotes: 1

Selected Answer: D D is correct!

The question states that the upstream system passes the date as a parameter to the Databricks Jobs API. In Databricks, when a parameter is passed to a notebook via the Jobs API, it can be retrieved using the dbutils.notebooks.getParam() method.

Option D directly retrieves the parameter value using this method, which is the correct approach for this scenario.

Comment 1351567 by EelkeV

Upvotes: 1

Selected Answer: E It is the way to fill in a parameter in a notebook

Comment 1335691 by HairyTorso

Upvotes: 2

Selected Answer: E Around question #130 they just repeat themselves. So it’s not 226 but around 130… Shame

Comment 1290627 by akashdesarda

Upvotes: 1

Selected Answer: E Jobs API allows to sending parameters via jobs parameter. This parameter must have the same notebook params. Eventually, it can be read using dbutils.widgets.get

Comment 1265036 by HorskKos

Upvotes: 4

E is correct because: A - gets configuration of a spark session B - gets a value from a manual input - non relevant for the job run C - sys.argv - gets parameters, which were used to run a Pyrhon script from CMD - completely non-related D - haven’t found this function on the web at all, assume that it doesn’t exist Therefore E is correct, though it’s a bad practice to type a date as a parameter, it’s better to get it with datetime library and then use it in the code

Comment 1247581 by Shailly

Upvotes: 1

Answer is E. Even though the value is passed from a upstream system, you can create parameters using widgets inside notebook and use the value as an input from the databricks jobs API.

Comment 1226795 by Isio05

Upvotes: 1

Selected Answer: E Widgets are used to create parameters in notebook that can be then utilized by e.g. jobs

Comment 1222432 by imatheushenrique

Upvotes: 1

E. E. dbutils.widgets.text(“date”, “null”) date = dbutils.widgets.get(“date”)

Comment 1198680 by AziLa

Upvotes: 1

correct ans is E

Comment 1195398 by Sosicha

Upvotes: 1

Are you reading the question? It asks about an upstream system that has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. Upstream system usually don’t use widgets. Widgets they are made for humans. Only C and D are correct but D is better so D.

Comment 1159633 by hal2401me

Upvotes: 1

Selected Answer: E vote for E dbutils.widget

Comment 1128049 by AziLa

Upvotes: 1

Correct Ans is E

Comment 1121585 by Jay_98_11

Upvotes: 2

Selected Answer: E E is correct

Comment 1113402 by RafaelCFC

Upvotes: 2

Selected Answer: E In https://docs.databricks.com/en/notebooks/notebook-workflows.html#dbutilsnotebook-api the “run Example” is an equivalent use-case as E.

Comment 1102660 by kz_data

Upvotes: 2

Selected Answer: E E is correct

Comment 1027115 by chokthewa

Upvotes: 1

I think D is correct answer, refer to https://docs.databricks.com/en/notebooks/notebook-workflows.html#dbutilsnotebook-api

Comment 981132 by BrianNguyen95

Upvotes: 3

E is correct answer

Comment 977584 by lokvamsi

Upvotes: 1

Selected Answer: E Correct. Ans: E

Comment 969962 by Happy_Prince

Upvotes: 2

Correct

Question sk2J5xKeNzrykM27zMEC

Question

A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3 Which statement describes how the Delta engine identifies which files to load?

Choices

A: All records are cached to an operational database and then the filter is applied
B: The Parquet file footers are scanned for min and max statistics for the latitude column
C: All records are cached to attached storage and then the filter is applied
D: The Delta log is scanned for min and max statistics for the latitude column
E: The Hive metastore is scanned for min and max statistics for the latitude column

answer?

Answer: D Answer_ET: D Community answer D (90%) 10% Discussion

Comment 988204 by taif12340

Upvotes: 22

Answer D:

In the Transaction log, Delta Lake captures statistics for each data file of the table. These statistics indicate per file:

Total number of records

Minimum value in each column of the first 32 columns of the table

Maximum value in each column of the first 32 columns of the table

Null value counts for in each column of the first 32 columns of the table

When a query with a selective filter is executed against the table, the query optimizer uses these statistics to generate the query result. it leverages them to identify data files that may contain records matching the conditional filter. For the SELECT query in the question, The transaction log is scanned for min and max statistics for the price column

Comment 1365581 by johnserafim

Upvotes: 2

Selected Answer: B B is correct!

Delta Lake stores min/max statistics for each column in the Parquet file footers. The engine scans these footers to determine if a file contains any data that satisfies the latitude > 66.3 condition. If the minimum latitude in a file is greater than 66.3, the file is loaded. If the maximum latitude is less than or equal to 66.3, the file is skipped.

Comment 1290673 by akashdesarda

Upvotes: 3

Selected Answer: D Above mentioned points are correct. If the table was just parquet table then parquet file footer have been used. But since this is Delta table, then delta log is used to scan & skip files. It uses stats written in in transaction log.

Comment 1268752 by AndreFR

Upvotes: 2

Answer D :

Delta data skipping automatically collects the stats (min, max, etc.) for the first 32 columns for each underlying Parquet file when you write data into a Delta table. Databricks takes advantage of this information (minimum and maximum values) at query time to skip unnecessary files in order to speed up the queries.

https://www.databricks.com/discover/pages/optimize-data-workloads-guide#delta-data

Comment 1267459 by saravanan289

Upvotes: 2

Selected Answer: D Delta table stores file statistics in transaction log

Comment 1237632 by 03355a2

Upvotes: 2

Selected Answer: D No explanation needed, this is where the information is stored.

Comment 1224441 by imatheushenrique

Upvotes: 1

D. The Delta log is scanned for min and max statistics for the latitude column

Comment 1213855 by coercion

Upvotes: 1

Selected Answer: D Delta log collects statistics like min value, max value, no of records, no of files for each transaction that happens on the table for the first 32 columns (default value)

Comment 1204721 by Tayari

Upvotes: 1

Selected Answer: D D is the answer

Comment 1183436 by arik90

Upvotes: 1

Selected Answer: D Based on Docu is D I don’t know why here is showing B

Comment 1172271 by alexvno

Upvotes: 1

Selected Answer: D Delta log first

Comment 1170375 by DavidRou

Upvotes: 1

Selected Answer: D Statistics on first 32 columns of a table are computed and written in the Delta Log by default.

Comment 1162470 by vikram12apr

Upvotes: 1

Selected Answer: D D is the right answer

Comment 1161000 by Curious76

Upvotes: 1

Selected Answer: D D is the answer

Comment 1152224 by kkravets

Upvotes: 1

Selected Answer: D D is correct one

Comment 1145974 by RiktRikt007

Upvotes: 2

I checked the delta log, and it dose store stat, stats”:”{“numRecords”:1,“minValues”:{“id”:1,“name”:“one”,“age”:11},“maxValues”:{“id”:1,“name”:“one”,“age”:11},“nullCount”:{“id”:0,“name”:0,“age”:0}}“

Comment 1128075 by AziLa

Upvotes: 1

correct ans is D

Comment 1121654 by Jay_98_11

Upvotes: 1

Selected Answer: D D for sure

Comment 1118554 by kz_data

Upvotes: 1

Selected Answer: D I think the correct answer is D

Comment 1116057 by ranith

Upvotes: 1

_delta_log contains the max and min of each column for the first 30 odd columns in a table for each partition. Also there is nothing called parquet file footers. Correct answer is D.

Comment 1111031 by lexaneon

Upvotes: 1

D https://www.databricks.com/discover/pages/optimize-data-workloads-guide#:~:text=Delta%20data%20skipping%20automatically%20collects,to%20speed%20up%20the%20queries.

Comment 1102119 by chowchowchow

Upvotes: 1

chatgpt vote b as well

Comment 1086046 by hamzaKhribi

Upvotes: 1

For me D is correct, as statistics are collected in the delta loge of the first 32 columns

Comment 1060780 by BIKRAM063

Upvotes: 1

D is correct, Transaction log will be scanned

Comment 1043874 by jms309

Upvotes: 1

Selected Answer: D D is the correct answer

Comment 991545 by Eertyy

Upvotes: 3

D is correct answer

Comment 970495 by tusharl

Upvotes: 3

D is correct Answer

Question 7wcwoe7HFa64NKQXGWGy

Question

The data engineering team has been tasked with configuring connections to an external database that does not have a supported native connector with Databricks. The external database already has data security configured by group membership. These groups map directly to user groups already created in Databricks that represent various teams within the company.

A new login credential has been created for each group in the external database. The Databricks Utilities Secrets module will be used to make these credentials available to Databricks users.

Assuming that all the credentials are configured correctly on the external database and group membership is properly configured on Databricks, which statement describes how teams can be granted the minimum necessary access to using these credentials?

Choices

A: “Manage” permissions should be set on a secret key mapped to those credentials that will be used by a given team.
B: “Read” permissions should be set on a secret key mapped to those credentials that will be used by a given team.
C: “Read” permissions should be set on a secret scope containing only those credentials that will be used by a given team.
D: “Manage” permissions should be set on a secret scope containing only those credentials that will be used by a given team. No additional configuration is necessary as long as all users are configured as administrators in the workspace where secrets have been added.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1141644 by vctrhugo

Upvotes: 4

Selected Answer: C In Databricks, secret scopes are used to manage and organize secrets. By setting “Read” permissions on a secret scope containing the credentials, you allow the team to access the necessary credentials without granting unnecessary privileges. This approach ensures that the teams have the minimum necessary access to the credentials required for connecting to the external database. “Manage” permissions would provide more access than needed for just using the credentials.

Option A and D suggest setting permissions on individual secret keys, which might work, but using a secret scope for organizational purposes is a cleaner and more scalable solution.

Comment 1136731 by Somesh512

Upvotes: 2

Selected Answer: C Access is at scope level and not key level

Comment 1086019 by petrv

Upvotes: 1

Selected Answer: C In summary, while technically feasible, setting “Read” permissions on a secret key might not be the most efficient or scalable solution when dealing with multiple teams and their corresponding credentials. Using secret scopes provides a more organized and maintainable approach for managing secrets in Databricks.

Comment 1080934 by Enduresoul

Upvotes: 3

Selected Answer: C Answer C is correct: https://docs.databricks.com/en/security/auth-authz/access-control/secret-acl.html#secret-access-control “Access control for secrets is managed at the secret scope level”

Question VeABWHbIxbNNJR0z2ajA

Question

Which indicators would you look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally? Assume you are using Spark’s MEMORY_ONLY storage level.

Choices

A: Size on Disk is < Size in Memory
B: The RDD Block Name includes the “*” annotation signaling a failure to cache
C: Size on Disk is > 0
D: The number of Cached Partitions > the number of Spark Partitions
E: On Heap Memory Usage is within 75% of Off Heap Memory Usage

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1141643 by vctrhugo

Upvotes: 7

Selected Answer: C C. Size on Disk is > 0

When using Spark’s MEMORY_ONLY storage level, the ideal scenario is that the data is fully cached in memory, and the Size on Disk should be 0 (indicating that the data is not spilled to disk). If the Size on Disk is greater than 0, it suggests that some data has been spilled to disk, which can lead to degraded performance as reading from disk is slower than reading from memory.

Comment 1323543 by benni_ale

Upvotes: 1

Selected Answer: C I think is C

Comment 1226802 by Isio05

Upvotes: 2

Selected Answer: C In this case any data on disk means that cache is not performing optimally

Question CNRCo07OuYSxwtCFTb2l

Question

What is the first line of a Databricks Python notebook when viewed in a text editor?

Choices

A: %python
B: // Databricks notebook source
C: # Databricks notebook source
D: — Databricks notebook source
E: # MAGIC %python

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1119503 by bacckom

Upvotes: 8

Selected Answer: C Python: # Databricks notebook source SQL: — Databricks notebook source Scala: // Databricks notebook source R: # Databricks notebook source

Comment 1237014 by Ati1362

Upvotes: 2

Selected Answer: C c is the answer

Comment 1111556 by divingbell17

Upvotes: 2

Selected Answer: C https://docs.databricks.com/en/notebooks/notebook-export-import.html#import-a-file-and-convert-it-to-a-notebook

Comment 1076842 by aragorn_brego

Upvotes: 2

Selected Answer: C This is the correct line that you would find at the top of a Databricks notebook when viewed in a text editor, especially for Python notebooks. The # symbol is used for comments in Python, and the comment # Databricks notebook source is used by Databricks to indicate the start of the notebook’s source code in the plain text file.

These lines are comments in the respective languages (Scala uses // and SQL uses — for single-line comments) and indicate the beginning of the Databricks notebook content in the text file.

Comment 1075100 by AWSMaster69

Upvotes: 1

Selected Answer: C The Answer is C, Just downloaded a notebook from Databricks and viewed it in a text editor.

Comment 1071617 by 60ties

Upvotes: 1

Selected Answer: C Answer is C

Comment 1071615 by 60ties

Upvotes: 1

// Databricks notebook source - Scala

Databricks notebook source - Python

— Databricks notebook source - SQL Answer is C

vuthanhdatt's Second Brain

Explorer

1

Questions and Answers

Question M9TBKjpsMuwBYBlFAOqc

Question

Choices

Comment 1173064 by hal2401me

Comment 1410425 by ultimomassimo

Comment 1361428 by shoaibmohammed3

Comment 1358833 by johnserafim

Comment 1351567 by EelkeV

Comment 1335691 by HairyTorso

Comment 1290627 by akashdesarda

Comment 1265036 by HorskKos

Comment 1247581 by Shailly

Comment 1226795 by Isio05

Comment 1222432 by imatheushenrique

Comment 1198680 by AziLa

Comment 1195398 by Sosicha

Comment 1159633 by hal2401me

Comment 1128049 by AziLa

Comment 1121585 by Jay_98_11

Comment 1113402 by RafaelCFC

Comment 1102660 by kz_data

Comment 1027115 by chokthewa

Comment 981132 by BrianNguyen95

Comment 977584 by lokvamsi

Comment 969962 by Happy_Prince

Question sk2J5xKeNzrykM27zMEC

Question

Choices

Comment 988204 by taif12340

Comment 1365581 by johnserafim

Comment 1290673 by akashdesarda

Comment 1268752 by AndreFR

Comment 1267459 by saravanan289

Comment 1237632 by 03355a2

Comment 1224441 by imatheushenrique

Comment 1213855 by coercion

Comment 1204721 by Tayari

Comment 1183436 by arik90

Comment 1172271 by alexvno

Comment 1170375 by DavidRou

Comment 1162470 by vikram12apr

Comment 1161000 by Curious76

Comment 1152224 by kkravets

Comment 1145974 by RiktRikt007

Comment 1128075 by AziLa

Comment 1121654 by Jay_98_11

Comment 1118554 by kz_data

Comment 1116057 by ranith

Comment 1111031 by lexaneon

Comment 1102119 by chowchowchow

Comment 1086046 by hamzaKhribi

Comment 1060780 by BIKRAM063

Comment 1043874 by jms309

Comment 991545 by Eertyy

Comment 970495 by tusharl

Question 7wcwoe7HFa64NKQXGWGy

Question

Choices

Comment 1141644 by vctrhugo

Comment 1136731 by Somesh512

Comment 1086019 by petrv

Comment 1080934 by Enduresoul

Question VeABWHbIxbNNJR0z2ajA

Question

Choices

Comment 1141643 by vctrhugo

Comment 1323543 by benni_ale

Comment 1226802 by Isio05

Question CNRCo07OuYSxwtCFTb2l

Question

Choices

Comment 1119503 by bacckom

Comment 1237014 by Ati1362

Comment 1111556 by divingbell17

Comment 1076842 by aragorn_brego

Comment 1075100 by AWSMaster69