Questions and Answers

Question xaEQtKPQWV8nDENSCbXq

Question

A junior developer complains that the code in their notebook isn’t producing the correct results in the development environment. A shared screenshot reveals that while they’re using a notebook versioned with Databricks Repos, they’re using a personal branch that contains old logic. The desired branch named dev-2.3.9 is not available from the branch selection dropdown. Which approach will allow this developer to review the current logic for this notebook?

Choices

A: Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-2.3.9
B: Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.
C: Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch
D: Merge all changes back to the main branch in the remote Git repository and clone the repo again
E: Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1506718 by codebender

Upvotes: 1

Selected Answer: B First step is to pull from the latest commits

Comment 1294569 by benni_ale

Upvotes: 1

Selected Answer: B I would also say B but could anyone explain how to pick that branch if it is not available from dropdown?

Comment 1286663 by benni_ale

Upvotes: 1

Selected Answer: B I would say B

Comment 1224436 by imatheushenrique

Upvotes: 2

B. Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.

Comment 1128060 by AziLa

Upvotes: 1

correct ans is B

Comment 1121592 by Jay_98_11

Upvotes: 2

Selected Answer: B vote for B also

Comment 1044666 by sturcu

Upvotes: 1

Selected Answer: B B is correct

Question 20N1c7BqnikzdlqTkbxP

Question

A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor. When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

Choices

A: The five Minute Load Average remains consistent/flat
B: Bytes Received never exceeds 80 million bytes per second
C: Total Disk Space remains constant
D: Network I/O never spikes
E: Overall cluster CPU utilization is around 25%

answer?

Answer: E Answer_ET: E Community answer E (48%) D (33%) A (19%) Discussion

Comment 991509 by BrianNguyen95

Upvotes: 19

Option E: In a Spark cluster, the driver node is responsible for managing the execution of the Spark application, including scheduling tasks, managing the execution plan, and interacting with the cluster manager. If the overall cluster CPU utilization is low (e.g., around 25%), it may indicate that the driver node is not utilizing the available resources effectively and might be a bottleneck.

Comment 1364294 by Tedet

Upvotes: 2

Selected Answer: A When you see the “Five Minute Load Average” remain consistent or flat, it could indicate that the driver is under heavy load and is struggling to keep up with the workload. In the case of a Spark cluster, if the driver is handling too much work, it can become a bottleneck and prevent the overall job from progressing efficiently.

Comment 1332442 by srinivasa

Upvotes: 3

Selected Answer: A Consistent/Flat Five Minute Load Average: If the load average on the driver node remains consistent and does not fluctuate, it suggests that the driver is under constant, significant load. This could be a sign that the driver is performing a lot of work, potentially leading to a bottleneck.

Comment 1326817 by AlejandroU

Upvotes: 2

Selected Answer: E Answer E. A low CPU usage could indicate that the driver isn’t working as efficiently as expected, which can lead to underutilization of the cluster and slower processing times.

Comment 1318289 by JB90

Upvotes: 1

Selected Answer: E Only when the driver does all or most the work will the overall cluster CPU util be this low since the driver cpu is 25% of the overall cluster CPU amount

Comment 1303250 by nedlo

Upvotes: 2

Selected Answer: E bottleneck means data skew means one of the nodes is doing majority of work while other is idle, so E is correct

Comment 1299700 by m79590530

Upvotes: 1

Selected Answer: E D also means that Driver never send big data chunks to the Worker nodes but as it is not mentioned to be 0 then it has a constant flow of data going in & out between the Driver node and the Worker nodes. Therefore it is not a measure of Driver bottleneck. However Answer E means one of the 4 cluster nodes is always working at 100% which can not be other than the Driver node as it is always working and coordinating work across Executors.

Comment 1270130 by fe3b2fc

Upvotes: 2

Selected Answer: D Executors talk between each other and between nodes, if the code/driver is working as intended you would see a spike in I/O while transferring data. If the code/driver was the issue you would see a spike in CPU usage and little network traffic between nodes. The correct answer is D.

Comment 1227690 by lophonos

Upvotes: 1

Selected Answer: E E is correct

Comment 1143155 by guillesd

Upvotes: 1

Selected Answer: D If there’s no IO between driver and executor nodes then the executor nodes are not working

Comment 1108701 by Patito

Upvotes: 2

Selected Answer: D D seems to be right

Comment 1091957 by rok21

Upvotes: 1

Selected Answer: E E is correct

Comment 1091782 by azurelearn2020

Upvotes: 2

Selected Answer: E 25% indicates Cluster CPU under-utilized

Comment 1052869 by sturcu

Upvotes: 3

Selected Answer: E If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized

Comment 1044878 by sturcu

Upvotes: 4

Selected Answer: D If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized

Question rruD0fJCk76pvMu50jx7

Question

Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?

Choices

A: In the Executor’s log file, by grepping for “predicate push-down”
B: In the Stage’s Detail screen, in the Completed Stages table, by noting the size of data read from the Input column
C: In the Storage Detail screen, by noting which RDDs are not stored on disk
D: In the Delta Lake transaction log. by noting the column statistics
E: In the Query Detail screen, by interpreting the Physical Plan

answer?

Answer: E Answer_ET: E Community answer E (83%) B (17%) Discussion

Comment 1364299 by Tedet

Upvotes: 1

Selected Answer: E Predicate push-down is an optimization where conditions (such as filters) are pushed as close to the data source as possible (often to the database or file system level), reducing the amount of data read and processed. If predicate push-down isn’t being leveraged, it can result in reading unnecessary data, leading to performance degradation. Execute a query ⇒ Click View and go to Spark UI ⇒ Navigate to SQL/DataFrame tab in SparkUI ⇒ Click on any stage ⇒ Navigate to details to find Physical Plan

Comment 1353918 by shaswat1404

Upvotes: 1

Selected Answer: B when predicated pushdown is working properly, the amount of data read should be much lower because the data source is able to filter out the rows at read time based on the query predicates. if predicate pushdown is not levaraged, stages might read a much larger volume of data than necessary, which can be observed in the input column in the stage detail screen therefore B is the correct option not A : executor logs might contain some information, but they are niot the most direct way to assess predicate push-down performance not C : used to check RDD caching and persistence, not predicate push-down not D : it holds meta data and statistics but is not viewed via the spark UI for diagnosing query performance not E : while physical plan in the query detail screen might filter push-down, interpreting it requires more expertise, and the metric on the input data size(option B) is more straight forward indicator.

Comment 1306886 by benni_ale

Upvotes: 1

Selected Answer: E E

Comment 1293856 by dd1192d

Upvotes: 2

Selected Answer: E E is correct : https://docs.datastax.com/en/dse/6.9/spark/predicate-push-down.html

Comment 1143091 by P1314

Upvotes: 1

Selected Answer: E Query plan. Correct is E

Question OMDtthj31Uvaf0fRLYnX

Question

Review the following error traceback: //IMG//

Which statement describes the error being raised?

Choices

A: The code executed was PySpark but was executed in a Scala notebook.
B: There is no column in the table named heartrateheartrateheartrate
C: There is a type error because a column object cannot be multiplied.
D: There is a type error because a DataFrame object cannot be multiplied.
E: There is a syntax error because the heartrate column is not correctly identified as a column.

answer?

Answer: B Answer_ET: B Community answer B (75%) E (25%) Discussion

Comment 1005470 by CertPeople

Upvotes: 8

Selected Answer: B It’s B, there is no column with that name

Comment 1091960 by rok21

Upvotes: 5

Selected Answer: E E is correct

Comment 1143185 by guillesd

Upvotes: 2

Selected Answer: B It’s B. Regarding E, a syntax error would mean that the query is not valid due to a wrongfully written SQL statement. However, this is not the case. The column just does not exist.

Comment 1121989 by Jay_98_11

Upvotes: 1

Selected Answer: B https://sparkbyexamples.com/spark/spark-cannot-resolve-given-input-columns/

Comment 1088715 by Gulenur_GS

Upvotes: 2

the answer is E, because df.select(3*df[‘heartrate’]).show() perfectly returns

Comment 1088603 by Gulenur

Upvotes: 2

Answer is E df.select(3*df[‘heartrate’]) returns perfect result without error

Comment 1066322 by npc0001

Upvotes: 2

Selected Answer: B Answer B

Comment 1066034 by Dileepvikram

Upvotes: 2

Answer is B

Comment 1044884 by sturcu

Upvotes: 2

Selected Answer: B No such column found

Question 84GCbbeykbwleCQ4zbKb

Question

Which distribution does Databricks support for installing custom Python code packages?

Choices

A: sbt
B: CRANC. npm
C: Wheels
D: jars

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1299558 by benni_ale

Upvotes: 1

Selected Answer: D I think D is correct

Comment 1159649 by hal2401me

Upvotes: 4

Selected Answer: D https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-python-wheels-in-workflows

Comment 1099944 by sodere

Upvotes: 1

Selected Answer: D https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-python-wheels-in-workflows

Comment 1099478 by alexvno

Upvotes: 2

Selected Answer: D Wheels should be ok

vuthanhdatt's Second Brain

Explorer

30

Questions and Answers

Question xaEQtKPQWV8nDENSCbXq

Question

Choices

Comment 1506718 by codebender

Comment 1294569 by benni_ale

Comment 1286663 by benni_ale

Comment 1224436 by imatheushenrique

Comment 1128060 by AziLa

Comment 1121592 by Jay_98_11

Comment 1044666 by sturcu

Question 20N1c7BqnikzdlqTkbxP

Question

Choices

Comment 991509 by BrianNguyen95

Comment 1364294 by Tedet

Comment 1332442 by srinivasa

Comment 1326817 by AlejandroU

Comment 1318289 by JB90

Comment 1303250 by nedlo

Comment 1299700 by m79590530

Comment 1270130 by fe3b2fc

Comment 1227690 by lophonos

Comment 1143155 by guillesd

Comment 1108701 by Patito

Comment 1091957 by rok21

Comment 1091782 by azurelearn2020

Comment 1052869 by sturcu

Comment 1044878 by sturcu

Question rruD0fJCk76pvMu50jx7

Question

Choices

Comment 1364299 by Tedet

Comment 1353918 by shaswat1404

Comment 1306886 by benni_ale

Comment 1293856 by dd1192d

Comment 1143091 by P1314

Question OMDtthj31Uvaf0fRLYnX

Question

Choices

Comment 1005470 by CertPeople

Comment 1091960 by rok21

Comment 1143185 by guillesd

Comment 1121989 by Jay_98_11

Comment 1088715 by Gulenur_GS

Comment 1088603 by Gulenur

Comment 1066322 by npc0001

Comment 1066034 by Dileepvikram

Comment 1044884 by sturcu

Question 84GCbbeykbwleCQ4zbKb

Question

Choices

Comment 1299558 by benni_ale

Comment 1159649 by hal2401me

Comment 1099944 by sodere

Comment 1099478 by alexvno

Graph View

Table of Contents