Questions and Answers
Question APNewqvzdgkPndRXuDoF
Question
When evaluating the Ganglia Metrics for a given cluster with 3 executor nodes, which indicator would signal proper utilization of the VM’s resources?
Choices
- A: The five Minute Load Average remains consistent/flat
- B: CPU Utilization is around 75%
- C: Network I/O never spikes
- D: Total Disk Space remains constant
answer?
Answer: B Answer_ET: B Discussion
Comment 1222410 by imatheushenrique
- Upvotes: 3
B. This level of CPU utilization indicates that the cluster is being used without being underutilized.
Question IzjBnSE4i7eJq7L832mS
Question
The data engineer is using Spark’s MEMORY_ONLY storage level.
Which indicators should the data engineer look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally?
Choices
- A: On Heap Memory Usage is within 75% of Off Heap Memory Usage
- B: The RDD Block Name includes the “*” annotation signaling a failure to cache
- C: Size on Disk is > 0
- D: The number of Cached Partitions > the number of Spark Partitions
answer?
Answer: C Answer_ET: C Community answer C (86%) 14% Discussion
Comment 1315727 by RuiCarvalhoDEV
- Upvotes: 1
Selected Answer: C is MEMORY_ONLY
Comment 1257426 by Hadiler
- Upvotes: 2
Selected Answer: C C is correct
Comment 1238184 by 03355a2
- Upvotes: 1
Selected Answer: C It’s simple, if MEMORY_ONLY is used, anything spilled to disk would indicate a problem.
Comment 1229350 by hpkr
- Upvotes: 2
Selected Answer: C C is correct here
Comment 1222919 by Freyr
- Upvotes: 1
Selected Answer: B Correct Answer: B Option B, is the most correct and relevant choice for an indicator that a cached table is not performing optimally in a MEMORY_ONLY scenario. If an RDD block includes a ”?” annotation, it strongly suggests issues with caching, which would directly impact the performance and expected behavior of MEMORY_ONLY caching. This indication points to a failure to cache the data entirely in memory, which is what MEMORY_ONLY intends to do.
Option C, could also be a relevant indicator in general caching scenarios (e.g., MEMORY_AND_DISK), but it contradicts the MEMORY_ONLY setting directly. Therefore, Option B is chosen based on the specific storage level described.
Comment 1222408 by imatheushenrique
- Upvotes: 1
B. This annotation says that some partitions of the cached data have been spilled to disk because there wasn’t enough memory to keep them.
Comment 1221178 by MDWPartners
- Upvotes: 2
I would say C
Question ILMDoLROtbBiqkZ11jiE
Question
Review the following error traceback:
//IMG//
Which statement describes the error being raised?
Choices
- A: There is a syntax error because the heartrate column is not correctly identified as a column.
- B: There is no column in the table named heartrateheartrateheartrate
- C: There is a type error because a column object cannot be multiplied.
- D: There is a type error because a DataFrame object cannot be multiplied.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1300614 by m79590530
- Upvotes: 1
Selected Answer: B The final error clearly states that such column name can not be resolved in the source dataframe schema/structure
Question 8gdvOFDAHUWPskUuL7GW
Question
What is a method of installing a Python package scoped at the notebook level to all nodes in the currently active cluster?
Choices
- A: Run source env/bin/activate in a notebook setup script
- B: Install libraries from PyPI using the cluster UI
- C: Use %pip install in a notebook cell
- D: Use %sh pip install in a notebook cell
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1300623 by m79590530
- Upvotes: 1
Selected Answer: C C is correct as ‘%sh pip install …’ runs only on the driver node and the Cluster UI PyPi or other library installs are not scoped to a specific notebook only but to all spark sessions in all notebooks on all cluster nodes.
Comment 1222349 by imatheushenrique
- Upvotes: 1
Is necessary just run %pip install some_library inside a notebook cell C. OBS: For the last update of a library can be executed %pip install some_library -U
Question Fo1XqOAEzyJJi4Esz2pJ
Question
What is the first line of a Databricks Python notebook when viewed in a text editor?
Choices
- A: %python
- B: // Databricks notebook source
- C: # Databricks notebook source
- D: — Databricks notebook source
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1273721 by minhhnh
- Upvotes: 2
Selected Answer: C The correct answer is:
C. # Databricks notebook source
This is the comment line that appears at the beginning of a Databricks Python notebook when viewed in a text editor.
Comment 1222348 by imatheushenrique
- Upvotes: 2
C. # Databricks notebook source The commentary in the first like will indicate a magic command for a notebook source.