Questions and Answers

Question APNewqvzdgkPndRXuDoF

Question

When evaluating the Ganglia Metrics for a given cluster with 3 executor nodes, which indicator would signal proper utilization of the VM’s resources?

Choices

A: The five Minute Load Average remains consistent/flat
B: CPU Utilization is around 75%
C: Network I/O never spikes
D: Total Disk Space remains constant

answer?

Answer: B Answer_ET: B Discussion

Comment 1222410 by imatheushenrique

Upvotes: 3

B. This level of CPU utilization indicates that the cluster is being used without being underutilized.

Question IzjBnSE4i7eJq7L832mS

Question

The data engineer is using Spark’s MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally?

Choices

A: On Heap Memory Usage is within 75% of Off Heap Memory Usage
B: The RDD Block Name includes the “*” annotation signaling a failure to cache
C: Size on Disk is > 0
D: The number of Cached Partitions > the number of Spark Partitions

answer?

Answer: C Answer_ET: C Community answer C (86%) 14% Discussion

Comment 1315727 by RuiCarvalhoDEV

Upvotes: 1

Selected Answer: C is MEMORY_ONLY

Comment 1257426 by Hadiler

Upvotes: 2

Selected Answer: C C is correct

Comment 1238184 by 03355a2

Upvotes: 1

Selected Answer: C It’s simple, if MEMORY_ONLY is used, anything spilled to disk would indicate a problem.

Comment 1229350 by hpkr

Upvotes: 2

Selected Answer: C C is correct here

Comment 1222919 by Freyr

Upvotes: 1

Selected Answer: B Correct Answer: B Option B, is the most correct and relevant choice for an indicator that a cached table is not performing optimally in a MEMORY_ONLY scenario. If an RDD block includes a ”?” annotation, it strongly suggests issues with caching, which would directly impact the performance and expected behavior of MEMORY_ONLY caching. This indication points to a failure to cache the data entirely in memory, which is what MEMORY_ONLY intends to do.

Option C, could also be a relevant indicator in general caching scenarios (e.g., MEMORY_AND_DISK), but it contradicts the MEMORY_ONLY setting directly. Therefore, Option B is chosen based on the specific storage level described.

Comment 1222408 by imatheushenrique

Upvotes: 1

B. This annotation says that some partitions of the cached data have been spilled to disk because there wasn’t enough memory to keep them.

Comment 1221178 by MDWPartners

Upvotes: 2

I would say C

Question ILMDoLROtbBiqkZ11jiE

Question

Review the following error traceback:

//IMG//

Which statement describes the error being raised?

Choices

A: There is a syntax error because the heartrate column is not correctly identified as a column.
B: There is no column in the table named heartrateheartrateheartrate
C: There is a type error because a column object cannot be multiplied.
D: There is a type error because a DataFrame object cannot be multiplied.

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1300614 by m79590530

Upvotes: 1

Selected Answer: B The final error clearly states that such column name can not be resolved in the source dataframe schema/structure

Question 8gdvOFDAHUWPskUuL7GW

Question

What is a method of installing a Python package scoped at the notebook level to all nodes in the currently active cluster?

Choices

A: Run source env/bin/activate in a notebook setup script
B: Install libraries from PyPI using the cluster UI
C: Use %pip install in a notebook cell
D: Use %sh pip install in a notebook cell

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1300623 by m79590530

Upvotes: 1

Selected Answer: C C is correct as ‘%sh pip install …’ runs only on the driver node and the Cluster UI PyPi or other library installs are not scoped to a specific notebook only but to all spark sessions in all notebooks on all cluster nodes.

Comment 1222349 by imatheushenrique

Upvotes: 1

Is necessary just run %pip install some_library inside a notebook cell C. OBS: For the last update of a library can be executed %pip install some_library -U

Question Fo1XqOAEzyJJi4Esz2pJ

Question

What is the first line of a Databricks Python notebook when viewed in a text editor?

Choices

A: %python
B: // Databricks notebook source
C: # Databricks notebook source
D: — Databricks notebook source

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1273721 by minhhnh

Upvotes: 2

Selected Answer: C The correct answer is:

C. # Databricks notebook source

This is the comment line that appears at the beginning of a Databricks Python notebook when viewed in a text editor.

Comment 1222348 by imatheushenrique

Upvotes: 2

C. # Databricks notebook source The commentary in the first like will indicate a magic command for a notebook source.

vuthanhdatt's Second Brain

Explorer

17

Questions and Answers

Question APNewqvzdgkPndRXuDoF

Question

Choices

Comment 1222410 by imatheushenrique

Question IzjBnSE4i7eJq7L832mS

Question

Choices

Comment 1315727 by RuiCarvalhoDEV

Comment 1257426 by Hadiler

Comment 1238184 by 03355a2

Comment 1229350 by hpkr

Comment 1222919 by Freyr

Comment 1222408 by imatheushenrique

Comment 1221178 by MDWPartners

Question ILMDoLROtbBiqkZ11jiE

Question

Choices

Comment 1300614 by m79590530

Question 8gdvOFDAHUWPskUuL7GW

Question

Choices

Comment 1300623 by m79590530

Comment 1222349 by imatheushenrique

Question Fo1XqOAEzyJJi4Esz2pJ

Question

Choices

Comment 1273721 by minhhnh

Comment 1222348 by imatheushenrique

Graph View

Table of Contents