Questions and Answers

Question 4LLWPsJoV1fKhFEWcnpe

Question

When evaluating the Ganglia Metrics for a given cluster with 3 executor nodes, which indicator would signal proper utilization of the VM’s resources?

Choices

A: The five Minute Load Average remains consistent/flat
B: Bytes Received never exceeds 80 million bytes per second
C: Network I/O never spikes
D: Total Disk Space remains constant
E: CPU Utilization is around 75%

answer?

Answer: E Answer_ET: E Community answer E (100%) Discussion

Comment 1062817 by sturcu

Upvotes: 7

Selected Answer: E I would look at max CPU utilization and max Memory usage. Having 75% CPU usage would signify we have a proper utilization of cpu resources

Comment 1141671 by vctrhugo

Upvotes: 2

Selected Answer: E Proper utilization of VM resources, especially in a distributed computing environment like Spark, often involves efficient usage of CPU resources. A CPU utilization around 75% indicates that the CPU is being utilized without being fully saturated, allowing room for additional processing without causing excessive contention.

Comment 1100376 by alexvno

Upvotes: 1

Selected Answer: E 75% good

Comment 1076694 by aragorn_brego

Upvotes: 3

Selected Answer: E An average CPU utilization around 75% is a good indicator of proper utilization of the VM’s resources in a distributed computing environment. It suggests that the CPUs are being actively used for computation without being maxed out, which could indicate a bottleneck. It leaves some headroom to handle additional load without causing excessive queuing or delays.

Question lahZOckwB9uht3pLJ1xF

Question

Which of the following technologies can be used to identify key areas of text when parsing Spark Driver log4j output?

Choices

A: Regex
B: Julia
C: pyspsark.ml.feature
D: Scala Datasets
E: C++

answer?

Answer: A Answer_ET: A Community answer A (89%) 11% Discussion

Comment 1141668 by vctrhugo

Upvotes: 3

Selected Answer: A It allows us to define patterns that match the structure of the log entries and capture relevant data.

Comment 1076699 by aragorn_brego

Upvotes: 4

Selected Answer: A Regular expressions (regex) can be used to identify and extract patterns from text data, which makes them very useful for parsing log files like the Spark Driver’s log4j output. By defining specific regex patterns, you can search for error messages, timestamps, specific log levels, or any other text that follows a particular format within the log files.

Comment 1057429 by sturcu

Upvotes: 3

Selected Answer: A Regex to extract text

Comment 1057124 by hm358

Upvotes: 2

Selected Answer: A regex is for string identification

Comment 1056075 by mouad_attaqi

Upvotes: 4

Selected Answer: A Using regex, we can identify key ans values areas

Comment 1053521 by sturcu

Upvotes: 1

Why C++, why not python or Java? Plus there are tools om parsing the log4j output like Chainsaw and xmlstarlet.

Question 60cj5Jh2N1zHOhH7R4gy

Question

You are testing a collection of mathematical functions, one of which calculates the area under a curve as described by another function.

assert(myIntegrate(lambda x: x*x, 0, 3) [0] == 9)

Which kind of test would the above line exemplify?

Choices

A: Unit
B: Manual
C: Functional
D: Integration
E: End-to-end

answer?

Answer: A Answer_ET: A Community answer A (75%) C (25%) Discussion

Comment 1207354 by Nickff

Upvotes: 3

Selected Answer: A Answer is A, unit test

Comment 1180359 by barnac1es

Upvotes: 3

Selected Answer: C I think it should be Functional Test

Comment 1141667 by vctrhugo

Upvotes: 3

Selected Answer: A A. Unit

Comment 1111480 by divingbell17

Upvotes: 3

Selected Answer: A A is correct

Question vMuyBdca92FBzYgQVb5D

Question

A Databricks job has been configured with 3 tasks, each of which is a Databricks notebook. Task A does not depend on other tasks. Tasks B and C run in parallel, with each having a serial dependency on Task A.

If task A fails during a scheduled run, which statement describes the results of this run?

Choices

A: Because all tasks are managed as a dependency graph, no changes will be committed to the Lakehouse until all tasks have successfully been completed.
B: Tasks B and C will attempt to run as configured; any changes made in task A will be rolled back due to task failure.
C: Unless all tasks complete successfully, no changes will be committed to the Lakehouse; because task A failed, all commits will be rolled back automatically.
D: Tasks B and C will be skipped; some logic expressed in task A may have been committed before task failure.
E: Tasks B and C will be skipped; task A will not commit any changes because of stage failure.

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1056076 by mouad_attaqi

Upvotes: 6

Selected Answer: D D is correct, taks B and C will definitely be skipped, since Task A is notebook, the ACID logic is at cell level, some logic might be executed before failing cell

Comment 1076705 by aragorn_brego

Upvotes: 4

Selected Answer: D In Databricks job execution, if a task that other tasks depend on fails, the dependent tasks will not be executed. Since Tasks B and C depend on the successful completion of Task A, they will be skipped if Task A fails. However, if Task A performs any operations that commit changes before the failure occurs (such as writing to a Delta table), those changes remain and are not automatically rolled back unless the logic within Task A specifically includes rollback mechanisms for partial failures.

Comment 1066315 by Dileepvikram

Upvotes: 3

D is the answer

Comment 1053525 by sturcu

Upvotes: 3

Selected Answer: D Some ops in task A may have fished before fail

Question yCkzkxWy84EEsEVpEYQv

Question

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table. Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales. //IMG//

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

Choices

A: Both commands will succeed. Executing show tables will show that countries_af and sales_af have been registered as views.
B: Cmd 1 will succeed. Cmd 2 will search all accessible databases for a table or view named countries_af: if this entity exists, Cmd 2 will succeed.
C: Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable representing a PySpark DataFrame.
D: Both commands will fail. No new variables, tables, or views will be created.
E: Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings.

answer?

Answer: E Answer_ET: E Community answer E (91%) 9% Discussion

Comment 1075925 by aragorn_brego

Upvotes: 11

Selected Answer: E Cmd 1 is a PySpark command that collects the list of countries from the ‘geo_lookup’ table where the continent is Africa (‘AF’). This command will execute successfully, resulting in countries_af being a list of country names (strings) in Python’s local memory.

Cmd 2 is an SQL command intended to create a view named ‘sales_af’ from the ‘sales’ table, filtered by the cities in the countries_af list. However, this will fail because the countries_af variable exists in the Python environment and is not recognized in the SQL context. SQL does not have access to Python variables directly; they are two separate execution contexts within a Databricks notebook. There is no table or view named countries_af that SQL can reference; it is merely a Python list variable.

The other options are incorrect because they either assume cross-contextual operation between Python and SQL within a Databricks notebook (which is not possible in the way described in the commands), or they do not correctly interpret the outcome of running the commands.

Comment 1292755 by benni_ale

Upvotes: 1

Selected Answer: E E , the collect method outputs strings so the python variable bill be a list of string which should not be called as a spark table as in cmd 2

Comment 1224440 by imatheushenrique

Upvotes: 1

E. Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings.

Comment 1191739 by juliom6

Upvotes: 3

Selected Answer: E E is correct.

%sql create table geo_lookup (continent varchar(2), country varchar(15)); insert into geo_lookup (continent, country) values (‘AF’,‘Nigeria’), (‘AF’,‘Kenya’); create table sales (city varchar(15), continent varchar(2)); insert into sales (city, continent) values (‘Nigeria’,‘AF’), (‘Kenya’,‘AF’);

%python countries_af = [x[0] for x in spark.table(‘geo_lookup’).filter(“continent=‘AF’“).select(‘country’).collect()]

%sql create view sales_af as select * from sales where city in countries_af and continent = “AF”;

ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near ‘in’.(line 4, pos 11)

i.e. countries_af is a python list of strings and can’t be used inside a sql statement

Comment 1150223 by leopedroso1

Upvotes: 1

By simulating this code in databricks we can see an error being thrown in the SQL statement

ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near ‘IN’.(line 1, pos 38)

SQL SELECT * FROM backup.sales WHERE CITY IN countries_af AND CONTINENT = “AF”

Comment 1145943 by RiktRikt007

Upvotes: 1

Selected Answer: B B shows the actual flow of spark sql, where E shows the question context, i mean from databricks point of view E never looked, it’s true that question state that database has no other tables, so ?? that mean databricks will not check for that particular table ? it will right ? i also confused by “database has no other database statement” and E and B both are right, but again B state “if countries table exists then command 2 will run” here “if” used, but question want to describe the language interoperability, so most of us selected E

Comment 1144403 by PrashantTiwari

Upvotes: 2

E is correct

Comment 1121649 by Jay_98_11

Upvotes: 2

Selected Answer: E vote for E

Comment 1118553 by kz_data

Upvotes: 1

Selected Answer: E E is correct answer

Comment 1062069 by ismoshkov

Upvotes: 1

Selected Answer: B https://docs.databricks.com/en/notebooks/notebooks-code.html#mix-languages Variables defined in one language (and hence in the REPL for that language) are not available in the REPL of another language

Comment 1040245 by sturcu

Upvotes: 1

Selected Answer: E correct

Comment 1000703 by lucasasterio

Upvotes: 2

Selected Answer: E correct

Comment 991538 by Eertyy

Upvotes: 2

E is right nswer

vuthanhdatt's Second Brain

Explorer

38

Questions and Answers

Question 4LLWPsJoV1fKhFEWcnpe

Question

Choices

Comment 1062817 by sturcu

Comment 1141671 by vctrhugo

Comment 1100376 by alexvno

Comment 1076694 by aragorn_brego

Question lahZOckwB9uht3pLJ1xF

Question

Choices

Comment 1141668 by vctrhugo

Comment 1076699 by aragorn_brego

Comment 1057429 by sturcu

Comment 1057124 by hm358

Comment 1056075 by mouad_attaqi

Comment 1053521 by sturcu

Question 60cj5Jh2N1zHOhH7R4gy

Question

Choices

Comment 1207354 by Nickff

Comment 1180359 by barnac1es

Comment 1141667 by vctrhugo

Comment 1111480 by divingbell17

Question vMuyBdca92FBzYgQVb5D

Question

Choices

Comment 1056076 by mouad_attaqi

Comment 1076705 by aragorn_brego

Comment 1066315 by Dileepvikram

Comment 1053525 by sturcu

Question yCkzkxWy84EEsEVpEYQv

Question

Choices

Comment 1075925 by aragorn_brego

Comment 1292755 by benni_ale

Comment 1224440 by imatheushenrique

Comment 1191739 by juliom6

Comment 1150223 by leopedroso1

Comment 1145943 by RiktRikt007

Comment 1144403 by PrashantTiwari

Comment 1121649 by Jay_98_11

Comment 1118553 by kz_data

Comment 1062069 by ismoshkov

Comment 1040245 by sturcu

Comment 1000703 by lucasasterio

Comment 991538 by Eertyy

Graph View

Table of Contents