Questions and Answers

Question 4LLWPsJoV1fKhFEWcnpe

Question

When evaluating the Ganglia Metrics for a given cluster with 3 executor nodes, which indicator would signal proper utilization of the VM’s resources?

Choices

  • A: The five Minute Load Average remains consistent/flat
  • B: Bytes Received never exceeds 80 million bytes per second
  • C: Network I/O never spikes
  • D: Total Disk Space remains constant
  • E: CPU Utilization is around 75%

Question lahZOckwB9uht3pLJ1xF

Question

Which of the following technologies can be used to identify key areas of text when parsing Spark Driver log4j output?

Choices

  • A: Regex
  • B: Julia
  • C: pyspsark.ml.feature
  • D: Scala Datasets
  • E: C++

Question 60cj5Jh2N1zHOhH7R4gy

Question

You are testing a collection of mathematical functions, one of which calculates the area under a curve as described by another function.

assert(myIntegrate(lambda x: x*x, 0, 3) [0] == 9)

Which kind of test would the above line exemplify?

Choices

  • A: Unit
  • B: Manual
  • C: Functional
  • D: Integration
  • E: End-to-end

Question vMuyBdca92FBzYgQVb5D

Question

A Databricks job has been configured with 3 tasks, each of which is a Databricks notebook. Task A does not depend on other tasks. Tasks B and C run in parallel, with each having a serial dependency on Task A.

If task A fails during a scheduled run, which statement describes the results of this run?

Choices

  • A: Because all tasks are managed as a dependency graph, no changes will be committed to the Lakehouse until all tasks have successfully been completed.
  • B: Tasks B and C will attempt to run as configured; any changes made in task A will be rolled back due to task failure.
  • C: Unless all tasks complete successfully, no changes will be committed to the Lakehouse; because task A failed, all commits will be rolled back automatically.
  • D: Tasks B and C will be skipped; some logic expressed in task A may have been committed before task failure.
  • E: Tasks B and C will be skipped; task A will not commit any changes because of stage failure.

Question yCkzkxWy84EEsEVpEYQv

Question

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table. Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales. //IMG//

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

Choices

  • A: Both commands will succeed. Executing show tables will show that countries_af and sales_af have been registered as views.
  • B: Cmd 1 will succeed. Cmd 2 will search all accessible databases for a table or view named countries_af: if this entity exists, Cmd 2 will succeed.
  • C: Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable representing a PySpark DataFrame.
  • D: Both commands will fail. No new variables, tables, or views will be created.
  • E: Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings.