Questions and Answers

Question mJflUaButPk6H4EANrPu

Question

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Choices

  • A: SELECT * FROM my_table WHERE age > 25;
  • B: UPDATE my_table WHERE age > 25;
  • C: DELETE FROM my_table WHERE age > 25;
  • D: UPDATE my_table WHERE age 25;
  • E: DELETE FROM my_table WHERE age 25;

Question jbRGBPb2V1JDh4FE6DDj

Question

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

Choices

  • A: SELECT * FROM sales
  • B: spark.delta.table
  • C: spark.sql
  • D: There is no way to share data between PySpark and SQL.
  • E: spark.table

Question L6ZRNnpBjF0O0oa3hv9w

Question

Which of the following commands will return the number of null values in the member_id column?

Choices

  • A: SELECT count(member_id) FROM my_table;
  • B: SELECT count(member_id) - count_null(member_id) FROM my_table;
  • C: SELECT count_if(member_id IS NULL) FROM my_table;
  • D: SELECT null(member_id) FROM my_table;
  • E: SELECT count_null(member_id) FROM my_table;

Question N03FznZNv8TYlQ4aW8xV

Question

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.

Which of the following code blocks successfully completes this task?

Choices

  • A:
  • B:
  • C:
  • D:
  • E:

Question NFQ87hcaQNJ4s5b4wiI0

Question

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.

They have the following incomplete code block:

____(f”SELECT customer_id, spend FROM {table_name}”)

Which of the following can be used to fill in the blank to successfully complete the task?

Choices

  • A: spark.delta.sql
  • B: spark.delta.table
  • C: spark.table
  • D: dbutils.sql
  • E: spark.sql