Questions and Answers

Question jThfRKaCLSBlQbdYPGtA

Question

A Delta Lake table representing metadata about content posts from users has the following schema: user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE This table is partitioned by the date column. A query is run with the following filter: longitude < 20 & longitude > -20 Which statement describes how data will be filtered?

Choices

  • A: Statistics in the Delta Log will be used to identify partitions that might Include files in the filtered range.
  • B: No file skipping will occur because the optimizer does not know the relationship between the partition column and the longitude.
  • C: The Delta Engine will use row-level statistics in the transaction log to identify the flies that meet the filter criteria.
  • D: Statistics in the Delta Log will be used to identify data files that might include records in the filtered range.
  • E: The Delta Engine will scan the parquet file footers to identify each row that meets the filter criteria.

Question 3agNAMhRPP8lTPWrJ4YV

Question

A small company based in the United States has recently contracted a consulting firm in India to implement several new data engineering pipelines to power artificial intelligence applications. All the company’s data is stored in regional cloud storage in the United States. The workspace administrator at the company is uncertain about where the Databricks workspace used by the contractors should be deployed. Assuming that all data governance considerations are accounted for, which statement accurately informs this decision?

Choices

  • A: Databricks runs HDFS on cloud volume storage; as such, cloud virtual machines must be deployed in the region where the data is stored.
  • B: Databricks workspaces do not rely on any regional infrastructure; as such, the decision should be made based upon what is most convenient for the workspace administrator.
  • C: Cross-region reads and writes can incur significant costs and latency; whenever possible, compute should be deployed in the same region the data is stored.
  • D: Databricks leverages user workstations as the driver during interactive development; as such, users should always use a workspace deployed in a region they are physically near.
  • E: Databricks notebooks send all executable code from the user’s browser to virtual machines over the open internet; whenever possible, choosing a workspace region near the end users is the most secure.

Question hD3f5kmERd4wSPoJFg5r

Question

The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalid latitude and longitude values in the activity_details table have been breaking their ability to use other geolocation processes. A junior engineer has written the following code to add CHECK constraints to the Delta Lake table: //IMG//

A senior engineer has confirmed the above logic is correct and the valid ranges for latitude and longitude are provided, but the code fails when executed. Which statement explains the cause of this failure?

Choices

  • A: Because another team uses this table to support a frequently running application, two-phase locking is preventing the operation from committing.
  • B: The activity_details table already exists; CHECK constraints can only be added during initial table creation.
  • C: The activity_details table already contains records that violate the constraints; all existing data must pass CHECK constraints in order to add them to an existing table.
  • D: The activity_details table already contains records; CHECK constraints can only be added prior to inserting values into a table.
  • E: The current table schema does not contain the field valid_coordinates; schema evolution will need to be enabled before altering the table to add a constraint.

Question YISFJT5AhX4JVUoFdbJR

Question

Which of the following is true of Delta Lake and the Lakehouse?

Choices

  • A: Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.
  • B: Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in data skipping based on query filters.
  • C: Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.
  • D: Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.
  • E: Z-order can only be applied to numeric values stored in Delta Lake tables.

Question 2O2ggTfzmxaKnrTXl0Nm

Question

The data engineering team has configured a Databricks SQL query and alert to monitor the values in a Delta Lake table. The recent_sensor_recordings table contains an identifying sensor_id alongside the timestamp and temperature for the most recent 5 minutes of recordings. The below query is used to create the alert: //IMG//

The query is set to refresh each minute and always completes in less than 10 seconds. The alert is set to trigger when mean (temperature) > 120. Notifications are triggered to be sent at most every 1 minute. If this alert raises notifications for 3 consecutive minutes and then stops, which statement must be true?

Choices

  • A: The total average temperature across all sensors exceeded 120 on three consecutive executions of the query
  • B: The recent_sensor_recordings table was unresponsive for three consecutive runs of the query
  • C: The source query failed to update properly for three consecutive minutes and then restarted
  • D: The maximum temperature recording for at least one sensor exceeded 120 on three consecutive executions of the query
  • E: The average temperature recordings for at least one sensor exceeded 120 on three consecutive executions of the query