Questions and Answers

Question upyTGCa5ytq15hW7rnKN

Question

Which statement describes a key benefit of an end-to-end test?

Choices

  • A: Makes it easier to automate your test suite
  • B: Pinpoints errors in the building blocks of your application
  • C: Provides testing coverage for all code paths and branches
  • D: Closely simulates real world usage of your application
  • E: Ensures code is optimized for a real-life workflow

Question vnGqnW4JNf2fsXR481t4

Question

The Databricks CLI is used to trigger a run of an existing job by passing the job_id parameter. The response that the job run request has been submitted successfully includes a field run_id.

Which statement describes what the number alongside this field represents?

Choices

  • A: The job_id and number of times the job has been run are concatenated and returned.
  • B: The total number of jobs that have been run in the workspace.
  • C: The number of times the job definition has been run in this workspace.
  • D: The job_id is returned in this field.
  • E: The globally unique ID of the newly triggered run.

Question cudI4e05eqWozPIBqSdV

Question

The data science team has created and logged a production model using MLflow. The model accepts a list of column names and returns a new column of type DOUBLE.

The following code correctly imports the production model, loads the customers table containing the customer_id key column into a DataFrame, and defines the feature columns needed for the model.

//IMG//

Which code block will output a DataFrame with the schema “customer_id LONG, predictions DOUBLE”?

Choices

  • A: df.map(lambda x:model(x[columns])).select(“customer_id, predictions”)
  • B: df.select(“customer_id”, model(*columns).alias(“predictions”))
  • C: model.predict(df, columns)
  • D: df.select(“customer_id”, pandas_udf(model, columns).alias(“predictions”))
  • E: df.apply(model, columns).select(“customer_id, predictions”)

Question yJ0rC7YlgKCEiTIJLPLH

Question

A nightly batch job is configured to ingest all data files from a cloud object storage container where records are stored in a nested directory structure YYYY/MM/DD. The data for each date represents all records that were processed by the source system on that date, noting that some records may be delayed as they await moderator approval. Each entry represents a user review of a product and has the following schema:

user_id STRING, review_id BIGINT, product_id BIGINT, review_timestamp TIMESTAMP, review_text STRING

The ingestion job is configured to append all data for the previous date to a target table reviews_raw with an identical schema to the source system. The next step in the pipeline is a batch write to propagate all new records inserted into reviews_raw to a table where data is fully deduplicated, validated, and enriched.

Which solution minimizes the compute costs to propagate this batch of data?

Choices

  • A: Perform a batch read on the reviews_raw table and perform an insert-only merge using the natural composite key user_id, review_id, product_id, review_timestamp.
  • B: Configure a Structured Streaming read against the reviews_raw table using the trigger once execution mode to process new records as a batch job.
  • C: Use Delta Lake version history to get the difference between the latest version of reviews_raw and one version prior, then write these records to the next table.
  • D: Filter all records in the reviews_raw table based on the review_timestamp; batch append those records produced in the last 48 hours.
  • E: Reprocess all records in reviews_raw and overwrite the next table in the pipeline.

Question K2IQOnQrwhpeAaeJ4PC4

Question

Which statement describes Delta Lake optimized writes?

Choices

  • A: Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
  • B: An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
  • C: Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
  • D: Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
  • E: A shuffle occurs prior to writing to try to group similar data together resulting in fewer files instead of each executor writing multiple files based on directory partitions.