Questions and Answers

Question GpmOmrRHzmL8f1JMFaQQ

Question

A data engineer, User A, has promoted a pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens.

A workspace admin, User C, inherits responsibility for managing this pipeline. User C uses the Databricks Jobs UI to take “Owner” privileges of each job. Jobs continue to be triggered using the credentials and tooling configured by User B.

An application has been configured to collect and parse run information returned by the REST API. Which statement describes the value returned in the creator_user_name field?

Choices

  • A: Once User C takes “Owner” privileges, their email address will appear in this field; prior to this, User A’s email address will appear in this field.
  • B: User B’s email address will always appear in this field, as their credentials are always used to trigger the run.
  • C: User A’s email address will always appear in this field, as they still own the underlying notebooks.
  • D: Once User C takes “Owner” privileges, their email address will appear in this field; prior to this, User B’s email address will appear in this field.
  • E: User C will only ever appear in this field if they manually trigger the job, otherwise it will indicate User B.

Question hOe3tum4fOyACZdtA2SY

Question

A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.

//IMG//

Which command should be removed from the notebook before scheduling it as a job?

Choices

  • A: Cmd 2
  • B: Cmd 3
  • C: Cmd 4
  • D: Cmd 5

Question p9avsuaKbjCHlLgkLFte

Question

Which statement regarding Spark configuration on the Databricks platform is true?

Choices

  • A: The Databricks REST API can be used to modify the Spark configuration properties for an interactive cluster without interrupting jobs currently running on the cluster.
  • B: Spark configurations set within a notebook will affect all SparkSessions attached to the same interactive cluster.
  • C: When the same Spark configuration property is set for an interactive cluster and a notebook attached to that cluster, the notebook setting will always be ignored.
  • D: Spark configuration properties set for an interactive cluster with the Clusters UI will impact all notebooks attached to that cluster.

Question a98naUArnjZOxNxythsl

Question

A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create. //IMG//

Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?

Choices

  • A: Three new jobs named “Ingest new data” will be defined in the workspace, and they will each run once daily.
  • B: The logic defined in the referenced notebook will be executed three times on new clusters with the configurations of the provided cluster ID.
  • C: Three new jobs named “Ingest new data” will be defined in the workspace, but no jobs will be executed.
  • D: One new job named “Ingest new data” will be defined in the workspace, but it will not be executed.
  • E: The logic defined in the referenced notebook will be executed three times on the referenced existing all purpose cluster.

Question UdmfC9UX1k3VQehKAkw8

Question

The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes.

Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

Choices

  • A: Configure a job that executes every time new data lands in a given directory
  • B: Schedule a job to execute the pipeline once an hour on a new job cluster
  • C: Schedule a Structured Streaming job with a trigger interval of 60 minutes
  • D: Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster