Questions and Answers

Question 2yryeWO1TPq06DOa3DtO

Question

An external object storage container has been mounted to the location /mnt/finance_eda_bucket. The following logic was executed to create a database for the finance team: //IMG//

After the database was successfully created and permissions configured, a member of the finance team runs the following code: //IMG//

If all users on the finance team are members of the finance group, which statement describes how the tx_sales table will be created?

Choices

  • A: A logical table will persist the query plan to the Hive Metastore in the Databricks control plane.
  • B: An external table will be created in the storage container mounted to /mnt/finance_eda_bucket.
  • C: A logical table will persist the physical plan to the Hive Metastore in the Databricks control plane.
  • D: An managed table will be created in the storage container mounted to /mnt/finance_eda_bucket.
  • E: A managed table will be created in the DBFS root storage container.

Question hRVAEwlGbuxFzN2LyDSz

Question

Although the Databricks Utilities Secrets module provides tools to store sensitive credentials and avoid accidentally displaying them in plain text users should still be careful with which credentials are stored here and which users have access to using these secrets. Which statement describes a limitation of Databricks Secrets?

Choices

  • A: Because the SHA256 hash is used to obfuscate stored secrets, reversing this hash will display the value in plain text.
  • B: Account administrators can see all secrets in plain text by logging on to the Databricks Accounts console.
  • C: Secrets are stored in an administrators-only table within the Hive Metastore; database administrators have permission to query this table by default.
  • D: Iterating through a stored secret and printing each character will display secret contents in plain text.
  • E: The Databricks REST API can be used to list secrets in plain text if the personal access token has proper credentials.

Question RZIZqr70zVe4WkFKJNUy

Question

What statement is true regarding the retention of job run history?

Choices

  • A: It is retained until you export or delete job run logs
  • B: It is retained for 30 days, during which time you can deliver job run logs to DBFS or S3
  • C: It is retained for 60 days, during which you can export notebook run results to HTML
  • D: It is retained for 60 days, after which logs are archived
  • E: It is retained for 90 days or until the run-id is re-used through custom run configuration

Question E52ePZemIT2GpDhe22GF

Question

A data engineer, User A, has promoted a new pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens. Which statement describes the contents of the workspace audit logs concerning these events?

Choices

  • A: Because the REST API was used for job creation and triggering runs, a Service Principal will be automatically used to identify these events.
  • B: Because User B last configured the jobs, their identity will be associated with both the job creation events and the job run events.
  • C: Because these events are managed separately, User A will have their identity associated with the job creation events and User B will have their identity associated with the job run events.
  • D: Because the REST API was used for job creation and triggering runs, user identity will not be captured in the audit logs.
  • E: Because User A created the jobs, their identity will be associated with both the job creation events and the job run events.

Question vAWZAo7E2GgI2wVQvHsJ

Question

A user new to Databricks is trying to troubleshoot long execution times for some pipeline logic they are working on. Presently, the user is executing code cell-by-cell, using display() calls to confirm code is producing the logically correct results as new transformations are added to an operation. To get a measure of average time to execute, the user is running each cell multiple times interactively. Which of the following adjustments will get a more accurate measure of how code is likely to perform in production?

Choices

  • A: Scala is the only language that can be accurately tested using interactive notebooks; because the best performance is achieved by using Scala code compiled to JARs, all PySpark and Spark SQL logic should be refactored.
  • B: The only way to meaningfully troubleshoot code execution times in development notebooks Is to use production-sized data and production-sized clusters with Run All execution.
  • C: Production code development should only be done using an IDE; executing code against a local build of open source Spark and Delta Lake will provide the most accurate benchmarks for how code will perform in production.
  • D: Calling display() forces a job to trigger, while many transformations will only add to the logical query plan; because of caching, repeated execution of the same logic does not provide meaningful results.
  • E: The Jobs UI should be leveraged to occasionally run the notebook as a job and track execution time during incremental code development because Photon can only be enabled on clusters launched for scheduled jobs.