Questions and Answers

Question qlOV5Egbf5GobqM3iS2Q

Question

The data engineering team has been tasked with configuring connections to an external database that does not have a supported native connector with Databricks. The external database already has data security configured by group membership. These groups map directly to user groups already created in Databricks that represent various teams within the company.

A new login credential has been created for each group in the external database. The Databricks Utilities Secrets module will be used to make these credentials available to Databricks users.

Assuming that all the credentials are configured correctly on the external database and group membership is properly configured on Databricks, which statement describes how teams can be granted the minimum necessary access to using these credentials?

Choices

  • A: No additional configuration is necessary as long as all users are configured as administrators in the workspace where secrets have been added.
  • B: “Read” permissions should be set on a secret key mapped to those credentials that will be used by a given team.
  • C: “Read” permissions should be set on a secret scope containing only those credentials that will be used by a given team.
  • D: “Manage” permissions should be set on a secret scope containing only those credentials that will be used by a given team.

Question fQWGrzJTwOcpJ8q6tSlP

Question

What is the retention of job run history?

Choices

  • A: It is retained until you export or delete job run logs
  • B: It is retained for 30 days, during which time you can deliver job run logs to DBFS or S3
  • C: It is retained for 60 days, during which you can export notebook run results to HTML
  • D: It is retained for 60 days, after which logs are archived

Question ZGMzTysKhRCXh2U2tBZ0

Question

A data engineer, User A, has promoted a new pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens.

Which statement describes the contents of the workspace audit logs concerning these events?

Choices

  • A: Because the REST API was used for job creation and triggering runs, a Service Principal will be automatically used to identify these events.
  • B: Because User A created the jobs, their identity will be associated with both the job creation events and the job run events.
  • C: Because these events are managed separately, User A will have their identity associated with the job creation events and User B will have their identity associated with the job run events.
  • D: Because the REST API was used for job creation and triggering runs, user identity will not be captured in the audit logs.

Question qKh9sW3dFmBGiZk1bIpp

Question

A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were both turned on for the streaming production job. Recent review of data files shows that most data files are under 64 MB, although each partition in the table contains at least 1 GB of data and the total table size is over 10 TB. Which of the following likely explains these smaller file sizes?

Choices

  • A: Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
  • B: Z-order indices calculated on the table are preventing file compaction
  • C: Bloom filter indices calculated on the table are preventing file compaction
  • D: Databricks has autotuned to a smaller target file size based on the overall size of data in the table
  • E: Databricks has autotuned to a smaller target file size based on the amount of data in each partition

Question 4S9uSaSZHs3IdoxyIszC

Question

A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.

In which location can one review the timeline for cluster resizing events?

Choices

  • A: Workspace audit logs
  • B: Driver’s log file
  • C: Ganglia
  • D: Cluster Event Log