Questions and Answers
Question 9QfgldmXkHT1ECMg3vHD
Question
A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.
The proposed directory structure is displayed below:
//IMG//
Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?
Choices
- A: No; Delta Lake manages streaming checkpoints in the transaction log.
- B: Yes; both of the streams can share a single checkpoint directory.
- C: No; only one stream can write to a Delta Lake table.
- D: No; each of the streams needs to have its own checkpoint directory.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1322661 by divyapsingh
- Upvotes: 1
Selected Answer: D D is the answer. two steams can not have same checkpoint directory.
Question jHkr33IW5HmyW3zIKIv4
Question
Which statement describes the default execution mode for Databricks Auto Loader?
Choices
- A: Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and idempotently loaded into the target Delta Lake table.
- B: New files are identified by listing the input directory; the target table is materialized by directly querying all valid files in the source directory.
- C: Webhooks trigger a Databricks job to run anytime new data arrives in a source directory; new data are automatically merged into target tables using rules inferred from the data.
- D: New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.
answer?
Answer: D Answer_ET: D Community answer D (71%) 14% 14% Discussion
Comment 1335570 by arekm
- Upvotes: 1
Selected Answer: D D - it is between A & D, rest does not make sense. A describes the file notification mode, which is NOT the default.
Comment 1331042 by OnlyPraveen
- Upvotes: 2
Selected Answer: D Also check answers to duplicate question 108.
Comment 1324190 by carlosmps
- Upvotes: 2
Selected Answer: D “Auto Loader uses directory listing mode by default. In directory listing mode, Auto Loader identifies new files by listing the input directory.”
https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/directory-listing-mode
Comment 1323117 by temple1305
- Upvotes: 1
Selected Answer: B Correct answer is B. However, listing the input directory is the default way of identifying new files for auto loader. Cloud Native Notification services can be used but this is not default setting for auto loader.
Comment 1322662 by divyapsingh
- Upvotes: 1
Selected Answer: A solution a is better approach a it is more efficient but both vendor-specific notification service and directory file listing are used for tracking the new files. check out below link :- https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/file-detection-modes.html
Question 2pZj0DXg5ATNKEunCKko
Question
The Databricks workspace administrator has configured interactive clusters for each of the data engineering groups. To control costs, clusters are set to terminate after 30 minutes of inactivity. Each user should be able to execute workloads against their assigned clusters at any time of the day. Assuming users have been added to a workspace but not granted any permissions, which of the following describes the minimal permissions a user would need to start and attach to an already configured cluster.
Choices
- A: “Can Manage” privileges on the required cluster
- B: Workspace Admin privileges, cluster creation allowed, “Can Attach To” privileges on the required cluster
- C: Cluster creation allowed, “Can Attach To” privileges on the required cluster
- D: “Can Restart” privileges on the required cluster
- E: Cluster creation allowed, “Can Restart” privileges on the required cluster
answer?
Answer: D Answer_ET: D Community answer D (79%) C (18%) 3% Discussion
Comment 1473635 by codebender
- Upvotes: 1
Selected Answer: D Restart is necessary to start the cluster
Comment 1416626 by Ashok_Choudhary_CT
- Upvotes: 1
Selected Answer: A There are three types of permissions:
- Can Restart
- Can Attach To
- Can Manage
Can manage provides both the permissions. Can restart doesn’t provide “Can Attach To” permission, So the correct answer is “A”
Comment 1410431 by ultimomassimo
- Upvotes: 2
Selected Answer: D 100% D User has to be able to start the cluster as it shuts down after 30 min of inactivity. Everything is explain clearly in the table here - Compute ACLs: https://docs.databricks.com/aws/en/security/auth/access-control/#clusters
Comment 1361474 by Tedet
- Upvotes: 1
Selected Answer: C To execute workloads against an already configured cluster in Databricks, users need the following minimum permissions:
- Cluster Creation Allowed: Users need the ability to create clusters, which ensures they can start a cluster if it’s not already running or if the cluster has been terminated after the inactivity period.
- “Can Attach To” Privileges on the Required Cluster: This permission allows users to attach their notebooks or jobs to the existing cluster. The “Can Attach To” permission is the key to allowing users to interact with the cluster for running jobs or notebooks.
Comment 1358846 by johnserafim
- Upvotes: 1
Selected Answer: C I should choose the C answer because the “Can Attach To” permission is the minimal requirement for a user to attach to an interactive cluster and execute workloads.
Comment 1353250 by shaswat1404
- Upvotes: 1
Selected Answer: D can manage all permession workspace admin irrelevant can attach to cannot start the cluster can restart correct can do both cluster creation + restart (cluster is alreeady created, dont need to give permission)
Comment 1351568 by EelkeV
- Upvotes: 1
Selected Answer: C You do not need all the permissions, just the lowest
Comment 1347006 by Dhusanth
- Upvotes: 4
Selected Answer: D https://docs.azure.cn/en-us/databricks/security/auth-authz/access-control/cluster-acl#cluster-permissions
Comment 1340845 by nadegetiedjo
- Upvotes: 1
Selected Answer: D if added to a workspace, it has the default settings of the users in that workspace
Comment 1339649 by sakis213
- Upvotes: 2
Selected Answer: C Can Restart only allows restarting the cluster and does not grant permission to attach workloads.
Comment 1336905 by yeyi97
- Upvotes: 1
Selected Answer: C Option C makes more sense since the question says also attach. Not just restart.
Comment 1326051 by rockreid
- Upvotes: 1
Selected Answer: C To execute workloads, users need to be able to attach their notebooks or jobs to the cluster. The “Can Attach To” privilege specifically allows users to attach to and use the cluster, which is essential for running their workloads.
Comment 1290633 by akashdesarda
- Upvotes: 1
Selected Answer: D Questions is users need to start & use so it will be Can restrat . Can attach cannot start compute
Comment 1280981 by Robbyisok
- Upvotes: 2
D is the correct answer. Focus on this line “user would need to start and attach to an already configured cluster.”
Comment 1247367 by md_2000
- Upvotes: 1
Selected Answer: D COrrect
Comment 1236070 by panya
- Upvotes: 1
D is the Correct Answer
Comment 1219859 by Freyr
- Upvotes: 1
D is correct. Not A.
Comment 1213847 by coercion
- Upvotes: 3
Selected Answer: D “Can restart” permission is only required. “Can Manage” permission gives the ability to edit the configurations of the cluster. https://docs.databricks.com/en/security/auth-authz/access-control/cluster-acl.html
Comment 1213834 by coercion
- Upvotes: 1
Answer should be D not A
Comment 1203749 by naveenballa2
- Upvotes: 1
D is correct
Comment 1198781 by sandeepgoyal1984
- Upvotes: 1
Correct Ans is D
Comment 1128054 by AziLa
- Upvotes: 2
Correct Ans is D
Comment 1121586 by Jay_98_11
- Upvotes: 2
Selected Answer: D D is correct
Comment 1102663 by kz_data
- Upvotes: 4
Selected Answer: D D is the correct answer
Comment 1075920 by aragorn_brego
- Upvotes: 2
Selected Answer: D https://docs.databricks.com/en/security/auth-authz/access-control/cluster-acl.html
Comment 1074801 by Eertyy
- Upvotes: 1
D-Can restart is is minimum permission to attach and start the cluster. For more information. Read this page https://docs.databricks.com/en/security/auth-authz/access-control/cluster-acl.html
Comment 1060737 by BIKRAM063
- Upvotes: 1
Option D is correct. ‘Can Restart’ privilege is required
Comment 1052627 by sagar21692
- Upvotes: 1
Option D is the right answer
Comment 1040231 by sturcu
- Upvotes: 1
Selected Answer: D https://learn.microsoft.com/en-us/azure/databricks/security/auth-authz/access-control/cluster-acl
Comment 1025106 by SRMV
- Upvotes: 2
D is the correct answer. https://docs.databricks.com/en/security/auth-authz/access-control/cluster-acl.html
Comment 1022725 by Deepaakash
- Upvotes: 3
Selected Answer: D Option D is the correct answer, refer databricks documentations
Comment 983356 by Enduresoul
- Upvotes: 4
Selected Answer: D “D” is the correct answer. “A” would be correct when it comes to editing the cluster itself.
Comment 979301 by Lungster
- Upvotes: 2
The correct answer is D, it allows a user(s) to start, restart an already existing cluster. The key phrase is “start and attach to an *already existing cluster”
Comment 972710 by 8605246
- Upvotes: 4
the correct option is D; “Can Restart” privileges on the required cluster https://docs.databricks.com/en/security/auth-authz/access-control/cluster-acl.html
Question F8eT4lrxXItrnsIPGmaM
Question
A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams. The proposed directory structure is displayed below: //IMG//
Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?
Choices
- A: No; Delta Lake manages streaming checkpoints in the transaction log.
- B: Yes; both of the streams can share a single checkpoint directory.
- C: No; only one stream can write to a Delta Lake table.
- D: Yes; Delta Lake supports infinite concurrent writers.
- E: No; each of the streams needs to have its own checkpoint directory.
answer?
Answer: E Answer_ET: E Community answer E (88%) 13% Discussion
Comment 1001484 by thxsgod
- Upvotes: 11
Selected Answer: E Correct, E.
Comment 1298013 by benni_ale
- Upvotes: 1
Selected Answer: E E is the correct
Comment 1222469 by imatheushenrique
- Upvotes: 2
E. No; each of the streams needs to have its own checkpoint directory. The checkpoint directory is 1 to 1
Comment 1209513 by svik
- Upvotes: 2
Selected Answer: B It is not clear from the question that year_week=2020_01 and year_week=2020_02 are used by stream 1 and stream 2 respectively. If they use the common parent checkpoint directory with individual sub folders for checkpointing, that should work fine. In that case the answer should be B
Comment 1121948 by Jay_98_11
- Upvotes: 1
Selected Answer: E correct E
Comment 1118610 by kz_data
- Upvotes: 1
Selected Answer: E E is correct
Comment 1040330 by sturcu
- Upvotes: 2
E is correct. If user wants 1 checkpoint directory then he needs to unions streams before writing.
Comment 993993 by Eertyy
- Upvotes: 3
answer is correct
Question KlQ0Q7J7ndUFckkmgTp8
Question
An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:
//IMG//
Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.
If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?
Choices
- A: Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.
- B: Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.
- C: Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.
- D: Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1340359 by lene
- Upvotes: 1
Selected Answer: B No intra-batch duplicates, can be inter-batch duplicates