Questions and Answers
Question TkgrjU4LH7WTUo7kTbgX
Question
A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:
//IMG//
A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67.
Which statement describes the outcome of this batch insert?
Choices
- A: The write will fail when the violating record is reached; any records previously processed will be recorded to the target table.
- B: The write will fail completely because of the constraint violation and no records will be inserted into the target table.
- C: The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.
- D: The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.
- E: The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1076666 by aragorn_brego
- Upvotes: 5
Selected Answer: B In systems that support atomic transactions, such as Delta Lake, when a batch operation encounters a record that violates a CHECK constraint, the entire operation fails, and no records are inserted, including those that do not violate the constraint. This is to ensure the atomicity of the transaction, meaning that either all the changes are committed, or none are, maintaining data integrity. The record with a longitude of 212.67 violates the constraint because longitude values must be between -180 and 180 degrees.
Comment 1141684 by vctrhugo
- Upvotes: 5
Selected Answer: B In Delta Lake, when a batch job attempts to insert records into a table that has a CHECK constraint, if any record violates the constraint, the entire write operation fails. This is because Delta Lake enforces strong transactional guarantees, which means that either all changes in a transaction are saved, or none are.
Comment 1131751 by spaceexplorer
- Upvotes: 1
Selected Answer: B B is correct
Comment 1066288 by Dileepvikram
- Upvotes: 4
B is the answer
Comment 1053489 by sturcu
- Upvotes: 4
Selected Answer: B B is the answer
Comment 1049724 by PearApple
- Upvotes: 3
B is the ans
Question mEBacLIbj9unFhLMHvNr
Question
A junior data engineer has manually configured a series of jobs using the Databricks Jobs UI. Upon reviewing their work, the engineer realizes that they are listed as the “Owner” for each job. They attempt to transfer “Owner” privileges to the “DevOps” group, but cannot successfully accomplish this task.
Which statement explains what is preventing this privilege transfer?
Choices
- A: Databricks jobs must have exactly one owner; “Owner” privileges cannot be assigned to a group.
- B: The creator of a Databricks job will always have “Owner” privileges; this configuration cannot be changed.
- C: Other than the default “admins” group, only individual users can be granted privileges on jobs.
- D: A user can only transfer job ownership to a group if they are also a member of that group.
- E: Only workspace administrators can grant “Owner” privileges to a group.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1166171 by hal2401me
- Upvotes: 3
Selected Answer: A did a test. “group cannot be owner” is displayed.
Comment 1141682 by vctrhugo
- Upvotes: 1
Selected Answer: A In Databricks, each job must have exactly one owner, which is typically the user who created the job. This “Owner” privilege allows the user to perform any action on the job, including modifying its settings or deleting it. However, this privilege cannot be assigned to a group. If you want to allow multiple users or a group of users to manage a job, you can use ACLs (Access Control Lists) to grant them the necessary permissions. But the “Owner” privilege will still remain with the individual user who created the job.
Comment 1053495 by sturcu
- Upvotes: 4
Selected Answer: A Correct A job cannot have more than one owner. A job cannot have a group as an owner
Question dsw2t6OuXp5324bZGsmK
Question
All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:
key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG
There are 5 unique topics being ingested. Only the “registration” topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.
Which of the following solutions meets the requirements?
Choices
- A: All data should be deleted biweekly; Delta Lake’s time travel functionality should be leveraged to maintain a history of non-PII information.
- B: Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory.
- C: Because the value field is stored as binary data, this information is not considered PII and no special precautions should be taken.
- D: Separate object storage containers should be specified based on the partition field, allowing isolation at the storage level.
- E: Data should be partitioned by the topic field, allowing ACLs and delete statements to leverage partition boundaries.
answer?
Answer: E Answer_ET: E Community answer E (88%) 8% Discussion
Comment 1056080 by mouad_attaqi
- Upvotes: 13
Selected Answer: E I think answer E is correct, as by default partitionning by a column will create a separate folder for each subset data linked to the partition
Comment 1322320 by benni_ale
- Upvotes: 2
Selected Answer: E Partitioning by topic field let delete queries leverage partioning boundaries
Comment 1301420 by benni_ale
- Upvotes: 1
Selected Answer: E E E E E E
Comment 1150989 by ojudz08
- Upvotes: 1
Selected Answer: D i think it’s best to isolate the storage to avoid mistakenly deleting tables in the same storage so I go with D
Comment 1131758 by spaceexplorer
- Upvotes: 1
Selected Answer: E E is correct
Comment 1105054 by ervinshang
- Upvotes: 2
Selected Answer: E E is correct
Comment 1076677 by aragorn_brego
- Upvotes: 2
Selected Answer: E Partitioning data by the topic field would allow the data engineering team to apply access control lists (ACLs) to restrict access to the partition containing the “registration” topic, which holds PII. Furthermore, the team can set up automated deletion policies that specifically target the partition with PII data to delete records after 14 days, without affecting the data in other partitions. This approach meets both the privacy requirements for PII and the data retention goals for non-PII information.
Comment 1066299 by Dileepvikram
- Upvotes: 3
I think answer is E
Comment 1056614 by [Removed]
- Upvotes: 1
Selected Answer: B The solution that meets the requirements is: B. Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory.
Partitioning the data by the registration field allows the directory containing PII records to be isolated and access restricted via ACLs. Additionally, the data retention requirements can be met by setting up a separate job or process to remove PII records that are 14 days old. For non-PII records, they can be retained indefinitely utilizing Delta Lake’s time travel functionality.
Comment 1053501 by sturcu
- Upvotes: 1
Selected Answer: D Correct
Question TOyKVMPJNKo6t2yskOZE
Question
The data architect has decided that once data has been ingested from external sources into the Databricks Lakehouse, table access controls will be leveraged to manage permissions for all production tables and views.
The following logic was executed to grant privileges for interactive queries on a production database to the core engineering group.
GRANT USAGE ON DATABASE prod TO eng; GRANT SELECT ON DATABASE prod TO eng;
Assuming these are the only privileges that have been granted to the eng group and that these users are not workspace administrators, which statement describes their privileges?
Choices
- A: Group members have full permissions on the prod database and can also assign permissions to other users or groups.
- B: Group members are able to list all tables in the prod database but are not able to see the results of any queries on those tables.
- C: Group members are able to query and modify all tables and views in the prod database, but cannot create new tables or views.
- D: Group members are able to query all tables and views in the prod database, but cannot create or edit anything in the database.
- E: Group members are able to create, query, and modify all tables and views in the prod database, but cannot define custom functions.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1053503 by sturcu
- Upvotes: 6
Selected Answer: D Usage and Select …sa abasically they can only select
Comment 1306946 by benni_ale
- Upvotes: 1
Selected Answer: D D is ok
Comment 1160226 by Curious76
- Upvotes: 1
Selected Answer: D D is correct
Comment 1141673 by vctrhugo
- Upvotes: 1
Selected Answer: D The GRANT statements provided in the logic grant the USAGE privilege, allowing the group members to see the existence of the database, and the SELECT privilege, allowing them to query tables and views. However, they do not have permissions to create or edit anything in the database. Therefore, the correct description is that group members can query all tables and views in the prod database but cannot create or edit any objects in the database.
Comment 1111476 by divingbell17
- Upvotes: 1
Selected Answer: D D is correct assuming unity catalog is not enabled
Comment 1076686 by aragorn_brego
- Upvotes: 3
Selected Answer: D The GRANT USAGE ON DATABASE statement gives the eng group the ability to access the prod database. This means they can enter the database context and list the tables. The GRANT SELECT ON DATABASE statement additionally grants them permission to perform SELECT queries on all existing tables and views within the prod database. However, these privileges do not include creating new tables or views, modifying existing tables, or assigning permissions to other users or groups.
Comment 1066311 by Dileepvikram
- Upvotes: 4
D is answer
Question 6fZfhPy9TtiCd3Y1AyoH
Question
A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.
In which location can one review the timeline for cluster resizing events?
Choices
- A: Workspace audit logs
- B: Driver’s log file
- C: Ganglia
- D: Cluster Event Log
- E: Executor’s log file
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1160227 by Curious76
- Upvotes: 3
Selected Answer: D The Cluster Event Log provides detailed information about various events affecting the cluster throughout its lifecycle, including cluster creation, restarts, termination, and resizing events. It displays the timestamp, event type (e.g., “CLUSTER_RESIZED”), and relevant details for each event, allowing the administrator to review the timeline for cluster scaling behavior and identify potential patterns related to user activity or resource-intensive queries.
Comment 1141672 by vctrhugo
- Upvotes: 1
Selected Answer: D The timeline for cluster resizing events can be reviewed in the Cluster Event Log. This log provides information about cluster scaling events, including when the cluster is scaled up or down. You can access this information to understand the reasons behind autoscaling events and whether they are triggered by many concurrent users or resource-intensive queries.
Comment 1100375 by alexvno
- Upvotes: 2
Selected Answer: D Cluster event log
Comment 1076691 by aragorn_brego
- Upvotes: 2
Selected Answer: D The Cluster Event Log in Databricks will show the timeline for cluster resizing events, including details about when and why a cluster was resized (scaled up or down). This log would help the workspace administrator determine the causes of cluster scaling, whether due to many concurrent users submitting jobs or a few users running resource-intensive queries.
less suitable: C. Ganglia provides metrics on system-level performance, such as CPU and memory usage, but does not log specific cluster scaling events.
Comment 1063455 by PearApple
- Upvotes: 2
cluster event log. D
Comment 1053510 by sturcu
- Upvotes: 3
Selected Answer: D Cluster Event Log