Questions and Answers

Question TkgrjU4LH7WTUo7kTbgX

Question

A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:

//IMG//

A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67.

Which statement describes the outcome of this batch insert?

Choices

A: The write will fail when the violating record is reached; any records previously processed will be recorded to the target table.
B: The write will fail completely because of the constraint violation and no records will be inserted into the target table.
C: The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.
D: The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.
E: The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1076666 by aragorn_brego

Upvotes: 5

Selected Answer: B In systems that support atomic transactions, such as Delta Lake, when a batch operation encounters a record that violates a CHECK constraint, the entire operation fails, and no records are inserted, including those that do not violate the constraint. This is to ensure the atomicity of the transaction, meaning that either all the changes are committed, or none are, maintaining data integrity. The record with a longitude of 212.67 violates the constraint because longitude values must be between -180 and 180 degrees.

Comment 1141684 by vctrhugo

Upvotes: 5

Selected Answer: B In Delta Lake, when a batch job attempts to insert records into a table that has a CHECK constraint, if any record violates the constraint, the entire write operation fails. This is because Delta Lake enforces strong transactional guarantees, which means that either all changes in a transaction are saved, or none are.

Comment 1131751 by spaceexplorer

Upvotes: 1

Selected Answer: B B is correct

Comment 1066288 by Dileepvikram

Upvotes: 4

B is the answer

Comment 1053489 by sturcu

Upvotes: 4

Selected Answer: B B is the answer

Comment 1049724 by PearApple

Upvotes: 3

B is the ans

Question mEBacLIbj9unFhLMHvNr

Question

A junior data engineer has manually configured a series of jobs using the Databricks Jobs UI. Upon reviewing their work, the engineer realizes that they are listed as the “Owner” for each job. They attempt to transfer “Owner” privileges to the “DevOps” group, but cannot successfully accomplish this task.

Which statement explains what is preventing this privilege transfer?

Choices

A: Databricks jobs must have exactly one owner; “Owner” privileges cannot be assigned to a group.
B: The creator of a Databricks job will always have “Owner” privileges; this configuration cannot be changed.
C: Other than the default “admins” group, only individual users can be granted privileges on jobs.
D: A user can only transfer job ownership to a group if they are also a member of that group.
E: Only workspace administrators can grant “Owner” privileges to a group.

answer?

Answer: A Answer_ET: A Community answer A (100%) Discussion

Comment 1166171 by hal2401me

Upvotes: 3

Selected Answer: A did a test. “group cannot be owner” is displayed.

Comment 1141682 by vctrhugo

Upvotes: 1

Selected Answer: A In Databricks, each job must have exactly one owner, which is typically the user who created the job. This “Owner” privilege allows the user to perform any action on the job, including modifying its settings or deleting it. However, this privilege cannot be assigned to a group. If you want to allow multiple users or a group of users to manage a job, you can use ACLs (Access Control Lists) to grant them the necessary permissions. But the “Owner” privilege will still remain with the individual user who created the job.

Comment 1053495 by sturcu

Upvotes: 4

Selected Answer: A Correct A job cannot have more than one owner. A job cannot have a group as an owner

Question dsw2t6OuXp5324bZGsmK

Question

All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:

key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG

There are 5 unique topics being ingested. Only the “registration” topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.

Which of the following solutions meets the requirements?

Choices

A: All data should be deleted biweekly; Delta Lake’s time travel functionality should be leveraged to maintain a history of non-PII information.
B: Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory.
C: Because the value field is stored as binary data, this information is not considered PII and no special precautions should be taken.
D: Separate object storage containers should be specified based on the partition field, allowing isolation at the storage level.
E: Data should be partitioned by the topic field, allowing ACLs and delete statements to leverage partition boundaries.

answer?

Answer: E Answer_ET: E Community answer E (88%) 8% Discussion

Comment 1056080 by mouad_attaqi

Upvotes: 13

Selected Answer: E I think answer E is correct, as by default partitionning by a column will create a separate folder for each subset data linked to the partition

Comment 1322320 by benni_ale

Upvotes: 2

Selected Answer: E Partitioning by topic field let delete queries leverage partioning boundaries

Comment 1301420 by benni_ale

Upvotes: 1

Selected Answer: E E E E E E

Comment 1150989 by ojudz08

Upvotes: 1

Selected Answer: D i think it’s best to isolate the storage to avoid mistakenly deleting tables in the same storage so I go with D

Comment 1131758 by spaceexplorer

Upvotes: 1

Selected Answer: E E is correct

Comment 1105054 by ervinshang

Upvotes: 2

Selected Answer: E E is correct

Comment 1076677 by aragorn_brego

Upvotes: 2

Selected Answer: E Partitioning data by the topic field would allow the data engineering team to apply access control lists (ACLs) to restrict access to the partition containing the “registration” topic, which holds PII. Furthermore, the team can set up automated deletion policies that specifically target the partition with PII data to delete records after 14 days, without affecting the data in other partitions. This approach meets both the privacy requirements for PII and the data retention goals for non-PII information.

Comment 1066299 by Dileepvikram

Upvotes: 3

I think answer is E

Comment 1056614 by [Removed]

Upvotes: 1

Selected Answer: B The solution that meets the requirements is: B. Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory.

Partitioning the data by the registration field allows the directory containing PII records to be isolated and access restricted via ACLs. Additionally, the data retention requirements can be met by setting up a separate job or process to remove PII records that are 14 days old. For non-PII records, they can be retained indefinitely utilizing Delta Lake’s time travel functionality.

Comment 1053501 by sturcu

Upvotes: 1

Selected Answer: D Correct

Question TOyKVMPJNKo6t2yskOZE

Question

The data architect has decided that once data has been ingested from external sources into the Databricks Lakehouse, table access controls will be leveraged to manage permissions for all production tables and views.

The following logic was executed to grant privileges for interactive queries on a production database to the core engineering group.

GRANT USAGE ON DATABASE prod TO eng; GRANT SELECT ON DATABASE prod TO eng;

Assuming these are the only privileges that have been granted to the eng group and that these users are not workspace administrators, which statement describes their privileges?

Choices

A: Group members have full permissions on the prod database and can also assign permissions to other users or groups.
B: Group members are able to list all tables in the prod database but are not able to see the results of any queries on those tables.
C: Group members are able to query and modify all tables and views in the prod database, but cannot create new tables or views.
D: Group members are able to query all tables and views in the prod database, but cannot create or edit anything in the database.
E: Group members are able to create, query, and modify all tables and views in the prod database, but cannot define custom functions.

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1053503 by sturcu

Upvotes: 6

Selected Answer: D Usage and Select …sa abasically they can only select

Comment 1306946 by benni_ale

Upvotes: 1

Selected Answer: D D is ok

Comment 1160226 by Curious76

Upvotes: 1

Selected Answer: D D is correct

Comment 1141673 by vctrhugo

Upvotes: 1

Selected Answer: D The GRANT statements provided in the logic grant the USAGE privilege, allowing the group members to see the existence of the database, and the SELECT privilege, allowing them to query tables and views. However, they do not have permissions to create or edit anything in the database. Therefore, the correct description is that group members can query all tables and views in the prod database but cannot create or edit any objects in the database.

Comment 1111476 by divingbell17

Upvotes: 1

Selected Answer: D D is correct assuming unity catalog is not enabled

Comment 1076686 by aragorn_brego

Upvotes: 3

Selected Answer: D The GRANT USAGE ON DATABASE statement gives the eng group the ability to access the prod database. This means they can enter the database context and list the tables. The GRANT SELECT ON DATABASE statement additionally grants them permission to perform SELECT queries on all existing tables and views within the prod database. However, these privileges do not include creating new tables or views, modifying existing tables, or assigning permissions to other users or groups.

Comment 1066311 by Dileepvikram

Upvotes: 4

D is answer

Question 6fZfhPy9TtiCd3Y1AyoH

Question

A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.

In which location can one review the timeline for cluster resizing events?

Choices

A: Workspace audit logs
B: Driver’s log file
C: Ganglia
D: Cluster Event Log
E: Executor’s log file

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1160227 by Curious76

Upvotes: 3

Selected Answer: D The Cluster Event Log provides detailed information about various events affecting the cluster throughout its lifecycle, including cluster creation, restarts, termination, and resizing events. It displays the timestamp, event type (e.g., “CLUSTER_RESIZED”), and relevant details for each event, allowing the administrator to review the timeline for cluster scaling behavior and identify potential patterns related to user activity or resource-intensive queries.

Comment 1141672 by vctrhugo

Upvotes: 1

Selected Answer: D The timeline for cluster resizing events can be reviewed in the Cluster Event Log. This log provides information about cluster scaling events, including when the cluster is scaled up or down. You can access this information to understand the reasons behind autoscaling events and whether they are triggered by many concurrent users or resource-intensive queries.

Comment 1100375 by alexvno

Upvotes: 2

Selected Answer: D Cluster event log

Comment 1076691 by aragorn_brego

Upvotes: 2

Selected Answer: D The Cluster Event Log in Databricks will show the timeline for cluster resizing events, including details about when and why a cluster was resized (scaled up or down). This log would help the workspace administrator determine the causes of cluster scaling, whether due to many concurrent users submitting jobs or a few users running resource-intensive queries.

less suitable: C. Ganglia provides metrics on system-level performance, such as CPU and memory usage, but does not log specific cluster scaling events.

Comment 1063455 by PearApple

Upvotes: 2

cluster event log. D

Comment 1053510 by sturcu

Upvotes: 3

Selected Answer: D Cluster Event Log

vuthanhdatt's Second Brain

Explorer

37

Questions and Answers

Question TkgrjU4LH7WTUo7kTbgX

Question

Choices

Comment 1076666 by aragorn_brego

Comment 1141684 by vctrhugo

Comment 1131751 by spaceexplorer

Comment 1066288 by Dileepvikram

Comment 1053489 by sturcu

Comment 1049724 by PearApple

Question mEBacLIbj9unFhLMHvNr

Question

Choices

Comment 1166171 by hal2401me

Comment 1141682 by vctrhugo

Comment 1053495 by sturcu

Question dsw2t6OuXp5324bZGsmK

Question

Choices

Comment 1056080 by mouad_attaqi

Comment 1322320 by benni_ale

Comment 1301420 by benni_ale

Comment 1150989 by ojudz08

Comment 1131758 by spaceexplorer

Comment 1105054 by ervinshang

Comment 1076677 by aragorn_brego

Comment 1066299 by Dileepvikram

Comment 1056614 by [Removed]

Comment 1053501 by sturcu

Question TOyKVMPJNKo6t2yskOZE

Question

Choices

Comment 1053503 by sturcu

Comment 1306946 by benni_ale

Comment 1160226 by Curious76

Comment 1141673 by vctrhugo

Comment 1111476 by divingbell17

Comment 1076686 by aragorn_brego

Comment 1066311 by Dileepvikram

Question 6fZfhPy9TtiCd3Y1AyoH

Question

Choices

Comment 1160227 by Curious76

Comment 1141672 by vctrhugo

Comment 1100375 by alexvno

Comment 1076691 by aragorn_brego

Comment 1063455 by PearApple

Comment 1053510 by sturcu

Graph View

Table of Contents