Questions and Answers

Question J4ES6Y9oBmN6AfxQugWe

Question

A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:

//IMG//

A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67.

Which statement describes the outcome of this batch insert?

Choices

A: The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.
B: The write will fail completely because of the constraint violation and no records will be inserted into the target table.
C: The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.
D: The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1251670 by vexor3

Upvotes: 2

Selected Answer: B B is correct

Question Hr4Dz89YlXSHYdrmYbUi

Question

A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constraints and multi-table inserts to validate records on write.

Which consideration will impact the decisions made by the engineer while migrating this workload?

Choices

A: Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.
B: Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake’s upsert functionality.
C: Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.
D: All Delta Lake transactions are ACID compliant against a single table, and Databricks does not enforce foreign key constraints.

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1366220 by lakime

Upvotes: 1

Selected Answer: D Yes, D is correct, not enforced

Comment 1341657 by RandomForest

Upvotes: 1

Selected Answer: D D is correct as primary and foreign keys are informational only and are not enforced in databricks https://learn.microsoft.com/en-us/azure/databricks/tables/constraints

Comment 1229315 by hpkr

Upvotes: 3

Selected Answer: D D is correct

Question CLxwEQov5ZRylfNY7np3

Question

A table is registered with the following code: //IMG//

Both users and orders are Delta Lake tables. Which statement describes the results of querying recent_orders?

Choices

A: All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query finishes.
B: All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.
C: Results will be computed and cached when the table is defined; these cached results will incrementally update as new records are inserted into source tables.
D: All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query began.
E: The versions of each source table will be stored in the table transaction log; query results will be saved to DBFS with each query.

answer?

Answer: B Answer_ET: B Community answer B (55%) D (45%) Discussion

Comment 969950 by asmayassineg

Upvotes: 18

Correct answer is B. table is created and data of join will be stored on DBFS and it will be returned on query time

Comment 1267921 by practicioner

Upvotes: 18

Selected Answer: D The question says “Which statement describes the results of querying recent_orders?”

The question doesn’t ask about the code snipped itself. This question is about the logic of “select * from recent_orders” after the creation of recent_orders.

answer is D

D is the right answer

Comment 1409598 by lakime

Upvotes: 1

Selected Answer: D It’s CTAAS, nothing is being stored in DBFS…

Comment 1362380 by Tedet

Upvotes: 1

Selected Answer: B It’s CTAS. So snapshot at the time of creation will be returned.

Comment 1360165 by ptty

Upvotes: 2

Selected Answer: B It’s B because this is different from creating a view (which would use CREATE VIEW instead), where the query logic would be executed each time the view is accessed.

Comment 1334761 by arekm

Upvotes: 1

Selected Answer: B B - it is a table.

Query time answers assume we are talking about a view, which we aren’t. Table is not automatically updated whenever the tables used in CTAS change - it is a standalone entity.

Comment 1325428 by Sriramiyer92

Upvotes: 1

Selected Answer: B The answer is B. Pls note it is CTAS statement and not a subquery.

Comment 1300787 by nedlo

Upvotes: 1

Selected Answer: B Only logic there is inside create statemetn and it will execute once while executing “create table” statement. Further select queries will only select any data that was inserted during create table statement , data wont by updated automatically. So B

Comment 1298701 by benni_ale

Upvotes: 1

i think B

Comment 1297480 by benni_ale

Upvotes: 1

Selected Answer: B i picked b

Comment 1226602 by Isio05

Upvotes: 2

Selected Answer: B CTAS statements persist it results, so B

Comment 1222461 by imatheushenrique

Upvotes: 2

B. All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.

Comment 1213873 by coercion

Upvotes: 2

Selected Answer: B “Create Table” is an action so “B”

Comment 1145238 by PrashantTiwari

Upvotes: 1

B is correct

Comment 1118588 by kz_data

Upvotes: 1

Selected Answer: B I think B is the correct answer

Comment 1117208 by IWantCerts

Upvotes: 1

Selected Answer: B B is correct. Views compute when query is executed, not when defined. And vice versa for tables.

Comment 1114351 by cryptoflam

Upvotes: 1

Selected Answer: B Key here is that option D says “returned”. The CTAS statement does not return results, thus option B is correct.

Comment 1075942 by aragorn_brego

Upvotes: 2

Selected Answer: B The correct answer is:

B. All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.

When the CREATE TABLE AS statement is executed, it runs the enclosed SELECT statement immediately to pull the current data from the users and orders tables where the order_date is within the last 7 days. This result is then stored as a new table called recent_orders in the Delta Lake on the DBFS (Databricks File System). Subsequent queries against recent_orders will return this stored data, and not recompute the join unless the table is updated or refreshed.

Comment 1060806 by BIKRAM063

Upvotes: 3

Selected Answer: B Correct is B . CTAS command

Comment 1040304 by sturcu

Upvotes: 1

Selected Answer: B Creating a table will not display results. You need to make a select alter it is created.

Comment 1014147 by Santitoxic

Upvotes: 1

Based on typical Delta Lake behavior, option D is the most accurate description. Delta Lake queries generally execute at query time and retur n results based on the state of the source tables at the time the query began. Delta Lake provides features for managing data versions and transactions, but it doesn’t precompute and store results like option B or cache results like option C.

Comment 1004023 by lucasasterio

Upvotes: 2

Selected Answer: B correct is B

Comment 988303 by robson90

Upvotes: 2

Aa ok, I missed “logic will execute at query time” ignore my previous comment

Comment 988299 by robson90

Upvotes: 1

Why not D? Table does not need to be stored on DBFS if using Unity Catalog. At least that’s my understanding https://docs.databricks.com/en/dbfs/unity-catalog.html

Comment 983570 by BrianNguyen95

Upvotes: 4

correct answer is B

Question ijJAo4zKLjRIVcGS1dyq

Question

A data architect has heard about Delta Lake’s built-in versioning and time travel capabilities. For auditing purposes, they have a requirement to maintain a full record of all valid street addresses as they appear in the customers table.

The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.

Which piece of information is critical to this decision?

Choices

A: Data corruption can occur if a query fails in a partially completed state because Type 2 tables require setting multiple fields in a single update.
B: Shallow clones can be combined with Type 1 tables to accelerate historic queries for long-term versioning.
C: Delta Lake time travel cannot be used to query previous versions of these tables because Type 1 changes modify data files in place.
D: Delta Lake time travel does not scale well in cost or latency to provide a long-term versioning solution.

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1229317 by hpkr

Upvotes: 3

Selected Answer: D correct answer - D

Question oXEkWwwf3UP2Rllio0jH

Question

A data engineer wants to join a stream of advertisement impressions (when an ad was shown) with another stream of user clicks on advertisements to correlate when impressions led to monetizable clicks.

In the code below, Impressions is a streaming DataFrame with a watermark (“event_time”, “10 minutes”)

//IMG//

The data engineer notices the query slowing down significantly.

Which solution would improve the performance?

Choices

A: Joining on event time constraint: clickTime >= impressionTime AND clickTime ⇐ impressionTime interval 1 hour
B: Joining on event time constraint: clickTime + 3 hours < impressionTime - 2 hours
C: Joining on event time constraint: clickTime == impressionTime using a leftOuter join
D: Joining on event time constraint: clickTime >= impressionTime - interval 3 hours and removing watermarks

answer?

Answer: A Answer_ET: A Community answer A (100%) Discussion

Comment 1300588 by m79590530

Upvotes: 2

Selected Answer: A Answer A is the only possible logically. B configures clickTime to be earlier than impressionTime C says that clickTime should be the same as impressionTime with all clicks left joined to impressions D wants to remove Watermarks which will lead to memory leaks and depletion for both streams staging/aggregation purposes by Spark

vuthanhdatt's Second Brain

Explorer

14

Questions and Answers

Question J4ES6Y9oBmN6AfxQugWe

Question

Choices

Comment 1251670 by vexor3

Question Hr4Dz89YlXSHYdrmYbUi

Question

Choices

Comment 1366220 by lakime

Comment 1341657 by RandomForest

Comment 1229315 by hpkr

Question CLxwEQov5ZRylfNY7np3

Question

Choices

Comment 969950 by asmayassineg

Comment 1267921 by practicioner

Comment 1409598 by lakime

Comment 1362380 by Tedet

Comment 1360165 by ptty

Comment 1334761 by arekm

Comment 1325428 by Sriramiyer92

Comment 1300787 by nedlo

Comment 1298701 by benni_ale

Comment 1297480 by benni_ale

Comment 1226602 by Isio05

Comment 1222461 by imatheushenrique

Comment 1213873 by coercion

Comment 1145238 by PrashantTiwari

Comment 1118588 by kz_data

Comment 1117208 by IWantCerts

Comment 1114351 by cryptoflam

Comment 1075942 by aragorn_brego

Comment 1060806 by BIKRAM063

Comment 1040304 by sturcu

Comment 1014147 by Santitoxic

Comment 1004023 by lucasasterio

Comment 988303 by robson90

Comment 988299 by robson90

Comment 983570 by BrianNguyen95

Question ijJAo4zKLjRIVcGS1dyq

Question

Choices

Comment 1229317 by hpkr

Question oXEkWwwf3UP2Rllio0jH

Question

Choices

Comment 1300588 by m79590530

Graph View

Table of Contents