Questions and Answers
Question J4ES6Y9oBmN6AfxQugWe
Question
A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:
//IMG//
A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67.
Which statement describes the outcome of this batch insert?
Choices
- A: The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.
- B: The write will fail completely because of the constraint violation and no records will be inserted into the target table.
- C: The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.
- D: The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1251670 by vexor3
- Upvotes: 2
Selected Answer: B B is correct
Question Hr4Dz89YlXSHYdrmYbUi
Question
A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constraints and multi-table inserts to validate records on write.
Which consideration will impact the decisions made by the engineer while migrating this workload?
Choices
- A: Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.
- B: Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake’s upsert functionality.
- C: Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.
- D: All Delta Lake transactions are ACID compliant against a single table, and Databricks does not enforce foreign key constraints.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1366220 by lakime
- Upvotes: 1
Selected Answer: D Yes, D is correct, not enforced
Comment 1341657 by RandomForest
- Upvotes: 1
Selected Answer: D D is correct as primary and foreign keys are informational only and are not enforced in databricks https://learn.microsoft.com/en-us/azure/databricks/tables/constraints
Comment 1229315 by hpkr
- Upvotes: 3
Selected Answer: D D is correct
Question CLxwEQov5ZRylfNY7np3
Question
A table is registered with the following code: //IMG//
Both users and orders are Delta Lake tables. Which statement describes the results of querying recent_orders?
Choices
- A: All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query finishes.
- B: All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.
- C: Results will be computed and cached when the table is defined; these cached results will incrementally update as new records are inserted into source tables.
- D: All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query began.
- E: The versions of each source table will be stored in the table transaction log; query results will be saved to DBFS with each query.
answer?
Answer: B Answer_ET: B Community answer B (55%) D (45%) Discussion
Comment 969950 by asmayassineg
- Upvotes: 18
Correct answer is B. table is created and data of join will be stored on DBFS and it will be returned on query time
Comment 1267921 by practicioner
- Upvotes: 18
Selected Answer: D The question says “Which statement describes the results of querying recent_orders?”
The question doesn’t ask about the code snipped itself. This question is about the logic of “select * from recent_orders” after the creation of recent_orders.
answer is D
D is the right answer
Comment 1409598 by lakime
- Upvotes: 1
Selected Answer: D It’s CTAAS, nothing is being stored in DBFS…
Comment 1362380 by Tedet
- Upvotes: 1
Selected Answer: B It’s CTAS. So snapshot at the time of creation will be returned.
Comment 1360165 by ptty
- Upvotes: 2
Selected Answer: B It’s B because this is different from creating a view (which would use CREATE VIEW instead), where the query logic would be executed each time the view is accessed.
Comment 1334761 by arekm
- Upvotes: 1
Selected Answer: B B - it is a table.
Query time answers assume we are talking about a view, which we aren’t. Table is not automatically updated whenever the tables used in CTAS change - it is a standalone entity.
Comment 1325428 by Sriramiyer92
- Upvotes: 1
Selected Answer: B The answer is B. Pls note it is CTAS statement and not a subquery.
Comment 1300787 by nedlo
- Upvotes: 1
Selected Answer: B Only logic there is inside create statemetn and it will execute once while executing “create table” statement. Further select queries will only select any data that was inserted during create table statement , data wont by updated automatically. So B
Comment 1298701 by benni_ale
- Upvotes: 1
i think B
Comment 1297480 by benni_ale
- Upvotes: 1
Selected Answer: B i picked b
Comment 1226602 by Isio05
- Upvotes: 2
Selected Answer: B CTAS statements persist it results, so B
Comment 1222461 by imatheushenrique
- Upvotes: 2
B. All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.
Comment 1213873 by coercion
- Upvotes: 2
Selected Answer: B “Create Table” is an action so “B”
Comment 1145238 by PrashantTiwari
- Upvotes: 1
B is correct
Comment 1118588 by kz_data
- Upvotes: 1
Selected Answer: B I think B is the correct answer
Comment 1117208 by IWantCerts
- Upvotes: 1
Selected Answer: B B is correct. Views compute when query is executed, not when defined. And vice versa for tables.
Comment 1114351 by cryptoflam
- Upvotes: 1
Selected Answer: B Key here is that option D says “returned”. The CTAS statement does not return results, thus option B is correct.
Comment 1075942 by aragorn_brego
- Upvotes: 2
Selected Answer: B The correct answer is:
B. All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.
When the CREATE TABLE AS statement is executed, it runs the enclosed SELECT statement immediately to pull the current data from the users and orders tables where the order_date is within the last 7 days. This result is then stored as a new table called recent_orders in the Delta Lake on the DBFS (Databricks File System). Subsequent queries against recent_orders will return this stored data, and not recompute the join unless the table is updated or refreshed.
Comment 1060806 by BIKRAM063
- Upvotes: 3
Selected Answer: B Correct is B . CTAS command
Comment 1040304 by sturcu
- Upvotes: 1
Selected Answer: B Creating a table will not display results. You need to make a select alter it is created.
Comment 1014147 by Santitoxic
- Upvotes: 1
Based on typical Delta Lake behavior, option D is the most accurate description. Delta Lake queries generally execute at query time and retur n results based on the state of the source tables at the time the query began. Delta Lake provides features for managing data versions and transactions, but it doesn’t precompute and store results like option B or cache results like option C.
Comment 1004023 by lucasasterio
- Upvotes: 2
Selected Answer: B correct is B
Comment 988303 by robson90
- Upvotes: 2
Aa ok, I missed “logic will execute at query time” ignore my previous comment
Comment 988299 by robson90
- Upvotes: 1
Why not D? Table does not need to be stored on DBFS if using Unity Catalog. At least that’s my understanding https://docs.databricks.com/en/dbfs/unity-catalog.html
Comment 983570 by BrianNguyen95
- Upvotes: 4
correct answer is B
Question ijJAo4zKLjRIVcGS1dyq
Question
A data architect has heard about Delta Lake’s built-in versioning and time travel capabilities. For auditing purposes, they have a requirement to maintain a full record of all valid street addresses as they appear in the customers table.
The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.
Which piece of information is critical to this decision?
Choices
- A: Data corruption can occur if a query fails in a partially completed state because Type 2 tables require setting multiple fields in a single update.
- B: Shallow clones can be combined with Type 1 tables to accelerate historic queries for long-term versioning.
- C: Delta Lake time travel cannot be used to query previous versions of these tables because Type 1 changes modify data files in place.
- D: Delta Lake time travel does not scale well in cost or latency to provide a long-term versioning solution.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1229317 by hpkr
- Upvotes: 3
Selected Answer: D correct answer - D
Question oXEkWwwf3UP2Rllio0jH
Question
A data engineer wants to join a stream of advertisement impressions (when an ad was shown) with another stream of user clicks on advertisements to correlate when impressions led to monetizable clicks.
In the code below, Impressions is a streaming DataFrame with a watermark (“event_time”, “10 minutes”)
//IMG//
The data engineer notices the query slowing down significantly.
Which solution would improve the performance?
Choices
- A: Joining on event time constraint: clickTime >= impressionTime AND clickTime ⇐ impressionTime interval 1 hour
- B: Joining on event time constraint: clickTime + 3 hours < impressionTime - 2 hours
- C: Joining on event time constraint: clickTime == impressionTime using a leftOuter join
- D: Joining on event time constraint: clickTime >= impressionTime - interval 3 hours and removing watermarks
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1300588 by m79590530
- Upvotes: 2
Selected Answer: A Answer A is the only possible logically. B configures clickTime to be earlier than impressionTime C says that clickTime should be the same as impressionTime with all clicks left joined to impressions D wants to remove Watermarks which will lead to memory leaks and depletion for both streams staging/aggregation purposes by Spark