Questions and Answers
Question 4NKjPBmfrlqrakYe8vIj
Question
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:
//IMG//
The data engineer only wants the query to process all of the available data in as many batches as required.
Which line of code should the data engineer use to fill in the blank?
Choices
- A: trigger(availableNow=True)
- B: trigger(processingTime= “once”)
- C: trigger(continuous= “once”)
- D: trigger(once=True)
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1303510 by comoon
- Upvotes: 6
A is correct.
Comment 1315559 by hakimipous
- Upvotes: 1
Selected Answer: A A is correct
Comment 1312105 by lj114
- Upvotes: 1
Selected Answer: A A is correct. Similar to queries one-time micro-batch trigger, the query will process all the available data and then stop on its own. The difference is that, it will process the data in (possibly) multiple micro-batches based on the source options
Comment 1306269 by rsmf
- Upvotes: 2
Selected Answer: A Correct is A
Question lOCLvpRH78btFq3qylst
Question
Data engineer and data analysts are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.
Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?
Choices
- A: The pipeline can have different notebook sources in SQL & Python
- B: The pipeline will need to be written entirely in SQL
- C: The pipeline will need to use a batch source in place of a streaming source
- D: The pipeline will need to be written entirely in Python
answer?
Answer: A Answer_ET: A Community answer A (83%) C (17%) Discussion
Comment 1327350 by MultiCloudIronMan
- Upvotes: 4
Selected Answer: A The correct answer is A. The pipeline can have different notebook sources in SQL & Python. Delta Live Tables supports both SQL and Python, as well as streaming and batch sources. Therefore, the existing pipeline can continue to use both SQL and Python for different layers without needing to be rewritten entirely in one language or switching from a streaming to a batch source.
Comment 1322474 by Worldmaster
- Upvotes: 1
Selected Answer: A Imho A is correct
Comment 1306270 by rsmf
- Upvotes: 1
Selected Answer: C C is correct
Comment 1306151 by comoon
- Upvotes: 1
C is correct
Question BU8j5CQ8tTibup3aX43b
Question
Identify a scenario to use an external table. A Data Engineer needs to create a parquet bronze table and wants to ensure that it gets stored in a specific path in an external location.
Which table can be created in this scenario?
Choices
- A: An external table where the location is pointing to specific path in external location.
- B: An external table where the schema has managed location pointing to specific path in external location.
- C: A managed table where the catalog has managed location pointing to specific path in external location.
- D: A managed table where the location is pointing to specific path in external location.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1327351 by MultiCloudIronMan
- Upvotes: 1
Selected Answer: A The correct answer is A. An external table where the location is pointing to specific path in external location. This allows the data engineer to specify the exact path in an external location where the Parquet bronze table will be stored.
Comment 1320584 by Worldmaster
- Upvotes: 1
Selected Answer: A A. It defines an external table where the data is stored at a specific path in an external location (such as cloud storage). The path is provided when the table is created. This matches the requirement to store the Parquet data in an external location, ensuring the data is managed externally, which is the core characteristic of an external table.
Question mXlJboKZlW30J7WL8q9J
Question
Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.
A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.
How can the scenario be implemented?
Choices
- A: CONSTRAINT valid_location EXPECT (location = NULL)
- B: CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL UPDATE
- C: CONSTRAINT valid_location EXPECT (location != NULL) ON DROP ROW
- D: CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL
answer?
Answer: B Answer_ET: B Community answer B (71%) D (29%) Discussion
Comment 1328547 by san089
- Upvotes: 1
Selected Answer: B Correct Answer: B
From Databricks doc
CONSTRAINT valid_count EXPECT (count > 0) ON VIOLATION FAIL UPDATE
Comment 1327362 by MultiCloudIronMan
- Upvotes: 1
Selected Answer: D The correct answer is D. CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL. This constraint ensures that if the location details are not provided by the employee (i.e., location is null), the pipeline will be terminated.
Comment 1322610 by canada_2k1
- Upvotes: 1
Selected Answer: B The answer from Udemy course
Comment 1322056 by knightkkd
- Upvotes: 1
Selected Answer: B FAIL UPDATE: Immediately stop pipeline execution. https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/expectations#fail
Comment 1320586 by Worldmaster
- Upvotes: 2
Selected Answer: B B is correct https://docs.databricks.com/en/delta-live-tables/sql-ref.html
ON VIOLATION Optional action to take for failed rows: FAIL UPDATE: Immediately stop pipeline execution. DROP ROW: Drop the record and continue processing.
Comment 1306271 by rsmf
- Upvotes: 1
Selected Answer: D D is correct
Comment 1303137 by comoon
- Upvotes: 2
D is correct
Question XPHEibf89o3PR9Uw7eQg
Question
Which two conditions are applicable for governance in Databricks Unity Catalog? (Choose two.)
Choices
- A: You can have more than 1 metastore within a databricks account console but only 1 per region.
- B: Both catalog and schema must have a managed location in Unity Catalog provided metastore is not associated with a location
- C: You can have multiple catalogs within metastore and 1 catalog can be associated with multiple metastore
- D: If catalog is not associated with location, it’s mandatory to associate schema with managed locations
- E: If metastore is not associated with location, it’s mandatory to associate catalog with managed locations
answer?
Answer: AE Answer_ET: AE Community answer AE (60%) AD (40%) Discussion
Comment 1334032 by CaoMengde09
- Upvotes: 2
Selected Answer: AE Databricks recommends that you assign managed storage at the catalog level for logical data isolation, with metastore-level and schema-level as options. New workspaces that are enabled for Unity Catalog automatically are created without a metastore-level managed storage location.
It means that it’s a must to create a managed location for a catalog for otpimal data isolations, otherwise you’ll finish up with many schemas/tables data in the same managed location which is a bad prctice from governance perspective.
Working with not isolated catalog in Unity Catalog is chaotic, i would prefer to be on my workspace rather than working in such bad governed Unity Catalog.
Ans : [“A”, “E”]
Comment 1330845 by grygi
- Upvotes: 3
Selected Answer: AD Had this on the exam. AD was correct, I maxed this area.
Comment 1327373 by MultiCloudIronMan
- Upvotes: 2
Selected Answer: AE The correct answers are A. You can have more than 1 metastore within a Databricks account console but only 1 per region and E. If metastore is not associated with location, it’s mandatory to associate catalog with managed locations. These conditions are applicable for governance in Databricks Unity Catalog
Comment 1317781 by sakis213
- Upvotes: 1
Selected Answer: AD D. If catalog is not associated with location, it’s mandatory to associate schema with managed locations
Comment 1316032 by 806e7d2
- Upvotes: 2
Selected Answer: AE A. You can have more than 1 metastore within a Databricks account console but only 1 per region. Unity Catalog allows multiple metastores within a Databricks account console. However, each region can only have one active metastore due to geographic restrictions and for managing data governance in a region-specific manner. E. If metastore is not associated with location, it’s mandatory to associate catalog with managed locations. When a metastore does not have a default storage location, you must configure managed locations for each catalog to ensure governance and compliance. Managed locations define where Unity Catalog manages and stores data.
Comment 1303135 by comoon
- Upvotes: 2
Correct - A and D. A.Unity Catalog allows you to have multiple metastores within a single Databricks account, but each metastore is limited to a single region for data locality and compliance purposes.
D. When a catalog does not have an associated managed location, it becomes necessary to associate schemas within the catalog with managed locations, ensuring that data is stored in a defined path.