Questions and Answers

Question ftej0T8v21cl0vbZJlho

Question

Which of the following data workloads will utilize a Gold table as its source?

Choices

A: A job that enriches data by parsing its timestamps into a human-readable format
B: A job that aggregates uncleaned data to create standard summary statistics
C: A job that cleans data by removing malformatted records
D: A job that queries aggregated data designed to feed into a dashboard
E: A job that ingests raw data from a streaming source into the Lakehouse

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1282225 by CommanderBigMac

Upvotes: 1

Selected Answer: D A gold table is most likely to contain aggregated data. It is also the table where cleaned up data is stored, so that is where data will be used from for a dashboard.

Comment 1084414 by 55f31c8

Upvotes: 1

Selected Answer: D https://docs.databricks.com/en/lakehouse/medallion.html#power-analytics-with-the-gold-layer

Comment 1050172 by meow_akk

Upvotes: 3

D is correct : std medallion arch

Question QMoHFSJDErgLvwwLIXCo

Question

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Choices

A: A key-value pair configuration
B: The preferred DBU/hour cost
C: A path to cloud storage location for the written data
D: A location of a target database for the written data
E: At least one notebook library to be executed

answer?

Answer: E Answer_ET: E Community answer E (69%) C (31%) Discussion

Comment 1132545 by Stemix

Upvotes: 7

Selected Answer: E Correct answer is E. storage location is optional. “(Optional) Enter a Storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty”

Comment 1344135 by fits08pistils

Upvotes: 2

Selected Answer: E The answer is E, even if technically the Pipeline name is the only mandatory field. However, if you don’t provide a notebook path, the following message will be displayed: “You didn’t specify any source code. “Create pipeline” will create your pipeline along with a blank notebook, which you can edit later.”

Comment 1315528 by hakimipous

Upvotes: 1

Selected Answer: C C is correct

Comment 1290093 by Colje

Upvotes: 4

D. A location of a target database for the written data

Why this is correct: When creating a Delta Live Tables (DLT) pipeline, you must specify the target database where the resulting data will be written. This ensures that the output of the pipeline is stored properly.

Why the other options are incorrect:

A. A key-value pair configuration: While configurations are useful, they are not mandatory when setting up a DLT pipeline.

B. The preferred DBU/hour cost: You don’t specify a cost directly; the DBU is associated with the cluster used.

C. A path to cloud storage location for the written data: While storage paths may be specified, the target database location is required.

E. At least one notebook library: You specify the transformation logic (which could be in notebooks), but this is not a strict requirement for setting up the pipeline itself.

Comment 1263537 by 80370eb

Upvotes: 1

Selected Answer: E This is a key requirement for creating a Delta Live Tables pipeline. You need to specify notebooks that contain the ETL logic to be executed by the pipeline.

Comment 1228292 by Shinigami76

Upvotes: 1

C, just tested on databricks DLT

Comment 1203845 by benni_ale

Upvotes: 1

Selected Answer: E tbf C is correct as well but the question is probably hinting for E

Comment 1176696 by BigMF

Upvotes: 1

Selected Answer: C Per Databaricks documentation (see below), you need to select a destination for datasets published by the pipeline, either the Hive metastore or Unity Catalog I think A is incorrect because it uses the term “Notebook Library” and not just “Notebook”. Databricks doc: https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html

Comment 1127418 by azure_bimonster

Upvotes: 1

Selected Answer: E As per Pipeline creating steps, choosing a Notebook is mandatory whereas specifying a location is optional. I would go with answer E

Comment 1124246 by Azure_2023

Upvotes: 2

Selected Answer: E https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html

E. The only non-optional selection is a notebook

Comment 1110201 by Garyn

Upvotes: 2

Selected Answer: E E. At least one notebook library to be executed.

Explanation: https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html

Delta Live Tables pipelines execute notebook libraries as part of their operations. These notebooks contain the logic, code, or instructions defining the data processing steps, transformations, or actions to be performed within the pipeline.

Specifying at least one notebook library to be executed is crucial when creating a new Delta Live Tables pipeline, as it defines the sequence of operations and the logic to be executed on the data within the pipeline, aligning with the documentation provided.

Comment 1100256 by saaaaaa

Upvotes: 2

Selected Answer: E This should be E. As per the link https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html

Create a pipeline

Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.

Give the pipeline a name and click File Picker Icon to select a notebook.

Select Triggered for Pipeline Mode.

(Optional) Enter a Storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty.

(Optional) Specify a Target schema to publish your dataset to the Hive metastore or a Catalog and a Target schema to publish your dataset to Unity Catalog. See Publish datasets.

(Optional) Click Add notification to configure one or more email addresses to receive notifications for pipeline events. See Add email notifications for pipeline events.

Click Create.

Comment 1084461 by 55f31c8

Upvotes: 1

Selected Answer: C https://docs.databricks.com/en/delta-live-tables/index.html#what-is-a-delta-live-tables-pipeline

Comment 1071980 by Huroye

Upvotes: 3

The correct answer is E. DLT tables needs a notebook where you have to specify the processing info

Comment 1056071 by kishore1980

Upvotes: 2

Selected Answer: C storage location is required to be specified to control the object storage location for data written by the pipeline.

Comment 1050175 by meow_akk

Upvotes: 3

Ans E : i think it might be E - https://docs.databricks.com/en/delta-live-tables/settings.html - this doc says that target schema and storage may be optional so it leaves us with E

Comment 1048903 by kishanu

Upvotes: 3

Selected Answer: C A path to a cloud storage location for the written data - considering this option is talking about the source data being stored in cloud storage and being ingested to DLT using an autoloader.

Question OCBCr8xvRlaCLC2wa5JC

Question

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id - FROM STREAM(LIVE.customers) WHERE loyalty_level = ‘high’;

Which of the following describes why the STREAM function is included in the query?

Choices

A: The STREAM function is not needed and will cause an error.
B: The table being created is a live table.
C: The customers table is a streaming live table.
D: The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.
E: The data in the customers table has been updated since its last run.

answer?

Answer: C Answer_ET: C Community answer C (75%) D (25%) Discussion

Comment 1050180 by meow_akk

Upvotes: 8

Ans C is correct : https://docs.databricks.com/en/sql/load-data-streaming-table.html Load data into a streaming table To create a streaming table from data in cloud object storage, paste the following into the query editor, and then click Run:

SQL Copy to clipboardCopy /* Load data from a volume */ CREATE OR REFRESH STREAMING TABLE AS SELECT * FROM STREAM read_files(‘/Volumes/////’)

/* Load data from an external location */ CREATE OR REFRESH STREAMING TABLE AS SELECT * FROM STREAM read_files(‘s3:////‘)

Comment 1263541 by 80370eb

Upvotes: 1

Selected Answer: C The STREAM function is used to indicate that LIVE.customers is a streaming live table. This allows the query to process real-time streaming data.

Comment 1203849 by benni_ale

Upvotes: 1

c is correct . about D: it can be correct but it is not given the fact it comes from pyspark ; sql supports (at least in databricks) the creation of streaming live table as well so it is not necessasarily from pyspark

Comment 1203848 by benni_ale

Upvotes: 1

Selected Answer: C c is ok

Comment 1203100 by [Removed]

Upvotes: 1

Selected Answer: D Option E, specifying “at least one notebook library to be executed,” is not a requirement for setting up a Delta Live Tables pipeline. Delta Live Tables are built on top of Databricks and use notebooks to define the pipeline’s logic, but the actual requirement when setting up the pipeline is typically the location where the data will be written to, like a target database or a path to cloud storage. While notebooks may contain the business logic for the transformations and actions within the pipeline, the fundamental requirement for setting up a pipeline is knowing where the data will reside after processing, hence why the location of the target database for the written data is crucial.

Comment 1127420 by azure_bimonster

Upvotes: 1

Selected Answer: C C is correct

Comment 1112624 by cxw23

Upvotes: 1

Ans is A. CREATE STREAMING LIVE TABLE syntax is does not exist. It should be CREATE LIVE TABLE AS SELECT * FROM STREAM.

Question odLBIcB5zh1f1qjn4hg6

Question

Which of the following describes the type of workloads that are always compatible with Auto Loader?

Choices

A: Streaming workloads
B: Machine learning workloads
C: Serverless workloads
D: Batch workloads
E: Dashboard workloads

answer?

Answer: A Answer_ET: A Community answer A (100%) Discussion

Comment 1050181 by meow_akk

Upvotes: 6

A is correct Structured streaming for autoloader

Comment 1263542 by 80370eb

Upvotes: 1

Selected Answer: A Auto Loader is designed to handle streaming data ingestion. It continuously processes new data as it arrives, making it well-suited for streaming workloads.

Comment 1203851 by benni_ale

Upvotes: 1

Selected Answer: A A is ok

Comment 1127421 by azure_bimonster

Upvotes: 1

Selected Answer: A A is correct here

Comment 1101606 by AndreFR

Upvotes: 3

Selected Answer: A https://docs.databricks.com/en/ingestion/auto-loader/unity-catalog.html#using-auto-loader-with-unity-catalog

Auto Loader relies on Structured Streaming for incremental processing

Question FtYkJMMCmPtaoWBSLOot

Question

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

Choices

A: None of these changes will need to be made
B: The pipeline will need to stop using the medallion-based multi-hop architecture
C: The pipeline will need to be written entirely in SQL
D: The pipeline will need to use a batch source in place of a streaming source
E: The pipeline will need to be written entirely in Python

answer?

Answer: D Answer_ET: D Community answer D (51%) A (49%) Discussion

Comment 1228049 by hussamAlHunaiti

Upvotes: 10

Selected Answer: D I had the exam today and option A & B weren’t exist, correct answer is D.

Comment 1233625 by vigaro

Upvotes: 8

Selected Answer: D None is never a solution

Comment 1328321 by MultiCloudIronMan

Upvotes: 3

Selected Answer: A The correct response is A. None of these changes will need to be made. Delta Live Tables supports both Python and SQL, as well as streaming and batch sources. This means that the existing medallion-based multi-hop architecture can be maintained, and the pipeline can continue to use both Python and SQL for different layers. Therefore, no changes are necessary when migrating to Delta Live Tables.

Comment 1315957 by 806e7d2

Upvotes: 2

Selected Answer: A Delta Live Tables (DLT) is designed to support a medallion-based architecture (raw → bronze → silver → gold) and allows for a combination of Python and SQL in pipeline definitions. It also supports both batch and streaming sources.

Comment 1314234 by gul1016

Upvotes: 1

The correct answer is:

A. None of these changes will need to be made.

Explanation: Delta Live Tables (DLT) supports both Python and SQL:

The data engineer can continue writing transformations for the raw, bronze, and silver layers in Python. The data analyst can work on the gold layer in SQL. Medallion-based architecture:

Delta Live Tables is well-suited for the medallion architecture (raw → bronze → silver → gold). It is commonly used to build reliable and maintainable data pipelines. Streaming sources:

Delta Live Tables fully supports streaming inputs and can handle both batch and streaming sources natively. Flexibility in implementation:

Delta Live Tables does not impose restrictions that require pipelines to be written entirely in either SQL or Python. Both languages can coexist in the same pipeline as needed. Thus, no major changes are required for the migration to Delta Live Tables.

Comment 1312070 by lj114

Upvotes: 1

Selected Answer: A A is correct

Comment 1307379 by ajay1709

Upvotes: 1

Right answer is not listed here. The right answer is “Different notbook may jused for SQL and Python”

Comment 1287629 by CommanderBigMac

Upvotes: 1

Selected Answer: D D is the answer

Comment 1273426 by 9d4d68a

Upvotes: 1

A. None of these changes will need to be made.

You can continue using the medallion-based architecture, and you do not need to switch entirely to SQL or Python. Delta Live Tables will work with your existing streaming sources and support both SQL and Python.

Comment 1267939 by 80370eb

Upvotes: 2

Selected Answer: A When migrating to Delta Live Tables, you can continue using the medallion-based architecture, work with streaming sources, and write the pipeline in either SQL or Python. Therefore, no major changes are required for the pipeline in this scenario.

Comment 1227869 by jaromarg

Upvotes: 4

D: Delta Live Tables is primarily designed to work with batch processing rather than streaming. This means that when migrating a pipeline to Delta Live Tables, any streaming sources used in the original pipeline will need to be replaced with batch sources.

In the scenario described, where the raw source of the pipeline is a streaming input, the data engineer and data analyst will need to modify their pipeline to read data from a batch source instead. This could involve changing the way data is ingested and processed to align with batch processing paradigms rather than streaming.

Additionally, Delta Live Tables enables the integration of both SQL and Python code within a pipeline, so there’s no strict requirement to write the pipeline entirely in SQL or Python. Both the data engineer’s Python code for the raw, bronze, and silver layers and the data analyst’s SQL code for the gold layer can still be used within the Delta Live Tables environment.

Overall, the key change needed when migrating to Delta Live Tables in this scenario is transitioning from a streaming input source to a batch source to align with the batch processing nature of Delta Live Tables.

Comment 1203853 by benni_ale

Upvotes: 2

Selected Answer: A A is correct

Comment 1198443 by Arunava05

Upvotes: 3

Cleared the exam today . Option A and B were not available in the exam . There was a different option which was correct.

Comment 1101634 by AndreFR

Upvotes: 4

Selected Answer: A B - DLT support medallion architecture (see example in : https://docs.databricks.com/en/delta-live-tables/transform.html#combine-streaming-tables-and-materialized-views-in-a-single-pipeline) C - DLT can mix Python and SQL using multiple notebooks (according to https://docs.databricks.com/en/delta-live-tables/tutorial-python.html You cannot mix languages within a Delta Live Tables source code file. You can use multiple notebooks or files with different languages in a pipeline) D - DLT manage streaming sources using streaming tables (ex : https://docs.databricks.com/en/delta-live-tables/load.html#load-data-from-a-message-bus) E - DLT support python and sql (https://docs.databricks.com/en/delta-live-tables/tutorial-python.html and https://docs.databricks.com/en/delta-live-tables/tutorial-sql.html)

Correct answer is A by elimination

Comment 1089750 by kz_data

Upvotes: 1

Selected Answer: A I think the answer is A

Comment 1089166 by nedlo

Upvotes: 2

Selected Answer: A It should be A. Medallion architecture can be used in DLT pipeline https://www.databricks.com/glossary/medallion-architecture “Databricks provides tools like Delta Live Tables (DLT) that allow users to instantly build data pipelines with Bronze, Silver and Gold tables from just a few lines of code.”

Comment 1071984 by Huroye

Upvotes: 3

the correct answer is A. DLT needs a notebook where you specify the processing

Comment 1064841 by mokrani

Upvotes: 1

Selected Answer: A Response A: They have to adapt their notebook’s code to be able to decalre the DLT pipeline. However, this option is not proposed in the answers so I think it might be A

Comment 1055192 by hsks

Upvotes: 2

Answer should be A.

Comment 1053009 by kbaba101

Upvotes: 2

In my opinion, this should be A. Assuming they were working on the same notebook, and weren’t declaring the Streaming or Live keywords during development, the would probably need to do so before adding to the DLT workflow. and that is not in the option.

Comment 1050183 by meow_akk

Upvotes: 4

i think its A ;

vuthanhdatt's Second Brain

Explorer

29

Questions and Answers

Question ftej0T8v21cl0vbZJlho

Question

Choices

Comment 1282225 by CommanderBigMac

Comment 1084414 by 55f31c8

Comment 1050172 by meow_akk

Question QMoHFSJDErgLvwwLIXCo

Question

Choices

Comment 1132545 by Stemix

Comment 1344135 by fits08pistils

Comment 1315528 by hakimipous

Comment 1290093 by Colje

Comment 1263537 by 80370eb

Comment 1228292 by Shinigami76

Comment 1203845 by benni_ale

Comment 1176696 by BigMF

Comment 1127418 by azure_bimonster

Comment 1124246 by Azure_2023

Comment 1110201 by Garyn

Comment 1100256 by saaaaaa

Comment 1084461 by 55f31c8

Comment 1071980 by Huroye

Comment 1056071 by kishore1980

Comment 1050175 by meow_akk

Comment 1048903 by kishanu

Question OCBCr8xvRlaCLC2wa5JC

Question

Choices

Comment 1050180 by meow_akk

Comment 1263541 by 80370eb

Comment 1203849 by benni_ale

Comment 1203848 by benni_ale

Comment 1203100 by [Removed]

Comment 1127420 by azure_bimonster

Comment 1112624 by cxw23

Question odLBIcB5zh1f1qjn4hg6

Question

Choices

Comment 1050181 by meow_akk

Comment 1263542 by 80370eb

Comment 1203851 by benni_ale

Comment 1127421 by azure_bimonster

Comment 1101606 by AndreFR

Question FtYkJMMCmPtaoWBSLOot

Question

Choices

Comment 1228049 by hussamAlHunaiti

Comment 1233625 by vigaro

Comment 1328321 by MultiCloudIronMan

Comment 1315957 by 806e7d2

Comment 1314234 by gul1016

Comment 1312070 by lj114

Comment 1307379 by ajay1709

Comment 1287629 by CommanderBigMac

Comment 1273426 by 9d4d68a

Comment 1267939 by 80370eb

Comment 1227869 by jaromarg

Comment 1203853 by benni_ale

Comment 1198443 by Arunava05

Comment 1101634 by AndreFR

Comment 1089750 by kz_data

Comment 1089166 by nedlo

Comment 1071984 by Huroye

Comment 1064841 by mokrani

Comment 1055192 by hsks

Comment 1053009 by kbaba101

Comment 1050183 by meow_akk

Graph View

Table of Contents