Questions and Answers
Question ftej0T8v21cl0vbZJlho
Question
Which of the following data workloads will utilize a Gold table as its source?
Choices
- A: A job that enriches data by parsing its timestamps into a human-readable format
- B: A job that aggregates uncleaned data to create standard summary statistics
- C: A job that cleans data by removing malformatted records
- D: A job that queries aggregated data designed to feed into a dashboard
- E: A job that ingests raw data from a streaming source into the Lakehouse
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1282225 by CommanderBigMac
- Upvotes: 1
Selected Answer: D A gold table is most likely to contain aggregated data. It is also the table where cleaned up data is stored, so that is where data will be used from for a dashboard.
Comment 1084414 by 55f31c8
- Upvotes: 1
Selected Answer: D https://docs.databricks.com/en/lakehouse/medallion.html#power-analytics-with-the-gold-layer
Comment 1050172 by meow_akk
- Upvotes: 3
D is correct : std medallion arch
Question QMoHFSJDErgLvwwLIXCo
Question
Which of the following must be specified when creating a new Delta Live Tables pipeline?
Choices
- A: A key-value pair configuration
- B: The preferred DBU/hour cost
- C: A path to cloud storage location for the written data
- D: A location of a target database for the written data
- E: At least one notebook library to be executed
answer?
Answer: E Answer_ET: E Community answer E (69%) C (31%) Discussion
Comment 1132545 by Stemix
- Upvotes: 7
Selected Answer: E Correct answer is E. storage location is optional. “(Optional) Enter a Storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty”
Comment 1344135 by fits08pistils
- Upvotes: 2
Selected Answer: E The answer is E, even if technically the Pipeline name is the only mandatory field. However, if you don’t provide a notebook path, the following message will be displayed: “You didn’t specify any source code. “Create pipeline” will create your pipeline along with a blank notebook, which you can edit later.”
Comment 1315528 by hakimipous
- Upvotes: 1
Selected Answer: C C is correct
Comment 1290093 by Colje
- Upvotes: 4
D. A location of a target database for the written data
Why this is correct: When creating a Delta Live Tables (DLT) pipeline, you must specify the target database where the resulting data will be written. This ensures that the output of the pipeline is stored properly.
Why the other options are incorrect:
A. A key-value pair configuration: While configurations are useful, they are not mandatory when setting up a DLT pipeline.
B. The preferred DBU/hour cost: You don’t specify a cost directly; the DBU is associated with the cluster used.
C. A path to cloud storage location for the written data: While storage paths may be specified, the target database location is required.
E. At least one notebook library: You specify the transformation logic (which could be in notebooks), but this is not a strict requirement for setting up the pipeline itself.
Comment 1263537 by 80370eb
- Upvotes: 1
Selected Answer: E This is a key requirement for creating a Delta Live Tables pipeline. You need to specify notebooks that contain the ETL logic to be executed by the pipeline.
Comment 1228292 by Shinigami76
- Upvotes: 1
C, just tested on databricks DLT
Comment 1203845 by benni_ale
- Upvotes: 1
Selected Answer: E tbf C is correct as well but the question is probably hinting for E
Comment 1176696 by BigMF
- Upvotes: 1
Selected Answer: C Per Databaricks documentation (see below), you need to select a destination for datasets published by the pipeline, either the Hive metastore or Unity Catalog I think A is incorrect because it uses the term “Notebook Library” and not just “Notebook”. Databricks doc: https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html
Comment 1127418 by azure_bimonster
- Upvotes: 1
Selected Answer: E As per Pipeline creating steps, choosing a Notebook is mandatory whereas specifying a location is optional. I would go with answer E
Comment 1124246 by Azure_2023
- Upvotes: 2
Selected Answer: E https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html
E. The only non-optional selection is a notebook
Comment 1110201 by Garyn
- Upvotes: 2
Selected Answer: E E. At least one notebook library to be executed.
Explanation: https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html
Delta Live Tables pipelines execute notebook libraries as part of their operations. These notebooks contain the logic, code, or instructions defining the data processing steps, transformations, or actions to be performed within the pipeline.
Specifying at least one notebook library to be executed is crucial when creating a new Delta Live Tables pipeline, as it defines the sequence of operations and the logic to be executed on the data within the pipeline, aligning with the documentation provided.
Comment 1100256 by saaaaaa
- Upvotes: 2
Selected Answer: E This should be E. As per the link https://docs.databricks.com/en/delta-live-tables/tutorial-pipelines.html
Create a pipeline
Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.
Give the pipeline a name and click File Picker Icon to select a notebook.
Select Triggered for Pipeline Mode.
(Optional) Enter a Storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty.
(Optional) Specify a Target schema to publish your dataset to the Hive metastore or a Catalog and a Target schema to publish your dataset to Unity Catalog. See Publish datasets.
(Optional) Click Add notification to configure one or more email addresses to receive notifications for pipeline events. See Add email notifications for pipeline events.
Click Create.
Comment 1084461 by 55f31c8
- Upvotes: 1
Selected Answer: C https://docs.databricks.com/en/delta-live-tables/index.html#what-is-a-delta-live-tables-pipeline
Comment 1071980 by Huroye
- Upvotes: 3
The correct answer is E. DLT tables needs a notebook where you have to specify the processing info
Comment 1056071 by kishore1980
- Upvotes: 2
Selected Answer: C storage location is required to be specified to control the object storage location for data written by the pipeline.
Comment 1050175 by meow_akk
- Upvotes: 3
Ans E : i think it might be E - https://docs.databricks.com/en/delta-live-tables/settings.html - this doc says that target schema and storage may be optional so it leaves us with E
Comment 1048903 by kishanu
- Upvotes: 3
Selected Answer: C A path to a cloud storage location for the written data - considering this option is talking about the source data being stored in cloud storage and being ingested to DLT using an autoloader.
Question OCBCr8xvRlaCLC2wa5JC
Question
A data engineer has joined an existing project and they see the following query in the project repository:
CREATE STREAMING LIVE TABLE loyal_customers AS
SELECT customer_id - FROM STREAM(LIVE.customers) WHERE loyalty_level = ‘high’;
Which of the following describes why the STREAM function is included in the query?
Choices
- A: The STREAM function is not needed and will cause an error.
- B: The table being created is a live table.
- C: The customers table is a streaming live table.
- D: The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.
- E: The data in the customers table has been updated since its last run.
answer?
Answer: C Answer_ET: C Community answer C (75%) D (25%) Discussion
Comment 1050180 by meow_akk
- Upvotes: 8
Ans C is correct : https://docs.databricks.com/en/sql/load-data-streaming-table.html Load data into a streaming table To create a streaming table from data in cloud object storage, paste the following into the query editor, and then click Run:
SQL Copy to clipboardCopy /* Load data from a volume */ CREATE OR REFRESH STREAMING TABLE
AS SELECT * FROM STREAM read_files(‘/Volumes/ / / / / ’) /* Load data from an external location */ CREATE OR REFRESH STREAMING TABLE
AS SELECT * FROM STREAM read_files(‘s3:// / / ‘) Comment 1263541 by 80370eb
- Upvotes: 1
Selected Answer: C The STREAM function is used to indicate that LIVE.customers is a streaming live table. This allows the query to process real-time streaming data.
Comment 1203849 by benni_ale
- Upvotes: 1
c is correct . about D: it can be correct but it is not given the fact it comes from pyspark ; sql supports (at least in databricks) the creation of streaming live table as well so it is not necessasarily from pyspark
Comment 1203848 by benni_ale
- Upvotes: 1
Selected Answer: C c is ok
Comment 1203100 by [Removed]
- Upvotes: 1
Selected Answer: D Option E, specifying “at least one notebook library to be executed,” is not a requirement for setting up a Delta Live Tables pipeline. Delta Live Tables are built on top of Databricks and use notebooks to define the pipeline’s logic, but the actual requirement when setting up the pipeline is typically the location where the data will be written to, like a target database or a path to cloud storage. While notebooks may contain the business logic for the transformations and actions within the pipeline, the fundamental requirement for setting up a pipeline is knowing where the data will reside after processing, hence why the location of the target database for the written data is crucial.
Comment 1127420 by azure_bimonster
- Upvotes: 1
Selected Answer: C C is correct
Comment 1112624 by cxw23
- Upvotes: 1
Ans is A. CREATE STREAMING LIVE TABLE syntax is does not exist. It should be CREATE LIVE TABLE AS SELECT * FROM STREAM.
Question odLBIcB5zh1f1qjn4hg6
Question
Which of the following describes the type of workloads that are always compatible with Auto Loader?
Choices
- A: Streaming workloads
- B: Machine learning workloads
- C: Serverless workloads
- D: Batch workloads
- E: Dashboard workloads
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1050181 by meow_akk
- Upvotes: 6
A is correct Structured streaming for autoloader
Comment 1263542 by 80370eb
- Upvotes: 1
Selected Answer: A Auto Loader is designed to handle streaming data ingestion. It continuously processes new data as it arrives, making it well-suited for streaming workloads.
Comment 1203851 by benni_ale
- Upvotes: 1
Selected Answer: A A is ok
Comment 1127421 by azure_bimonster
- Upvotes: 1
Selected Answer: A A is correct here
Comment 1101606 by AndreFR
- Upvotes: 3
Selected Answer: A https://docs.databricks.com/en/ingestion/auto-loader/unity-catalog.html#using-auto-loader-with-unity-catalog
Auto Loader relies on Structured Streaming for incremental processing
Question FtYkJMMCmPtaoWBSLOot
Question
A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.
Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?
Choices
- A: None of these changes will need to be made
- B: The pipeline will need to stop using the medallion-based multi-hop architecture
- C: The pipeline will need to be written entirely in SQL
- D: The pipeline will need to use a batch source in place of a streaming source
- E: The pipeline will need to be written entirely in Python
answer?
Answer: D Answer_ET: D Community answer D (51%) A (49%) Discussion
Comment 1228049 by hussamAlHunaiti
- Upvotes: 10
Selected Answer: D I had the exam today and option A & B weren’t exist, correct answer is D.
Comment 1233625 by vigaro
- Upvotes: 8
Selected Answer: D None is never a solution
Comment 1328321 by MultiCloudIronMan
- Upvotes: 3
Selected Answer: A The correct response is A. None of these changes will need to be made. Delta Live Tables supports both Python and SQL, as well as streaming and batch sources. This means that the existing medallion-based multi-hop architecture can be maintained, and the pipeline can continue to use both Python and SQL for different layers. Therefore, no changes are necessary when migrating to Delta Live Tables.
Comment 1315957 by 806e7d2
- Upvotes: 2
Selected Answer: A Delta Live Tables (DLT) is designed to support a medallion-based architecture (raw → bronze → silver → gold) and allows for a combination of Python and SQL in pipeline definitions. It also supports both batch and streaming sources.
Comment 1314234 by gul1016
- Upvotes: 1
The correct answer is:
A. None of these changes will need to be made.
Explanation: Delta Live Tables (DLT) supports both Python and SQL:
The data engineer can continue writing transformations for the raw, bronze, and silver layers in Python. The data analyst can work on the gold layer in SQL. Medallion-based architecture:
Delta Live Tables is well-suited for the medallion architecture (raw → bronze → silver → gold). It is commonly used to build reliable and maintainable data pipelines. Streaming sources:
Delta Live Tables fully supports streaming inputs and can handle both batch and streaming sources natively. Flexibility in implementation:
Delta Live Tables does not impose restrictions that require pipelines to be written entirely in either SQL or Python. Both languages can coexist in the same pipeline as needed. Thus, no major changes are required for the migration to Delta Live Tables.
Comment 1312070 by lj114
- Upvotes: 1
Selected Answer: A A is correct
Comment 1307379 by ajay1709
- Upvotes: 1
Right answer is not listed here. The right answer is “Different notbook may jused for SQL and Python”
Comment 1287629 by CommanderBigMac
- Upvotes: 1
Selected Answer: D D is the answer
Comment 1273426 by 9d4d68a
- Upvotes: 1
A. None of these changes will need to be made.
You can continue using the medallion-based architecture, and you do not need to switch entirely to SQL or Python. Delta Live Tables will work with your existing streaming sources and support both SQL and Python.
Comment 1267939 by 80370eb
- Upvotes: 2
Selected Answer: A When migrating to Delta Live Tables, you can continue using the medallion-based architecture, work with streaming sources, and write the pipeline in either SQL or Python. Therefore, no major changes are required for the pipeline in this scenario.
Comment 1227869 by jaromarg
- Upvotes: 4
D: Delta Live Tables is primarily designed to work with batch processing rather than streaming. This means that when migrating a pipeline to Delta Live Tables, any streaming sources used in the original pipeline will need to be replaced with batch sources.
In the scenario described, where the raw source of the pipeline is a streaming input, the data engineer and data analyst will need to modify their pipeline to read data from a batch source instead. This could involve changing the way data is ingested and processed to align with batch processing paradigms rather than streaming.
Additionally, Delta Live Tables enables the integration of both SQL and Python code within a pipeline, so there’s no strict requirement to write the pipeline entirely in SQL or Python. Both the data engineer’s Python code for the raw, bronze, and silver layers and the data analyst’s SQL code for the gold layer can still be used within the Delta Live Tables environment.
Overall, the key change needed when migrating to Delta Live Tables in this scenario is transitioning from a streaming input source to a batch source to align with the batch processing nature of Delta Live Tables.
Comment 1203853 by benni_ale
- Upvotes: 2
Selected Answer: A A is correct
Comment 1198443 by Arunava05
- Upvotes: 3
Cleared the exam today . Option A and B were not available in the exam . There was a different option which was correct.
Comment 1101634 by AndreFR
- Upvotes: 4
Selected Answer: A B - DLT support medallion architecture (see example in : https://docs.databricks.com/en/delta-live-tables/transform.html#combine-streaming-tables-and-materialized-views-in-a-single-pipeline) C - DLT can mix Python and SQL using multiple notebooks (according to https://docs.databricks.com/en/delta-live-tables/tutorial-python.html You cannot mix languages within a Delta Live Tables source code file. You can use multiple notebooks or files with different languages in a pipeline) D - DLT manage streaming sources using streaming tables (ex : https://docs.databricks.com/en/delta-live-tables/load.html#load-data-from-a-message-bus) E - DLT support python and sql (https://docs.databricks.com/en/delta-live-tables/tutorial-python.html and https://docs.databricks.com/en/delta-live-tables/tutorial-sql.html)
Correct answer is A by elimination
Comment 1089750 by kz_data
- Upvotes: 1
Selected Answer: A I think the answer is A
Comment 1089166 by nedlo
- Upvotes: 2
Selected Answer: A It should be A. Medallion architecture can be used in DLT pipeline https://www.databricks.com/glossary/medallion-architecture “Databricks provides tools like Delta Live Tables (DLT) that allow users to instantly build data pipelines with Bronze, Silver and Gold tables from just a few lines of code.”
Comment 1071984 by Huroye
- Upvotes: 3
the correct answer is A. DLT needs a notebook where you specify the processing
Comment 1064841 by mokrani
- Upvotes: 1
Selected Answer: A Response A: They have to adapt their notebook’s code to be able to decalre the DLT pipeline. However, this option is not proposed in the answers so I think it might be A
Comment 1055192 by hsks
- Upvotes: 2
Answer should be A.
Comment 1053009 by kbaba101
- Upvotes: 2
In my opinion, this should be A. Assuming they were working on the same notebook, and weren’t declaring the Streaming or Live keywords during development, the would probably need to do so before adding to the DLT workflow. and that is not in the option.
Comment 1050183 by meow_akk
- Upvotes: 4
i think its A ;