Questions and Answers

Question lU4e7bafmAstjStV9FrR

Question

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Choices

A: When another task needs to be replaced by the new task
B: When another task needs to successfully complete before the new task begins
C: When another task has the same dependency libraries as the new task
D: When another task needs to use as little compute resources as possible

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1327395 by MultiCloudIronMan

Upvotes: 1

Selected Answer: B The correct answer is B. When another task needs to successfully complete before the new task begins. Selecting a task in the “Depends On” field ensures that the new task will only start after the specified task has successfully completed, maintaining the correct sequence and dependencies in the workflow

Question sYl3h2Cr2ujyxxkkfN5s

Question

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?

Choices

A: CREATE TABLE all_transactions AS SELECT * FROM march_transactions INNER JOIN SELECT * FROM april_transactions;
B: CREATE TABLE all_transactions AS SELECT * FROM march_transactions UNION SELECT * FROM april_transactions;
C: CREATE TABLE all_transactions AS SELECT * FROM march_transactions OUTER JOIN SELECT * FROM april_transactions;
D: CREATE TABLE all_transactions AS SELECT * FROM march_transactions INTERSECT SELECT * from april_transactions;

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1327435 by MultiCloudIronMan

Upvotes: 2

Selected Answer: B The correct answer is B. CREATE TABLE all_transactions AS SELECT * FROM march_transactions UNION SELECT * FROM april_transactions. The UNION operator combines the results of two queries and removes duplicate records, ensuring that the new table all_transactions contains all unique records from both march_transactions and april_transactions.

Question uEuKrTAjpsA7mk0S7j50

Question

How can Git operations must be performed outside of Databricks Repos?

Choices

A: Commit
B: Pull
C: Merge
D: Clone

answer?

Answer: C Answer_ET: C Community answer C (86%) 14% Discussion

Comment 1400837 by 45a1d55

Upvotes: 1

Selected Answer: C C. Merge: Merging branches is not natively supported within Databricks Repos. While you can switch branches and commit changes, the act of merging two branches (e.g., resolving conflicts or combining histories) requires a Git client outside of Databricks or handling it in the remote Git hosting service (e.g., creating a pull request on GitHub). Databricks documentation indicates that merge operations, especially those involving conflicts, are outside its scope, pushing users to external tools. Verdict: Aligns with the question—merge operations must be performed outside Databricks

Comment 1347554 by IulianRo

Upvotes: 1

Selected Answer: C It should be MERGE

Comment 1343073 by shinypriti23

Upvotes: 1

Selected Answer: C Merge happen outside of DB

Comment 1337145 by CoolSmartDude

Upvotes: 1

Selected Answer: C Merge needs to happen outside of DB

Comment 1335980 by duzi

Upvotes: 2

Selected Answer: C See https://docs.databricks.com/en/repos/git-operations-with-repos.html “The article describes how to perform common Git operations in your Databricks workspace using Git folders, including cloning, branching, committing, and pushing.” See also Question 8.

Comment 1327437 by MultiCloudIronMan

Upvotes: 1

Selected Answer: D The correct answers are A. Commit and D. Clone. These Git operations must be performed outside of Databricks Repos.

Question 5eD4YA2cjOIkGu15ValL

Question

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id - FROM STREAM(LIVE.customers) WHERE loyalty_level = ‘high’;

Which of the following describes why the STREAM function is included in the query?

Choices

A: The STREAM function is not needed and will cause an error.
B: The data in the customers table has been updated since its last run.
C: The customers table is a streaming live table.
D: The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1327441 by MultiCloudIronMan

Upvotes: 3

Selected Answer: C The correct answer is C. The customers table is a streaming live table. The STREAM function is used to indicate that the customers table is a streaming live table, which means it is continuously updated with new data. This allows the loyal_customers table to be created as a streaming live table that processes data incrementally as it arrives.

Question LSkeCEdhvOTL6gRpOcKL

Question

Which Structured Streaming query is performing a hop from a Silver table to a Gold table?

Choices

answer?

Answer: D Answer_ET: D Community answer D (75%) B (25%) Discussion

Comment 1324349 by datareport_AZ

Upvotes: 1

Selected Answer: B B performs aggregation

Comment 1322912 by b41de50

Upvotes: 1

Selected Answer: D Silver table contains filtered, cleaned augmented data. Gold table contains aggregated data

Comment 1322551 by Manish_Kum

Upvotes: 2

Selected Answer: D aggregation is performed in Silver to Gold hop. also outputmode will be “complete”

vuthanhdatt's Second Brain

Explorer

13

Questions and Answers

Question lU4e7bafmAstjStV9FrR

Question

Choices

Comment 1327395 by MultiCloudIronMan

Question sYl3h2Cr2ujyxxkkfN5s

Question

Choices

Comment 1327435 by MultiCloudIronMan

Question uEuKrTAjpsA7mk0S7j50

Question

Choices

Comment 1400837 by 45a1d55

Comment 1347554 by IulianRo

Comment 1343073 by shinypriti23

Comment 1337145 by CoolSmartDude

Comment 1335980 by duzi

Comment 1327437 by MultiCloudIronMan

Question 5eD4YA2cjOIkGu15ValL

Question

Choices

Comment 1327441 by MultiCloudIronMan

Question LSkeCEdhvOTL6gRpOcKL

Question

Choices

Comment 1324349 by datareport_AZ

Comment 1322912 by b41de50

Comment 1322551 by Manish_Kum

Graph View

Table of Contents