Questions and Answers
Question lU4e7bafmAstjStV9FrR
Question
In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?
Choices
- A: When another task needs to be replaced by the new task
- B: When another task needs to successfully complete before the new task begins
- C: When another task has the same dependency libraries as the new task
- D: When another task needs to use as little compute resources as possible
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1327395 by MultiCloudIronMan
- Upvotes: 1
Selected Answer: B The correct answer is B. When another task needs to successfully complete before the new task begins. Selecting a task in the “Depends On” field ensures that the new task will only start after the specified task has successfully completed, maintaining the correct sequence and dependencies in the workflow
Question sYl3h2Cr2ujyxxkkfN5s
Question
A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.
Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?
Choices
- A: CREATE TABLE all_transactions AS SELECT * FROM march_transactions INNER JOIN SELECT * FROM april_transactions;
- B: CREATE TABLE all_transactions AS SELECT * FROM march_transactions UNION SELECT * FROM april_transactions;
- C: CREATE TABLE all_transactions AS SELECT * FROM march_transactions OUTER JOIN SELECT * FROM april_transactions;
- D: CREATE TABLE all_transactions AS SELECT * FROM march_transactions INTERSECT SELECT * from april_transactions;
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1327435 by MultiCloudIronMan
- Upvotes: 2
Selected Answer: B The correct answer is B. CREATE TABLE all_transactions AS SELECT * FROM march_transactions UNION SELECT * FROM april_transactions. The UNION operator combines the results of two queries and removes duplicate records, ensuring that the new table all_transactions contains all unique records from both march_transactions and april_transactions.
Question uEuKrTAjpsA7mk0S7j50
Question
How can Git operations must be performed outside of Databricks Repos?
Choices
- A: Commit
- B: Pull
- C: Merge
- D: Clone
answer?
Answer: C Answer_ET: C Community answer C (86%) 14% Discussion
Comment 1400837 by 45a1d55
- Upvotes: 1
Selected Answer: C C. Merge: Merging branches is not natively supported within Databricks Repos. While you can switch branches and commit changes, the act of merging two branches (e.g., resolving conflicts or combining histories) requires a Git client outside of Databricks or handling it in the remote Git hosting service (e.g., creating a pull request on GitHub). Databricks documentation indicates that merge operations, especially those involving conflicts, are outside its scope, pushing users to external tools. Verdict: Aligns with the question—merge operations must be performed outside Databricks
Comment 1347554 by IulianRo
- Upvotes: 1
Selected Answer: C It should be MERGE
Comment 1343073 by shinypriti23
- Upvotes: 1
Selected Answer: C Merge happen outside of DB
Comment 1337145 by CoolSmartDude
- Upvotes: 1
Selected Answer: C Merge needs to happen outside of DB
Comment 1335980 by duzi
- Upvotes: 2
Selected Answer: C See https://docs.databricks.com/en/repos/git-operations-with-repos.html “The article describes how to perform common Git operations in your Databricks workspace using Git folders, including cloning, branching, committing, and pushing.” See also Question 8.
Comment 1327437 by MultiCloudIronMan
- Upvotes: 1
Selected Answer: D The correct answers are A. Commit and D. Clone. These Git operations must be performed outside of Databricks Repos.
Question 5eD4YA2cjOIkGu15ValL
Question
A data engineer has joined an existing project and they see the following query in the project repository:
CREATE STREAMING LIVE TABLE loyal_customers AS
SELECT customer_id - FROM STREAM(LIVE.customers) WHERE loyalty_level = ‘high’;
Which of the following describes why the STREAM function is included in the query?
Choices
- A: The STREAM function is not needed and will cause an error.
- B: The data in the customers table has been updated since its last run.
- C: The customers table is a streaming live table.
- D: The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1327441 by MultiCloudIronMan
- Upvotes: 3
Selected Answer: C The correct answer is C. The customers table is a streaming live table. The STREAM function is used to indicate that the customers table is a streaming live table, which means it is continuously updated with new data. This allows the loyal_customers table to be created as a streaming live table that processes data incrementally as it arrives.
Question LSkeCEdhvOTL6gRpOcKL
Question
Which Structured Streaming query is performing a hop from a Silver table to a Gold table?
Choices
- A:
- B:
- C:
- D:
answer?
Answer: D Answer_ET: D Community answer D (75%) B (25%) Discussion
Comment 1324349 by datareport_AZ
- Upvotes: 1
Selected Answer: B B performs aggregation
Comment 1322912 by b41de50
- Upvotes: 1
Selected Answer: D Silver table contains filtered, cleaned augmented data. Gold table contains aggregated data
Comment 1322551 by Manish_Kum
- Upvotes: 2
Selected Answer: D aggregation is performed in Silver to Gold hop. also outputmode will be “complete”