Questions and Answers
Question lXAaKfD2sYx87MBHTDF0
Question
A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame. Which of the following describes how a data lakehouse could alleviate this issue?
Choices
- A: Both teams would autoscale their work as data size evolves
- B: Both teams would use the same source of truth for their work
- C: Both teams would reorganize to report to the same department
- D: Both teams would be able to collaborate on projects in real-time
- E: Both teams would respond more quickly to ad-hoc requests
answer?
Answer: B Answer_ET: B Community answer B (96%) 4% Discussion
Comment 895789 by prasioso
- Upvotes: 10
Databricks Lakehouse enables using data as the single source of truth. Duplicating data often results in data silos in organizations. Correct answer B.
Comment 1339003 by Tedet
- Upvotes: 1
Selected Answer: B Lakehouse - Single, unified platform for both analytical and data engineering workflows
Comment 1313098 by NzmD
- Upvotes: 1
Selected Answer: B Correct answer is B.
Comment 1312053 by 806e7d2
- Upvotes: 2
Selected Answer: B A data lakehouse is designed to integrate the benefits of data lakes and data warehouses by providing a single, unified platform for both analytical and data engineering workflows. By combining structured and unstructured data in one place, a lakehouse enables both data engineers and data analysts to access and work from the same source of truth. This eliminates data silos, reducing discrepancies in reports that can arise from each team working with different datasets or versions of data.
While options A, D, and E describe some advantages that a data lakehouse might offer, they don’t directly address the issue of inconsistent reports. Option C is more about organizational structure than technical architecture.
Comment 1305427 by Gusberg
- Upvotes: 1
Selected Answer: B Correct answer is: B. Both teams would use the same source of truth for their work
Comment 1289718 by gtriarhos
- Upvotes: 1
Selected Answer: B CLEAR ANSWER
Comment 1274178 by afzalmp40
- Upvotes: 1
Selected Answer: B B is correct
Comment 1227517 by mascarenhaslucas
- Upvotes: 1
Selected Answer: B The answer is B!
Comment 1215698 by poo_san
- Upvotes: 1
Selected Answer: A B is correct
Comment 1193427 by bettermakeme
- Upvotes: 1
B is correct answer, I got 100%. all questions came from https://www.udemy.com/course/practice-exams-databricks-certified-data-engineer-associate-t/?couponCode=APR2024
Comment 1177150 by Itmma
- Upvotes: 1
Selected Answer: B B is correct
Comment 1177149 by Itmma
- Upvotes: 1
B is correct
Comment 1114388 by shyemko
- Upvotes: 1
Selected Answer: B B is correct
Comment 1104687 by SerGrey
- Upvotes: 1
Selected Answer: B Correct is B
Comment 1028729 by VijayKula
- Upvotes: 1
Selected Answer: B Correct is B
Comment 1023990 by oscar_nadie
- Upvotes: 1
Selected Answer: B Correct is B
Comment 1017336 by KalavathiP
- Upvotes: 1
Selected Answer: B Correct ans B
Comment 1016519 by d_b47
- Upvotes: 1
Selected Answer: B Both teams would use the same source of truth for their work
Comment 1000323 by vpraja03
- Upvotes: 4
There are 2 versions in Databricks Certified Data Engineer Associate, which version we need to pick for this exam ?
Comment 997855 by vctrhugo
- Upvotes: 3
B. Both teams would use the same source of truth for their work
A data lakehouse is designed to unify the data engineering and data analysis architectures by integrating features of both data lakes and data warehouses. One of the key benefits of a data lakehouse is that it provides a common, centralized data repository (the “lake”) that serves as a single source of truth for data storage and analysis. This allows both data engineering and data analysis teams to work with the same consistent data sets, reducing discrepancies and ensuring that the reports generated by both teams are based on the same underlying data.
Option B addresses the issue of data consistency and alignment between the two teams, which is a common challenge in organizations with separate data engineering and data analysis architectures. By using the same source of truth, the data lakehouse helps alleviate this issue and promotes better collaboration and data integrity.
Comment 928338 by james_donquixote
- Upvotes: 2
Selected Answer: B Correct letter B
Comment 863826 by Data_4ever
- Upvotes: 4
Selected Answer: B Unity Catalog in Databricks helps to eliminate Data Silos in an organization by having one single source of truth data.
Comment 862177 by XiltroX
- Upvotes: 1
Selected Answer: B Correct answer is B
Question F7tlPMBJZ6GDtGuOpfb6
Question
A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos. Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?
Choices
- A: Databricks Repos automatically saves development progress
- B: Databricks Repos supports the use of multiple branches
- C: Databricks Repos allows users to revert to previous versions of a notebook
- D: Databricks Repos provides the ability to comment on specific changes
- E: Databricks Repos is wholly housed within the Databricks Lakehouse Platform
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 889063 by Majjjj
- Upvotes: 12
Selected Answer: B While both Databricks Notebooks versioning and Databricks Repos allow for version control of code, Databricks Repos provides the additional benefit of supporting the use of multiple branches. This allows for multiple versions of a notebook or project to be developed in parallel, facilitating collaboration among team members and simplifying the process of merging changes into a single main branch.
Comment 1275698 by md_sultan
- Upvotes: 1
I read, Legacy notebook Git integration support was removed on January 31st, 2024. so , it means the git notebook integration not supported now. AM I correct?
Comment 1262392 by 80370eb
- Upvotes: 1
Selected Answer: B B. Databricks Repos supports the use of multiple branches
This feature allows for more advanced version control and collaborative development workflows, enabling multiple branches for different features or experiments.
Comment 1203168 by benni_ale
- Upvotes: 1
Selected Answer: B b , multiple branches are not supported at all without a git integration and databricks repos have built in UI for governing such a thing
Comment 1189111 by benni_ale
- Upvotes: 1
Selected Answer: B B is correct
Comment 1177185 by Itmma
- Upvotes: 1
Selected Answer: B B is correct
Comment 1113189 by SerGrey
- Upvotes: 1
Selected Answer: B Correct answer is B
Comment 1064778 by awofalus
- Upvotes: 1
Selected Answer: B Correct : B
Comment 1017348 by KalavathiP
- Upvotes: 1
Selected Answer: B B is correct
Comment 997870 by vctrhugo
- Upvotes: 3
Selected Answer: B B. Databricks Repos supports the use of multiple branches.
An advantage of using Databricks Repos over the built-in Databricks Notebooks versioning is the ability to work with multiple branches. Branching is a fundamental feature of version control systems like Git, which Databricks Repos is built upon. It allows you to create separate branches for different tasks, features, or experiments within your project. This separation helps in parallel development and experimentation without affecting the main branch or the work of other team members.
Branching provides a more organized and collaborative development environment, making it easier to merge changes and manage different development efforts. While Databricks Notebooks versioning also allows you to track versions of notebooks, it may not provide the same level of flexibility and collaboration as branching in Databricks Repos.
Comment 978872 by hany_ds
- Upvotes: 1
B built in databricks notebook versioning does not allow multiple branches.
Comment 946763 by Atnafu
- Upvotes: 2
B An advantage of using Databricks Repos over the Databricks Notebooks versioning is that Databricks Repos supports the use of multiple branches. With Databricks Repos, you can create and manage multiple branches of your codebase, enabling parallel development, collaboration, and the ability to work on different features or bug fixes simultaneously.
Comment 876197 by Varma_Saraswathula
- Upvotes: 1
B. Databricks Repos supports the use of multiple branches
Comment 860624 by sdas1
- Upvotes: 2
Option B
Comment 859628 by surrabhi_4
- Upvotes: 2
Selected Answer: B option B
Comment 857981 by XiltroX
- Upvotes: 2
Selected Answer: B Correct answer is B
Question Nivbbemf6xtNU80QBVKM
Question
A data engineer has been given a new record of data:
id STRING = ‘a1’ rank INTEGER = 6 rating FLOAT = 9.4
Which SQL commands can be used to append the new record to an existing Delta table my_table?
Choices
- A: INSERT INTO my_table VALUES (‘a1’, 6, 9.4)
- B: INSERT VALUES (‘a1’, 6, 9.4) INTO my_table
- C: UPDATE my_table VALUES (‘a1’, 6, 9.4)
- D: UPDATE VALUES (‘a1’, 6, 9.4) my_table
answer?
Answer: A Answer_ET: A Discussion
Comment 1218488 by MDWPartners
- Upvotes: 4
Repeated, correct.
Question ptrq1JUozHIE9tFNLYdW
Question
A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance.
Which keyword can be used to compact the small files?
Choices
- A: OPTIMIZE
- B: VACUUM
- C: COMPACTION
- D: REPARTITION
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1360190 by Soori567
- Upvotes: 1
Selected Answer: A OPTIMIZE to compact multiple small files into larger ones
Comment 1232538 by kim32
- Upvotes: 2
The OPTIMIZE command is used to compact small files into larger ones, which helps improve the performance of Delta Lake tables. It consolidates small files into fewer larger files to reduce the overhead associated with having many small files. This process is often referred to as “compaction” but the specific keyword in Databricks Delta Lake is OPTIMIZE.
Comment 1218490 by MDWPartners
- Upvotes: 1
Repeated, correct.
Question RUwDmePtTGCi3d7DRPat
Question
A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location.
Which of the following data entities should the data engineer create?
Choices
- A: Table
- B: Function
- C: View
- D: Temporary view
answer?
Answer: A Answer_ET: A Community answer A (50%) C (50%) Discussion
Comment 1387325 by kowal02
- Upvotes: 1
Selected Answer: A You can create a table using CTAS: CREATE TABLE AS SELECT … FROM … and the results will be saved to a physical location.
Comment 1360856 by SrinivasR
- Upvotes: 1
Selected Answer: C correct Answer is C View , as question say wants to create entity using couple of tables and that’s needs to used by others, so i think Answer is C View.
Comment 1290980 by Yuvazz
- Upvotes: 3
VIEW will not be physically like Meterialized VIEW. Answer is Table
Comment 1287511 by MohdAltaf19
- Upvotes: 2
Correct Answer is C As View is persited they are physically stored and accessable across cluster even when restarted or detactched .
Comment 1236917 by Dip1994
- Upvotes: 3
A is the correct answer as the question is asking for physical location