Questions and Answers
Question Z1Cid99sZCVRHWF32GIf
Question
A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have been made and synced to the central Git repository. The data engineer now needs to sync their Databricks Repo to get the changes from the central Git repository.
Which of the following Git operations does the data engineer need to run to accomplish this task?
Choices
- A: Merge
- B: Push
- C: Pull
- D: Commit
- E: Clone
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1203815 by benni_ale
- Upvotes: 2
Selected Answer: C C is correct
Comment 1171819 by [Removed]
- Upvotes: 1
Selected Answer: C C is correct
Comment 1057257 by god_father
- Upvotes: 1
Selected Answer: C This is more of a Git question.
From the docs: In Databricks Repos, you can use Git functionality to: Clone, push to, and pull from a remote Git repository. Create and manage branches for development work, including merging, rebasing, and resolving conflicts. Create notebooks—including IPYNB notebooks—and edit them and other files. Visually compare differences upon commit and resolve merge conflicts.
Source: https://docs.databricks.com/en/repos/index.html
Comment 1048837 by kishanu
- Upvotes: 2
Selected Answer: C pull is required from the Databricks Repo to sync the changes b/w local and central repo.
Question Ah1zfPeZa0h58h56PNy2
Question
Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?
Choices
- A: Cloud-specific integrations
- B: Simplified governance
- C: Ability to scale storage
- D: Ability to scale workloads
- E: Avoiding vendor lock-in
answer?
Answer: E Answer_ET: E Community answer E (100%) Discussion
Comment 1263461 by 80370eb
- Upvotes: 2
Selected Answer: E By embracing open-source technologies, the platform allows users to avoid being locked into a single vendor’s ecosystem, offering flexibility and the ability to integrate with a wide range of tools and systems.
Comment 1203816 by benni_ale
- Upvotes: 1
Selected Answer: E E is correct
Comment 1132177 by UGOTCOOKIES
- Upvotes: 4
Selected Answer: E E is correct as open-source is opposite of proprietary technology, so not being a proprietary means it is free of vendor lock in, if that makes sense.
Comment 1050925 by meow_akk
- Upvotes: 3
its avoiding vendor lock in : - https://double.cloud/blog/posts/2023/01/break-free-from-vendor-lock-in-with-open-source-tech/
Comment 1048839 by kishanu
- Upvotes: 2
Selected Answer: E E looks to be the correct one, as Databricks Lakeshouse platform supports Delta table which is an open-source format for storage.
Comment 1048187 by Rs1997
- Upvotes: 1
D is the correct answer
Question FxlvXTITPDF3W5vgPALk
Question
A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.
In which of the following locations can the data engineer review their permissions on the table?
Choices
- A: Databricks Filesystem
- B: Jobs
- C: Dashboards
- D: Repos
- E: Data Explorer
answer?
Answer: E Answer_ET: E Community answer E (100%) Discussion
Comment 1263462 by 80370eb
- Upvotes: 2
Selected Answer: E Data Explorer in Databricks allows users to view and manage permissions for tables, schemas, and databases.
Comment 1203817 by benni_ale
- Upvotes: 1
Selected Answer: E E is correct
Comment 1089721 by kz_data
- Upvotes: 4
Selected Answer: E E is correct answer
Comment 1050924 by meow_akk
- Upvotes: 2
E is correct Data explorer
Question uY79pTpvuKhv9hGPxo0d
Question
Which of the following describes a scenario in which a data engineer will want to use a single-node cluster?
Choices
- A: When they are working interactively with a small amount of data
- B: When they are running automated reports to be refreshed as quickly as possible
- C: When they are working with SQL within Databricks SQL
- D: When they are concerned about the ability to automatically scale with larger data
- E: When they are manually running reports with a large amount of data
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1048841 by kishanu
- Upvotes: 5
Selected Answer: A Single node clusters can be used for interactive queries with small dataset
Comment 1203819 by benni_ale
- Upvotes: 1
Selected Answer: A A is correct
Comment 1127370 by azure_bimonster
- Upvotes: 2
Selected Answer: A A seems correct for this
Comment 1050929 by meow_akk
- Upvotes: 4
ans A : A Single Node cluster is a cluster consisting of an Apache Spark driver and no Spark workers. A Single Node cluster supports Spark jobs and all Spark data sources, including Delta Lake. A Standard cluster requires a minimum of one Spark worker to run Spark jobs. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwidg8mSsYqCAxUmg2oFHbkTDJsQFnoECA4QAw&url=https%3A%2F%2Fdocs.databricks.com%2Fen%2Fclusters%2Fsingle-node.html%23%3A~%3Atext%3DA%2520Single%2520Node%2520cluster%2520is%2Cworker%2520to%2520run%2520Spark%2520jobs.&usg=AOvVaw3PFq3_Qyt2gAAa4id0j6CS&opi=89978449
Question Lixh8bolLLxEXIYBiSV4
Question
Which of the following describes the storage organization of a Delta table?
Choices
- A: Delta tables are stored in a single file that contains data, history, metadata, and other attributes.
- B: Delta tables store their data in a single file and all metadata in a collection of files in a separate location.
- C: Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.
- D: Delta tables are stored in a collection of files that contain only the data stored within the table.
- E: Delta tables are stored in a single file that contains only the data stored within the table.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1339009 by Tedet
- Upvotes: 2
Selected Answer: C Delta tables store data in a structured manner using Parquet files, and they also maintain metadata and transaction logs in separate directories. This organization allows for versioning, transactional capabilities, and metadata tracking in Delta Lake. Thank you for pointing out the error, and I appreciate your understanding.
Comment 1312099 by 806e7d2
- Upvotes: 2
Selected Answer: C Delta tables use a distributed storage format, where data, history, metadata, and other attributes are stored across multiple files. This includes data files (e.g., Parquet files) for the actual data and log files for transaction history and metadata, allowing Delta Lake to support version control, schema enforcement, and ACID properties.
Comment 997863 by vctrhugo
- Upvotes: 3
Selected Answer: C C. Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.
Delta tables store data in a structured manner using Parquet files, and they also maintain metadata and transaction logs in separate directories. This organization allows for versioning, transactional capabilities, and metadata tracking in Delta Lake. Thank you for pointing out the error, and I appreciate your understanding.
Comment 1262386 by 80370eb
- Upvotes: 1
Selected Answer: C C. Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.
Comment 1227524 by mascarenhaslucas
- Upvotes: 1
Selected Answer: C The answer is C!
Comment 1188491 by benni_ale
- Upvotes: 4
Selected Answer: C GPT4: Delta tables in Databricks use: Parquet format files for data storage. A _delta_log folder for JSON log files that track transactions. Scheme enforcement in metadata to ensure consistency. Checkpoint files to speed up the rebuilding of the table state.
Comment 1177168 by Itmma
- Upvotes: 1
Selected Answer: C C is correct
Comment 1104699 by SerGrey
- Upvotes: 1
Selected Answer: C C is correct
Comment 1028759 by VijayKula
- Upvotes: 1
Answer is C
Comment 1022440 by Sriramiyer92
- Upvotes: 2
Reading Material: 5 reasons to choose Delta format (on Databricks) https://medium.com/datalex/5-reasons-to-use-delta-lake-format-on-databricks-d9e76cf3e77d
Comment 1017340 by KalavathiP
- Upvotes: 1
Selected Answer: C Correct ans C
Comment 982210 by andie123
- Upvotes: 2
Selected Answer: C C is the right answer
Comment 946757 by Atnafu
- Upvotes: 2
C Delta tables in Databricks Delta Lake are stored in a collection of files organized in a directory structure. This directory structure includes data files, transaction log files, and metadata files. These files are stored in a specified location, typically in a distributed file system such as Hadoop Distributed File System (HDFS) or Amazon S3.
Comment 895823 by prasioso
- Upvotes: 3
First selected D as I assumed the data to be stored in the Delta lake and the transaction log to be stored separately. However, documentation states when a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. The deltalog contains multiple files hence a collection of files. Answer C.
Comment 863842 by Data_4ever
- Upvotes: 3
Selected Answer: C C is the right option
Comment 860308 by knivesz
- Upvotes: 1
Selected Answer: C C , respuesta correcta
Comment 857958 by XiltroX
- Upvotes: 2
C is correct answer https://docs.delta.io/latest/delta-faq.html#:~:text=Delta%20Lake%20uses%20versioned%20Parquet,directory%20to%20provide%20ACID%20transactions.