Questions and Answers
Question Irf8QUGT0NLGS3dFBbHC
Question
Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?
Choices
- A: DROP
- B: IGNORE
- C: MERGE
- D: APPEND
- E: INSERT
answer?
Answer: C Answer_ET: C Community answer C (94%) 6% Discussion
Comment 1262402 by 80370eb
- Upvotes: 3
Selected Answer: C C. MERGE
The MERGE command allows you to perform upserts (update and insert) into a Delta table, effectively avoiding duplicates by updating existing records and inserting new ones as needed.
Comment 1213173 by BharaniRaj
- Upvotes: 1
Selected Answer: C C is the right answer
Comment 1203174 by benni_ale
- Upvotes: 1
Selected Answer: C C merge
Comment 1113196 by SerGrey
- Upvotes: 1
Selected Answer: C Correct answer is C
Comment 1064802 by awofalus
- Upvotes: 1
Selected Answer: C C is correct
Comment 1047724 by J_1_2
- Upvotes: 1
Selected Answer: C Merge is correct
Comment 1028518 by DavidRou
- Upvotes: 2
MERGE INTO is the one to choose if you want to avoid duplicates.
Comment 1020507 by chris_mach
- Upvotes: 1
Selected Answer: C Merge is correct
Comment 1017354 by KalavathiP
- Upvotes: 1
Selected Answer: C Merge will avoid duplicates by comparing the results based on primary key columns
Comment 997923 by vctrhugo
- Upvotes: 3
Selected Answer: C C. MERGE
The MERGE command is used to write data into a Delta table while avoiding the writing of duplicate records. It allows you to perform an “upsert” operation, which means that it will insert new records and update existing records in the Delta table based on a specified condition. This helps maintain data integrity and avoid duplicates when adding new data to the table.
Comment 946000 by Atnafu
- Upvotes: 2
C. MERGE
To write data into a Delta table while avoiding the writing of duplicate records, you can use the MERGE command. The MERGE command in Delta Lake allows you to combine the ability to insert new records and update existing records in a single atomic operation.
The MERGE command compares the data being written with the existing data in the Delta table based on specified matching criteria, typically using a primary key or unique identifier. It then performs conditional actions, such as inserting new records or updating existing records, depending on the comparison results.
By using the MERGE command, you can handle the prevention of duplicate records in a more controlled and efficient manner. It allows you to synchronize and reconcile data from different sources while avoiding duplication and ensuring data integrity.
Therefore, option C, MERGE, is the correct command to use when writing data into a Delta table while avoiding the writing of duplicate records.
Comment 889315 by softthinkers
- Upvotes: 2
Answer is C. AS DROP is used to remove a table or database IGNORE is used to skip errors while executing a query. INSERT will add new records but will not avoid duplication so Merge is right answer
Comment 876213 by Varma_Saraswathula
- Upvotes: 2
Ans - C https://docs.databricks.com/sql/language-manual/delta-merge-into.html
Comment 875872 by naxacod574
- Upvotes: 1
Option C
Comment 861295 by XiltroX
- Upvotes: 1
Selected Answer: D Wrong answer. The correct answer is D.
Comment 858873 by knivesz
- Upvotes: 3
Selected Answer: C la unica opcion posible
Question xi85iH0ejppDMhZGUq77
Question
A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.
Which of the following describes how a data lakehouse could alleviate this issue?
Choices
- A: Both teams would respond more quickly to ad-hoc requests
- B: Both teams would use the same source of truth for their work
- C: Both teams would reorganize to report to the same department
- D: Both teams would be able to collaborate on projects in real-time
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1322554 by Manish_Kum
- Upvotes: 2
Selected Answer: B B is correct
Question JoxonN9nMR9NV4NYHAbp
Question
A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?
Choices
- A: SELECT * FROM sales
- B: spark.delta.table
- C: spark.sql
- D: spark.table
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1322555 by Manish_Kum
- Upvotes: 1
Selected Answer: C C is correct
Question ORcvACP85uckLE1bLbvZ
Question
A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.
Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?
Choices
- A: pyspark.sql.types.DateType
- B: datetime
- C: pyspark.sql.types.TimestampType
- D: Cron syntax
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1335986 by duzi
- Upvotes: 1
Selected Answer: D Question is repeated. See details on https://learn.microsoft.com/en-us/azure/databricks/jobs/scheduled
Question XyQAuPXru07NhesFdreT
Question
A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.
Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?
Choices
- A: The pipeline will need to be written entirely in Python
- B: The pipeline will need to stop using the medallion-based multi-hop architecture
- C: The pipeline will need to be written entirely in SQL
- D: The pipeline will need to use a batch source in place of a streaming source
answer?
Answer: D Answer_ET: D Community answer D (56%) A (33%) 11% Discussion
Comment 1409787 by Billybob0604
- Upvotes: 1
Selected Answer: C Delta Live Tables (DLT) currently requires SQL or Python for defining data pipelines. However, for streaming data, SQL has become the primary language for defining Delta Live Tables pipelines in Databricks.
Comment 1388415 by e872ce8
- Upvotes: 1
Selected Answer: A A & C. The pipeline will need to be written entirely in Python (if using Python APIs for Delta Live Tables) The pipeline will need to be written entirely in SQL (if using SQL-based Delta Live Tables)Delta Live Tables (DLT) is a declarative ETL framework built on Databricks, designed to simplify pipeline development and management. It supports: Python-based pipelines using @dlt.table decorators SQL-based pipelines using CREATE LIVE TABLE Since the data engineer is using Python and the data analyst is using SQL, the pipeline will need to be rewritten in one of the two supported languages.
Comment 1366381 by Kayceetalks
- Upvotes: 2
Selected Answer: D None of these options are currect
Comment 1328533 by MultiCloudIronMan
- Upvotes: 2
Selected Answer: A The correct response is A. None of these changes will need to be made. Delta Live Tables supports both Python and SQL, as well as streaming and batch sources. This means that the existing medallion-based multi-hop architecture can be maintained, and the pipeline can continue to use both Python and SQL for different layers. Therefore, no changes are necessary when migrating to Delta Live Tables.
Comment 1322557 by Manish_Kum
- Upvotes: 3
Selected Answer: D best choice in this questions is D