Questions and Answers

Question H15MG7w2KSjqnl96FurD

Question

A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.

Which approach can the data engineer take to identify the table that is dropping the records?

Choices

A: They can set up separate expectations for each table when developing their DLT pipeline.
B: They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.
C: They can set up DLT to notify them via email when records are dropped.
D: They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.

answer?

Answer: D Answer_ET: D Discussion

Comment 1273332 by 9d4d68a

Upvotes: 5

Repeated, Correct

Question dj77z4lwOSYcRjblcFPJ

Question

What is used by Spark to record the offset range of the data being processed in each trigger in order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing?

Choices

A: Checkpointing and Write-ahead Logs
B: Replayable Sources and Idempotent Sinks
C: Write-ahead Logs and Idempotent Sinks
D: Checkpointing and Idempotent Sinks

answer?

Answer: D Answer_ET: A Community answer D (60%) A (40%) Discussion

Comment 1401251 by Lili97

Upvotes: 2

Selected Answer: D Hello, is it usual to have duplicated questions? What is the point of paying if some questions are repeated?

Comment 1330847 by grygi

Upvotes: 1

Selected Answer: A A is correct. I had this on the exam, from my results it seems so. I chose D and didn’t max this area and I was sure of all other answers.

Comment 1327314 by MultiCloudIronMan

Upvotes: 1

Selected Answer: D The correct answer is D. Checkpointing and Idempotent Sinks. In Structured Streaming, Spark uses checkpointing to reliably track the progress of the data being processed. Checkpointing saves the state of the streaming query, including the offset ranges of the data processed in each trigger. Idempotent sinks ensure that even if the same data is processed multiple times due to a failure and restart, the results remain consistent and correct.

Comment 1313097 by NzmD

Upvotes: 1

Selected Answer: A Repeated!

Comment 1273345 by 9d4d68a

Upvotes: 2

Repeated, Correct

The correct answer is A. Checkpointing and Write-ahead Logs. Checkpointing records the progress of streaming queries, while write-ahead logs (WALs) capture the data before it is processed, allowing Spark to recover and process data reliably in case of failures.

Question R75OKQYdzOoHvmDwhR7b

Question

What describes the relationship between Gold tables and Silver tables?

Choices

A: Gold tables are more likely to contain aggregations than Silver tables.
B: Gold tables are more likely to contain valuable data than Silver tables.
C: Gold tables are more likely to contain a less refined view of data than Silver tables.
D: Gold tables are more likely to contain truthful data than Silver tables.

answer?

Answer: A Answer_ET: A Community answer A (100%) Discussion

Comment 1327316 by MultiCloudIronMan

Upvotes: 1

Selected Answer: A Gold is final stage to feed analytics platforms

Comment 1273343 by 9d4d68a

Upvotes: 2

Repeated, Correct

Question 9Gm25ynrYzW2IyOJxUlu

Question

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL. Which of the following commands could the data engineering team use to access sales in PySpark?

Choices

A: SELECT * FROM sales
B: There is no way to share data between PySpark and SQL.
C: spark.sql(“sales”)D. spark.delta.table(“sales”)
D: spark.table(“sales”)

answer?

Answer: E Answer_ET: E Community answer E (95%) 5% Discussion

Comment 946766 by Atnafu

Upvotes: 14

E The spark.table() function in PySpark allows you to access tables registered in the catalog, including Delta tables. By specifying the table name (“sales”), the data engineering team can read the Delta table and perform various operations on it using PySpark.

Option A, SELECT * FROM sales, is a SQL syntax and cannot be directly used in PySpark.

Option B, “There is no way to share data between PySpark and SQL,” is incorrect. PySpark provides the capability to interact with data using both SQL and DataFrame/DataSet APIs.

Option C, spark.sql(“sales”), is a valid command to execute SQL queries on registered tables in PySpark. However, in this case, the “sales” argument alone is not a valid SQL query.

Option D, spark.delta.table(“sales”), is a specific method provided by Delta Lake to access Delta tables directly. While it can be used to access the “sales” table, it is not the most common approach in PySpark.

Comment 1344724 by dhohigh

Upvotes: 1

Selected Answer: E This answer is pure python and is a simple solution for the Question.

Comment 1272813 by 9d4d68a

Upvotes: 1

To access the Delta table sales using PySpark, the data engineering team can use the following command:

E. spark.table(“sales”)

This command allows them to load the table into a PySpark DataFrame, which they can then use for their tests and data processing in Python. No, the command spark.delta.table(“table name”) does not exist in PySpark. To access a Delta table, you should use:

spark.table(“table name”)

Or, if you need to use Delta-specific functionality, you would typically use Delta’s APIs or spark.read.format(“delta”).table(“table name”) to read the table into a DataFrame.

Comment 1262396 by 80370eb

Upvotes: 1

Selected Answer: E E. spark.table(“sales”)

This command allows the team to access the table using PySpark, enabling them to implement their tests in Python.

Comment 1252366 by souldiv

Upvotes: 1

spark.table() . E is the correct one

Comment 1203171 by benni_ale

Upvotes: 1

Selected Answer: E E is correct

Comment 1189113 by benni_ale

Upvotes: 2

Selected Answer: E e is correct

Comment 1177189 by Itmma

Upvotes: 1

Selected Answer: E E is correct

Comment 1113192 by SerGrey

Upvotes: 1

Selected Answer: E Correct answer is E

Comment 1109089 by Garyn

Upvotes: 4

Selected Answer: E E. spark.table(“sales”)

The spark.table() function in PySpark allows access to a registered table within the SparkSession. In this case, “sales” is the name of the Delta table created by the data analyst, and the spark.table() function enables access to this table for performing data engineering tests using Python (PySpark).

Comment 1106007 by csd

Upvotes: 1

C is correct Answer

Comment 1064788 by awofalus

Upvotes: 1

Selected Answer: E Correct is E

Comment 1017350 by KalavathiP

Upvotes: 1

Selected Answer: E E is correct

Comment 1016561 by d_b47

Upvotes: 1

Selected Answer: E delta is default.

Comment 921433 by ThomasReps

Upvotes: 2

Selected Answer: E It’s E. As stated by others, the default format is delta

If you try to run D, you get an error, that there are no “delta”-command for spark: “AttributeError: ‘SparkSession’ object has no attribute ‘delta’“. If you want to explicit tell it should be delta, then you need an “.option(format=‘delta’)” insted.

Comment 914071 by Dwarakkrishna

Upvotes: 1

You access data in Delta tables by the table name or the table path, as shown in the following examples: people_df = spark.read.table(table_name)

display(people_df)

Comment 895846 by prasioso

Upvotes: 1

I believe the answer is E as in databricks the default tables are delta tables hence spark.table should be enough. Have not seen a spark.delta.table function before.

Comment 892904 by Tickxit

Upvotes: 2

Selected Answer: E E: spark.table or spark.read.table

Comment 889298 by softthinkers

Upvotes: 1

Correct Answer is D spark.delta.table(“sales”) And the reason that its asking for delta table not normal table if its for normal table then it should be spark.table(“sales”)

Comment 889065 by Majjjj

Upvotes: 1

The correct answer is D.

The data engineering team can access the Delta table sales in PySpark by using the spark.delta.table command. This command is used to create a DataFrame based on a Delta table. Therefore, the correct command is spark.delta.table(“sales”).

Comment 876207 by Varma_Saraswathula

Upvotes: 1

Option E - https://spark.apache.org/docs/3.2.1/api/python/reference/api/pyspark.sql.SparkSession.table.html

Comment 875861 by naxacod574

Upvotes: 1

Option E

Comment 868052 by azurearch

Upvotes: 2

option E

Comment 867435 by SireeJ

Upvotes: 1

Option: D

Comment 860627 by sdas1

Upvotes: 2

Option E

Comment 860422 by knivesz

Upvotes: 3

Selected Answer: E Creamos una tabla: create or replace table delta_su (id INT , nombre STRING) Insertamos la tabla y posteriomente obtenemos los valores registrados con : spark.table(“delta_su”).show()

Comment 860276 by Retko

Upvotes: 4

E is correct, spark.table(“sales”)

Comment 857990 by XiltroX

Upvotes: 1

Selected Answer: C Correct answer is C

Question JxCrj6gQstqtDXOGPCxg

Question

What describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Choices

A: CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.
B: CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.
C: CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.
D: CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1327318 by MultiCloudIronMan

Upvotes: 1

Selected Answer: B Streaming data from source to destination

Comment 1273344 by 9d4d68a

Upvotes: 2

Repeated, Correct

vuthanhdatt's Second Brain

Explorer

5

Questions and Answers

Question H15MG7w2KSjqnl96FurD

Question

Choices

Comment 1273332 by 9d4d68a

Question dj77z4lwOSYcRjblcFPJ

Question

Choices

Comment 1401251 by Lili97

Comment 1330847 by grygi

Comment 1327314 by MultiCloudIronMan

Comment 1313097 by NzmD

Comment 1273345 by 9d4d68a

Question R75OKQYdzOoHvmDwhR7b

Question

Choices

Comment 1327316 by MultiCloudIronMan

Comment 1273343 by 9d4d68a

Question 9Gm25ynrYzW2IyOJxUlu

Question

Choices

Comment 946766 by Atnafu

Comment 1344724 by dhohigh

Comment 1272813 by 9d4d68a

Comment 1262396 by 80370eb

Comment 1252366 by souldiv

Comment 1203171 by benni_ale

Comment 1189113 by benni_ale

Comment 1177189 by Itmma

Comment 1113192 by SerGrey

Comment 1109089 by Garyn

Comment 1106007 by csd

Comment 1064788 by awofalus

Comment 1017350 by KalavathiP

Comment 1016561 by d_b47

Comment 921433 by ThomasReps

Comment 914071 by Dwarakkrishna

Comment 895846 by prasioso

Comment 892904 by Tickxit

Comment 889298 by softthinkers

Comment 889065 by Majjjj

Comment 876207 by Varma_Saraswathula

Comment 875861 by naxacod574

Comment 868052 by azurearch

Comment 867435 by SireeJ

Comment 860627 by sdas1

Comment 860422 by knivesz

Comment 860276 by Retko

Comment 857990 by XiltroX

Question JxCrj6gQstqtDXOGPCxg

Question

Choices

Comment 1327318 by MultiCloudIronMan

Comment 1273344 by 9d4d68a

Graph View

Table of Contents