Questions and Answers
Question EGDkICKOhbEAEYwJx5Gd
Question
Incorporating unit tests into a PySpark application requires upfront attention to the design of your jobs, or a potentially significant refactoring of existing code.
Which benefit offsets this additional effort?
Choices
- A: Improves the quality of your data
- B: Validates a complete use case of your application
- C: Troubleshooting is easier since all steps are isolated and tested individually
- D: Ensures that all steps interact correctly to achieve the desired end result
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1300631 by m79590530
- Upvotes: 1
Selected Answer: C C is testing each unit of the solution separately. It doesn’t necessarily validate the data quality as mentioned in A. B is more for Business Case scenario testing like end-to-end testing for real life, real data execution. D is more related to Integration testing.
Comment 1222347 by imatheushenrique
- Upvotes: 1
C. Troubleshooting is easier since all steps are isolated and tested individually The unit tests will ensuree that specific functions and transformations will work as intended.
Question BVVA949iMldGOz9Fi96I
Question
What describes integration testing?
Choices
- A: It validates an application use case.
- B: It validates behavior of individual elements of an application,
- C: It requires an automated testing framework.
- D: It validates interactions between subsystems of your application.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1300634 by m79590530
- Upvotes: 1
Selected Answer: D Interactions and cooperation between the solution and/or interfacing systems/data consumers components, subsystems and interfaces is exactly Integration testing.
Comment 1222346 by imatheushenrique
- Upvotes: 2
D. It validates interactions between subsystems of your application. An integration test is used for different softwares validation components, subsystems, or applications that has dependencies.
Question tTFbNVzpPoerY5nqMHkt
Question
The Databricks CLI is used to trigger a run of an existing job by passing the job_id parameter. The response that the job run request has been submitted successfully includes a field run_id.
Which statement describes what the number alongside this field represents?
Choices
- A: The job_id and number of times the job has been run are concatenated and returned.
- B: The globally unique ID of the newly triggered run.
- C: The number of times the job definition has been run in this workspace.
- D: The job_id is returned in this field.
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1300638 by m79590530
- Upvotes: 1
Selected Answer: B Every job run creates and assigns a globally unique run ID to the job RUN as well as globally unique run ID’s for the Tasks RUN’s inside the Job.
Question m0v8KDZYlM6jiSgNgAA8
Question
A Databricks job has been configured with three tasks, each of which is a Databricks notebook. Task A does not depend on other tasks. Tasks B and C run in parallel, with each having a serial dependency on task A.
What will be the resulting state if tasks A and B complete successfully but task C fails during a scheduled run?
Choices
- A: All logic expressed in the notebook associated with tasks A and B will have been successfully completed; some operations in task C may have completed successfully.
- B: Unless all tasks complete successfully, no changes will be committed to the Lakehouse; because task C failed, all commits will be rolled back automatically.
- C: Because all tasks are managed as a dependency graph, no changes will be committed to the Lakehouse until all tasks have successfully been completed.
- D: All logic expressed in the notebook associated with tasks A and B will have been successfully completed; any changes made in task C will be rolled back due to task failure.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1300641 by m79590530
- Upvotes: 1
Selected Answer: A Each Notebook or Task consists of multiple commands and actions performed by them. Each action may be on the data in the Delta Lake where ACID transactions take place and fully rollback certain data manipulations if some of them fail but the Notebooks/Tasks in the Job themselves will not completely fail or rollback. Therefore Answer A correctly describes the result considering the dependencies configures between the Notebooks/Tasks as described in the question.
Comment 1222341 by imatheushenrique
- Upvotes: 1
A: A. All logic expressed in the notebook associated with tasks A and B will have been successfully completed; some operations in task C may have completed successfully.
Because this type of orchestration indicates a Fan-Out.
Question 21EnX3xmZksEeqkBmYJt
Question
Which statement regarding stream-static joins and static Delta tables is correct?
Choices
- A: Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch.
- B: Each microbatch of a stream-static join will use the most recent version of the static Delta table as of the job’s initialization.
- C: The checkpoint directory will be used to track state information for the unique keys present in the join.
- D: Stream-static joins cannot use static Delta tables because of consistency issues.
- E: The checkpoint directory will be used to track updates to the static Delta table.
answer?
Answer: A Answer_ET: A Community answer A (89%) 11% Discussion
Comment 1013280 by Eertyy
- Upvotes: 12
B is the right answer as Option B is more typical for stream-static joins, as it provides a consistent static DataFrame snapshot for the entire job’s duration. Option A might be suitable in specialized cases where you need real-time updates of the static DataFrame for each microbatch.
Comment 983582 by BrianNguyen95
- Upvotes: 6
correct answer is A
Comment 1334769 by arekm
- Upvotes: 3
Selected Answer: A A is correct, see: https://learn.microsoft.com/en-us/azure/databricks/transform/join#stream-static
Comment 1325453 by Sriramiyer92
- Upvotes: 1
Selected Answer: B A stream-static join joins the latest valid version of a Delta table (the static data) to a data stream using a stateless join.
When Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch. Because the join is stateless, you do not need to configure watermarking and can process results with low latency. The data in the static Delta table used in the join should be slowly-changing.
Comment 1325446 by Sriramiyer92
- Upvotes: 1
Selected Answer: A https://docs.databricks.com/en/transform/join.html#stream-static
Comment 1291403 by akashdesarda
- Upvotes: 1
Selected Answer: A This is straight from docs, “A stream-static join joins the latest valid version of a Delta table (the static data) to a data stream using a stateless join.
When Azure Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch. Because the join is stateless, you do not need to configure watermarking and can process results with low latency. The data in the static Delta table used in the join should be slowly-changing.” https://learn.microsoft.com/en-us/azure/databricks/transform/join#stream-static
Comment 1118607 by kz_data
- Upvotes: 1
Selected Answer: A correct answer is A
Comment 1086137 by hamzaKhribi
- Upvotes: 1
Selected Answer: A Correct Answer A
Comment 1040322 by sturcu
- Upvotes: 1
Selected Answer: A A is correct. When Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static
Comment 1022082 by sagar21692
- Upvotes: 1
Correct answer is A. https://docs.databricks.com/en/structured-streaming/delta-lake.html