Questions and Answers

Question BJulJpSW9796OyyrOz35

Question

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Choices

A: They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to “Reliability Optimized.”
B: They can turn on the Auto Stop feature for the SQL endpoint.
C: They can increase the cluster size of the SQL endpoint.
D: They can turn on the Serverless feature for the SQL endpoint.
E: They can increase the maximum bound of the SQL endpoint’s scaling range.

answer?

Answer: D Answer_ET: D Community answer D (71%) C (21%) 7% Discussion

Comment 1115262 by carpa_jo

Upvotes: 18

Selected Answer: D The important point of this scenario is “when they are submitted to a non-running SQL endpoint”. So its not about increasing the instance size or the amount of instances to improve the query performance, but its about reducing the start-up time. A: Not possible, serverless can’t be combined with spot instance policies, see https://docs.databricks.com/en/compute/sql-warehouse/serverless.html#limitations B: Auto Stop is about terminating a SQL warehouse after x minutes of being idle. C: Increasing the cluster size provides more capacities for running queries, but doesn’t reduce start-up time. D: Serverless reduces start-up time from minutes to seconds. Jackpot! E: Increasing the max bound of the SQL endpoints scaling range will help with lots of sequencial queries, which is not the case here.

Comment 1127427 by azure_bimonster

Upvotes: 1

Selected Answer: D D is correct. Key phrase is “submitted to a non-running SQL endpoint”. Increasing cluster size is not going to help if that’s in a state like non-running.

Comment 1118438 by bartfto

Upvotes: 1

Selected Answer: D “when they are submitted to a non-running SQL endpoint” ANSWER D

Comment 1110217 by Garyn

Upvotes: 2

Selected Answer: C C. They can increase the cluster size of the SQL endpoint.

Explanation:

Increasing the cluster size of the SQL endpoint can enhance query performance by providing more computational resources to execute queries. This can potentially speed up query processing by allowing more parallelism, handling larger workloads, and reducing the time taken for query execution.

Comment 1101905 by AndreFR

Upvotes: 4

key word, “non-running SQL endpoint” which implies that the query is slow because the cluster needs time to get started.

I suggest answer D because :

A : Serverless & spot instances cannot be mixed ?

B : autostop means that jobs are submitted to non-running SQL endpoints

C : increasing the clustersize can compensate for slow startup time

D : serverless is able to start and scale faster than non-running SQL endpoints (seconds intead of minutes)

E : increasing maximum bound will help only if there are simultaneous queries

https://docs.gcp.databricks.com/en/lakehouse-architecture/cost-optimization/best-practices.html#use-serverless-for-your-workloads

Comment 1097421 by olaru

Upvotes: 2

Selected Answer: E maximum bound of the SQL endpoint’s scaling range

Comment 1089230 by nedlo

Upvotes: 2

Selected Answer: C D is wrong - its already Serverless (non running SQL endpoint) how would turning Serverless ON help? They also says C here https://community.databricks.com/t5/data-engineering/when-to-increase-maximum-bound-vs-when-to-increase-cluster-size/td-p/27880 . E is only true for autoscaling clusters

Comment 1085672 by msengupta

Upvotes: 2

Selected Answer: C https://community.databricks.com/t5/data-engineering/sql-query-takes-too-long-to-run/td-p/21884

Comment 1056585 by Syd

Upvotes: 2

Answer E:

https://www.databricks.com/blog/2022/03/10/top-5-databricks-performance-tips.html

Comment 1050190 by meow_akk

Upvotes: 1

Ans E : you re welcome :) https://community.databricks.com/t5/data-engineering/when-to-increase-maximum-bound-vs-when-to-increase-cluster-size/td-p/27880

Question O7IFmDadbgrXMXoKzTgF

Question

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

Choices

A: pyspark.sql.types.DateType
B: datetime
C: pyspark.sql.types.TimestampType
D: Cron syntax
E: There is no way to represent and submit this information programmatically

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1315969 by 806e7d2

Upvotes: 1

Selected Answer: D Databricks allows for programmatic representation and submission of job schedules using Cron syntax, which is a standardized format for defining schedules. This approach is particularly useful for transferring complex schedules between different jobs or automating job scheduling.

Cron syntax specifies schedules with fields for minutes, hours, day of the month, month, day of the week, and optionally year, making it ideal for representing complex scheduling patterns programmatically.

Comment 1264139 by 80370eb

Upvotes: 2

Selected Answer: D Cron syntax is a powerful way to define complex schedules programmatically. In Databricks, you can use Cron syntax to set up the scheduling of jobs, which allows for more flexibility and ease when transferring the schedule to other jobs without manually selecting each value in the scheduling form.

Comment 1084550 by 55f31c8

Upvotes: 2

Selected Answer: D https://docs.databricks.com/en/sql/user/queries/schedule-query.html

Comment 1050191 by meow_akk

Upvotes: 3

Ans D : Cron Syntax with that you can easily copy all the syntax

Question 5qyPnzCEZhkKbgaflWbN

Question

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?

Choices

A: Manually programming in an alert system in each cell of the Notebook
B: Setting up an Alert in the Job page
C: Setting up an Alert in the Notebook
D: There is no way to notify the Job owner in the case of Job failure
E: MLflow Model Registry Webhooks

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1264141 by 80370eb

Upvotes: 2

Selected Answer: B In Databricks, you can configure job notifications directly from the Jobs page, where you can specify that an email should be sent to the Job owner or other specified individuals in the case of Job failure. This is the most straightforward and automated way to ensure notifications are sent.

Comment 1090954 by Lavpak

Upvotes: 2

Selected Answer: B Setting up an alert in Jobs page

Comment 1050192 by meow_akk

Upvotes: 3

Ans B : https://docs.databricks.com/en/workflows/jobs/job-notifications.html

Question 3CcvaOdjq4IVfTmGHpBr

Question

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Choices

A: They can schedule the query to refresh every 1 day from the SQL endpoint’s page in Databricks SQL.
B: They can schedule the query to refresh every 12 hours from the SQL endpoint’s page in Databricks SQL.
C: They can schedule the query to refresh every 1 day from the query’s page in Databricks SQL.
D: They can schedule the query to run every 1 day from the Jobs UI.
E: They can schedule the query to run every 12 hours from the Jobs UI.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1110226 by Garyn

Upvotes: 3

Selected Answer: C The manager can schedule the query to refresh every 1 day from the query’s page in Databricks SQL (Option C). Here are the steps to do this:

In the Query Editor, click Schedule > Add schedule to open a menu with schedule settings.

Choose when to run the query. Use the dropdown pickers to specify the frequency, period, starting time, and time zone.

Click Create.

Comment 1101887 by AndreFR

Upvotes: 1

Selected Answer: C has to be every 1 day to run once day. https://docs.databricks.com/en/sql/user/queries/schedule-query.html

Comment 1089765 by kz_data

Upvotes: 1

Selected Answer: C Correct Answer is C

Comment 1056165 by kishore1980

Upvotes: 2

Selected Answer: C From the query editor page we have option to schedule the queries

Comment 1050785 by meow_akk

Upvotes: 1

Ans D : think option A might not be right since we are not doing scheduling in sql end points page

Question qGqbIVp1VpGEP3MbD8Ua

Question

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Choices

A: When another task needs to be replaced by the new task
B: When another task needs to fail before the new task begins
C: When another task has the same dependency libraries as the new task
D: When another task needs to use as little compute resources as possible
E: When another task needs to successfully complete before the new task begins

answer?

Answer: E Answer_ET: E Community answer E (100%) Discussion

Comment 1284139 by CommanderBigMac

Upvotes: 1

Selected Answer: E E is correct

Comment 1090952 by Lavpak

Upvotes: 1

Selected Answer: E https://docs.databricks.com/en/workflows/jobs/conditional-tasks.html

Comment 1050786 by meow_akk

Upvotes: 2

Ans E : E is correct since dependency means the dependent job must complete successfully.

vuthanhdatt's Second Brain

Explorer

31

Questions and Answers

Question BJulJpSW9796OyyrOz35

Question

Choices

Comment 1115262 by carpa_jo

Comment 1127427 by azure_bimonster

Comment 1118438 by bartfto

Comment 1110217 by Garyn

Comment 1101905 by AndreFR

Comment 1097421 by olaru

Comment 1089230 by nedlo

Comment 1085672 by msengupta

Comment 1056585 by Syd

Comment 1050190 by meow_akk

Question O7IFmDadbgrXMXoKzTgF

Question

Choices

Comment 1315969 by 806e7d2

Comment 1264139 by 80370eb

Comment 1084550 by 55f31c8

Comment 1050191 by meow_akk

Question 5qyPnzCEZhkKbgaflWbN

Question

Choices

Comment 1264141 by 80370eb

Comment 1090954 by Lavpak

Comment 1050192 by meow_akk

Question 3CcvaOdjq4IVfTmGHpBr

Question

Choices

Comment 1110226 by Garyn

Comment 1101887 by AndreFR

Comment 1089765 by kz_data

Comment 1056165 by kishore1980

Comment 1050785 by meow_akk

Question qGqbIVp1VpGEP3MbD8Ua

Question

Choices

Comment 1284139 by CommanderBigMac

Comment 1090952 by Lavpak

Comment 1050786 by meow_akk

Graph View

Table of Contents