Questions and Answers
Question BJulJpSW9796OyyrOz35
Question
A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.
Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?
Choices
- A: They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to “Reliability Optimized.”
- B: They can turn on the Auto Stop feature for the SQL endpoint.
- C: They can increase the cluster size of the SQL endpoint.
- D: They can turn on the Serverless feature for the SQL endpoint.
- E: They can increase the maximum bound of the SQL endpoint’s scaling range.
answer?
Answer: D Answer_ET: D Community answer D (71%) C (21%) 7% Discussion
Comment 1115262 by carpa_jo
- Upvotes: 18
Selected Answer: D The important point of this scenario is “when they are submitted to a non-running SQL endpoint”. So its not about increasing the instance size or the amount of instances to improve the query performance, but its about reducing the start-up time. A: Not possible, serverless can’t be combined with spot instance policies, see https://docs.databricks.com/en/compute/sql-warehouse/serverless.html#limitations B: Auto Stop is about terminating a SQL warehouse after x minutes of being idle. C: Increasing the cluster size provides more capacities for running queries, but doesn’t reduce start-up time. D: Serverless reduces start-up time from minutes to seconds. Jackpot! E: Increasing the max bound of the SQL endpoints scaling range will help with lots of sequencial queries, which is not the case here.
Comment 1127427 by azure_bimonster
- Upvotes: 1
Selected Answer: D D is correct. Key phrase is “submitted to a non-running SQL endpoint”. Increasing cluster size is not going to help if that’s in a state like non-running.
Comment 1118438 by bartfto
- Upvotes: 1
Selected Answer: D “when they are submitted to a non-running SQL endpoint” ANSWER D
Comment 1110217 by Garyn
- Upvotes: 2
Selected Answer: C C. They can increase the cluster size of the SQL endpoint.
Explanation:
Increasing the cluster size of the SQL endpoint can enhance query performance by providing more computational resources to execute queries. This can potentially speed up query processing by allowing more parallelism, handling larger workloads, and reducing the time taken for query execution.
Comment 1101905 by AndreFR
- Upvotes: 4
key word, “non-running SQL endpoint” which implies that the query is slow because the cluster needs time to get started.
I suggest answer D because :
A : Serverless & spot instances cannot be mixed ?
B : autostop means that jobs are submitted to non-running SQL endpoints
C : increasing the clustersize can compensate for slow startup time
D : serverless is able to start and scale faster than non-running SQL endpoints (seconds intead of minutes)
E : increasing maximum bound will help only if there are simultaneous queries
Comment 1097421 by olaru
- Upvotes: 2
Selected Answer: E maximum bound of the SQL endpoint’s scaling range
Comment 1089230 by nedlo
- Upvotes: 2
Selected Answer: C D is wrong - its already Serverless (non running SQL endpoint) how would turning Serverless ON help? They also says C here https://community.databricks.com/t5/data-engineering/when-to-increase-maximum-bound-vs-when-to-increase-cluster-size/td-p/27880 . E is only true for autoscaling clusters
Comment 1085672 by msengupta
- Upvotes: 2
Selected Answer: C https://community.databricks.com/t5/data-engineering/sql-query-takes-too-long-to-run/td-p/21884
Comment 1056585 by Syd
- Upvotes: 2
Answer E:
https://www.databricks.com/blog/2022/03/10/top-5-databricks-performance-tips.html
Comment 1050190 by meow_akk
- Upvotes: 1
Ans E : you re welcome :) https://community.databricks.com/t5/data-engineering/when-to-increase-maximum-bound-vs-when-to-increase-cluster-size/td-p/27880
Question O7IFmDadbgrXMXoKzTgF
Question
A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.
Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?
Choices
- A: pyspark.sql.types.DateType
- B: datetime
- C: pyspark.sql.types.TimestampType
- D: Cron syntax
- E: There is no way to represent and submit this information programmatically
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1315969 by 806e7d2
- Upvotes: 1
Selected Answer: D Databricks allows for programmatic representation and submission of job schedules using Cron syntax, which is a standardized format for defining schedules. This approach is particularly useful for transferring complex schedules between different jobs or automating job scheduling.
Cron syntax specifies schedules with fields for minutes, hours, day of the month, month, day of the week, and optionally year, making it ideal for representing complex scheduling patterns programmatically.
Comment 1264139 by 80370eb
- Upvotes: 2
Selected Answer: D Cron syntax is a powerful way to define complex schedules programmatically. In Databricks, you can use Cron syntax to set up the scheduling of jobs, which allows for more flexibility and ease when transferring the schedule to other jobs without manually selecting each value in the scheduling form.
Comment 1084550 by 55f31c8
- Upvotes: 2
Selected Answer: D https://docs.databricks.com/en/sql/user/queries/schedule-query.html
Comment 1050191 by meow_akk
- Upvotes: 3
Ans D : Cron Syntax with that you can easily copy all the syntax
Question 5qyPnzCEZhkKbgaflWbN
Question
Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?
Choices
- A: Manually programming in an alert system in each cell of the Notebook
- B: Setting up an Alert in the Job page
- C: Setting up an Alert in the Notebook
- D: There is no way to notify the Job owner in the case of Job failure
- E: MLflow Model Registry Webhooks
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1264141 by 80370eb
- Upvotes: 2
Selected Answer: B In Databricks, you can configure job notifications directly from the Jobs page, where you can specify that an email should be sent to the Job owner or other specified individuals in the case of Job failure. This is the most straightforward and automated way to ensure notifications are sent.
Comment 1090954 by Lavpak
- Upvotes: 2
Selected Answer: B Setting up an alert in Jobs page
Comment 1050192 by meow_akk
- Upvotes: 3
Ans B : https://docs.databricks.com/en/workflows/jobs/job-notifications.html
Question 3CcvaOdjq4IVfTmGHpBr
Question
An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.
Which of the following approaches can the manager use to ensure the results of the query are updated each day?
Choices
- A: They can schedule the query to refresh every 1 day from the SQL endpoint’s page in Databricks SQL.
- B: They can schedule the query to refresh every 12 hours from the SQL endpoint’s page in Databricks SQL.
- C: They can schedule the query to refresh every 1 day from the query’s page in Databricks SQL.
- D: They can schedule the query to run every 1 day from the Jobs UI.
- E: They can schedule the query to run every 12 hours from the Jobs UI.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1110226 by Garyn
- Upvotes: 3
Selected Answer: C The manager can schedule the query to refresh every 1 day from the query’s page in Databricks SQL (Option C). Here are the steps to do this:
- In the Query Editor, click Schedule > Add schedule to open a menu with schedule settings.
- Choose when to run the query. Use the dropdown pickers to specify the frequency, period, starting time, and time zone.
- Click Create.
Comment 1101887 by AndreFR
- Upvotes: 1
Selected Answer: C has to be every 1 day to run once day. https://docs.databricks.com/en/sql/user/queries/schedule-query.html
Comment 1089765 by kz_data
- Upvotes: 1
Selected Answer: C Correct Answer is C
Comment 1056165 by kishore1980
- Upvotes: 2
Selected Answer: C From the query editor page we have option to schedule the queries
Comment 1050785 by meow_akk
- Upvotes: 1
Ans D : think option A might not be right since we are not doing scheduling in sql end points page
Question qGqbIVp1VpGEP3MbD8Ua
Question
In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?
Choices
- A: When another task needs to be replaced by the new task
- B: When another task needs to fail before the new task begins
- C: When another task has the same dependency libraries as the new task
- D: When another task needs to use as little compute resources as possible
- E: When another task needs to successfully complete before the new task begins
answer?
Answer: E Answer_ET: E Community answer E (100%) Discussion
Comment 1284139 by CommanderBigMac
- Upvotes: 1
Selected Answer: E E is correct
Comment 1090952 by Lavpak
- Upvotes: 1
Selected Answer: E https://docs.databricks.com/en/workflows/jobs/conditional-tasks.html
Comment 1050786 by meow_akk
- Upvotes: 2
Ans E : E is correct since dependency means the dependent job must complete successfully.