Questions and Answers
Question P4gq703E7glyK9hJZhpC
Question
A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task. Which of the following approaches can the data engineer use to set up the new task?
Choices
- A: They can clone the existing task in the existing Job and update it to run the new notebook.
- B: They can create a new task in the existing Job and then add it as a dependency of the original task.
- C: They can create a new task in the existing Job and then add the original task as a dependency of the new task.
- D: They can create a new job from scratch and add both tasks to run concurrently.
- E: They can clone the existing task to a new Job and then edit it to run the new notebook.
answer?
Answer: B Answer_ET: B Community answer B (65%) C (35%) Discussion
Comment 870363 by Redwings538
- Upvotes: 24
Selected Answer: B It seems there is some confusion on what dependency means in this case. Option B is correct because adding the new task as a dependency of the original task means that the new task will run BEFORE the original task, which is the goal defined in the question.
Comment 864064 by Data_4ever
- Upvotes: 15
Selected Answer: B B is the right answer.
Comment 1402228 by Billybob0604
- Upvotes: 3
Selected Answer: C The new task should run before the original task, meaning the original task must depend on the new task
Comment 1360336 by pint414
- Upvotes: 1
Selected Answer: B B as the new task runs first
Comment 1358131 by avidlearner
- Upvotes: 1
Selected Answer: C I think the confusion here is because it mentions “as a dependency” which to my opinion means following. if we go by that wording C is the correct answer because we want the original task to be run after the new task.
Comment 1338556 by Usaha1
- Upvotes: 1
Selected Answer: B B because when we add a task which is supposed to run after previous task then dependency (“depends on”) gets added to the second job, not the first job.
Comment 1337888 by rohitrc8521
- Upvotes: 2
Selected Answer: C Answer is C, folks Please pay solid attention to the wording. They deliberately have constructed the wordings of option B and C to confuse the audience.
Comment 1337694 by danishanis
- Upvotes: 3
Selected Answer: C I think the correct answer should be C and not B. Adding the new task as a dependency of the original task would mean that the original task runs first and then the new task runs. This is the opposite of what is desired in the question.
Comment 1332147 by brconejeros
- Upvotes: 1
Selected Answer: C Basically because on the sentence we have a prior: “they need to set up another task to run a new notebook prior to the original task.”. So, the correct answer is C
Comment 1330632 by Rifrif
- Upvotes: 1
Selected Answer: B the answer B as it need runs before start working
Comment 1329482 by sam_chalvet
- Upvotes: 1
Selected Answer: B B - Event without know anything about Databricks, answer B is how I would want to be able to handle this scenario, it makes the most sense.
Comment 1314210 by 806e7d2
- Upvotes: 1
Selected Answer: B In Databricks Jobs, you can manage task dependencies within a single job. If you want to add a new task that needs to run before the original task due to an upstream issue, the appropriate approach would be to:
Create a new task: This new task would run the notebook that addresses the upstream data issue. Add it as a dependency of the original task: By making the new task dependent on the original task, you ensure that the new task runs first, and only after its successful completion will the original task run. This approach ensures that the sequence of tasks is correctly managed in a single job, with dependencies explicitly defined.
Comment 1290029 by Colje
- Upvotes: 1
C. They can create a new task in the existing Job and then add the original task as a dependency of the new task.
Why this is correct: In Databricks, you can set up a task dependency chain by adding a new task and specifying that the original task depends on the new one. This ensures that the new task will run first, followed by the original task.
Comment 1288366 by tangerine141
- Upvotes: 1
Selected Answer: B Both B and C involve dependencies between tasks, but the difference is in how the dependencies are structured:
B: “They can create a new task in the existing Job and then add it as a dependency of the original task.”
In this case, the new task is added as a prerequisite (dependency) for the original task. This means the new task will run first, and once it’s completed, the original task will run.
C: “They can create a new task in the existing Job and then add the original task as a dependency of the new task.”
In this case, the original task is added as a dependency for the new task, meaning the new task will wait for the original task to finish before running.
The correct answer is B: You want the new task (the one handling the upstream issue) to run before the original task, so it should be set as a dependency of the original task.
Comment 1286700 by Stefan94
- Upvotes: 1
Selected Answer: B B is correct as Redwings538 says
Comment 1276108 by CID2024
- Upvotes: 2
I think the Correct answer is C. Because as per the statement in the question “they need to set up another task to run a new notebook prior to the original task.” i.e. original task should run AFTER the new task.
So, By creating a new task in the existing job and setting the original task as a dependency of the new task, the data engineer ensures that the new notebook runs first, followed by the original task. This approach maintains the sequence of execution required to address the upstream data issue.
Comment 1272902 by 9d4d68a
- Upvotes: 2
Below is the info I am convinced after checking with AI… Here’s the break down the differences between options B and C:
Option B: Create a new task in the existing Job and then add it as a dependency of the original task: Result: The new task will run after the original task.
Option C: Create a new task in the existing Job and then add the original task as a dependency of the new task:
Result: The new task will run before the original task.
Summary: Option B: Original task → New task Option C: New task → Original task In your case, Option C is the correct choice because you need the new task to run first to resolve the upstream data issue before the original task executes.
Comment 1266880 by 7a22144
- Upvotes: 1
C is correct because it correctly handles the sequence of execution. By creating a new task in the existing Job and adding the original task as a dependency of the new task, the new task will run first, and once it completes successfully, the original task will run. This ensures that the upstream data issue is addressed before the original task runs.
Comment 1216596 by kokosz
- Upvotes: 2
Selected Answer: B B is the right answer.
Comment 1203803 by benni_ale
- Upvotes: 1
Selected Answer: B original depends on new
Comment 1166276 by Mircuz
- Upvotes: 3
Selected Answer: C C because the new task has to run prior the original one
Comment 1133115 by Nika12
- Upvotes: 6
Selected Answer: B Just got 100% on the test. B was correct.
Comment 1125950 by Shaxxie
- Upvotes: 3
This has become more of a English grammatical test as the word dependency is confusing people. When the Original task has a dependency on the new task this means the original task needs to depend on the new task. So it’s Option C.
Comment 1109414 by Garyn
- Upvotes: 4
Selected Answer: C The data engineer can create a new task in the existing Job and then add the original task as a dependency of the new task (Option C). This way, the new task will run first, and once it’s completed, the original task will run. Here are the steps to do this:
Click Workflows in the sidebar and click New and select Job. The Tasks tab appears with the create task dialog. Replace Add a name for your job… with your job name. Enter a name for the task in the Task name field. In the Type drop-down menu, select the type of task to run. Configure the cluster where the task runs. To add dependent libraries, click + Add next to Dependent libraries. You can pass parameters for your task. Please note that the exact process may vary depending on the specific configurations and permissions set up in your workspace. It’s always a good idea to consult with your organization’s IT or data governance team to ensure the correct procedures are followed.
Comment 1106573 by Tinendra
- Upvotes: 5
Answer is C
Comment 1094611 by nedlo
- Upvotes: 1
Selected Answer: B I am pretty sure its B - “they need to set up another task to run a new notebook prior to the original task.” - so NEW task need to run BEFORE ORIGINAL task. So NEW TASK should be DEPENDENCY of ORIGINAL TASK (or in other words: original task is dependent on new task)
Comment 1065601 by ObeOne
- Upvotes: 5
“A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.”
In the tasks UI of the Job:
- Create a new task
- Select original task
- In original task for “depends on” enter *new task” - as new task needs to run prior to original task, ie, original task has a dependency on new task
from 1. create new task … from 3. original task has a dependency on new task
Answer is C … They can create a new task in the existing Job and then add the original task as a dependency of the new task.
Comment 1064897 by awofalus
- Upvotes: 2
Selected Answer: C Correct is C because original task will run after the newer, and then, depend on it
Comment 1059189 by ObeOne
- Upvotes: 2
C is correct
Comment 1058979 by DavidRou
- Upvotes: 1
Right answer: B We need to add the new task as a dependency of the original one because the question says that it needs to be run before the original task.
Comment 1049033 by kbaba101
- Upvotes: 1
This is a Grammar issue not a Databricks issue: Add A as a dependency of B means A must run before B.
Comment 998006 by vctrhugo
- Upvotes: 3
Selected Answer: C C. They can create a new task in the existing Job and then add the original task as a dependency of the new task.
To set up a new task that runs a new notebook prior to the original task in an existing Job, you can create a new task within the same Job and then set the original task as a dependency for the new task. This way, the new task will execute before the original task when the Job is triggered.
Comment 993228 by [Removed]
- Upvotes: 2
Selected Answer: B B is the right answer
Comment 990001 by poTEYtoe_poTAHtoe
- Upvotes: 2
Selected Answer: B B is correct. I misunderstood this question and initially thought it was C.
Comment 946522 by Atnafu
- Upvotes: 4
B To set up the new task to run a new notebook prior to the original task in a single-task Job, the data engineer can use the following approach:
In the existing Job, create a new task that corresponds to the new notebook that needs to be run.
Set up the new task with the appropriate configuration, specifying the notebook to be executed and any necessary parameters or dependencies.
Once the new task is created, designate it as a dependency of the original task in the Job configuration. This ensures that the new task is executed before the original task.
Comment 928369 by james_donquixote
- Upvotes: 2
Selected Answer: C C is the right answer.
Comment 896695 by prasioso
- Upvotes: 4
Selected Answer: C original task is dependent on the new task. So the new task must run before the original one. Hence C
Comment 892486 by austinoy
- Upvotes: 1
it is really confusing - according to oxford dictionary, dependency means “a dependent or subordinate thing”, so original task should be a dependency of new task. so C?
Comment 889135 by Majjjj
- Upvotes: 3
Selected Answer: C The data engineer can create a new task in the existing Job and then add the original task as a dependency of the new task. This will ensure that the new task runs before the original task, and any upstream data issues are resolved before the original task begins. Option B suggests creating a new task and adding it as a dependency of the original task, which would not address the issue of running the new notebook before the original task.
Comment 869842 by HoangHuy
- Upvotes: 2
Selected Answer: B Definitely option B, I tested!
Comment 867550 by TC007
- Upvotes: 2
Selected Answer: C The approach that the data engineer can use to set up the new task is option C: create a new task in the existing Job and then add the original task as a dependency of the new task.
By creating a new task in the existing Job and adding the original task as a dependency, the new task will run before the original task, as it is dependent on the completion of the new task. This ensures that the new notebook is run prior to the original task, as required.
Comment 865304 by sdas1
- Upvotes: 1
Option C
Comment 862209 by XiltroX
- Upvotes: 3
Selected Answer: C C is the right answer.
Comment 862011 by 4be8126
- Upvotes: 4
Selected Answer: B B. They can create a new task in the existing Job and then add it as a dependency of the original task.
Adding a new task as a dependency to an existing task in the same Job allows the new task to run before the original task is executed. This ensures that the data engineer can run the new notebook prior to the original task without having to create a new Job from scratch. Cloning the existing task or creating a new Job would add unnecessary complexity to the pipeline.
Question 6BSLaJ9rZfiTqHJnCqM9
Question
An engineering manager wants to monitor the performance of a recent project using a Databricks SQL query. For the first week following the project’s release, the manager wants the query results to be updated every minute. However, the manager is concerned that the compute resources used for the query will be left running and cost the organization a lot of money beyond the first week of the project’s release. Which of the following approaches can the engineering team use to ensure the query does not cost the organization any money beyond the first week of the project’s release?
Choices
- A: They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.
- B: They can set the query’s refresh schedule to end after a certain number of refreshes.
- C: They cannot ensure the query does not cost the organization money beyond the first week of the project’s release.
- D: They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.
- E: They can set the query’s refresh schedule to end on a certain date in the query scheduler.
answer?
Answer: E Answer_ET: E Community answer E (77%) C (20%) 2% Discussion
Comment 1133116 by Nika12
- Upvotes: 16
Selected Answer: E Just got 100% on the test. E was correct. C was not in the available options.
Comment 879855 by BigDaddyAus
- Upvotes: 10
The query scheduler only gives the option on what the interval is to run the query. It does not provide a way to stop after x iterations or at a point in time. The question is confusing. From what i found the only option is to limit users access to the query (and therefore query scheduler). https://docs.databricks.com/security/auth-authz/access-control/query-acl.html Not convinced how this would be helping the organization save money if no-one is manually stopping the schedule. Answer C seems most correct Answer D can be achieved using acl however how is this helpful in the use case described?
Comment 1339010 by Usaha1
- Upvotes: 1
Selected Answer: E Cron syntax can be used for scheduling
Comment 1337895 by rohitrc8521
- Upvotes: 1
Selected Answer: E Calm down folks, the answer is E!!
Comment 1312190 by UrcoIbz
- Upvotes: 1
Selected Answer: E Option E is correct. Although there in not an ‘direct’ option to select an end date, cron expressions allows run schedules on a specific time period (in this case a specific week).
Comment 1286237 by tmz1
- Upvotes: 2
Answer is E. There is an option to specify schedule with CRON syntax which enables to set schedule for a chosen week. For example, when you specify CRON: 0 0 0 23-29 SEP ? 2024, the query will be run At 12:00 AM, between day 23 and 29 of the month, only in September 2024.
Comment 1266883 by 7a22144
- Upvotes: 1
E is correct because the engineering team can use the query scheduler in Databricks to set a specific end date for the query refresh schedule. This way, after the first week, the automatic refreshes will stop, and the associated compute costs will be avoided.
Comment 1244559 by 3fbc31b
- Upvotes: 1
Selected Answer: E The correct answer is E for this question.
Comment 1216201 by aspix82
- Upvotes: 1
Answer is E
Comment 1160895 by data_arch
- Upvotes: 4
Selected Answer: E Answer is E It´s true natively the query can´t be scheduled to stop, but the scheduler allow us to use cron syntax. So we can define the year, month and days of the first week and the trigger won´t run after that
Comment 1118556 by Def21
- Upvotes: 2
Selected Answer: C The query scheduler does not give option to have end date (or iterations). Dashboards might give one, but the question specifically mentions queries. https://learn.microsoft.com/en-gb/azure/databricks/sql/user/queries/schedule-query
Comment 1109421 by Garyn
- Upvotes: 3
Selected Answer: E E. They can set the query’s refresh schedule to end on a certain date in the query scheduler.
Explanation:
Query Scheduler: Databricks offers a Query Scheduler that allows users to schedule the execution of SQL queries at specific intervals or for specific durations.
Setting a Specific End Date: The team can configure the query’s refresh schedule to conclude or end on a certain date. By specifying an end date within the first week of the project’s release, the query will automatically stop refreshing after that date. This action ensures that compute resources aren’t continuously utilized beyond the specified timeframe, preventing unnecessary costs.
This approach allows the team to control and limit the execution of the query to the required duration without incurring additional costs beyond the first week of the project’s release.
Comment 1086862 by mokrani
- Upvotes: 1
C is the correct answer
Source : https://docs.databricks.com/en/sql/user/queries/schedule-query.html
Comment 1057245 by god_father
- Upvotes: 2
Selected Answer: E E is the correct answer.
From the docs:
If a dashboard is configured for automatic updates, it has a Scheduled button at the top, rather than a Schedule button. To stop automatically updating the dashboard and remove its subscriptions:
Click Scheduled. In the Refresh every drop-down, select Never. Click Save. The Scheduled button label changes to Schedule. Source: https://learn.microsoft.com/en-us/azure/databricks/sql/user/dashboards/
Comment 1055874 by kishore1980
- Upvotes: 1
Selected Answer: E Option E is correct answer
Comment 1055869 by kishore1980
- Upvotes: 1
Selected Answer: B The picker scrolls and allows you to choose: An interval: 1-30 minutes, 1-12 hours, 1 or 30 days, 1 or 2 weeks
Since the schedule picker allows to choose interval to refresh query every 1 or 2 weeks. If we choose 1 week the schedule ends after a week. So the answer is B.
Comment 1000903 by damaldon
- Upvotes: 1
Correct Answer E.
Comment 993231 by [Removed]
- Upvotes: 1
Selected Answer: C agree with BigDaddyAus
Comment 981575 by Inhaler_boy
- Upvotes: 2
Selected Answer: C Answer is C. According to documentation it cant be scheduled up until a certain date. It has to be in intervals and then canceled manually. They don’t mention end date. Only start date and intervals. https://docs.databricks.com/en/workflows/jobs/schedule-jobs.html
Comment 946530 by Atnafu
- Upvotes: 1
E Option A: The query will still run, but it will be throttled if it exceeds the DBU limit. Option B:The query will still run, but it will only run a certain number of times before it stops. Option C: The engineering team can ensure Option D: The query will still run, but only the individuals who are authorized to manage the refresh schedule will be able to stop it. E-Answer Therefore, the correct answer is that the engineering team can set the query’s refresh schedule to end on a certain date in the query scheduler to ensure the query does not cost the organization any money beyond the first week of the project’s release.
Comment 941418 by LANDIS
- Upvotes: 2
Answer is E https://docs.databricks.com/sql/user/queries/schedule-query.html#schedule-a-query
Comment 912077 by chays
- Upvotes: 2
Selected Answer: C agree with BigDaddyAus
Comment 882494 by Tickxit
- Upvotes: 2
Selected Answer: C I agree with BigDaddyAus, I don’t see any option to end the query scheduler.
Comment 862015 by 4be8126
- Upvotes: 4
Selected Answer: E The correct answer is E. They can set the query’s refresh schedule to end on a certain date in the query scheduler.
Databricks SQL supports a query scheduler that enables users to schedule SQL queries to run at defined intervals. By default, scheduled queries run indefinitely. However, users can configure the scheduler to stop running queries at a specific time or after a specific number of runs. In this scenario, the engineering team can set the query’s refresh schedule to end on a certain date, ensuring that the query does not run beyond the first week of the project’s release and potentially cost the organization more money.
Question dgP24CtMews8UDvFtvEQ
Question
A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously. They ask the data engineering team for help. The data engineering team notices that each of the team’s queries uses the same SQL endpoint. Which of the following approaches can the data engineering team use to improve the latency of the team’s queries?
Choices
- A: They can increase the cluster size of the SQL endpoint.
- B: They can increase the maximum bound of the SQL endpoint’s scaling range.
- C: They can turn on the Auto Stop feature for the SQL endpoint.
- D: They can turn on the Serverless feature for the SQL endpoint.
- E: They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to “Reliability Optimized.”
answer?
Answer: B Answer_ET: B Community answer B (57%) A (36%) 6% Discussion
Comment 1000909 by damaldon
- Upvotes: 31
Answer is B. According to databricks documentation: -Sequentially → Increase cluster size -Concurrent ⇒ Scale out cluster
Comment 1064700 by mokrani
- Upvotes: 16
Answer B is correct For those who’s selected the same answer as the question 40 in the Databricks exam training, be careful becaue it’s quite different:
- Here the question is about simultaneously runs → Scale Out clusters (involves adding more clusters)
- In the Databricks exam training, the question is about “sequentially run queries” → Scale Up (increasing the size of the nodes)
Pleas refer to the this accepted answer https://community.databricks.com/t5/data-engineering/sequential-vs-concurrency-optimization-questions-from-query/td-p/36696
Comment 1340745 by andie123
- Upvotes: 1
Selected Answer: A When many users are running small queries simultaneously on a SQL warehouse (prior: endpoint), the database can become overloaded, causing slow query execution times. By increasing the cluster size of the SQL warehouse, the database can handle more simultaneous queries, resulting in faster query execution times. → A
Comment 1314215 by 806e7d2
- Upvotes: 1
Selected Answer: B The issue described is related to query latency when multiple users are running queries simultaneously, all using the same SQL endpoint. This often leads to contention for resources, causing delays in query processing. To address this, the maximum scaling range of the SQL endpoint can be increased, which allows the endpoint to dynamically scale and handle more concurrent queries by adding more resources (e.g., additional nodes) as needed.
In Databricks SQL, SQL endpoints can be scaled horizontally (adding more nodes) to better handle concurrency. By increasing the maximum scaling range, the endpoint will be able to scale more aggressively during periods of high load, improving query performance for concurrent users.
Comment 1287501 by MohdAltaf19
- Upvotes: 2
Correct Answers B Through put > Sequential > Scale Up Performance > Concurrent > Scale Out
Comment 1266887 by 7a22144
- Upvotes: 2
B is correct because increasing the maximum bound of the SQL endpoint’s scaling range allows the endpoint to handle a larger number of queries by automatically scaling up the resources (e.g., adding more clusters). This approach addresses the issue of slow queries due to high concurrent usage, as more resources will become available to handle the increased load from simultaneous queries.
Comment 1203804 by benni_ale
- Upvotes: 1
Selected Answer: B simultaneously probably means concurrently so scaling out the cluster is better
Comment 1187388 by sakis213
- Upvotes: 1
Selected Answer: B B is correct
Comment 1145253 by niharam2021
- Upvotes: 2
A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously4
Comment 1137349 by agAshish
- Upvotes: 3
Answer is A , Q40 — https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf
Comment 1133119 by Nika12
- Upvotes: 5
Selected Answer: B Just got 100% on the exam. B was correct. Also, here is the link to good explanation: https://docs.databricks.com/en/compute/cluster-config-best-practices.html
Comment 1122783 by Ody__
- Upvotes: 1
Selected Answer: A A is correct
Comment 1122137 by Ody__
- Upvotes: 2
Selected Answer: A correct answer is A Question 40: https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf
Comment 1117096 by SerGrey
- Upvotes: 2
Selected Answer: B B is correct
Comment 1094629 by nedlo
- Upvotes: 3
Selected Answer: B its B because its “simultanously by many users” so you have to scale it horizontally by increasing number of nodes : https://community.databricks.com/t5/data-engineering/sequential-vs-concurrency-optimization-questions-from-query/td-p/36696
Comment 1069328 by pc1337xd
- Upvotes: 5
Selected Answer: B Issues occur when too many users are running queries at the same time → Increase scaling so more clusters handle the queries
Comment 1057251 by god_father
- Upvotes: 2
Selected Answer: B Increasing cluster size is for vertical scalability of query execution, while scaling out cluster is for horizontal scalability of query execution
Comment 1008949 by saikot
- Upvotes: 2
The correct answer is B (we can check this under databricks sql WH tool tip option. It is clearly mentioend that scaling is used to improve query “LATANCY”)
Comment 998009 by vctrhugo
- Upvotes: 1
Selected Answer: A A. They can increase the cluster size of the SQL endpoint.
To improve the latency of the team’s queries when many members are running small queries simultaneously, you can increase the cluster size of the SQL endpoint. Increasing the cluster size allocates more compute resources to handle query execution, which can help reduce query execution times and improve overall performance, especially during periods of high query concurrency.
Option B refers to adjusting scaling settings, which can also be beneficial, but increasing the cluster size (Option A) directly allocates more resources, making it a more direct approach to improving query performance.
Options C, D, and E relate to different features and configurations (Auto Stop, Serverless, and Spot Instance Policy), but they may not directly address the issue of improving query latency during high concurrency, which is the primary concern in this scenario.
Comment 993232 by [Removed]
- Upvotes: 1
Selected Answer: A agree with @AndreFR
Comment 985481 by AndreFR
- Upvotes: 5
Selected Answer: A question 40 in the official databricks training exam : https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf
Comment 967498 by miraFlores
- Upvotes: 3
Selected Answer: B https://community.databricks.com/t5/data-engineering/when-to-increase-maximum-bound-vs-when-to-increase-cluster-size/m-p/27880
Comment 948438 by mehroosali
- Upvotes: 4
Selected Answer: A similar question on official practice questions (Q40), based on that answer its A.
Comment 946539 by Atnafu
- Upvotes: 2
E. Here are the reasons why: Serverless feature allows Databricks to automatically scale the cluster up and down based on the workload. This can help to improve the latency of queries, especially when many small queries are running simultaneously. Spot Instance Policy determines how Databricks uses Spot Instances for serverless SQL endpoints. The “Reliability Optimized” Spot Instance Policy is a good choice for SQL endpoints that require high availability.
Comment 904468 by NavalYemul
- Upvotes: 3
If the queries are running sequentially then scale up (increase the size of the cluster from 2x small to 4x large) If the queries are running concurrently or with many users then scale out (add more clusters. Increase the SQL endpoints scaling range)
Comment 889138 by Majjjj
- Upvotes: 4
Selected Answer: B Option B is the correct answer. The engineering team can set the query’s refresh schedule to end after a certain number of refreshes to ensure that it does not run and cost the organization any money beyond the first week of the project’s release. By setting a limit on the number of refreshes, the query will stop running automatically once the limit is reached. This approach allows the team to monitor the performance of the project for the first week with frequent updates, but also ensures that the query does not consume resources unnecessarily after that period. Options A, C, D, and E are incorrect as they do not provide a solution to the problem of controlling the query’s runtime cost.
Comment 887680 by pargit35
- Upvotes: 2
i think b
Comment 867553 by TC007
- Upvotes: 2
Selected Answer: A A: increase the cluster size of the SQL endpoint.
When many users are running small queries simultaneously on a SQL endpoint, the database can become overloaded, causing slow query execution times. By increasing the cluster size of the SQL endpoint, the database can handle more simultaneous queries, resulting in faster query execution times.
Comment 865312 by sdas1
- Upvotes: 1
Option B
Comment 862019 by 4be8126
- Upvotes: 3
Selected Answer: D D. They can turn on the Serverless feature for the SQL endpoint.
The issue with the always-on SQL endpoint is that it may not be optimized for handling many small queries simultaneously, which can lead to slow query performance. By turning on the Serverless feature for the SQL endpoint, the team can take advantage of a serverless compute model that scales automatically to meet the team’s query demands, providing them with more compute resources when they need it and only paying for what they use. This feature can help improve the latency of the team’s queries without increasing the cluster size or maximum bound of the SQL endpoint.
Comment 860621 by knivesz
- Upvotes: 1
LA respuesta es B, ya que corren simultaneamente, por tal motivo se debe incremental scale out
Comment 859164 by XiltroX
- Upvotes: 1
B is the wrong answer. The only way to solve this issue in the “always-on” SQL endpoint is to increase the cluster size. So the right choice is A.
Question uAAwwhxaKBTI8TAsD6ni
Question
Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?
Choices
- A: The ability to manipulate the same data using a variety of languages
- B: The ability to collaborate in real time on a single notebook
- C: The ability to set up alerts for query failures
- D: The ability to support batch and streaming workloads
- E: The ability to distribute complex data operations
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 997865 by vctrhugo
- Upvotes: 13
Selected Answer: D D. The ability to support batch and streaming workloads
Delta Lake is a key component of the Databricks Lakehouse Platform that provides several benefits, and one of the most significant benefits is its ability to support both batch and streaming workloads seamlessly. Delta Lake allows you to process and analyze data in real-time (streaming) as well as in batch, making it a versatile choice for various data processing needs.
While the other options may be benefits or capabilities of Databricks or the Lakehouse Platform in general, they are not specifically associated with Delta Lake.
Comment 1362817 by Basha1996
- Upvotes: 1
Selected Answer: D D. The ability to support batch and streaming workloads
Adding features such as ACID Properties are allowed which eliminate the drawbacks on DW and Data lake.
Comment 1339008 by Tedet
- Upvotes: 1
Selected Answer: D The ability to support batch and streaming workloads - Key feature of lakehouse
Comment 1274180 by afzalmp40
- Upvotes: 1
Selected Answer: D D is correct
Comment 1262385 by 80370eb
- Upvotes: 1
Selected Answer: D D. The ability to support batch and streaming workloads
Comment 1227523 by mascarenhaslucas
- Upvotes: 1
Selected Answer: D The answer is D!
Comment 1177167 by Itmma
- Upvotes: 2
Selected Answer: D D is correct
Comment 1028756 by VijayKula
- Upvotes: 1
Selected Answer: D Answer is D
Comment 1017339 by KalavathiP
- Upvotes: 1
Selected Answer: D Correct and D
Comment 941033 by nb1000
- Upvotes: 1
D is correct
Comment 863838 by Data_4ever
- Upvotes: 4
Selected Answer: D Delta Lake supports both Batch & Stream workloads
Comment 860277 by knivesz
- Upvotes: 4
Selected Answer: D Respuesta correcta es D
Comment 859602 by surrabhi_4
- Upvotes: 3
Selected Answer: D option D
Comment 857955 by XiltroX
- Upvotes: 3
D is the right answer https://learn.microsoft.com/en-us/azure/databricks/delta/
Question T5CAvKlX0QEteNkjSQKF
Question
A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary. Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?
Choices
- A: They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.
- B: They can set up the dashboard’s SQL endpoint to be serverless.
- C: They can turn on the Auto Stop feature for the SQL endpoint.
- D: They can reduce the cluster size of the SQL endpoint.
- E: They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 862022 by 4be8126
- Upvotes: 11
Selected Answer: C The data engineer can use the Auto Stop feature to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard. The Auto Stop feature allows the SQL endpoint to automatically shut down when there are no active connections, which will minimize the total running time of the SQL endpoint. By scheduling the dashboard to refresh once per day, the SQL endpoint will only be running for a short period of time each day, which will minimize the total running time and reduce costs.
Comment 1084835 by mokrani
- Upvotes: 5
Why it can’t be B ? . They can set up the dashboard’s SQL endpoint to be serverless. ? they can use a serverless endpoint and it will only be active when required.
Comment 1410442 by Khaled999
- Upvotes: 1
Selected Answer: C c IS CORRECT
Comment 1262774 by 80370eb
- Upvotes: 2
Selected Answer: C The Auto Stop feature ensures that the SQL endpoint will automatically shut down when not in use, which helps in reducing unnecessary running time and associated costs. The endpoint will only be running when it’s needed for refreshing the dashboard.
Comment 1117099 by SerGrey
- Upvotes: 1
Selected Answer: C C is correct
Comment 1065493 by awofalus
- Upvotes: 1
Selected Answer: C correct : C
Comment 1057426 by vikas555
- Upvotes: 1
C. They can turn on the Auto Stop feature for the SQL endpoint.
Comment 998011 by vctrhugo
- Upvotes: 2
Selected Answer: C C. They can turn on the Auto Stop feature for the SQL endpoint.
To minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard while ensuring that it only runs when necessary, the data engineer can turn on the Auto Stop feature for the SQL endpoint. This feature will automatically stop the SQL endpoint when it is idle for a specified period, reducing costs by avoiding unnecessary running time.
Option C allows you to efficiently manage the SQL endpoint’s lifecycle, ensuring it’s active only when needed, which aligns with the goal of minimizing running time and associated costs.
Option B (setting the dashboard’s SQL endpoint to be serverless) can also be a valid approach, as it allows the SQL endpoint to be provisioned on-demand and incurs costs only when queries are executed. However, it depends on the specific requirements of your dashboard and queries.
Options A, D, and E do not directly address the goal of minimizing the SQL endpoint’s running time while ensuring it runs when necessary.
Comment 992321 by ArindamNath
- Upvotes: 1
C is correct.
Comment 985700 by AndreFR
- Upvotes: 1
Selected Answer: C https://docs.databricks.com/en/clusters/clusters-manage.html#automatic-termination
Comment 946547 by Atnafu
- Upvotes: 2
C The Auto Stop feature automatically terminates the compute resources (cluster) associated with the SQL endpoint after a specified period of inactivity. By enabling this feature, the SQL endpoint will be automatically stopped when it is no longer needed, reducing the total running time and associated costs.
Comment 859167 by XiltroX
- Upvotes: 4
Selected Answer: C Correct answer