Questions and Answers
Question 2yryeWO1TPq06DOa3DtO
Question
An external object storage container has been mounted to the location /mnt/finance_eda_bucket. The following logic was executed to create a database for the finance team: //IMG//
After the database was successfully created and permissions configured, a member of the finance team runs the following code: //IMG//
If all users on the finance team are members of the finance group, which statement describes how the tx_sales table will be created?
Choices
- A: A logical table will persist the query plan to the Hive Metastore in the Databricks control plane.
- B: An external table will be created in the storage container mounted to /mnt/finance_eda_bucket.
- C: A logical table will persist the physical plan to the Hive Metastore in the Databricks control plane.
- D: An managed table will be created in the storage container mounted to /mnt/finance_eda_bucket.
- E: A managed table will be created in the DBFS root storage container.
answer?
Answer: D Answer_ET: D Community answer D (68%) E (24%) 8% Discussion
Comment 988785 by tkg13
- Upvotes: 11
Correct Answer D
https://docs.databricks.com/en/data-governance/unity-catalog/create-schemas.html#language-SQL
Comment 1015370 by MarceloManhaes
- Upvotes: 7
Every unmanaged(external) table creation needs to put keyword LOCATION despite if database, that table resides, is put with LOCATION sententece. So B is incorrect. D is correct because the sentence to creates the table is a managed table.
https://docs.databricks.com/en/lakehouse/data-objects.html
Comment 1336937 by ASRCA
- Upvotes: 2
Selected Answer: B This is because the storage container is mounted to /mnt/finance_eda_bucket, and the code executed by the finance team member would create an external table in that location.
Comment 1326466 by AlejandroU
- Upvotes: 1
Selected Answer: D Answer D. The use of the LOCATION clause with a DBFS path (/mnt/finance_eda_bucket) suggests that Hive Metastore and DBFS location are being used. The answer is correct in the context of Hive Metastore and DBFS location, but if Unity Catalog (UC) is in use, the result would be an external table, not a managed one.
Comment 1299156 by benni_ale
- Upvotes: 1
Selected Answer: D No USE DATABASE statement otherwise it would have been external
Comment 1214016 by coercion
- Upvotes: 2
Selected Answer: D D as the word LOCATION is not specified. Although the data will be stored in an external location but the table will still be a managed table.
Comment 1159465 by Curious76
- Upvotes: 2
Selected Answer: E E is correct coz this table is managed on top of the external source file. amanaged tables are stored on DBFS.
Comment 1158824 by hal2401me
- Upvotes: 5
Selected Answer: D Correct answer D. just did a test. As with DBR12.2, UC databases are not supported with location on dbfs, but s3/abfss. However, Hive_metastore databases are supported with location on dbfs. Then, a table created in this database IS a managed table, as verified with describe extend command.
Comment 1133290 by s_villahermosa91
- Upvotes: 2
Selected Answer: E Correct Answer E
Comment 1119874 by kz_data
- Upvotes: 1
Selected Answer: D D is the correct answer, the table created is a managed table and not external, and it will be located under the location defined in the database’s creation DDL.
Comment 1091773 by azurelearn2020
- Upvotes: 2
Selected Answer: D It will be a managed table created under specified database. Location keyword used for database will make sure all the managed tables are stored in database location.
Comment 1080143 by Enduresoul
- Upvotes: 4
Selected Answer: D D is correct. The table will be created as managed, because no LOCATION is specified on table creation. The table will be created in the location specified with database creation
Comment 1066023 by Dileepvikram
- Upvotes: 1
I think the answer id D
Comment 1063207 by PearApple
- Upvotes: 2
I followed the steps to create schema and table, the answer is D
Comment 1058018 by jerborder
- Upvotes: 1
Correct answer is D. “data for a managed table resides in the location of the database it is registered to
Comment 1044856 by sturcu
- Upvotes: 2
Selected Answer: E A managed table will be created on DBFS.
Question hRVAEwlGbuxFzN2LyDSz
Question
Although the Databricks Utilities Secrets module provides tools to store sensitive credentials and avoid accidentally displaying them in plain text users should still be careful with which credentials are stored here and which users have access to using these secrets. Which statement describes a limitation of Databricks Secrets?
Choices
- A: Because the SHA256 hash is used to obfuscate stored secrets, reversing this hash will display the value in plain text.
- B: Account administrators can see all secrets in plain text by logging on to the Databricks Accounts console.
- C: Secrets are stored in an administrators-only table within the Hive Metastore; database administrators have permission to query this table by default.
- D: Iterating through a stored secret and printing each character will display secret contents in plain text.
- E: The Databricks REST API can be used to list secrets in plain text if the personal access token has proper credentials.
answer?
Answer: D Answer_ET: D Community answer D (67%) E (30%) 4% Discussion
Comment 1364182 by Tedet
- Upvotes: 1
Selected Answer: D Secret redaction Storing credentials as Databricks secrets makes it easy to protect your credentials when you run notebooks and jobs. However, it is easy to accidentally print a secret to standard output buffers or display the value during variable assignment.
Comment 1326751 by AlejandroU
- Upvotes: 1
Selected Answer: D Answer D. dbutils.secrets.get(scope=“myScope”, key=“myKey”) retrieves the plain text value of a secret, which is then available for use in code. Limitation: Once the secret is retrieved, if improperly handled (e.g., logged or iterated), its plain text value can be exposed. Option E: The REST API can list secrets in plain text if proper credentials (e.g., a personal access token) are provided. This is unrelated to dbutils.secrets.get but is a valid limitation of the overall secrets management framework in Databricks. Note that the difference between Option D or E is if it is a limitation related to Databricks Utilities Secret (dbutils.secrets), in this case option D is the correct option.
Comment 1325696 by Sriramiyer92
- Upvotes: 1
Selected Answer: D Cannot be option E as it justs lists the Secret value. It does not print the content therein
Comment 1270074 by fe3b2fc
- Upvotes: 4
Selected Answer: D value = dbutils.secrets.get(scope=“myScope”, key=“myKey”)
for char in value: print(char, end=” ”)
Out: y o u r _ v a l u e
Comment 1214025 by coercion
- Upvotes: 2
Selected Answer: E Only through REST API or CLI you can fetch the secret if you have valid token
Comment 1194051 by Er5
- Upvotes: 2
E: https://docs.databricks.com/api/azure/workspace/secrets/listsecrets GET /api/2.0/secrets/list won’t list secrets in plain text. D: if print it without iterating it in a for loop the output is kind of encrypted where it is showing [REDACTED]. But, if I do it as shown in the screenshot, I’m able to see the value of the secret key. https://community.databricks.com/t5/data-engineering/how-to-avoid-databricks-secret-scope-from-exposing-the-value-of/td-p/12254 https://docs.databricks.com/en/security/secrets/redaction.html Secret redaction for notebook cell output applies only to literals. The secret redaction functionality does not prevent deliberate and arbitrary transformations of a secret literal.
Comment 1158087 by Lucario95
- Upvotes: 2
Selected Answer: E Both D and E seems correct. They are poorly written thought because for D just printing the characters (not separated by spaces, newlines or something) would not work, while E if launched inside databricks workspace would not work neither.
Comment 1145328 by PrashantTiwari
- Upvotes: 2
D is correct
Comment 1143095 by guillesd
- Upvotes: 2
Selected Answer: D D is for sure correct (tried it several times on a Databricks environment).
Comment 1138286 by DAN_H
- Upvotes: 3
Selected Answer: D D is correct
Comment 1132816 by spaceexplorer
- Upvotes: 2
Selected Answer: D D is correct
Comment 1130380 by Def21
- Upvotes: 1
Selected Answer: E At least E is a correct answer.
B: You can’t see secrets in Admin console. Only via REST API, CLI etc. C: Secrets are. not stored in Hive Metastore. D: I am not sure if iterating through secret character by character would work? E: This is at least correct. Using this.
Comment 1122534 by ranith
- Upvotes: 1
B and E both seems to be correct:
Comment 1121984 by Jay_98_11
- Upvotes: 2
Selected Answer: D For sure it’s D
Comment 1108959 by hkay
- Upvotes: 3
Answer is E: /api/2.0/secrets/get { “key”: “string”, “value”: “string” } The REST API can potentially expose secrets in plain text if a user with appropriate permissions (including access to both secrets/list and secrets/get) uses a personal access token.
Comment 1108695 by Patito
- Upvotes: 2
Selected Answer: D Iterating through the secrets provides a way to see the secret’s password.
Comment 1080146 by Enduresoul
- Upvotes: 1
D is correct, see https://community.databricks.com/t5/data-engineering/how-to-avoid-databricks-secret-scope-from-exposing-the-value-of/td-p/12254/page/2
Comment 1076403 by aragorn_brego
- Upvotes: 3
Selected Answer: E While Databricks Secrets are designed to secure sensitive information such as passwords and tokens, one limitation is that if a user’s personal access token is compromised, and that token has the necessary permissions, the REST API could potentially be used to retrieve secrets. This means that the security of secrets is also dependent on the security of personal access tokens and the permissions assigned to them.
Comment 1062530 by AzureDE2522
- Upvotes: 2
E is the correct answer because it describes a limitation of Databricks Secrets. Databricks Secrets is a module that provides tools to store sensitive credentials and avoid accidentally displaying them in plain text. Databricks Secrets allows creating secret scopes, which are collections of secrets that can be accessed by users or groups. Databricks Secrets also allows creating and managing secrets using the Databricks CLI or the Databricks REST API. However, a limitation of Databricks Secrets is that the Databricks REST API can be used to list secrets in plain text if the personal access token has proper credentials. Therefore, users should still be careful with which credentials are stored in Databricks Secrets and which users have access to using these secrets.
Comment 1060637 by Hannah_13
- Upvotes: 2
Answer is D based on Udemy practice test
Comment 1050505 by Crocjun
- Upvotes: 2
could be E reference: https://docs.databricks.com/api/workspace/secrets
Comment 1044858 by sturcu
- Upvotes: 1
Selected Answer: B B is the correct answer
Comment 1025961 by TheGhost21
- Upvotes: 1
B is the correct answer
Question RZIZqr70zVe4WkFKJNUy
Question
What statement is true regarding the retention of job run history?
Choices
- A: It is retained until you export or delete job run logs
- B: It is retained for 30 days, during which time you can deliver job run logs to DBFS or S3
- C: It is retained for 60 days, during which you can export notebook run results to HTML
- D: It is retained for 60 days, after which logs are archived
- E: It is retained for 90 days or until the run-id is re-used through custom run configuration
answer?
Answer: C Answer_ET: C Community answer C (94%) 6% Discussion
Comment 975873 by stuart_gta1
- Upvotes: 9
B is wrong, Should be C.
Comment 1105665 by Yogi05
- Upvotes: 7
C is correct answer. https://docs.databricks.com/en/workflows/jobs/monitor-job-runs.html
Comment 1364272 by Tedet
- Upvotes: 1
Selected Answer: C To export notebook run results for a job with a single task:
On the job detail page, click the View Details link for the run in the Run column of the Completed Runs (past 60 days) table. Click Export to HTML. To export notebook run results for a job with multiple tasks:
On the job detail page, click the View Details link for the run in the Run column of the Completed Runs (past 60 days) table. Click the notebook task to export. Click Export to HTML.
Comment 1364271 by Tedet
- Upvotes: 1
Selected Answer: C Databricks maintains a history of your job runs for up to 60 days. If you need to preserve job runs, Databricks recommends exporting results before they expire.
Comment 1329781 by janeZ
- Upvotes: 1
Selected Answer: C https://learn.microsoft.com/en-us/azure/databricks/jobs/monitor
Comment 1158847 by hal2401me
- Upvotes: 2
Selected Answer: C https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/monitor-job-runs Azure Databricks maintains a history of your job runs for up to 60 days. If you need to preserve job runs, Databricks recommends exporting results before they expire. For more information, see Export job run results.
Comment 1114980 by ATLTennis
- Upvotes: 3
Selected Answer: C C is the correct answer
Comment 1108697 by Patito
- Upvotes: 2
Selected Answer: C c is correct
Comment 1101870 by SwastikaM
- Upvotes: 2
Option C is correct
Comment 1101341 by f728f7f
- Upvotes: 1
Selected Answer: D A secret CAN be printer character-by-character, so it’s not really that secure.
Comment 1091950 by rok21
- Upvotes: 3
Selected Answer: C C is correct
Comment 1091776 by azurelearn2020
- Upvotes: 1
Selected Answer: C Correct Answer is C
Comment 1044859 by sturcu
- Upvotes: 1
Selected Answer: C C is correct: retention is 60 days and export to html
Comment 973900 by 8605246
- Upvotes: 3
this is incorrect databricks maintains a history of job runs for 60 dayshttps://docs.databricks.com/en/workflows/jobs/monitor-job-runs.html#:~:text=Databricks%20maintains%20a%20history%20of,see%20Export%20job%20run%20results.
Question E52ePZemIT2GpDhe22GF
Question
A data engineer, User A, has promoted a new pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens. Which statement describes the contents of the workspace audit logs concerning these events?
Choices
- A: Because the REST API was used for job creation and triggering runs, a Service Principal will be automatically used to identify these events.
- B: Because User B last configured the jobs, their identity will be associated with both the job creation events and the job run events.
- C: Because these events are managed separately, User A will have their identity associated with the job creation events and User B will have their identity associated with the job run events.
- D: Because the REST API was used for job creation and triggering runs, user identity will not be captured in the audit logs.
- E: Because User A created the jobs, their identity will be associated with both the job creation events and the job run events.
answer?
Answer: C Answer_ET: C Community answer C (55%) E (45%) Discussion
Comment 1158866 by hal2401me
- Upvotes: 9
Selected Answer: E https://docs.databricks.com/api/azure/workspace/jobs/create API/jobs/create:run_as object Write-only setting, available only in Create/Update/Reset and Submit calls. Specifies the user or service principal that the job runs as. If not specified, the job runs as the user who created the job. In the question, it’s not stated that user A creates a service principal. So runas can only be himself.
Comment 1410177 by capt2101akash
- Upvotes: 1
Selected Answer: C Both uses their own credential for specific tasks
Comment 1366198 by Nate_
- Upvotes: 1
Selected Answer: C User A created the jobs via the REST API using their personal access token, so the workspace audit logs will record these job creation events with User A’s identity. Conversely, when User B triggers job runs through the REST API (again, using their own personal access token) via an external orchestration tool, those events will be logged with User B’s identity.
Comment 1335295 by arekm
- Upvotes: 2
Selected Answer: C Answer C - the run_as property is not said to be configured, so the job will run with the permissions of the creator - user A. However, still user B will be the one that triggered the run, which is what the question is about.
Comment 1330908 by LuminaBerry
- Upvotes: 2
Selected Answer: C C should be the correct answer. Although User A has it’s user associated to the creation, and by default the run as user if omitted on the creation of the job, the question specifies the Audit Logs (Run Event Logs) associated to the run. I’ve tried it this out on a job and for a job which was created and has a run as user different from mine, if I go to the run event logs, there are logs which stated that my user trigged a “started” event type
Comment 1329782 by janeZ
- Upvotes: 1
Selected Answer: C based on the standard understanding of how personal access tokens typically work, each user’s actions should be logged separately with their respective identities. Therefore, “C” would be the standard answer unless there is a specific behavior or configuration in Databricks that causes the job run events to be attributed back to User A.
Comment 1326800 by AlejandroU
- Upvotes: 1
Selected Answer: C Answer C. The audit logs distinguish between actions like job creation and job execution, so User A and User B will be identified separately for these actions.
Comment 1320887 by benni_ale
- Upvotes: 1
Selected Answer: E I tried myself and E seems correct
Comment 1318281 by JB90
- Upvotes: 1
Selected Answer: C When you use the API to commit the jobs the creation is logged using the PAT info, the same happens when you start a run using a different PAT.
Comment 1316225 by benni_ale
- Upvotes: 1
Selected Answer: E Specifies the user, service principal or group that the job/pipeline runs as. If not specified, the job/pipeline runs as the user who created the job/pipeline. Either user_name or service_principal_name should be specified. If not, an error is thrown.
Comment 1309134 by rsmf
- Upvotes: 1
Selected Answer: C C is the right answer
Comment 1303026 by Carkeys
- Upvotes: 1
Selected Answer: C In Databricks, audit logs capture the identity of the user associated with each distinct event, whether it’s creating or running a job. Since User A used their personal access token to create the jobs and User B used theirs to trigger job runs, the audit logs will reflect User A’s identity for job creation events and User B’s identity for job run events.
Comment 1264693 by quaternion
- Upvotes: 2
Selected Answer: E By default, jobs run as the identity of the job owner. This means that the job assumes the permissions of the job owner. You can change the identity that the job is running as to a service principal. Then, the job assumes the permissions of that service principal instead of the owner. https://docs.databricks.com/en/jobs/create-run-jobs.html#run-a-job-as-a-service-principal
Comment 1146382 by spudteo
- Upvotes: 1
Selected Answer: E When you create a job your role is IS OWNER and RUN AS. So when you trigger a job, it will run as the RUN AS entity. And it should be user A if someone dosen’t have changed it
Comment 1131100 by spaceexplorer
- Upvotes: 3
Selected Answer: C C is correct
Comment 1091952 by rok21
- Upvotes: 3
Selected Answer: C C is correct
Question vAWZAo7E2GgI2wVQvHsJ
Question
A user new to Databricks is trying to troubleshoot long execution times for some pipeline logic they are working on. Presently, the user is executing code cell-by-cell, using display() calls to confirm code is producing the logically correct results as new transformations are added to an operation. To get a measure of average time to execute, the user is running each cell multiple times interactively. Which of the following adjustments will get a more accurate measure of how code is likely to perform in production?
Choices
- A: Scala is the only language that can be accurately tested using interactive notebooks; because the best performance is achieved by using Scala code compiled to JARs, all PySpark and Spark SQL logic should be refactored.
- B: The only way to meaningfully troubleshoot code execution times in development notebooks Is to use production-sized data and production-sized clusters with Run All execution.
- C: Production code development should only be done using an IDE; executing code against a local build of open source Spark and Delta Lake will provide the most accurate benchmarks for how code will perform in production.
- D: Calling display() forces a job to trigger, while many transformations will only add to the logical query plan; because of caching, repeated execution of the same logic does not provide meaningful results.
- E: The Jobs UI should be leveraged to occasionally run the notebook as a job and track execution time during incremental code development because Photon can only be enabled on clusters launched for scheduled jobs.
answer?
Answer: B Answer_ET: B Community answer B (62%) D (38%) Discussion
Comment 1143126 by guillesd
- Upvotes: 7
Selected Answer: B Both B and D are correct statements. However, D is not an adjustment (see the question), it is just an afirmation which happens to be correct. B, however, is an adjustment, and it will definitely help with profiling.
Comment 1364290 by Tedet
- Upvotes: 1
Selected Answer: D Explanation: Using display() in Databricks forces a job to trigger and display the output, which can lead to an inaccurate measure of performance when benchmarking code. This is because display() triggers the job and materializes the result, which does not accurately reflect how the code will perform in production when the job is run without the display output.
Additionally, repeated execution of the same logic (with caching) may not give you meaningful performance results since the results are cached in memory and not representative of fresh computations, as they would occur in a production environment.
To get a more accurate measure of execution time, the user should focus on using appropriate job execution techniques, such as running the notebook with “Run All” and avoiding reliance on display() calls, which are not representative of how the pipeline would behave in production.
Comment 1335300 by arekm
- Upvotes: 1
Selected Answer: B Answer B, see discussion under benni_ale.
Comment 1326813 by AlejandroU
- Upvotes: 3
Selected Answer: D Answer D. While Option D doesn’t directly provide an alternative adjustment, it points out a critical issue in the way interactive notebooks might give misleading results. It would be advisable to avoid using display() as a benchmark for performance in production-like environments.
Comment 1323815 by carlosmps
- Upvotes: 1
Selected Answer: B Without much thought, I would vote for option B, but since it says ‘the ONLY,’ it makes me hesitate. While option D only points out the issues with the data engineer’s executions, it doesn’t really provide the adjustments that need to be made. On the other hand, option B at least gives you a way to simulate production behavior. I’ll vote for B, but as I said, the word ‘only’ makes me doubt, because it’s not the only way.
Comment 1316229 by benni_ale
- Upvotes: 1
Selected Answer: D Answer: D.
Explanation:
Lazy Evaluation: Spark employs lazy evaluation, meaning transformations are not executed until an action (e.g., display(), count(), collect()) is called. Using display() triggers the execution of the transformations up to that point.
Caching Effects: Repeatedly executing the same cell can lead to caching, where Spark stores intermediate results. This caching can cause subsequent executions to be faster, not reflecting the true performance of the code.
Why not B: Production-Sized Data and Clusters: While using production-sized data and clusters (as mentioned in option B) can provide insights into performance, it’s not the only way to troubleshoot execution times. Proper testing can often be conducted on smaller datasets and clusters, especially during the development phase.
Comment 1267418 by practicioner
- Upvotes: 2
Selected Answer: B B and D are correct. The question says “which statements” which suggests us that this is a question with multiple choices
Comment 1255528 by HelixAbdu
- Upvotes: 4
Both D and B are correct. But in real life some times clients dose not accept to gave you there production data to test easily. Also it says in B it is “the only way” ans this is not true for me
So i will go with D
Comment 1172816 by ffsdfdsfdsfdsfdsf
- Upvotes: 4
Selected Answer: B These people voting D have no reading comprehension.
Comment 1172596 by alexvno
- Upvotes: 2
Selected Answer: B Close env size volumes as possible so results make sense
Comment 1167519 by halleysg
- Upvotes: 3
Selected Answer: D D is correct
Comment 1160418 by Curious76
- Upvotes: 1
Selected Answer: D I will go with D
Comment 1155755 by agreddy
- Upvotes: 4
D is the correct answer
A. Scala is the only language accurately tested using notebooks: Not true. Spark SQL and PySpark can be accurately tested in notebooks, and production performance doesn’t solely depend on language choice. B. Production-sized data and clusters are necessary: While ideal, it’s not always feasible for development. Smaller datasets and clusters can provide indicative insights. C. IDE and local Spark/Delta Lake: Local environments won’t replicate production’s scale and configuration fully. E. Jobs UI and Photon: True that Photon benefits scheduled jobs, but Jobs UI can track execution times regardless of Photon usage. However, Jobs UI runs might involve additional overhead compared to notebook cells. Option D addresses the specific limitations of using display() for performance measurement
Comment 1138931 by DAN_H
- Upvotes: 3
Selected Answer: D As B not talking about how to deal with display() function. We know that way to testing performance for the whole notebook need to avoid using display as it is way to test the code and display the data
Comment 1136357 by zzzzx
- Upvotes: 1
B is correct
Comment 1132817 by spaceexplorer
- Upvotes: 1
Selected Answer: D D is correct
Comment 1110832 by divingbell17
- Upvotes: 2
Selected Answer: B Calling display() forces a job to trigger - doesnt make sense display is used to display a df/table in tabular format, has nothing to do with a job trigger
Comment 1101332 by ervinshang
- Upvotes: 1
D is correct
Comment 1091956 by rok21
- Upvotes: 1
Selected Answer: B B is correct
Comment 1044866 by sturcu
- Upvotes: 1
Selected Answer: B Yes D is a True statement. But it does not answer the question. The ask is for “which adjustments will get a more accurate measure of how code is likely to perform in production”. Answer D just describes why the chosen approach is not correct. It does not provide a solution.
Comment 988786 by tkg13
- Upvotes: 1
Is it not B?