Questions and Answers
Question GpmOmrRHzmL8f1JMFaQQ
Question
A data engineer, User A, has promoted a pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens.
A workspace admin, User C, inherits responsibility for managing this pipeline. User C uses the Databricks Jobs UI to take “Owner” privileges of each job. Jobs continue to be triggered using the credentials and tooling configured by User B.
An application has been configured to collect and parse run information returned by the REST API. Which statement describes the value returned in the creator_user_name field?
Choices
- A: Once User C takes “Owner” privileges, their email address will appear in this field; prior to this, User A’s email address will appear in this field.
- B: User B’s email address will always appear in this field, as their credentials are always used to trigger the run.
- C: User A’s email address will always appear in this field, as they still own the underlying notebooks.
- D: Once User C takes “Owner” privileges, their email address will appear in this field; prior to this, User B’s email address will appear in this field.
- E: User C will only ever appear in this field if they manually trigger the job, otherwise it will indicate User B.
answer?
Answer: B Answer_ET: B Community answer B (37%) C (30%) A (26%) 7% Discussion
Comment 1558795 by kishanu
- Upvotes: 1
Selected Answer: C It should be User A. Go to any existing workflow which has a few runs and see the JSON. It will have the creator_user_name as the creator of the workflow. For the user who has triggered the job, the field name is “run_as_user_name”
Comment 1558794 by kishanu
- Upvotes: 1
Selected Answer: C It should be User A. Go to any existing workflow which has a few runs and see the JSON. It will have the creator_user_name as the creator of the workflow. For the user who has triggered the job, the field name is “run_as_user_name”
Comment 1335511 by arekm
- Upvotes: 1
Selected Answer: B B - the person that triggered the job. Since we are using User B personal access token, it is going to say User B’s email address.
Comment 1326592 by temple1305
- Upvotes: 2
Selected Answer: A User A was OWNER, B just run job, doesn’t became OWNER, then User C became OWNER - so case “A”
Comment 1322705 by Thameur01
- Upvotes: 2
Selected Answer: E creator_user_name reflects the user who triggered the job run, not the job owner or the notebook creator. For programmatically triggered runs, the field will reflect the user whose personal access token was used (User B in this case). If User C manually triggers a job, the field will show User C’s email address for that specific run.
Comment 1310027 by cf56faf
- Upvotes: 2
Selected Answer: B It’s B.
Comment 1308865 by smashit
- Upvotes: 1
ownership of job an be assigned to user not for a group. Option A is correct.
Comment 1303883 by Jugiboss
- Upvotes: 2
Selected Answer: A Seems correct.
Comment 1303649 by Ananth4Sap
- Upvotes: 2
Selected Answer: C The creator_user_name field in the run information returned by the REST API reflects the user who originally created the job. Since User A created the jobs using their personal access token, their email address will always appear in this field, regardless of who triggers the job runs or who takes ownership of the job later.
Comment 1288733 by pk07
- Upvotes: 2
Selected Answer: C C based on previous comment
Comment 1287015 by shaojunni
- Upvotes: 1
Selected Answer: C Job creator can’t be changed. Owner can be changed. So creator_user_name field always return who created the job: User A. But UserA no longer owns the job. So Answer C is partially correct.
Comment 1287013 by shaojunni
- Upvotes: 1
Selected Answer: C Job creator can’t be changed. Owner can be changed. So creator_user_name field always return owner: User A. But UserA no longer owns the job. So Answer C is partially correct.
Comment 1268069 by jlocke
- Upvotes: 3
Selected Answer: A When you create a job your role is IS OWNER and RUN AS. So when you trigger a job, it will run as the RUN AS entity. And it should be user A if someone dosen’t have changed it
Comment 1264142 by 946a1af
- Upvotes: 2
Selected Answer: B Answer B
Comment 1244663 by c00ccb7
- Upvotes: 2
Selected Answer: B Should be the DevOps email address
Comment 1244387 by c00ccb7
- Upvotes: 3
Selected Answer: B the creator_user_name field reflects the user who triggered the job run
Comment 1231438 by fcfb11c
- Upvotes: 2
Answer: E
Comment 1224897 by Deb9753
- Upvotes: 1
Answer: C
Question hOe3tum4fOyACZdtA2SY
Question
A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.
//IMG//
Which command should be removed from the notebook before scheduling it as a job?
Choices
- A: Cmd 2
- B: Cmd 3
- C: Cmd 4
- D: Cmd 5
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1335514 by arekm
- Upvotes: 1
Selected Answer: D D - Cmd 5 which is display(finalDF) is costly, produces a nicely formatted HTML table which is unnecessary during the production run. On top that there is a limit on how much output the notebook can generate. This causes me some problems in the past for legacy code.
Comment 1308866 by smashit
- Upvotes: 1
There is a limitation when you use display command inside a job. if the output exceeds 20MB it will throw an error. may be that was the reason.
Comment 1277913 by csrazdan
- Upvotes: 2
Selected Answer: D The display function is built specifically for the notebook environment and will not work for Spark. Should you want to print contacts of DF in Spark then replace it with df.show()
Question p9avsuaKbjCHlLgkLFte
Question
Which statement regarding Spark configuration on the Databricks platform is true?
Choices
- A: The Databricks REST API can be used to modify the Spark configuration properties for an interactive cluster without interrupting jobs currently running on the cluster.
- B: Spark configurations set within a notebook will affect all SparkSessions attached to the same interactive cluster.
- C: When the same Spark configuration property is set for an interactive cluster and a notebook attached to that cluster, the notebook setting will always be ignored.
- D: Spark configuration properties set for an interactive cluster with the Clusters UI will impact all notebooks attached to that cluster.
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1303555 by nedlo
- Upvotes: 1
Selected Answer: D this question repeats afaik
Comment 1281031 by Melik3
- Upvotes: 2
Selected Answer: D this looks correct
Comment 1251581 by vexor3
- Upvotes: 3
Selected Answer: D Should be D
Question a98naUArnjZOxNxythsl
Question
A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create. //IMG//
Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?
Choices
- A: Three new jobs named “Ingest new data” will be defined in the workspace, and they will each run once daily.
- B: The logic defined in the referenced notebook will be executed three times on new clusters with the configurations of the provided cluster ID.
- C: Three new jobs named “Ingest new data” will be defined in the workspace, but no jobs will be executed.
- D: One new job named “Ingest new data” will be defined in the workspace, but it will not be executed.
- E: The logic defined in the referenced notebook will be executed three times on the referenced existing all purpose cluster.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1334747 by arekm
- Upvotes: 4
Selected Answer: C You can totally create 3 jobs with the same name using the UI. REST API is no different. Since no schedule information is in the json, it will not run.
Comment 1322348 by AlejandroU
- Upvotes: 1
Selected Answer: C Answer C. 3 new jobs with the same name will be created with each API call, but they won’t be executed unless further configuration is made for scheduling or triggering.
Comment 1290686 by akashdesarda
- Upvotes: 1
Selected Answer: C Databricks will keep on creating new jobs if you keep running create rept api. Each will have the same name but a different ID. Also no trigger/schedule is mentioned so they wont run.
Comment 1224443 by imatheushenrique
- Upvotes: 1
C. Three new jobs named “Ingest new data” will be defined in the workspace, but no jobs will be executed.
Comment 1213860 by coercion
- Upvotes: 3
Selected Answer: C Learnt new thing : DBX can have duplicate job names (Job ID will be different). So three jobs will be created with three job ids but it will not run as no schedule is mentioned.
Comment 1167343 by franciscodm
- Upvotes: 1
C for sure
Comment 1130892 by spaceexplorer
- Upvotes: 1
Selected Answer: C Correct answer is C
Comment 1121659 by Jay_98_11
- Upvotes: 1
Selected Answer: C C is correct
Comment 1118561 by kz_data
- Upvotes: 1
Selected Answer: C Correct answer is C
Comment 1040267 by sturcu
- Upvotes: 3
Selected Answer: C databricks jobs create will create a new job with the same name each time it is run. In order to overwrite the extsting job you need to run databricks jobs reset
Comment 1016751 by bob_
- Upvotes: 3
Answer is correct. The 3 API calls create 3 jobs with the same name but different job ids. There is no schedule defined so will not execute.
Comment 1013199 by Eertyy
- Upvotes: 2
correct answer is A, because an api can create can create same job with same name if executed thrice
Comment 1009629 by mwyopme
- Upvotes: 2
therefore answer: D
Comment 1009958 by vsydoriak99
- Upvotes: 3
Because the create command was run 3 times. Databricks can have several jobs with the same name
Question UdmfC9UX1k3VQehKAkw8
Question
The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes.
Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?
Choices
- A: Configure a job that executes every time new data lands in a given directory
- B: Schedule a job to execute the pipeline once an hour on a new job cluster
- C: Schedule a Structured Streaming job with a trigger interval of 60 minutes
- D: Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1326398 by Sriramiyer92
- Upvotes: 1
Selected Answer: B Key words: updated every hour, pipeline runs in 10 minutes - Simple job cluster should do the job.
Comment 1297917 by Colje
- Upvotes: 1
Selected Answer: B The correct answer is B. Schedule a job to execute the pipeline once an hour on a new job cluster.
Explanation: In this scenario, the business reporting team needs the data to be updated every hour, and the processing time for the pipeline takes 10 minutes. To meet this requirement with the lowest cost, the best option is to schedule the job to run once per hour using a new job cluster.
A job cluster is created specifically for the duration of the job, and once the job finishes, the cluster is terminated. This is cost-efficient because resources are only consumed while the job is running, and the cluster does not stay active when it is not needed.