Questions and Answers

Question GpmOmrRHzmL8f1JMFaQQ

Question

A data engineer, User A, has promoted a pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens.

A workspace admin, User C, inherits responsibility for managing this pipeline. User C uses the Databricks Jobs UI to take “Owner” privileges of each job. Jobs continue to be triggered using the credentials and tooling configured by User B.

An application has been configured to collect and parse run information returned by the REST API. Which statement describes the value returned in the creator_user_name field?

Choices

A: Once User C takes “Owner” privileges, their email address will appear in this field; prior to this, User A’s email address will appear in this field.
B: User B’s email address will always appear in this field, as their credentials are always used to trigger the run.
C: User A’s email address will always appear in this field, as they still own the underlying notebooks.
D: Once User C takes “Owner” privileges, their email address will appear in this field; prior to this, User B’s email address will appear in this field.
E: User C will only ever appear in this field if they manually trigger the job, otherwise it will indicate User B.

answer?

Answer: B Answer_ET: B Community answer B (37%) C (30%) A (26%) 7% Discussion

Comment 1558795 by kishanu

Upvotes: 1

Selected Answer: C It should be User A. Go to any existing workflow which has a few runs and see the JSON. It will have the creator_user_name as the creator of the workflow. For the user who has triggered the job, the field name is “run_as_user_name”

Comment 1558794 by kishanu

Upvotes: 1

Selected Answer: C It should be User A. Go to any existing workflow which has a few runs and see the JSON. It will have the creator_user_name as the creator of the workflow. For the user who has triggered the job, the field name is “run_as_user_name”

Comment 1335511 by arekm

Upvotes: 1

Selected Answer: B B - the person that triggered the job. Since we are using User B personal access token, it is going to say User B’s email address.

Comment 1326592 by temple1305

Upvotes: 2

Selected Answer: A User A was OWNER, B just run job, doesn’t became OWNER, then User C became OWNER - so case “A”

Comment 1322705 by Thameur01

Upvotes: 2

Selected Answer: E creator_user_name reflects the user who triggered the job run, not the job owner or the notebook creator. For programmatically triggered runs, the field will reflect the user whose personal access token was used (User B in this case). If User C manually triggers a job, the field will show User C’s email address for that specific run.

Comment 1310027 by cf56faf

Upvotes: 2

Selected Answer: B It’s B.

Comment 1308865 by smashit

Upvotes: 1

ownership of job an be assigned to user not for a group. Option A is correct.

Comment 1303883 by Jugiboss

Upvotes: 2

Selected Answer: A Seems correct.

Comment 1303649 by Ananth4Sap

Upvotes: 2

Selected Answer: C The creator_user_name field in the run information returned by the REST API reflects the user who originally created the job. Since User A created the jobs using their personal access token, their email address will always appear in this field, regardless of who triggers the job runs or who takes ownership of the job later.

Comment 1288733 by pk07

Upvotes: 2

Selected Answer: C C based on previous comment

Comment 1287015 by shaojunni

Upvotes: 1

Selected Answer: C Job creator can’t be changed. Owner can be changed. So creator_user_name field always return who created the job: User A. But UserA no longer owns the job. So Answer C is partially correct.

Comment 1287013 by shaojunni

Upvotes: 1

Selected Answer: C Job creator can’t be changed. Owner can be changed. So creator_user_name field always return owner: User A. But UserA no longer owns the job. So Answer C is partially correct.

Comment 1268069 by jlocke

Upvotes: 3

Selected Answer: A When you create a job your role is IS OWNER and RUN AS. So when you trigger a job, it will run as the RUN AS entity. And it should be user A if someone dosen’t have changed it

Comment 1264142 by 946a1af

Upvotes: 2

Selected Answer: B Answer B

Comment 1244663 by c00ccb7

Upvotes: 2

Selected Answer: B Should be the DevOps email address

Comment 1244387 by c00ccb7

Upvotes: 3

Selected Answer: B the creator_user_name field reflects the user who triggered the job run

Comment 1231438 by fcfb11c

Upvotes: 2

Answer: E

Comment 1224897 by Deb9753

Upvotes: 1

Answer: C

Question hOe3tum4fOyACZdtA2SY

Question

A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.

//IMG//

Which command should be removed from the notebook before scheduling it as a job?

Choices

A: Cmd 2
B: Cmd 3
C: Cmd 4
D: Cmd 5

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1335514 by arekm

Upvotes: 1

Selected Answer: D D - Cmd 5 which is display(finalDF) is costly, produces a nicely formatted HTML table which is unnecessary during the production run. On top that there is a limit on how much output the notebook can generate. This causes me some problems in the past for legacy code.

Comment 1308866 by smashit

Upvotes: 1

There is a limitation when you use display command inside a job. if the output exceeds 20MB it will throw an error. may be that was the reason.

Comment 1277913 by csrazdan

Upvotes: 2

Selected Answer: D The display function is built specifically for the notebook environment and will not work for Spark. Should you want to print contacts of DF in Spark then replace it with df.show()

Question p9avsuaKbjCHlLgkLFte

Question

Which statement regarding Spark configuration on the Databricks platform is true?

Choices

A: The Databricks REST API can be used to modify the Spark configuration properties for an interactive cluster without interrupting jobs currently running on the cluster.
B: Spark configurations set within a notebook will affect all SparkSessions attached to the same interactive cluster.
C: When the same Spark configuration property is set for an interactive cluster and a notebook attached to that cluster, the notebook setting will always be ignored.
D: Spark configuration properties set for an interactive cluster with the Clusters UI will impact all notebooks attached to that cluster.

answer?

Answer: D Answer_ET: D Community answer D (100%) Discussion

Comment 1303555 by nedlo

Upvotes: 1

Selected Answer: D this question repeats afaik

Comment 1281031 by Melik3

Upvotes: 2

Selected Answer: D this looks correct

Comment 1251581 by vexor3

Upvotes: 3

Selected Answer: D Should be D

Question a98naUArnjZOxNxythsl

Question

A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create. //IMG//

Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?

Choices

A: Three new jobs named “Ingest new data” will be defined in the workspace, and they will each run once daily.
B: The logic defined in the referenced notebook will be executed three times on new clusters with the configurations of the provided cluster ID.
C: Three new jobs named “Ingest new data” will be defined in the workspace, but no jobs will be executed.
D: One new job named “Ingest new data” will be defined in the workspace, but it will not be executed.
E: The logic defined in the referenced notebook will be executed three times on the referenced existing all purpose cluster.

answer?

Answer: C Answer_ET: C Community answer C (100%) Discussion

Comment 1334747 by arekm

Upvotes: 4

Selected Answer: C You can totally create 3 jobs with the same name using the UI. REST API is no different. Since no schedule information is in the json, it will not run.

Comment 1322348 by AlejandroU

Upvotes: 1

Selected Answer: C Answer C. 3 new jobs with the same name will be created with each API call, but they won’t be executed unless further configuration is made for scheduling or triggering.

Comment 1290686 by akashdesarda

Upvotes: 1

Selected Answer: C Databricks will keep on creating new jobs if you keep running create rept api. Each will have the same name but a different ID. Also no trigger/schedule is mentioned so they wont run.

Comment 1224443 by imatheushenrique

Upvotes: 1

C. Three new jobs named “Ingest new data” will be defined in the workspace, but no jobs will be executed.

Comment 1213860 by coercion

Upvotes: 3

Selected Answer: C Learnt new thing : DBX can have duplicate job names (Job ID will be different). So three jobs will be created with three job ids but it will not run as no schedule is mentioned.

Comment 1167343 by franciscodm

Upvotes: 1

C for sure

Comment 1130892 by spaceexplorer

Upvotes: 1

Selected Answer: C Correct answer is C

Comment 1121659 by Jay_98_11

Upvotes: 1

Selected Answer: C C is correct

Comment 1118561 by kz_data

Upvotes: 1

Selected Answer: C Correct answer is C

Comment 1040267 by sturcu

Upvotes: 3

Selected Answer: C databricks jobs create will create a new job with the same name each time it is run. In order to overwrite the extsting job you need to run databricks jobs reset

Comment 1016751 by bob_

Upvotes: 3

Answer is correct. The 3 API calls create 3 jobs with the same name but different job ids. There is no schedule defined so will not execute.

Comment 1013199 by Eertyy

Upvotes: 2

correct answer is A, because an api can create can create same job with same name if executed thrice

Comment 1009629 by mwyopme

Upvotes: 2

therefore answer: D

Comment 1009958 by vsydoriak99

Upvotes: 3

Because the create command was run 3 times. Databricks can have several jobs with the same name

Question UdmfC9UX1k3VQehKAkw8

Question

The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes.

Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

Choices

A: Configure a job that executes every time new data lands in a given directory
B: Schedule a job to execute the pipeline once an hour on a new job cluster
C: Schedule a Structured Streaming job with a trigger interval of 60 minutes
D: Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster

answer?

Answer: B Answer_ET: B Community answer B (100%) Discussion

Comment 1326398 by Sriramiyer92

Upvotes: 1

Selected Answer: B Key words: updated every hour, pipeline runs in 10 minutes - Simple job cluster should do the job.

Comment 1297917 by Colje

Upvotes: 1

Selected Answer: B The correct answer is B. Schedule a job to execute the pipeline once an hour on a new job cluster.

Explanation: In this scenario, the business reporting team needs the data to be updated every hour, and the processing time for the pipeline takes 10 minutes. To meet this requirement with the lowest cost, the best option is to schedule the job to run once per hour using a new job cluster.

A job cluster is created specifically for the duration of the job, and once the job finishes, the cluster is terminated. This is cost-efficient because resources are only consumed while the job is running, and the cluster does not stay active when it is not needed.

vuthanhdatt's Second Brain

Explorer

5

Questions and Answers

Question GpmOmrRHzmL8f1JMFaQQ

Question

Choices

Comment 1558795 by kishanu

Comment 1558794 by kishanu

Comment 1335511 by arekm

Comment 1326592 by temple1305

Comment 1322705 by Thameur01

Comment 1310027 by cf56faf

Comment 1308865 by smashit

Comment 1303883 by Jugiboss

Comment 1303649 by Ananth4Sap

Comment 1288733 by pk07

Comment 1287015 by shaojunni

Comment 1287013 by shaojunni

Comment 1268069 by jlocke

Comment 1264142 by 946a1af

Comment 1244663 by c00ccb7

Comment 1244387 by c00ccb7

Comment 1231438 by fcfb11c

Comment 1224897 by Deb9753

Question hOe3tum4fOyACZdtA2SY

Question

Choices

Comment 1335514 by arekm

Comment 1308866 by smashit

Comment 1277913 by csrazdan

Question p9avsuaKbjCHlLgkLFte

Question

Choices

Comment 1303555 by nedlo

Comment 1281031 by Melik3

Comment 1251581 by vexor3

Question a98naUArnjZOxNxythsl

Question

Choices

Comment 1334747 by arekm

Comment 1322348 by AlejandroU

Comment 1290686 by akashdesarda

Comment 1224443 by imatheushenrique

Comment 1213860 by coercion

Comment 1167343 by franciscodm

Comment 1130892 by spaceexplorer

Comment 1121659 by Jay_98_11

Comment 1118561 by kz_data

Comment 1040267 by sturcu

Comment 1016751 by bob_

Comment 1013199 by Eertyy

Comment 1009629 by mwyopme

Comment 1009958 by vsydoriak99

Question UdmfC9UX1k3VQehKAkw8

Question

Choices

Comment 1326398 by Sriramiyer92

Comment 1297917 by Colje

Graph View

Table of Contents