Questions and Answers
Question YF4WqeajaJpoNXeCM6g2
Question
A Databricks SQL dashboard has been configured to monitor the total number of records present in a collection of Delta Lake tables using the following query pattern:
SELECT COUNT (*) FROM table -
Which of the following describes how results are generated each time the dashboard is updated?
Choices
- A: The total count of rows is calculated by scanning all data files
- B: The total count of rows will be returned from cached results unless REFRESH is run
- C: The total count of records is calculated from the Delta transaction logs
- D: The total count of records is calculated from the parquet file metadata
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1332189 by AlejandroU
- Upvotes: 1
Answer is D. This is same question #63. Delta Lake stores its data in Parquet format. Parquet files include metadata (like row counts) that can be efficiently queried. For a COUNT(*) query, Delta Lake does not need to scan all the data files; instead, it reads the row count information stored in the Parquet file metadata, making the operation faster.
Comment 1251589 by vexor3
- Upvotes: 1
Selected Answer: C C is correct
Comment 1229872 by hpkr
- Upvotes: 1
Selected Answer: C C is correct
Comment 1224737 by BrianNguyen95
- Upvotes: 1
Selected Answer: C Delta Lake optimizes COUNT(*) queries by reading the row counts stored in the Delta transaction log. This eliminates the need for a full table scan, resulting in significantly faster query performance.
pen_spark
Comment 1222813 by Freyr
- Upvotes: 1
Selected Answer: C Correct Answer: C
Comment 1221429 by MDWPartners
- Upvotes: 2
Selected Answer: C I would’ve said C
Question G8gcPs6KJSuMozW7eCCo
Question
A Delta Lake table was created with the below query:
//IMG//
Consider the following query:
DROP TABLE prod.sales_by_store -
If this statement is executed by a workspace admin, which result will occur?
Choices
- A: Data will be marked as deleted but still recoverable with Time Travel.
- B: The table will be removed from the catalog but the data will remain in storage.
- C: The table will be removed from the catalog and the data will be deleted.
- D: An error will occur because Delta Lake prevents the deletion of production data.
answer?
Answer: C Answer_ET: C Community answer C (100%) Discussion
Comment 1270633 by robodog
- Upvotes: 1
Selected Answer: C no location keyword, so its a managed table
Comment 1230608 by Isio05
- Upvotes: 3
Selected Answer: C It’s managed table, so data will be also removed
Comment 1229873 by hpkr
- Upvotes: 2
Selected Answer: C C is correct
Comment 1228724 by hpkr
- Upvotes: 1
Selected Answer: C Option C
Comment 1224425 by imatheushenrique
- Upvotes: 1
C because its a managed table
Comment 1222826 by Freyr
- Upvotes: 1
Selected Answer: C Correct Answer: C No location provided in the table. So, it is a managed table. This will result in deleting the table meta data as well as table data.
Comment 1221430 by MDWPartners
- Upvotes: 1
Selected Answer: C Seems C, it’s a managed table
Question 9fSEQNbJIEMf5YTWX6lB
Question
A developer has successfully configured their credentials for Databricks Repos and cloned a remote Git repository. They do not have privileges to make changes to the main branch, which is the only branch currently visible in their workspace.
Which approach allows this user to share their code updates without the risk of overwriting the work of their teammates?
Choices
- A: Use Repos to create a new branch, commit all changes, and push changes to the remote Git repository.
- B: Use Repos to create a fork of the remote repository, commit all changes, and make a pull request on the source repository.
- C: Use Repos to pull changes from the remote Git repository; commit and push changes to a branch that appeared as changes were pulled.
- D: Use Repos to merge all differences and make a pull request back to the remote repository.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1288813 by RyanAck24
- Upvotes: 2
Selected Answer: A A seems correct
Comment 1236234 by Ati1362
- Upvotes: 3
answer B
Question ealYjXDtXnbxXgTn81BN
Question
The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database.
After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active user. They then modify their code to the following (leaving all other variables unchanged).
//IMG//
Which statement describes what will happen when the above code is executed?
Choices
- A: The connection to the external table will succeed; the string “REDACTED” will be printed.
- B: An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the encoded password will be saved to DBFS.
- C: An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the password will be printed in plain text.
- D: The connection to the external table will succeed; the string value of password will be printed in plain text.
answer?
Answer: A Answer_ET: A Community answer A (100%) Discussion
Comment 1255629 by Hadiler
- Upvotes: 3
Selected Answer: A A is the correct answer
Comment 1251590 by vexor3
- Upvotes: 3
Selected Answer: A A is correct
Comment 1224944 by Deb9753
- Upvotes: 1
Answer A : When using Databricks secrets, the actual value of the secret is typically protected from being displayed in plain text. Databricks automatically redacts secret values when they are printed in the notebook. So, when you use the print(password) statement, the output will not show the actual password but will instead show [REDACTED].
Comment 1224426 by imatheushenrique
- Upvotes: 1
A. A. The connection to the external table will succeed; the string “REDACTED” will be printed.
Question ag7Vcqvi3My9j2sKwPOy
Question
The data science team has created and logged a production model using MLflow. The model accepts a list of column names and returns a new column of type DOUBLE.
The following code correctly imports the production model, loads the customers table containing the customer_id key column into a DataFrame, and defines the feature columns needed for the model.
//IMG//
Which code block will output a DataFrame with the schema “customer_id LONG, predictions DOUBLE”?
Choices
- A: df.map(lambda x:model(x[columns])).select(“customer_id, predictions”)
- B: df.select(“customer_id”, model(*columns).alias(“predictions”))
- C: model.predict(df, columns)
- D: df.apply(model, columns).select(“customer_id, predictions”)
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1255630 by Hadiler
- Upvotes: 1
Selected Answer: B B is the correct answer
Comment 1251591 by vexor3
- Upvotes: 1
Selected Answer: B B is correct
Comment 1222809 by Freyr
- Upvotes: 3
Selected Answer: B Correct Answer: B This option uses select to specify columns from the DataFrame and applies the model to the specified columns (columns). The output of the model is aliased as “predictions”, which ensures the output DataFrame will have the column names “customer_id” and “predictions” with appropriate data types assuming the model returns a double type. This syntax aligns with PySpark’s DataFrame transformations and is a typical way to apply a machine learning model to specific columns in Databricks.