Questions and Answers
Question K8DWLXBn7vvwMZSuEz8N
Question
A data architect has determined that a table of the following format is necessary:
//IMG//
Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?
Choices
- A:
- B:
- C:
- D:
- E:
answer?
Answer: E Answer_ET: E Community answer E (86%) 14% Discussion
Comment 1050141 by meow_akk
- Upvotes: 6
E is correct you dont need to specify Delta as its the default storage format for tables.
Comment 1132510 by Stemix
- Upvotes: 5
A and E have both correct syntax, but the question mentioned “regardless of whether a table already exists with this name”. Hence the correct answer is E
Comment 1301418 by RandomForest
- Upvotes: 1
Selected Answer: E E is correct because you dont need to specify Delta as its the default storage format for tables and the question states “regardless of whether the table already exists”. As a result A is wrong as it will NOT recreate the table if it exists.
Comment 1273390 by 9d4d68a
- Upvotes: 1
The closest correct option could be C but it shouldn’t have with column clause To create an empty Delta table with the specified schema in SQL, you can use the CREATE TABLE statement with the USING DELTA clause. The IF NOT EXISTS option ensures that the table is created only if it does not already exist. Here’s the SQL code block that accomplishes this:
CREATE TABLE IF NOT EXISTS your_table_name ( employeeld STRING, startDate DATE, avgRating DOUBLE ) USING DELTA;
Comment 1263475 by 80370eb
- Upvotes: 1
Selected Answer: A A is the correct answer. create table if not exists will check if the table already exists in database. if not it will create the new table.
Comment 1203829 by benni_ale
- Upvotes: 1
Selected Answer: E E is correct
Comment 1127384 by azure_bimonster
- Upvotes: 2
Selected Answer: E E is correct option
Comment 1117397 by bartfto
- Upvotes: 2
Selected Answer: E E. correct
Question K4c2qewORCPg6klLcuze
Question
A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.
Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?
Choices
- A: It is not possible to use SQL in a Python notebook
- B: They can attach the cell to a SQL endpoint rather than a Databricks cluster
- C: They can simply write SQL syntax in the cell
- D: They can add %sql to the first line of the cell
- E: They can change the default language of the notebook to SQL
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1263480 by 80370eb
- Upvotes: 1
Selected Answer: D we can use magic comment in notebook to indicate the cell to be run in specific language %SQL.
Comment 1127386 by azure_bimonster
- Upvotes: 1
Selected Answer: D Magic command % can be used to switch the language, so D is correct
Comment 1117398 by bartfto
- Upvotes: 1
Selected Answer: D D. Correct. Use %sql magic in first line.
Comment 1078561 by Lavpak
- Upvotes: 3
Selected Answer: D Use magic command %sql
Comment 1057563 by MFEST
- Upvotes: 3
correct answer D
Question OmuUP1e2bQReHd3lMgId
Question
Which of the following SQL keywords can be used to convert a table from a long format to a wide format?
Choices
- A: TRANSFORM
- B: PIVOT
- C: SUM
- D: CONVERT
- E: WHERE
answer?
Answer: B Answer_ET: B Community answer B (100%) Discussion
Comment 1263485 by 80370eb
- Upvotes: 1
Selected Answer: B The PIVOT operation is used to rotate data from rows to columns, which effectively converts a table from a long format (where each row represents a single observation or measurement) to a wide format (where each row represents a single entity with multiple observations or measurements as columns).
Comment 1203830 by benni_ale
- Upvotes: 1
Selected Answer: B B is correct
Comment 1127390 by azure_bimonster
- Upvotes: 1
Selected Answer: B Answer is B
Comment 1101148 by AndreFR
- Upvotes: 2
Selected Answer: B https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-pivot.html
“Pivot” transforms the rows of the table_reference by rotating unique values of a specified column list into separate columns.
SYNTAX :
table_reference PIVOT ( { aggregate_expression [ [ AS ] agg_column_alias ] } [, …] FOR column_list IN ( expression_list ) )
column_list { column_name | ( column_name [, …] ) }
expression_list { expression [ AS ] [ column_alias ] | { ( expression [, …] ) [ AS ] [ column_alias] } [, …] ) }
Comment 1052939 by athu07
- Upvotes: 2
Selected Answer: B PIVOT is correct.
Comment 1050139 by meow_akk
- Upvotes: 3
PIVOT is correct.
Question JJoRkcfWTPzHbGJYAyL7
Question
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
Choices
- A: Parquet files can be partitioned
- B: CREATE TABLE AS SELECT statements cannot be used on files
- C: Parquet files have a well-defined schema
- D: Parquet files have the ability to be optimized
- E: Parquet files will become Delta tables
answer?
Answer: C Answer_ET: C Community answer C (88%) 6% Discussion
Comment 1247403 by 1a44567
- Upvotes: 1
Vote for D Parquet files are a columnar storage file format that allows for efficient data compression and encoding schemes, enabling optimization and faster query performance compared to CSV files. This format supports efficient reading and writing of large datasets, making it a preferred choice for big data applications.
Comment 1219456 by MDWPartners
- Upvotes: 1
Selected Answer: C The keywords are “CREATE TABLE AS SELECT “
Comment 1203832 by benni_ale
- Upvotes: 1
Selected Answer: C C is correct
Comment 1132190 by UGOTCOOKIES
- Upvotes: 2
Selected Answer: C CREATE TABLE AS SELECT adopts the schema details from the source. Parquet files have a defined schema.
Comment 1117399 by bartfto
- Upvotes: 1
Selected Answer: C C. Paruqet has well defined schema unline csv
Comment 1109975 by Garyn
- Upvotes: 1
Selected Answer: C C. Parquet files have a well-defined schema.
Explanation:
Parquet files inherently store metadata about the schema within the files themselves, allowing for a well-defined schema. This schema information includes data types, column names, and other structural information. When creating an external table from Parquet, this schema is retained, providing a structured and well-defined format for the data. It ensures consistency and enables more efficient processing, query optimization, and compatibility across various systems or tools that work with the Parquet format. This structured schema within Parquet files offers advantages in terms of data integrity, ease of data processing, and compatibility, making it a beneficial choice over CSV, which lacks inherent schema information and might need additional handling or inference of schema during data ingestion.
Comment 1101159 by AndreFR
- Upvotes: 1
Selected Answer: B The key word here is : CREATE TABLE AS SELECT
not A : partitioning is not relevant in a create table as statement because the data will be created in a delta table not C : Parquet schema is not well defined and there can be parquet files with multiple schema in a folder not D : Parquet are already optimized and are not relevant in a create table as statement because the data will be created in a delta table not E : both CSV & Parquet will become delta tables in a create table as statement B : correct answer by elimination
Comment 1094895 by nedlo
- Upvotes: 1
Selected Answer: D I disagree i think its D. Schema can be inferred from CSV as well, but CSV cannot provide same optimizations as Parquet
Comment 1058405 by FastEddie
- Upvotes: 4
Selected Answer: C CTAS - CTAS automatically infer schema information from query results and do not support manual schema declaration.This means that CTAS statements are useful for external data ingestion from sources with well-defined schema, such as Parquet files and tables.CTAS statements also do not support specifying additional file options.
Comment 1057667 by kishore1980
- Upvotes: 2
Selected Answer: C C is the correct option
Comment 1053321 by anandpsg101
- Upvotes: 2
Selected Answer: C c is correct
Comment 1050150 by meow_akk
- Upvotes: 4
Columnar storage like Apache Parquet is designed to bring efficiency compared to row-based files like CSV. When querying, columnar storage you can skip over the non-relevant data very quickly. As a result, aggregation queries are less time-consuming compared to row-oriented databases.
Comment 1049562 by kbaba101
- Upvotes: 4
C. it supports well-defined schema, such as Parquet files and tables and do not support specifying additional file options such as Delimeter if you were to use CSV
Question uS5NDHw9HyR980QFix1C
Question
A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.
Which of the following relational objects should the data engineer create?
Choices
- A: Spark SQL Table
- B: View
- C: Database
- D: Temporary view
- E: Delta Table
answer?
Answer: D Answer_ET: D Community answer D (100%) Discussion
Comment 1050152 by meow_akk
- Upvotes: 8
D is correct. Temp view : session based Create temp view view_name as query All these are termed as session ended: Opening a new notebook Detaching and reattaching a cluster Installing a python package Restarting a cluster
Comment 1263495 by 80370eb
- Upvotes: 1
Selected Answer: D A temporary view is created in memory and does not persist beyond the session. It does not require physical storage, making it ideal for avoiding storage costs.
Comment 1109989 by Garyn
- Upvotes: 2
Selected Answer: D D. Temporary view
Explanation:
Temporary View: A temporary view in database systems like Apache Spark provides a temporary and ephemeral representation of data based on an SQL query’s result set. It exists for the duration of a Spark session and is not persisted to storage. Similar to a regular view, a temporary view allows the data engineer to define a logical schema by pulling and combining data from multiple tables using SQL queries, but it does not store any physical data on disk. Temporary views are suitable when there’s no need for long-term storage of the combined data and are helpful for immediate analysis or processing within the current session without incurring storage costs.
Comment 1101164 by AndreFR
- Upvotes: 1
Selected Answer: D https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-view.html
Should be a view or temporary view to avoid copying and storing data.
Does not need to be used by other data engineers in other sessions. So this view should be temporary. TEMPORARY views are visible only to the session that created them and are dropped when the session ends.
Comment 1071964 by Huroye
- Upvotes: 2
Answer is D. key phrase is “…does not need to be used by other data engineers in other sessions…”
Comment 1057676 by kishore1980
- Upvotes: 1
Selected Answer: D D is right option