Postgres Query Optimizer

Set up sample data

Install postgres

sudo apt install -y postgresql-common  
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt -y install postgresql

Check postgres is up sudo systemctl status postgresql

sudo -i -u postgres
psql

Change default user password: \password postgres

Login with postgres directly from psql will face peer authentication error, we need to change auth method from peer to md5 in sudo nano /etc/postgresql/<version>/main/pg_hba.conf change local all postgres peer to local all postgres md5 Then restart sudo systemctl restart postgresql

Enable remote access

sudo nano /etc/postgresql/<version>/main/postgresql.conf
Change 
#listen_addresses = 'localhost'
to
listen_addresses = '*'

add host all all 0.0.0.0/0 md5 to pg_hba.conf

Get backup data here download .backup folder wget "https://drive.usercontent.google.com/download?id=1nNXcTjhJK6EzJdzaZvdoJDrYj_V61rq8&export=download&authuser=0&confirm=t&uuid=1e50409c-949d-4791-b598-69cf20eb5df2&at=ALoNOgm0E9MrgEry4Sd72cZ-q4_4:1747273891823" -O /tmp/backup.dump then restore with pg_restore pg_restore --dbname=postgres --username=postgres --no-owner --verbose /tmp/backup.dump

(page=26&selection=12,42,38,12&color=yellow) Why is it different in SQL, and why is it that two queries that yield the same result may drastically differ in execution time? The underlying source of the problem is that SQL is a declarative language. That means that when we write a SQL statement, we describe the result we want to get, but we do not specify how that result should be obtained. By contrast, in an imperative language, we specify what to do to obtain a desired result—that is, the sequence of steps that should be executed.

(page=36&selection=26,0,36,35&color=yellow) For former Oracle users, the most important distinction they should be aware of is that PostgreSQL does not cache queries; it only caches the data in the shared buffers. When Oracle would produce a generic execution plan, which will be available for multiple connections, PostgreSQL will generate a new execution plan each time the query is executed. Even though all the necessary data is already in the main memory, the query itself must be executed.

(page=38&selection=39,0,53,48&color=yellow) In order to produce query results, PostgreSQL performs the following steps: • Compile and transform a SQL statement into an expression consisting of high-level logical operations, known as a logical plan. • Optimize the logical plan and convert it into an execution plan. • Execute (interpret) the plan and return results.

(page=39&selection=12,7,16,85&color=yellow) the output of an imperative language compiler is usually (nearly) executable code, such as byte code for a Java virtual machine. In contrast, the output of a query compiler is an expression consisting of high-level operations that remain declarative

(page=39&selection=26,26,32,9&color=yellow) The optimizer performs two kinds of transformations: it replaces logical operations with their execution algorithms and possibly changes the logical expression structure by changing the order in which logical operations will be executed.

(page=46&selection=39,0,47,28&color=yellow) The database engine interprets SQL queries by parsing them into a logical plan, transforming the results, choosing algorithms to implement the logical plan, and finally executing the chosen algorithms. The logical operations used by the database engine are based on operations derived from relational theory, and understanding these is crucial to thinking like a database.

Data Access Algorithms

Full scan

In a full scan, at times referred to as a sequential scan, the database engine consecutively reads all of the rows in a table and checks the filtering condition for each row.

Index-based table access

Index-only scan

Index-only scan algorithm is always preferable if it is applicable (i.e., all needed columns are in the index).

Example

page=55&selection=18,0,18,55

Index Structures

(page=68&selection=35,0,46,26&color=yellow) The output of the PostgreSQL optimizer is an execution plan. While a SELECT defines what needs to be done, an execution plan defines how to execute SQL operations.

How Are Execution Costs Calculated?

(page=77&selection=47,0,67,20&color=yellow) The cost of each execution plan depends on • Cost formulas of algorithms used in the plan • Statistical data on tables and indexes, including distribution of values • System settings (parameters and preferences), such as join_ collapse_limit or cpu_index_tuple_cost

Index Selectivity

(page=85&selection=76,12,87,29&color=yellow) the smaller the number of records that correspond to one value of the index, the lower the index’s selectivity value. We do not want to create indexes with high selectivity; as we saw in Chapter 3, index-based data retrieval in this case will take more time than a sequential scan.

Example: page=86&selection=15,0,16,0&color=yellow

🧾 Query:

sql

SELECT * FROM flight WHERE scheduled_departure BETWEEN '2024-01-17' AND '2024-08-18';

📋 Execution Plan:

pgsql

Bitmap Heap Scan on flight (cost=4510.19..17780.28 rows=310806 width=71) Recheck Cond: (scheduled_departure BETWEEN '2024-01-17 00:00:00+07' AND '2024-08-18 00:00:00+07') -> Bitmap Index Scan on flight_scheduled_departure (cost=0.00..4432.48 rows=310806 width=0) Index Cond: (scheduled_departure BETWEEN '2024-01-17 00:00:00+07' AND '2024-08-18 00:00:00+07')

✅ Breakdown of How This Works:

Step 1: Bitmap Index Scan on `flight_scheduled_departure`

PostgreSQL uses the B-tree index on scheduled_departure to find the positions (TIDs) of all rows where the timestamp falls within the range '2024-01-17' to '2024-08-18'.
It doesn’t go to the table (flight) yet. It just collects pointers to matching rows (TIDs = tuple IDs) and stores them in a bitmap (in memory).

For example, suppose it finds rows at:

makefile

CopyEdit

TID: (1,5), (1,7), (2,1), (2,2), ..., (4500,10)

The bitmap efficiently maps which pages (blocks of disk) and offsets (rows) need to be read.
This phase is sequential in the index (fast) and avoids jumping back and forth to the table.

Step 2: Bitmap Heap Scan on `flight`

Now PostgreSQL takes the bitmap and groups the TIDs by disk page.
Instead of randomly fetching each row individually (which would happen in an index scan), it fetches entire pages in optimal I/O order.

So if 500 rows are on the same page, PostgreSQL reads the page once.
This is way faster when many rows match, because it reduces random access to disk and improves cache utilization.

Step 3: Recheck Condition

Sometimes PostgreSQL has to recheck the condition on the actual row, especially if the index doesn’t fully satisfy the query (e.g., with expression indexes or multi-condition filters).
In your case, it does recheck the scheduled_departure BETWEEN ... condition during the heap scan, even though it used it in the index.

This is a safety measure in case of row changes or partial index matches.

🧠 Visual Analogy:

Imagine the index is like a library catalog:

Bitmap Index Scan is like writing down all the book locations you need from the catalog.
Bitmap Heap Scan is like walking through the library aisle by aisle, picking books in the most efficient order instead of running back and forth every time you look something up.

📈 When Bitmap Index Scan is Better

Condition	Index Scan	Bitmap Index Scan
Few rows matched	✅ Better	❌ Overhead too high
Thousands to hundreds of thousands matched	❌ Random access heavy	✅ Efficient bulk access
Multiple indexes used	❌ Not supported	✅ Can combine multiple bitmaps

vuthanhdatt's Second Brain

Explorer

Postgres Query Optimizer

Set up sample data

Install postgres

Data Access Algorithms

Full scan

Index-based table access

Index-only scan

Example

Index Structures

How Are Execution Costs Calculated?

Index Selectivity

🧾 Query:

📋 Execution Plan:

✅ Breakdown of How This Works:

Step 1: Bitmap Index Scan on `flight_scheduled_departure`

Step 2: Bitmap Heap Scan on `flight`

Step 3: Recheck Condition

🧠 Visual Analogy:

📈 When Bitmap Index Scan is Better

Graph View

Table of Contents

vuthanhdatt's Second Brain

Explorer

Postgres Query Optimizer

Set up sample data

Install postgres

Data Access Algorithms

Full scan

Index-based table access

Index-only scan

Example

Index Structures

How Are Execution Costs Calculated?

Index Selectivity

🧾 Query:

📋 Execution Plan:

✅ Breakdown of How This Works:

Step 1: Bitmap Index Scan on flight_scheduled_departure

Step 2: Bitmap Heap Scan on flight

Step 3: Recheck Condition

🧠 Visual Analogy:

📈 When Bitmap Index Scan is Better

Graph View

Table of Contents

Step 1: Bitmap Index Scan on `flight_scheduled_departure`

Step 2: Bitmap Heap Scan on `flight`