Set up sample data
Install postgres
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt -y install postgresqlCheck postgres is up
sudo systemctl status postgresql
Login with default user
sudo -i -u postgres
psqlChange default user password: \password postgres
Login with postgres directly from psql will face peer authentication error, we need to change auth method from peer to md5 in sudo nano /etc/postgresql/<version>/main/pg_hba.conf
change
local all postgres peer
to
local all postgres md5
Then restart sudo systemctl restart postgresql
Enable remote access
sudo nano /etc/postgresql/<version>/main/postgresql.conf
Change
#listen_addresses = 'localhost'
to
listen_addresses = '*'
add host all all 0.0.0.0/0 md5
to pg_hba.conf
Get backup data here
download .backup folder
wget "https://drive.usercontent.google.com/download?id=1nNXcTjhJK6EzJdzaZvdoJDrYj_V61rq8&export=download&authuser=0&confirm=t&uuid=1e50409c-949d-4791-b598-69cf20eb5df2&at=ALoNOgm0E9MrgEry4Sd72cZ-q4_4:1747273891823" -O /tmp/backup.dump
then restore with pg_restore
pg_restore --dbname=postgres --username=postgres --no-owner --verbose /tmp/backup.dump

(page=26&selection=12,42,38,12&color=yellow) Why is it different in SQL, and why is it that two queries that yield the same result may drastically differ in execution time? The underlying source of the problem is that SQL is a declarative language. That means that when we write a SQL statement, we describe the result we want to get, but we do not specify how that result should be obtained. By contrast, in an imperative language, we specify what to do to obtain a desired result—that is, the sequence of steps that should be executed.
(page=36&selection=26,0,36,35&color=yellow) For former Oracle users, the most important distinction they should be aware of is that PostgreSQL does not cache queries; it only caches the data in the shared buffers. When Oracle would produce a generic execution plan, which will be available for multiple connections, PostgreSQL will generate a new execution plan each time the query is executed. Even though all the necessary data is already in the main memory, the query itself must be executed.
(page=38&selection=39,0,53,48&color=yellow) In order to produce query results, PostgreSQL performs the following steps: • Compile and transform a SQL statement into an expression consisting of high-level logical operations, known as a logical plan. • Optimize the logical plan and convert it into an execution plan. • Execute (interpret) the plan and return results.
(page=39&selection=12,7,16,85&color=yellow) the output of an imperative language compiler is usually (nearly) executable code, such as byte code for a Java virtual machine. In contrast, the output of a query compiler is an expression consisting of high-level operations that remain declarative
(page=39&selection=26,26,32,9&color=yellow) The optimizer performs two kinds of transformations: it replaces logical operations with their execution algorithms and possibly changes the logical expression structure by changing the order in which logical operations will be executed.
(page=46&selection=39,0,47,28&color=yellow) The database engine interprets SQL queries by parsing them into a logical plan, transforming the results, choosing algorithms to implement the logical plan, and finally executing the chosen algorithms. The logical operations used by the database engine are based on operations derived from relational theory, and understanding these is crucial to thinking like a database.
Data Access Algorithms

Full scan
In a full scan, at times referred to as a sequential scan, the database engine consecutively reads all of the rows in a table and checks the filtering condition for each row.
Index-based table access
Index-only scan
Index-only scan algorithm is always preferable if it is applicable (i.e., all needed columns are in the index).
Example
Index Structures
(page=68&selection=35,0,46,26&color=yellow) The output of the PostgreSQL optimizer is an execution plan. While a SELECT defines what needs to be done, an execution plan defines how to execute SQL operations.
How Are Execution Costs Calculated?
(page=77&selection=47,0,67,20&color=yellow) The cost of each execution plan depends on • Cost formulas of algorithms used in the plan • Statistical data on tables and indexes, including distribution of values • System settings (parameters and preferences), such as join_ collapse_limit or cpu_index_tuple_cost
Index Selectivity
(page=85&selection=76,12,87,29&color=yellow) the smaller the number of records that correspond to one value of the index, the lower the index’s selectivity value. We do not want to create indexes with high selectivity; as we saw in Chapter 3, index-based data retrieval in this case will take more time than a sequential scan.
Example: page=86&selection=15,0,16,0&color=yellow
🧾 Query:
sql
SELECT * FROM flight WHERE scheduled_departure BETWEEN '2024-01-17' AND '2024-08-18';
📋 Execution Plan:
pgsql
Bitmap Heap Scan on flight (cost=4510.19..17780.28 rows=310806 width=71) Recheck Cond: (scheduled_departure BETWEEN '2024-01-17 00:00:00+07' AND '2024-08-18 00:00:00+07') -> Bitmap Index Scan on flight_scheduled_departure (cost=0.00..4432.48 rows=310806 width=0) Index Cond: (scheduled_departure BETWEEN '2024-01-17 00:00:00+07' AND '2024-08-18 00:00:00+07')
✅ Breakdown of How This Works:
Step 1: Bitmap Index Scan on flight_scheduled_departure
-
PostgreSQL uses the B-tree index on
scheduled_departureto find the positions (TIDs) of all rows where the timestamp falls within the range'2024-01-17'to'2024-08-18'. -
It doesn’t go to the table (
flight) yet. It just collects pointers to matching rows (TIDs = tuple IDs) and stores them in a bitmap (in memory).For example, suppose it finds rows at:
makefile
CopyEdit
TID: (1,5), (1,7), (2,1), (2,2), ..., (4500,10)The bitmap efficiently maps which pages (blocks of disk) and offsets (rows) need to be read.
-
This phase is sequential in the index (fast) and avoids jumping back and forth to the table.
Step 2: Bitmap Heap Scan on flight
-
Now PostgreSQL takes the bitmap and groups the TIDs by disk page.
-
Instead of randomly fetching each row individually (which would happen in an index scan), it fetches entire pages in optimal I/O order.
So if 500 rows are on the same page, PostgreSQL reads the page once.
-
This is way faster when many rows match, because it reduces random access to disk and improves cache utilization.
Step 3: Recheck Condition
-
Sometimes PostgreSQL has to recheck the condition on the actual row, especially if the index doesn’t fully satisfy the query (e.g., with expression indexes or multi-condition filters).
-
In your case, it does recheck the
scheduled_departure BETWEEN ...condition during the heap scan, even though it used it in the index.This is a safety measure in case of row changes or partial index matches.
🧠 Visual Analogy:
Imagine the index is like a library catalog:
-
Bitmap Index Scan is like writing down all the book locations you need from the catalog.
-
Bitmap Heap Scan is like walking through the library aisle by aisle, picking books in the most efficient order instead of running back and forth every time you look something up.
📈 When Bitmap Index Scan is Better
| Condition | Index Scan | Bitmap Index Scan |
|---|---|---|
| Few rows matched | ✅ Better | ❌ Overhead too high |
| Thousands to hundreds of thousands matched | ❌ Random access heavy | ✅ Efficient bulk access |
| Multiple indexes used | ❌ Not supported | ✅ Can combine multiple bitmaps |