2026-06-10

Partial indexes in PostgreSQL: EXPLAIN ANALYZE proof and a negative control

Deterministic dataset, same EXPLAIN in phase A and B: what to look for in EXPLAIN (ANALYZE, BUFFERS) before and after the partial index, plus a sample plan for the negative control on rows with status closed.

Goal

On a synthetic but reproducible workload, check whether a partial index aligned with the predicate moves the plan from a heap sequential scan to an index-backed access path, and whether BUFFERS shifts reads from the whole table toward the index (plus a small negative test: a query whose predicate is not covered by the partial index should not use that index as the main filter path).

Version anchor: node names and EXPLAIN options follow PostgreSQL 16 as in the official docs. Timings and buffer counts vary by hardware and cache; the shape of the comparison is what this note keeps.

Hypothesis

With ~500k rows and a small open fraction, a query like WHERE tenant_id = constant AND status = 'open' should, after CREATE INDEX … WHERE (status = 'open'), tend toward a plan that hits the indexed subset instead of scanning almost the entire heap.

Dataset (deterministic)

random() makes every reload differ. Here hashtextextended on strings built from n fixes the row mix for a given Postgres major.

DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
  id           bigserial PRIMARY KEY,
  tenant_id    int NOT NULL,
  status       text NOT NULL CHECK (status IN ('open', 'closed', 'archived')),
  amount_cents int NOT NULL,
  created_at   timestamptz NOT NULL DEFAULT now()
);

INSERT INTO orders (tenant_id, status, amount_cents)
SELECT
  (abs(hashtextextended('t' || n::text, 0)) % 5) + 1 AS tenant_id,
  CASE
    WHEN (abs(hashtextextended('o' || n::text, 0)) % 1000) < 12 THEN 'open'
    WHEN (abs(hashtextextended('c' || n::text, 0)) % 1000) < 12 THEN 'closed'
    ELSE 'archived'
  END AS status,
  abs(hashtextextended('a' || n::text, 0)) % 100000 AS amount_cents
FROM generate_series(1, 500000) AS s(n);

ANALYZE orders;

Sanity checks right after load:

SELECT status, count(*) FROM orders GROUP BY 1 ORDER BY 1;
SELECT count(*) FROM orders WHERE tenant_id = 3 AND status = 'open';

You should see few open rows vs archived, and > 0 open rows for tenant_id = 3 (tenant and status are decoupled unlike a naive n % 100 / n % 5 pattern that can park all open rows on one tenant).

Run the same EXPLAIN below in phase A (primary key only) and again in phase B (after the partial index).

Phase A — primary key only

With the table filled and no secondary index on (tenant_id, …) restricted to open, the expected plan is a Seq Scan on orders: scan the heap (or most of it), filter on tenant_id and status, satisfy ORDER BY id, then LIMIT.

Run:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, amount_cents
FROM orders
WHERE tenant_id = 3 AND status = 'open'
ORDER BY id
LIMIT 50;

Sample plan (abbreviated, PG 16 — costs/times/buffers are illustrative; on your host they differ; node hierarchy is what matters):

Limit  (cost=... rows=50 width=...) (actual time=... rows=50 loops=1)
  ->  Sort  (...)
        Sort Key: id
        Sort Method: top-N heapsort
        ->  Seq Scan on orders  (...)
              Filter: ((tenant_id = 3) AND (status = 'open'::text))
              Rows Removed by Filter: 499xxx   -- almost the whole table dropped here
              Buffers: shared hit=... read=...  -- lots of heap, tiny result

What to scan for:

Seq Scan on orders (no index satisfying the open predicate);
Rows Removed by Filter huge vs the final 50 rows;
Buffers on the heap relation large relative to the result.

Note: do not recreate the table between phase A and phase B. The index is built on the same populated heap; if you run DROP/INSERT in between, the plans are no longer comparable to the baseline you just captured.

Phase B — partial index aligned with the predicate

CREATE INDEX orders_open_by_tenant_partial
  ON orders (tenant_id, id)
  WHERE status = 'open';

ANALYZE orders;

Re-run the identical EXPLAIN from phase A:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, amount_cents
FROM orders
WHERE tenant_id = 3 AND status = 'open'
ORDER BY id
LIMIT 50;

Sample plan after the index (abbreviated — Postgres may pick Index Scan or a Bitmap Index Scan → Bitmap Heap Scan chain; the goal is to see the partial index name in the plan):

Limit  (...)
  ->  Sort  (...)
        ->  Bitmap Heap Scan on orders  (...)
              Recheck Cond: ((tenant_id = 3) AND (status = 'open'::text))
              ->  Bitmap Index Scan on orders_open_by_tenant_partial  (...)
                    Index Cond: (tenant_id = 3)
              Buffers: shared hit=... read=...   -- index reads show up; heap much lighter

What to scan for:

the name orders_open_by_tenant_partial (or an equivalent access path that names it);
in Buffers, index blocks plus a much lighter heap footprint than phase A for the same SELECT.

The A vs B comparison is not theoretical: it is the same query twice, with only the index DDL in between.

Negative control

EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM orders
WHERE tenant_id = 3 AND status = 'closed'
LIMIT 50;

The predicate does not match WHERE status = 'open' on the partial index. orders_open_by_tenant_partial must not appear as the access path that filters closed.

Sample (abbreviated) — expect seq scan or another path, but not the partial “open” index:

Limit  (...)
  ->  Seq Scan on orders  (...)
        Filter: ((tenant_id = 3) AND (status = 'closed'::text))
        Rows Removed by Filter: ...
        Buffers: ...
-- no line "Bitmap Index Scan on orders_open_by_tenant_partial"

If that index name showed up to answer closed, that would be a signal to dig (stats, predicate shape, wrong assumption), not a silent win.

Out of scope

No parallel workers, no INCLUDE, no partitioning—only heap, one partial index, and two query families (open vs closed) to isolate the effect.

Closing

The boxed plans are visual anchors: they show where to look in the plan text (Seq Scan vs index name, Rows Removed by Filter, Buffers). Exact numbers (timings, buffer hits) come only from your run of the scripts above. If Postgres still prefers a seq scan after the index, the same production checklist applies: predicate/index alignment, implicit casts, ANALYZE freshness, work_mem, estimated vs actual cardinalities.

References

CREATE INDEX — WHERE clause
EXPLAIN — ANALYZE, BUFFERS

Want to ship ideas like these into your product?