2026-06-13

Advisory locks and connection pools: the silent leak (and how to avoid it)

Why session-scoped pg_advisory_lock survives COMMIT and breaks transaction-pooled connections, how to reproduce with psql and pg_locks, why SKIP LOCKED is the wrong tool to serialize one known job key, and a production-ready blocking query.

Who this is for

If workers sometimes run the same job twice, or you are designing a Postgres-backed queue, you probably already know what advisory locks and SELECT … FOR UPDATE are. What is hard to find is when the mechanism you picked quietly fails in production and how to notice before the symptom is duplicate side effects or “stuck” traffic.

This post focuses on three concrete gaps:

Session-scoped advisory locks + a pool that reuses the same Postgres backend across transactions (classic PgBouncer transaction pooling).
Why SKIP LOCKED is the wrong primitive when you need mutual exclusion on one known key (you want wait, not skip).
A ready-to-paste monitoring query for long advisory waits, with column-by-column intent for alerting.

Version note: examples match PostgreSQL 16; on other majors, double-check pg_locks columns and your pooler docs.

Minimal ledger setup

You still want a stable row when we contrast with FOR UPDATE:

DROP TABLE IF EXISTS job_ledger;

CREATE TABLE job_ledger (
  job_key     text PRIMARY KEY,
  claim_count bigint NOT NULL DEFAULT 0,
  updated_at  timestamptz NOT NULL DEFAULT now()
);

INSERT INTO job_ledger (job_key) VALUES ('report-daily')
ON CONFLICT DO NOTHING;

Shared advisory class id for your app namespace:

SELECT 7742 AS app_advisory_class;

Sanity check: `pg_advisory_xact_lock` behaves

Two psql windows.

Session A

BEGIN;
SELECT pg_advisory_xact_lock(7742, hashtext('report-daily'));
-- … work …
COMMIT;

Session B (same BEGIN + same pg_advisory_xact_lock while A has not committed): blocks. After A commits, B proceeds. The lock is transaction-scoped — nothing remains on the backend after the transaction ends.

This is textbook Postgres — it only sets the baseline before the painful case.

The painful case: `pg_advisory_lock` and transaction-mode pooling

pg_advisory_lock (non-xact) is session-scoped: it survives COMMIT until pg_advisory_unlock or disconnect.

Reproduce without PgBouncer (one backend, one `psql`)

In a single session:

SELECT pg_advisory_lock(7742, hashtext('report-daily'));
COMMIT;

Then ask Postgres whether the lock is still there:

SELECT l.pid, l.mode, l.granted, l.classid, l.objid
FROM pg_locks l
WHERE l.locktype = 'advisory'
  AND l.pid = pg_backend_pid();

Expected: at least one row with granted = true. COMMIT did not release the lock — that is correct for the session-scoped variant.

Now picture PgBouncer transaction pooling (or an equivalent pool): after COMMIT your app returns the connection to the pool. The server backend (pid above) often stays allocated to the next client that gets that pooled server slot. The advisory lock is still attached to that pid: the next worker may inherit a lock it never requested. Symptoms: “ghost” serialization, starvation, or cross-tenant bleed that never reproduces on a single local connection.

Why pg_advisory_xact_lock fixes the pooling story: it releases at transaction end, so the backend is clean for the next pool renter after every COMMIT/ROLLBACK — the lifecycle transaction pooling assumes.

Note: in session pooling, a client usually owns one backend for its whole life, so cross-client inheritance is rarer, but you still leak until disconnect if you forget pg_advisory_unlock. In transaction mode, backend reuse across clients is the default.

Useful reading: PgBouncer features — pooling mode.

`FOR UPDATE` on the ledger: when lock and state must be the same transaction

If serialization must be the same transaction that updates claim_count / status, a ledger row is a common pattern:

BEGIN;

SELECT job_key, claim_count
FROM job_ledger
WHERE job_key = 'report-daily'
FOR UPDATE;

UPDATE job_ledger
SET claim_count = claim_count + 1, updated_at = now()
WHERE job_key = 'report-daily';

COMMIT;

Here the lock is the row you mutate — no “advisory first, update later” gap.

`SKIP LOCKED`: why it is not the mutex you want for one key

SKIP LOCKED means do not wait for rows already locked — the reader moves on immediately without those rows. For one known job key that must be processed by at most one worker at a time, that is often the opposite of what you want: you want the second worker to wait, not to “not see” the work.

Session A

BEGIN;
SELECT job_key FROM job_ledger WHERE job_key = 'report-daily' FOR UPDATE;
-- hold; do not COMMIT yet

Session B — first without skip:

BEGIN;
SELECT job_key FROM job_ledger WHERE job_key = 'report-daily' FOR UPDATE;

Expected: blocks until A commits. Classic serialization.

ROLLBACK on B, then retry B with:

BEGIN;
SELECT job_key FROM job_ledger WHERE job_key = 'report-daily' FOR UPDATE SKIP LOCKED;

Expected: the statement returns immediately with zero rows (there is no alternate unlocked row): the worker may read “no job” while the job exists and is merely locked. That is the footgun for single-key idempotency.

Outbox / claim-pool designs want to skip rows others already claimed across a set of jobs. This article is about one key that must be mutually exclusive, not “fetch the next free row”.

Production-ready monitoring: advisory waits longer than N seconds

Goal: feed an alert (“a backend has waited too long on an advisory”). Join ungranted pg_locks to pg_stat_activity, then to the granted holder using the lock identity columns.

Example threshold: 30 seconds — tune in prod.

SELECT
  blocked_a.pid AS blocked_pid,
  now() - blocked_a.query_start AS blocked_since,
  blocked_a.state AS blocked_state,
  blocked_a.wait_event_type,
  blocked_a.wait_event,
  left(blocked_a.query, 200) AS blocked_query_snippet,
  blocking_a.pid AS blocking_pid,
  now() - blocking_a.query_start AS blocking_query_age,
  left(blocking_a.query, 200) AS blocking_query_snippet
FROM pg_catalog.pg_locks blocked_l
JOIN pg_catalog.pg_stat_activity blocked_a ON blocked_a.pid = blocked_l.pid
JOIN pg_catalog.pg_locks blocking_l
  ON blocking_l.locktype = blocked_l.locktype
 AND blocking_l.database IS NOT DISTINCT FROM blocked_l.database
 AND blocking_l.relation IS NOT DISTINCT FROM blocked_l.relation
 AND blocking_l.page IS NOT DISTINCT FROM blocked_l.page
 AND blocking_l.tuple IS NOT DISTINCT FROM blocked_l.tuple
 AND blocking_l.virtualxid IS NOT DISTINCT FROM blocked_l.virtualxid
 AND blocking_l.transactionid IS NOT DISTINCT FROM blocked_l.transactionid
 AND blocking_l.classid IS NOT DISTINCT FROM blocked_l.classid
 AND blocking_l.objid IS NOT DISTINCT FROM blocked_l.objid
 AND blocking_l.objsubid IS NOT DISTINCT FROM blocked_l.objsubid
 AND blocking_l.pid <> blocked_l.pid
JOIN pg_catalog.pg_stat_activity blocking_a ON blocking_a.pid = blocking_l.pid
WHERE NOT blocked_l.granted
  AND blocking_l.granted
  AND blocked_l.locktype = 'advisory'
  AND now() - blocked_a.query_start > interval '30 seconds';

How to read each column for alerting:

blocked_pid: Postgres backend of the waiting session.
blocked_since: how long that statement has been running — a practical proxy for lock wait duration.
blocked_state / wait_event_type / wait_event: lock waits are usually active + Lock with an advisory-related wait event; helps separate lock waits from I/O waits.
blocked_query_snippet: truncated SQL text (watch PII in logs).
blocking_pid: the session that holds the granted advisory lock.
blocking_query_snippet: what the holder is doing — often the long transaction to optimize or cap with a timeout.

In noisy systems, add AND blocked_l.classid = 7742 if that class id is reserved for your job locks.

Out of scope

Patroni, failover, ORMs, LISTEN/NOTIFY, outbox claim tables with SKIP LOCKED across many rows — valid topics, different stories.

Closing

The useful question is not a generic “advisory vs FOR UPDATE” matrix, but: does my lock survive COMMIT in a way the pool hands to the next client? and does my worker treat “zero rows” as “no work” when the work row is merely locked? Shipping pg_advisory_lock under transaction pooling without knowing this is a silent failure mode until you measure contention or accidental cross-job reuse.