pgrac is a shared-disk cluster extension for PostgreSQL 16. Its goal is to bring the core mechanisms of Oracle RAC — Cache Fusion, GES, SCN, three-way heartbeat, and merged recovery — to the PG ecosystem with minimal invasiveness. This chapter does not cover implementation details; it establishes the concept map: which PG single-instance assumptions must be broken, which new concepts are introduced, and how the cross-node protocols fit together.
A shared-disk cluster is fundamentally different from a typical share-nothing distributed database (Citus, CockroachDB, etc.):
Takeaway: the share-disk model can achieve near-linear scaling under OLTP and mixed workloads (≥ 90% efficiency). This is the shared premise of Oracle RAC and pgrac.
+--------+ +--------+ +--------+
| Node 1 | | Node 2 | | Node 3 |
+--------+ +--------+ +--------+
\ | /
\ Interconnect (RDMA) /
\____________|____________/
|
+----------------+
| Shared Storage |
+----------------+
| Oracle RAC Mechanism | pgrac Equivalent | Chapter |
|---|---|---|
| Cache Fusion | Ch 2 | |
| GES (Global Enqueue Service) | Ch 3 | |
| SCN (System Change Number) | Later chapters | |
| Reconfiguration | Later chapters |
PG 16 is a single-instance design. Introducing a cluster requires breaking at least five core assumptions.
PG keeps exactly one in-memory copy of each block. pgrac introduces three buffer categories (current / CR / PI) to support cross-node reads, consistent reads, and retaining dirty pages after shipping them away.
PG's LWLock and eight enqueue lock types all live in shared memory and do not cross node boundaries. pgrac uses GES to manage cross-instance enqueue locks uniformly, and PCM to manage cross-instance buffer-block locks.
In a single PG instance, row visibility is resolved by inspecting the heap tuple's xmin/xmax combined with a CLOG lookup. In a cluster, the xid spaces of different nodes cannot be compared directly. pgrac introduces SCN (Lamport timestamp) as the cross-instance ordering baseline.
A single PG instance has one pg_wal/ directory; all transactions write serially into it. pgrac switches to a per-instance independent redo stream (pg_wal_node_N/); after a crash, redo apply is performed by merging streams in SCN order.
Crash recovery in a single PG instance is performed at startup by a single process that replays the redo log. pgrac introduces reconfiguration (freeze / rebuild GRD / thaw) and merged recovery (merging multi-stream redo) to handle node failures.
PG single-instance pgrac cluster
------------------ --------------------------
Single-version buffer → XCUR/SCUR + CR + PI (three types)
Instance-local locks → GES (enqueue) + PCM (buffer)
xmin/xmax + CLOG → + SCN (Lamport)
Single WAL stream → Per-instance + merged recovery
Startup single-process → Reconfiguration + LMS workers
The cross-node protocols in pgrac are not independent — they cooperate through a shared metadata infrastructure (GRD) and a shared ordering baseline (SCN):
The cross-node subsystems are divided by responsibility:
Core terms used in this chapter:
Subsequent chapters dive into the concepts, operational flows, configuration, and monitoring of each protocol. Each chapter closes with a pointer to the corresponding curated deep page (detailed mechanism + performance budget) and the original design document.