This appendix consolidates the core terms and abbreviations that appear across all 12 chapters and 14 feature deep pages of the pgrac manual. Entries are grouped into 6 thematic sections. Each entry provides the expanded abbreviation (where applicable), a 1–2 sentence definition, and cross-references to the relevant deep page or chapter. This appendix does not repeat the conceptual derivations found in individual chapters; it serves as a quick-reference vocabulary baseline.
Core terms for cluster topology and the infrastructure layer, covering nodes, processes, heartbeat, and fencing mechanisms.
postmaster — The PostgreSQL primary monitor process; in pgrac cluster mode it additionally registers with the GRD, forks the cluster daemons (LMS / LMD / LCK / LMON, etc.), and makes restart / instance-crash decisions when a child process dies. See Chapter 8.
instance — The collective name for the group of pgrac processes running on a single node. Each instance has its own buffer pool, its own undo tablespace (undo_node_N), and its own WAL stream (pg_wal_node_N/).
node — The physical or virtual host running an instance. Nodes communicate with each other over a high-speed interconnect (RDMA / RoCE / TCP) and share the underlying storage.
voting disk — A quorum file stored on shared storage, used to determine which side continues serving data when a network partition (split-brain) occurs. Voting is based on majority-node reachability; a failed node is fenced via STONITH (shoot the other node in the head).
fencing — The mechanism for isolating a failed node to prevent a split-brain node from continuing to write to shared storage. pgrac supports two modes: STONITH hard fencing (IPMI / BMC power-off) and IO fencing (revoking storage access credentials).
CSSD (Cluster Synchronization Services Daemon) — The cluster synchronization services daemon. Responsible for managing node membership, maintaining the voting disk lease, and triggering fence decisions when a node times out (default heartbeat timeout: 6 seconds).
three-way heartbeat — pgrac's three-path heartbeat mechanism: inter-node interconnect heartbeat (1 s) + shared-storage I/O heartbeat (3 s) + CSS voting disk lease (6 s). Each path has a different failure threshold; the SUSPECT → DEAD state machine is coordinated by LMON. See Chapter 7.
DRM (Dynamic Resource Mastering) — GRD's hotspot migration mechanism: it migrates the master node of a frequently accessed resource to the most active requester, reducing the number of cross-node message round-trips. Both GCS and GES can trigger DRM. See
Core terms for the Cache Fusion and cross-node lock protocol layer, covering GCS, GES, GRD, and the PCM lock state machine.
Cache Fusion — pgrac's cross-node buffer transfer protocol: when node B requests a block that node A already holds in its buffer pool, the block is shipped directly from A's memory to B's memory over the interconnect, bypassing disk (Tier 1 RDMA target latency ~5 μs). See
GCS (Global Cache Service) — The upper-layer service for Cache Fusion. Manages access rights, transfer coordination, and PI lifecycle for all cross-node buffer blocks. GCS resources are registered in the GRD, keyed by BufferTag (file number + block number).
GES (Global Enqueue Service) — The global enqueue lock service. Manages all cross-node logical locks outside the buffer cache: row locks (TX), table locks (TM), sequence locks (SEQ), advisory locks (UL), and others. GES shares GRD master routing with PCM but has a completely isolated keyspace. See
GRD (Global Resource Directory) — The cross-node resource metadata dictionary, sharded across nodes by hash(resource_id) % N; each resource has a unique master node. Both GCS (buffer block resources) and GES (enqueue lock resources) store their metadata in the GRD. See
master — The authoritative node for a given resource in the GRD. Responsible for maintaining the resource's grant_list (holder queue) and convert_queue (waiter queue), and for coordinating lock-conversion message exchange. The master is statically determined by hash(key) % N and can be dynamically migrated via DRM.
BAST (Blocking ASynchronous Trap) — GES asynchronous block notification: when the master detects that a lock request conflicts with the current holder, it sends a BAST message to the holder, requesting that the holder proactively release or downgrade its lock. BAST is a cooperative release mechanism, distinct from the synchronous forced downgrade in PCM (a PCM downgrade is an immediate response). See
PCM lock (Parallel Cache Management lock) — A three-state lock (N / S / X) on a buffer block with an orthogonal has_pi flag. N = not held; S = shared read (multiple nodes may hold simultaneously); X = exclusive write (unique across the entire cluster). The has_pi flag is orthogonal to the lock state and indicates that this node retains a PI (Past Image) copy. See
PI (Past Image) — When a node holding an X lock ships the block out (X → S downgrade), if the block is dirty, the node retains the old dirty copy as a PI. The PI provides a fallback for subsequent consistent reads and is deregistered by the GRD after the corresponding WAL has been fsync'd or after reconfiguration completes. See Chapter 2 and
XCUR (Exclusive Current) — A buffer copy type: the current block on the node holding the PCM X lock; unique across the entire cluster. All reads and writes go to the XCUR node; requests from other nodes trigger a Cache Fusion transfer.
SCUR (Shared Current) — A buffer copy type: the current block on a node holding a PCM S lock; multiple nodes may hold it simultaneously (read-only shared).
CR (Consistent Read) — A consistent-read view. After AD-006 PIVOT B, CR no longer occupies a dedicated buffer slot; instead, historical row-level versions are constructed on-demand from the undo chain against the current block. CR is entirely local — it does not enter the GRD and is not transferred across nodes. See Chapter 11.
Core terms for the SCN timestamp protocol and crash recovery, covering the Lamport clock, WAL merging, and the reconfiguration (freeze / rebuild / thaw) process.
SCN (System Change Number) — pgrac's distributed Lamport clock, 64-bit encoded (high 8 bits: node_id; low 56 bits: local_scn). Incremented on every transaction commit. Serves as the MVCC visibility baseline, the global timestamp on WAL records, and the causal ordering reference for Cache Fusion and GES messages. See
BOC (Broadcast on Commit) — The commit-time broadcast mechanism: after a transaction commits, the BOC module embedded in the walwriter periodically (every 100 μs) piggybacks the latest commit_scn to all nodes, driving continuous cluster-wide SCN convergence. BOC is the active push path for SCN propagation, supplementing the passive piggyback path.
piggyback — The mechanism of attaching the current SCN to any cross-node message (PCM grant, CF block transfer, GES grant, BOC broadcast). Upon receiving a piggyback, the receiver applies local_scn = max(local_scn, remote.local_scn) + 1, ensuring causal monotonicity.
Lamport timestamp — The theoretical basis for pgrac's SCN: the Lamport logical clock. Three advancement paths: local commit increment, receive-external-SCN max+1, and snapshot-current-value on WAL write. See Chapter 4.
merged recovery — The mechanism for crash recovery that merges the per-instance WAL streams from multiple nodes in ascending xl_scn order (K-way priority queue merge). Each WAL record carries xl_scn; the merged order is consistent with the original write causal order. See Chapter 5 and Chapter 12.
freeze / rebuild / thaw — The three-phase reconfiguration protocol. Freeze suspends DML on all nodes, blocking new resource access. Rebuild reconstructs the GRD (clears GCS/GES records for the dead node, elects new masters, consolidates PI information). Thaw resumes normal cluster service. See
xl_scn — An extended field in the WAL record header (8 B) that stores the SCN at the moment the record was written (commit records store commit_scn; all other records store local_scn). Used for global ordering during merged recovery and for per-thread WAL stream SCN monotonicity verification after a crash. See Chapter 12.
thread_id — A WAL page header field (xlp_thread_id, 2 B) that identifies the per-instance redo stream to which the WAL page belongs. Read from the page header during recovery; does not need to be redundantly stored in every record header. See Chapter 12.
Core terms for block format and the undo subsystem, covering ITL, UBA, TT slot, per-instance undo, and delayed cleanout.
ITL (Interested Transaction List) — The transaction-marker slot array at the end of a block (special area). Each slot is 48 B: xid (4 B) + wrap (2 B) + flags (1 B) + lock_count (1 B) + UBA (16 B) + commit_scn (8 B) + write_scn (8 B) + first_change_lsn (8 B). Default INITRANS = 8 (384 B total), identified by the PD_HAS_ITL flag. See
INITRANS — A per-table DDL parameter controlling the initial number of ITL slots in the block special area (default: 8). High-concurrency hot tables may be set to 16–32; OLAP read-heavy tables may use 4. INITRANS = 8 results in approximately 4.8% capacity loss per block (384 B / 8192 B). See Chapter 9.
UBA (Undo Block Address) — The type of the undo_segment_head field in an ITL slot; a 16 B precise addressing structure: segment_id (4 B) + block_no (4 B) + tt_slot_offset (2 B) + row_offset (2 B) + reserved (4 B). Serves two query paths: TT slot lookup (to obtain commit_scn) and undo record lookup (for CR block construction). See
TT slot (Transaction Table slot) — A transaction-status table entry in the per-instance undo segment header, 32 B: xid (4 B) + wrap (2 B) + status (1 B) + flags (1 B) + commit_scn (8 B) + first_undo_block (16 B UBA). The authoritative data source for SCN-path visibility decisions. Each segment header holds 48 TT slots (1.5 KB). See
per-instance undo tablespace — The independent undo storage area owned by each pgrac node (undo_node_N); default 16 segments × 64 MB = 1 GB. Each instance writes only to its own undo; cross-node undo reads are served via Cache Fusion (S mode, read-only), achieving zero cross-node undo write contention. See
delayed cleanout — An optimization that avoids traversing all modified rows to update ITL slot flags at commit time. On commit, only the TT slot is updated (status → COMMITTED) and the block's PD_DELAYED_CLEANOUT flag is set; the actual cleanout — writing commit_scn back into the ITL slot — is deferred until the next transaction that requests an S → X upgrade on the block or opportunistically queries its SCN path. Under heavy concurrent write workloads this can reduce commit-path I/O by 70–90%. See Chapter 9.
pd_block_scn — An 8 B field appended by pgrac at the end of PageHeaderData, recording the SCN at the time of the block's most recent modification. The read path compares block_scn against snapshot.read_scn to determine whether a CR view must be constructed; if block_scn ≤ read_scn, the current version is directly visible and no undo traversal is needed. See Chapter 9.
undo retention — The minimum retention period for undo data (default: cluster_undo_retention_sec = 900 seconds, 15 minutes). Within this window, undo records may not be reclaimed even if the segment is COMMITTED, preventing "snapshot too old" (STO) errors for long-running queries. The background undo_vacuum worker scans for and reclaims expired data every 60 seconds. See Chapter 10.
Core terms for the lock protocol and concurrency control layer, covering grant_list, convert queue, deadlock detection, and the write-ahead rule.
grant_list — The lock-holder linked list maintained by the GES / GRD master, recording all lock holders (node ID + lock mode) that have been granted the resource. When a new request arrives, the master checks the grant_list for conflicts; if none exist the request is added immediately, otherwise it is placed in the convert_queue.
convert queue — The wait queue maintained by the GES master, holding lock requests that cannot be granted immediately because they conflict with an existing holder in the grant_list. The queue is FIFO-ordered and supports three priority levels (DDL > DML > Advisory). See
deadlock probe — pgrac's deadlock detection mechanism: when LMD detects a cycle in the wait-for graph, it selects the cheapest transaction (usually the youngest) as the victim and aborts it to break the cycle. Each LMD node builds a local fragment of the wait-for graph; cross-node merging is coordinated by LMON. See Chapter 6.
write-ahead rule (Cache Fusion) — The constraint (feature-019) that the corresponding WAL records must have been fsync'd before a dirty block is shipped via Cache Fusion. This ensures that even if the receiver crashes, it can recover the block from the log without relying on the sender (which may also have crashed). Violating this rule breaks the cluster durability guarantee. See Chapter 12.
WRAP counter — The wrap field (2 B) in a TT slot or ITL slot, incremented whenever the slot is reused. Prevents ABA false positives: a reader confirms the slot has not been reused by comparing wrap values; if they do not match, the reader falls back to the commit_scn cached in the ITL slot.
convert queue priority — The priority mechanism of the GES wait queue: DDL operations (AccessExclusive) have the highest priority, ordinary DML (RowExclusive) is next, and advisory locks are lowest. Higher-priority requests may jump ahead of lower-priority holders, without violating the FIFO guarantee for requests of equal priority.
S → X upgrade (cleanout trigger) — During delayed cleanout, a reader that finds the block's PD_DELAYED_CLEANOUT flag set must write commit_scn back into the ITL slot. Writing the block requires briefly upgrading the PCM lock (S → X); once cleanout completes the lock is immediately downgraded back to S, during which other readers wait. The upgrade must also produce the corresponding WAL (ItlCleanoutRecord). See Chapter 9.
Core terms for background processes and wait-event monitoring, covering the LMS, LMD, LMHB, DIAG, and other cluster daemons.
LMS (Lock Master Service worker) — pgrac's core cluster daemon. Handles remote PCM/GES requests from other nodes, responds to buffer ship requests (the Cache Fusion send path), executes lock grant/revoke decisions, and attaches an SCN piggyback to every response message. Default: 4 workers (GUC cluster.lms_workers, tunable 1–16). See
LMD (Lock Manager Daemon) — Receives enqueue requests from the local node, maintains the wait queue (FIFO + 3 priority levels), builds the local fragment of the wait-for graph for deadlock detection, and merges the global graph with other nodes under LMON coordination. One per node. See
LMHB (Lock Manager HeartBeat) — The heartbeat guardian that monitors the lock-service processes (LMS, LMD, etc.): if a lock-service process fails to respond within the timeout, LMHB triggers a panic and postmaster decides whether to restart the process or crash the instance (preserving cluster consistency takes precedence over preserving single-node availability).
DIAG — The cross-node diagnostic snapshot process. Detects long-wait conditions (default threshold: 60 seconds) and triggers hang dumps, receives diagnostic requests from other nodes, and aggregates the cluster log. The first entry point for cluster-hang investigation. See
RECO (Distributed Recovery process) — Handles reclamation of in-doubt transactions left over from cross-node distributed transactions (2PC). In pgrac its responsibilities are extended to cleaning up orphaned lock records and PI residuals left behind during reconfiguration. One per node; persistent but normally idle.
GRD0 — The primary process for the GRD (Global Resource Directory) shard on the local node. Responsible for persisting GRD shard metadata and serving as the coordination endpoint for the cross-node GRD synchronization protocol. GRD0 carries the bulk of the Rebuild work during the Freeze / Rebuild / Thaw three-phase protocol.
walwriter + BOC — In pgrac, the native PG walwriter process has the BOC (Broadcast on Commit) module embedded within it: it flushes WAL every 100 μs and piggybacks the latest commit_scn to all nodes. BOC and WAL flush timing are tightly coupled; embedding them preserves timing consistency (aligning with Oracle LGWR's architectural decision to embed BOC). See Chapter 4 and Chapter 8.
LCK (Lock Process) — A dedicated process that holds instance-level locks (dictionary locks, cluster catalog locks), preventing LMS workers from being blocked for extended periods by instance-level locks while handling high-frequency block requests. One per node.
LMON (Lock Monitor) — Monitors cluster node health, coordinates reconfiguration, triggers GRD rebuild and fence decisions, and launches the Recovery Coordinator when merged recovery is required. The global coordinator of cluster health state. See
This appendix is closely related to the following feature deep pages and chapters; consulting them in parallel is recommended.
Tier S (Core Protocols)
xl_scn monotonicity invariant, persistence anti-rollbackhash(key) % N sharding, DRM hotspot migration, Freeze/Rebuild/Thaw three-phase GRD reconstructionTier A (Storage & Lock Services)
pcm_lock acquisition order constraintsClusterPageHeader C struct, ITL slot 48 B field annotations, PD_HAS_ITL / PD_DELAYED_CLEANOUT flag semanticsUndoSegmentHeader struct, TT slot 32 B encoding, 5-state segment lifecycle, undo_vacuum bgworkerClusterBufferDesc 128 B layout, three-pool differentiated eviction, PIVOT B cache-line alignment strategy, PI TTLTier B (Operations & Observability)
ClusterXLogRecord 32 B header, 7–8 new RMGRs, K-way SCN merge replay algorithm, WAL capacity budget (5.5× baseline)pg_stat_cluster_activity view, long-wait thresholds and DIAG triggersBackendType enum extensions, startup order and timeoutsThe following design documents are the normative sources for the chapters in this manual. They are recommended for readers who need in-depth protocol details, formal proofs, or implementation constraints.