Skip to main content

Streaming Replication

Set up a high available PostgreSQL setup using streaming replication and Pgpool-II for high availability and fail over.

End Goal

The diagram below shows the set-up that will be configured in this guide.

In a first step streaming replication will be setup up, in a second step Pgpool-II will be used for both load balancing and high availability.

//ToDo: Add Diagram

Setup

For the steps describe here the following environment will be set up:

NameIP AddressInitial Role
db01.nomadesk.org192.168.122.10primary
db02.nomadesk.org192.168.122.11standby
db03.nomadesk.org192.168.122.12standby
db.nomadesk.org192.168.122.20high available virtual IP

The roles describe above are not fixed except for the virtual IP The primary and standby role can change when a fail over event occurs.
The role only matters for the initial setup.

It is recommended to set up three identical machines.
When a fail over occurs each needs to be able to become the primary in the setup.

Additionally when setting up Pgpool-II load balancing will be configured so that read queries will be done on the standby nodes and write actions on the primary.

The virtual IP is what will make sure that even if the primary node goes down, the cluster will stay up and running.

In the sections below next to the title it will be mentioned on what server to run the commands.

Configuring the Base OS [All]

For this guide AlmaLinux 9 will be used but can easily be modified for any other distribution.

The steps assume that the starting point is a AlmaLinux 9 minimal installation.

SELinux [All]

SELinux is out of scope for this guide where we want to focus on setting up a high available PostgreSQL setup.

To disable SELinux:

sudo sed -i 's/enforcing/disabled/g' /etc/selinux/config
sudo setenforce 0

Firewall [All]

Configure the firewall:

sudo firewall-cmd --zone=public --add-port=5432/tcp --permanent
sudo firewall-cmd --reload

In later steps the ports for Pgpool-II will be opened, the first step is to have the streaming replication up and running.

Install PostgreSQL [All]

Add the EPEL and PostgreSQL repositories:

sudo dnf install -y elrepo-release

Most EPEL packages require packages from PowerTools/CRB repository as dependencies. Run the suitable command to enable the PowerTools/CRB repository:

sudo dnf config-manager --set-enabled crb

Add the PostgreSQL repository:

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm

Make sure PostgreSQL does not update automatically:

sudo dnf -qy module disable postgresql

Now install PostgreSQL, Pgpool-II will also be installed as this is used in the next section to set up high availability:

sudo dnf install -y postgresql16-server postgresql16 pgpool-II-pg16-*

#install some tools used in this guide
sudo dnf install -y vim rsync

Create the directory /var/lib/pgsql/archivedir to store WAL segments on all servers, this will be used in the postgres.conf in a later step.

mkdir -p /var/lib/pgsql/archivedir
chown postgres:postgres /var/lib/pgsql/archivedir

Setting up the Primary PostgreSQL server [db01]

Initialize the database on the primary (db01) node:

sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
sudo systemctl enable --now postgresql-16

It is not needed to initialize the databases on the other nodes as these will be copied from the primary node with pg_basebackup.

Setting up Replication [db01]

On the primary server (db01), modify the postgres.conf file in /var/lib/pgsql/16/data:

vim /var/lib/pgsql/16/data/postgresql.conf

At the bottom, add the following:

listen_addresses = '*'

archive_mode = on
archive_command = 'cp "%p" "/var/lib/pgsql/archivedir/%f"'

max_wal_senders = 10
wal_keep_size=256
wal_level = replica
wal_log_hints = on

max_replication_slots = 10

hot_standby = on

Create the replication user, by switching to the postgres user and running the CREATE USER query:

su - postgres

Log into the database:

psql

The query to create the user is:

SET password_encryption = 'scram-sha-256';
CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD '<my password>';

Note*: Make sure to update the password in the query.

Allow access from the secondary servers by modifying the pg_hba.conf file:

vim /var/lib/pgsql/16/data/pg_hba.conf

At the bottom, add the following:

host    replication     replicator      192.168.122.10/24       trust   #trust db01 without password
host replication replicator 192.168.122.11/24 trust #trust db02 without password
host replication replicator 192.168.122.12/24 trust #trust db03 without password
host all all 0.0.0.0/0 scram-sha-256 #enable password authentication to this host from anywhere else

Make sure to update the IP addresses to match your own setup.

Restart the PostgreSQL server to apply all the new settings:

sudo systemctl restart postgresql-16

Configuring the Standby PostgreSQL Servers [db02, db03]

Mirror the state of the primary server it's database:

Switch to the postgres user:

su - postgres
pg_basebackup -h 192.168.122.10 -U replicator -Fp -Xs -P -R -D /var/lib/pgsql/16/data/

The pg_basebackup command uses streaming replication to mirror the databases.

Exit the postgres user and start the databases and enable it on boot:

sudo systemctl enable --now postgresql-16

Testing the Replication

Insert Data [db01]

Log into the database on the primary database server and create a table:

su - postgres
psql
CREATE TABLE cars (
brand VARCHAR(255),
model VARCHAR(255),
year INT
);

Insert some data:

INSERT INTO cars (brand, model, year) VALUES ('Ford', 'Mustang', 1964);

Read the Data [db02,db03]

On the standby servers also log into the database and run a select query.

su - postgres

Now run the query:

psql -c "select * from cars;"

Both of the secondary servers should return:

[postgres@secondary01 data]$ psql -c "select * from cars;"
brand | model | year
-------+---------+------
Ford | Mustang | 1964

Conclusion

Streaming replication is now set up and you have a backup in case anything goes wrong with the primary and/or standby nodes.

This does not mean the setup is high available yet as the connection string used in the application would be that of the primary only.

In the next step Pgpool-II will be used to set up the high availability.