In the world of running critical application, where we cannot afford downtime caused by hardware failures, it becomes necessary to run cluster of hardware, where the application can be automatically migrated to another running hardware, in case the initial hardware fails due to whatever reason.
- Pacemaker is a high-availability cluster resource manager, that enables the following:
Constantly check the status of all the hardware systems running on the cluster. - Start configured application (eg: httpd) in one of the running hardware based on the preference mentioned in the cluster configuration file.
- Move application to another hardware in case the present hardware becomes unavailable due to reasons such as hardware failure, network failure, unavailability of the node due to resource (memory, CPU, etc) over-consumption, etc.
- Enable administrator to move application to a different hardware in a seamless manner
Pacemaker can be considered as new version on old HA application that was available in older Linux versions.
Pacemaker – Installation procedure to create a TWO node cluster
Install the application in both nodes
[root@star ~]# yum install pacemaker pcs fence-agents -y
Verify the hacluster user is created in both nodes
[root@star ~]# cat /etc/passwd | grep hacluster
Set a password for the hacluster used (eg: mypass) in both nodes
[root@star ~]# passwd hacluster
Start the pacemaker applications in both nodes
[root@star ~]# systemctl start pcsd
[root@star ~]# systemctl enable pcsd
As we will have two nodes in this cluster, and will be addressed as ha_node_1 and ha_node_2 in this example, we need to add the DNS entry in the local DNS file.
[root@star ~]# vi /etc/hosts
192.168.1.22 ha_node_1
192.168.1.23 ha_node_2
Disable NetworkManager during startup in both the nodes
[root@star ~]# systemctl disable NetworkManager
Run the following commands in ha_node_1
[root@star ~]# pcs cluster auth ha_node_1 ha_node_2 -u hacluster
[root@star ~]# pcs cluster setup –name Cluster ha_node_1 ha_node_2
================
Destroying cluster on nodes: ha_node_1, ha_node_2…
ha_node_1: Stopping Cluster (pacemaker)…
ha_node_2: Stopping Cluster (pacemaker)…
ha_node_2: Successfully destroyed cluster
ha_node_1: Successfully destroyed cluster
Sending ‘pacemaker_remote authkey’ to ‘ha_node_1’, ‘ha_node_2’
ha_node_1: successful distribution of the file ‘pacemaker_remote authkey’
ha_node_2: successful distribution of the file ‘pacemaker_remote authkey’
Sending cluster config files to the nodes…
ha_node_1: Succeeded
ha_node_2: Succeeded
Synchronizing pcsd certificates on nodes ha_node_1, ha_node_2…
ha_node_1: Success
ha_node_2: Success
Restarting pcsd on the nodes in order to reload the certificates…
ha_node_1: Success
ha_node_2: Success
[root@localhost ~]#
=================================
Verify in both nodes:
[root@star ~]# cat /etc/corosync/corosync.conf
Run the command in ha_node_1
[root@star ~]# pcs cluster start –all
ha_node_1: Starting Cluster (corosync)…
ha_node_2: Starting Cluster (corosync)…
ha_node_1: Starting Cluster (pacemaker)…
ha_node_2: Starting Cluster (pacemaker)…
Check the status in both nodes
[root@star ~]# pcs status
====== =======
Cluster name: Cluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: ha_node_2 (version 1.1.21-4.el7-f14e36fd43) – partition with quorum
Last updated: Mon Aug 24 13:37:59 2020
Last change: Mon Aug 24 13:37:48 2020 by hacluster via crmd on ha_node_2
2 nodes configured
0 resources configured
Online: [ ha_node_1 ha_node_2 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@localhost ~]#
pcs cluster enable -all
====== =======
Now configure the cluster with apache and virtual IP. Run command in 1st node
[root@star ~]# pcs resource create VirtIP IPAddr ip=192.168.1.25 cidr_netmask=24 op monitor interval=30
Assumed agent name ‘ocf:heartbeat:IPaddr’ (deduced from ‘IPAddr’)
In case you want to delete the VirtIP cluster, execute the following command
[root@star ~]# <<<< pcs resource delete VirtIP >>>
[root@star ~]# pcs resource create Httpd apache configuration=”/etc/httpd/conf/httpd.conf” op monitor interval=30
If the above does not work we can try this command
[root@star ~]# pcs resource create Httpd apache configuration=”/etc/httpd/conf/httpd.conf” op monitor interval=30 –force
If apache httpd is already installed you may try the below
[root@star ~]# pcs resource update Httpd apache configfile=”/etc/httpd/conf/httpd.conf” op monitor interval=30 –force
Verify the status of the Virtual IP and httpd
[root@star ~]# pcs status resources
Now try accessing the web server by accessing the URL with the virtual IP. You may want to try switching off each node one at a time while the other node is running. The website should be accessible immaterial of the node that is dowm
Troubleshooting
= = = = =****= = = = =
In case the below command shows the Stopped result, you may want to start the cluster in one of the node:
[root@drbd1 ~]# pcs status resources
======
Httpd (ocf::heartbeat:apache): Stopped
VirtIP (ocf::heartbeat:IPaddr): Stopped
===========
[root@drbd1 ~]# pcs cluster start –all
======
ha_node_1: Starting Cluster (corosync)…
ha_node_2: Starting Cluster (corosync)…
ha_node_1: Starting Cluster (pacemaker)…
ha_node_2: Starting Cluster (pacemaker)…
=========
[root@drbd1 ~]# pcs status resources
======
Httpd (ocf::heartbeat:apache): Started ha_node_1
VirtIP (ocf::heartbeat:IPaddr): Started ha_node_2
=======
At times you may see the below issue. In this case verify if the nodes can ping each other using the IP configured in /etc/hosts file
[root@drbd1 ~]# pcs status resources
====
Error: unable to get cluster status from crm_mon
Error: cluster is not available on this node
======
The reason for the network issue could be due to a restart of the node, and the NetworkManager being disabled.
Once the above are taken care we need to run the following command
[root@drbd1 ~]# pcs cluster start ha_node_1
ha_node_1: Starting Cluster (corosync)…
ha_node_2: Starting Cluster (pacemaker)…
The below command moved cluster from one node to another
[root@drbd1 ~]# pcs resource move Httpdha_node_2
[root@drbd1 ~]#