How To Configure High-Availability Cluster on CentOS 7

High-Availability cluster or Failover-cluster (active-passive cluster) is one of the most widely used cluster types in the production environment. This type of cluster provides you the continued availability of services even one of the cluster nodes fails. If the server running application has failed for some reason (hardware failure), cluster software (pacemaker) will restart the application on another node.

High-Availability is mainly used for databases, custom application, and also for file sharing. Fail-over is not just starting an application. It has some series of operations associated with it like, mounting filesystems, configuring networks and starting dependent applications.

Environment
CentOS 7 supports Fail-over cluster using the pacemaker. Here, we will be looking at configuring the Apache (web) server as a highly available application.

As I said, fail-over is a series of operations, so we would need to configure filesystem and networks as a resource. For a filesystem, we would be using a shared storage from iSCSI storage.

Configure High-Availability Cluster on CentOS 7 – Infrastructure and all are running on VMware Workstation.

 

Host Name IP Address OS Purpose
node01.darole.org 192.168.2.201 CentOS 7 Cluster Node 1
node02.darole.org 192.168.2.202 CentOS 7 Cluster Node 2
Storage01.darole.org 192.168.2.200 CentOS 7 iSCSI Shared Storage

192.168.2.205 Virtual Cluster IP (Apache)

Shared Storage
  Shared storage is one of the important resources in the high-availability cluster as it holds the data of a running application. All the nodes in a cluster will have access to shared storage for recent data. SAN storage is the most widely used shared storage in the production environment. For this demo, we will configure a cluster with iSCSI storage for a demonstration purpose.

Install Packages iSCSI Server on storage01 servers.

[root@storage01 ~]# yum install targetcli -y

Install iscsi-initiator and initiator’s details for both nodes.

[root@node01 ~]# yum install iscsi-initiator-utils -y
[root@node01 ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.redhat:829c3e8d196
[root@node01 ~]#

[root@node02 ~]# yum install iscsi-initiator-utils -y
[root@node02 ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.redhat:174663d8b7e9
[root@node02 ~]#

Here, we will create 20GB of LVM disk on the iSCSI server to use as shared storage for our cluster nodes. Let’s list the available disks attached to the target server using the command.

[root@storage01 ~]# pvcreate /dev/sdb
[root@storage01 ~]# vgcreate vg_iscsi /dev/sdb
[root@storage01 ~]# lvcreate -l 100%FREE -n lv_iscsi vg_iscsi

Create Shared Storage

Enter below command to get an iSCSI CLI for an interactive prompt.

[root@storage01 ~]# targetcli

targetcli shell version 2.1.fb46
right 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> cd /backstores/block
/backstores/block> create iscsi_shared_storage /dev/vg_iscsi/lv_iscsi
Created block storage object iscsi_shared_storage using /dev/vg_iscsi/lv_iscsi.
/backstores/block> cd /iscsi
/iscsi> create
Created target iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2.
Created TPG 1.
Global pref auto_add_default_portal=true
Created default portal listening on all IPs (0.0.0.0), port 3260.
/iscsi> cd iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2/tpg1/acls
/iscsi/iqn.20...1d2/tpg1/acls> create iqn.1994-05.com.redhat:829c3e8d196
Created Node ACL for iqn.1994-05.com.redhat:829c3e8d196
/iscsi/iqn.20...1d2/tpg1/acls> create iqn.1994-05.com.redhat:174663d8b7e9
Created Node ACL for iqn.1994-05.com.redhat:174663d8b7e9
/iscsi/iqn.20...1d2/tpg1/acls> cd /iscsi/iqn.2003-01.org.linux-iscsi.storage01.x 8664:sn.8a376366c1d2/tpg1/luns
/iscsi/iqn.20...1d2/tpg1/luns> create /backstores/block/iscsi_shared_storage
Created LUN 0.
Created LUN 0->0 mapping in node ACL iqn.1994-05.com.redhat:174663d8b7e9
Created LUN 0->0 mapping in node ACL iqn.1994-05.com.redhat:829c3e8d196
/iscsi/iqn.20...1d2/tpg1/luns> ls
o- luns .................................................................................................................. [LUNs: 1]
o- lun0 ................................................. [block/iscsi_shared_storage (/dev/vg_iscsi/lv_iscsi) (default_tg_pt_gp)]
/iscsi/iqn.20...1d2/tpg1/luns> cd /
/> ls
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 1]
| | o- iscsi_shared_storage .............................................. [/dev/vg_iscsi/lv_iscsi (20.0GiB) write-thru activated]
| | o- alua ................................................................................................... [ALUA Groups: 1]
| | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| o- fileio ................................................................................................. [Storage Objects: 0]
| o- pscsi .................................................................................................. [Storage Objects: 0]
| o- ramdisk ................................................................................................ [Storage Objects: 0]
o- iscsi ............................................................................................................ [Targets: 1]
| o- iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2 ....................................................... [TPGs: 1]
| o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
| o- acls .......................................................................................................... [ACLs: 2]
| | o- iqn.1994-05.com.redhat:174663d8b7e9 .................................................................. [Mapped LUNs: 1]
| | | o- mapped_lun0 .................................................................. [lun0 block/iscsi_shared_storage (rw)]
| | o- iqn.1994-05.com.redhat:829c3e8d196 ................................................................... [Mapped LUNs: 1]
| | o- mapped_lun0 .................................................................. [lun0 block/iscsi_shared_storage (rw)]
| o- luns .......................................................................................................... [LUNs: 1]
| | o- lun0 ......................................... [block/iscsi_shared_storage (/dev/vg_iscsi/lv_iscsi) (default_tg_pt_gp)]
| o- portals .................................................................................................... [Portals: 1]
| o- 0.0.0.0:3260 ..................................................................................................... [OK]
o- loopback ......................................................................................................... [Targets: 0]
/> saveconfig
Configuration saved to /etc/target/saveconfig.json
/> exit
Global pref auto_save_on_exit=true
Last 10 configs saved in /etc/target/backup/.
Configuration saved to /etc/target/saveconfig.json
[root@server ~]#

Enable and restart the target service.

[root@storage01 ~]# systemctl enable target
[root@storage01 ~]# systemctl start target

Discover Shared Storage on node01 using below command

[root@node01 ~]# iscsiadm -m discovery -t st -p 192.168.2.200
192.168.2.200:3260,1 iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2
[root@node01 ~]#

Now, login to the target with the below command.

[root@node01 ~]# iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2 -p 192.168.2.200 -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2, portal: 192.168.2.200,3260] (multiple)
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2, portal: 192.168.2.200,3260] successful.
[root@node01 ~]# systemctl restart iscsid
[root@node01 ~]# systemctl enable iscsid

Shared Storage
Go to all of your nodes and check whether the new disk is visible or not. In my nodes, /dev/sdb is the disk coming from our iSCSI storage.

[root@node01 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 39G 0 part
├─centos-root 253:0 0 37G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm [SWAP]
sdb 8:16 0 20G 0 disk
sr0 11:0 1 1024M 0 rom
[root@node01 ~]# pvcreate /dev/sdb
[root@node01 ~]# vgcreate vg_apache /dev/sdb
[root@node01 ~]# lvcreate -n lv_apache -l 100%FREE vg_apache
[root@node01 ~]# mkfs.ext4 /dev/vg_apache/lv_apache
[root@node01 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 39G 0 part
├─centos-root 253:0 0 37G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm [SWAP]
sdb 8:16 0 20G 0 disk
└─vg_apache-lv_apache 253:2 0 20G 0 lvm
sr0 11:0 1 1024M 0 rom
[root@node01 ~]#

Now, go to another node and run below commands to detect the new filesystem.

Discover Shared Storage on node02 using below command and it will detect the new filesystem.

[root@node02 ~]# iscsiadm -m discovery -t st -p 192.168.2.200
192.168.2.200:3260,1 iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2
[root@node02 ~]#

Now, login to the target with the below command.

[root@node02 ~]# iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2 -p 192.168.2.200 -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2, portal: 192.168.2.200,3260] (multiple)
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.storage01.x8664:sn.8a376366c1d2, portal: 192.168.2.200,3260] successful.
[root@node02 ~]# systemctl restart iscsid
[root@node02 ~]# systemctl enable iscsid
[root@node02 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 39G 0 part
├─centos-root 253:0 0 37G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm [SWAP]
sdb 8:16 0 20G 0 disk
└─vg_apache-lv_apache 253:2 0 20G 0 lvm
sr0 11:0 1 1024M 0 rom
[root@node02 ~]#

Finally, verify the LVM we created on node01 is available to you on another node (Ex. node2) using below command.

Install and Configure Cluster packages

Install cluster packages (pacemaker) on all nodes using below command and Set password for the hacluster user. This user account is a cluster administration account. We suggest you set the same password for all nodes. Start the cluster service. Also, enable it to start automatically on system startup.

[root@node01 ~]# yum install pcs fence-agents-all -y
[root@node01 ~]# passwd hacluster
[root@node01 ~]# systemctl start pcsd
[root@node01 ~]# systemctl enable pcsd

[root@node02 ~]# yum install pcs fence-agents-all -y
[root@node02 ~]# passwd hacluster
[root@node02 ~]# systemctl start pcsd
[root@node02 ~]# systemctl enable pcsd

Remember to run the above commands on all of your cluster nodes.

Create a High Availability Cluster
Authorize the nodes using below command. Run the below command in any one of the nodes to authorize the nodes.

[root@node01 ~]# pcs cluster auth node01 node02
Username: hacluster
Password:redhat
node02: Authorized
node01: Authorized
[root@node01 ~]#

Create a cluster.

[root@node1 ~]# pcs cluster setup --start --name web_cluster node01 node02
Destroying cluster on nodes: node01, node02...
node01: Stopping Cluster (pacemaker)...
node02: Stopping Cluster (pacemaker)...
node01: Successfully destroyed cluster
node02: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node01', 'node02'
node01: successful distribution of the file 'pacemaker_remote authkey'
node02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node01: Succeeded
node02: Succeeded

Starting cluster on nodes: node01, node02...
node01: Starting Cluster (corosync)...
node02: Starting Cluster (corosync)...
node01: Starting Cluster (pacemaker)...
node02: Starting Cluster (pacemaker)...

Synchronizing pcsd certificates on nodes node01, node02...
node02: Success
node01: Success
Restarting pcsd on the nodes in order to reload the certificates...
node02: Success
node01: Success
[root@node01 ~]#

Enable the cluster to start at the system startup.

[root@node01 ~]# pcs cluster enable --all
node01: Cluster Enabled
node02: Cluster Enabled
[root@node01 ~]#

[root@node01 ~]# pcs cluster start --all

Use below command to get the status of the cluster.

[root@node01 ~]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: node01 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Oct 4 09:37:14 2022
Last change: Tue Oct 4 07:14:09 2022 by hacluster via crmd on node01
2 nodes configured
0 resource instances configured

PCSD Status:
node02: Online
node01: Online
[root@node01 ~]#

[root@node01 ~]# pcs status
Cluster name: web_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: node01 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Oct 4 10:03:01 2022
Last change: Tue Oct 4 07:14:09 2022 by hacluster via crmd on node01

2 nodes configured
0 resource instances configured

Online: [ node01 node02 ]

No resources


Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node01 ~]#

Fencing Devices

The fencing device is a hardware/software device which helps to disconnect the problem node by resetting node / disconnecting shared storage from accessing it. My demo cluster is running on top of VMware Virtual machine, so I am not showing you a fencing device setup, but you can follow this guide to set up a fencing device.

Cluster Resources

Prepare resources

Install Apache web server on node01.

[root@node01 ~]# yum install -y httpd wget

Edit the configuration file.

[root@node01 ~]# vi /etc/httpd/conf/httpd.conf
Add below content at the end of file on node01.

<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 192.168.2.215
</Location>

Install Apache web server on node02.

[root@node02 ~]# yum install -y httpd wget

Edit the configuration file.

[root@node02 ~]# vi /etc/httpd/conf/httpd.conf
Add below content at the end of file on node02.

<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 192.168.2.215
</Location>

Now we need to use shared storage for storing the web content (HTML) file. Perform below operation in any one of the nodes.

[root@node01 ~]# mount /dev/vg_apache/lv_apache /var/www/
[root@node01 ~]# mkdir /var/www/html
[root@node01 ~]# mkdir /var/www/cgi-bin
[root@node01 ~]# mkdir /var/www/error

[root@node01 ~]# cat <<-END >/var/www/html/index.html
<html>
<body>Hello, Welcome!. This Page Is Served By Red Hat Hight Availability Cluster</body>
</html>
END

[root@node01 ~]# umount /var/www

[root@node01 ~]# systemctl start httpd
[root@node01 ~]# systemctl enable httpd

[root@node02 ~]# systemctl start httpd
[root@node02 ~]# systemctl enable httpd

Create Resources

Create a filesystem resource for Apache server. Use the storage coming from the iSCSI server.

[root@node01 ~]# pcs resource create httpd_fs Filesystem device="/dev/mapper/vg_apache-lv_apache" directory="/var/www" fstype="ext4" --group apache
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')
[root@node01 ~]#

Create an IP address resource. This IP address will act a virtual IP address for the Apache and clients will use this ip address for accessing the web content instead of individual node’s ip.

[root@node01 ~]# pcs resource create httpd_vip IPaddr2 ip=192.168.2.215 cidr_netmask=24 --group apache
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')
[root@node01 ~]#

Bug 

The standard path for Apache PID file in CentOs/RHEL7 is /var/run/httpd/httpd.pid. However, pacemaker keeps looking PID file in /var/run/httpd.pid. We can fix pacemaker script using following command.

/bin/sed -i 's/RUNDIR\/${httpd_basename}.pid/RUNDIR\/${httpd_basename}\/${httpd_basename}.pid/g' /usr/lib/ocf/lib/heartbeat/apache-conf.sh

Create an Apache resource which will monitor the status of the Apache server and move the resource to another node in case of any failure.

[root@node01 ~]# pcs resource create httpd_ser apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://192.168.2.215/" --group apache
Assumed agent name 'ocf:heartbeat:apache' (deduced from 'apache')
[root@node01 ~]#

Since we are not using fencing, disable it (STONITH). You must disable to start the cluster resources, but disabling STONITH in the production environment is not recommended.

[root@node01 ~]# pcs property set stonith-enabled=false

Check the status of the cluster.

[root@node01 ~]# pcs status
Cluster name: web_cluster
Stack: corosync
Current DC: node02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Oct 4 12:38:10 2022
Last change: Tue Oct 4 11:28:24 2022 by root via cibadmin on node01

2 nodes configured
3 resource instances configured

Online: [ node01 node02 ]

Full list of resources:

Resource Group: apache
httpd_fs (ocf::heartbeat:Filesystem): Started node02
httpd_vip (ocf::heartbeat:IPaddr2): Started node02
httpd_ser (ocf::heartbeat:apache): Started node02

Failed Resource Actions:

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node01 ~]#


Verify High Availability Cluster

Once the cluster is up and running, point a web browser to the Apache virtual IP address. You should get a web page like below.

Configure High-Availability Cluster on CentOS 7 – Apache Web Server

Test High Availability Cluster
Let’s check the failover of resource of the node by stopping the cluster on the active node.

[root@node01 ~]# pcs cluster start node02

Put the node01 in to maintance mode. 

[root@node01 ~]# pcs cluster standby node01
[root@node01 ~]# pcs cluster unstandby node01

Pacemaker cluster Important Configuration Files and Directories

/var/lib/pacemaker/cib/cib.xml: Cluster configuration file
/var/log/cluster/corosync.log: Corosync log file
/etc/sysconfig/pacemaker: Pacemaker configuration file
/var/log/pacemaker.log: Pacemaker log file
/usr/lib/ocf/resource.d/heartbeat: Directory where all the cluster resource scripts are available

PaceMaker Cluster administration Commands

PaceMaker cluster command to view cluster nodes status.
# pcs cluster status
PaceMaker cluster command to view detailed status of the cluster nodes and resources.
# pcs status --full
PaceMaker cluster command to view status of the cluster nodes and resources.
# crm_mon -r1
PaceMaker cluster command to view real time status of the cluster nodes and resources.
# crm_mon r
PaceMaker cluster command to view status of all cluster resources and resource groups.
# pcs resource show
PaceMaker cluster command to put the cluster node on standby mode.
# pcs cluster standby <Cluster node name>
PaceMaker cluster command to remove the cluster node from standby mode.
# pcs cluster unstandby <Cluster node name>:
PaceMaker cluster command to move the cluster resource from one node to another node.
# pcs resource move <resource name> <node name>
PaceMaker cluster command to restart the cluster resource on the running node.
# pcs resource restart <resource name>
PaceMaker cluster command to start the cluster resource on current node.
# pcs resource enable <resource name>
PaceMaker cluster command to stop cluster resource on running node.
# pcs resource disable <resource name>:
PaceMaker cluster command to start debug cluster resource. You can use the switch --full for more verbose output.
# pcs resource debug-start <Resource Name>
PaceMaker cluster command to stop debugging cluster resource. You can use the switch --full for even more verbose output.
# pcs resource debug-stop <Resource Name>
PaceMaker cluster command to monitor cluster resource debuging. You can use the switch --full for even more verbose output.
# pcs resource debug-monitor <Resource Name>
PaceMaker cluster command to list available cluster resource agents.
# pcs resource agents
PaceMaker cluster command to list available cluster resource agents with more information.
# pcs resource list
PaceMaker cluster command to view detailed infirmotation about cluster resource agents and their configuration or settings.
# pcs resource describe <Resource Agents Name>
PaceMaker cluster command to create a cluster resource.
# pcs resource create <Reource Name> <Reource Agent Name> options
PaceMaker cluster command to view cluster configuration settings of a perticular resource.
# pcs resource show <Resource Name>
PaceMaker cluster command to update specific cluster resource configuration.
# pcs resource update <Resource Name> options
PaceMaker cluster command to delete specific cluster resource.
# pcs resource delete <Resource Name>
PaceMaker cluster command to cleanup specific cluster resource.
# pcs resource cleanup <Resource Name>
PaceMaker cluster command to list available cluster fence agents.
# pcs stonith list
PaceMaker cluster command to view detail cluster configuration settings for fence agent.
# pcs stonith describe <Fence Agent Name>
PaceMaker cluster command to create PaceMaker cluster stonith agent.
# pcs stonith create <Stonith Name> <Stonith Agent Name> options
Display PaceMaker cluster configured settings of stonith agent.
# pcs stonith show <Stonith Name>
Update PaceMaker cluster stonith configuration.
# pcs stonith update <Stonith Name> options
PaceMaker cluster command to delete stonith agent.
# pcs stonith delete <Stonith Name>
PaceMaker cluster command to cleanup stonith agent failures.
# pcs stonith cleanup <Stonith Name>
PaceMaker cluster command to check cluster configuration.
# pcs config
PaceMaker cluster command to check cluster property.
# pcs property list
PaceMaker cluster command to get more details about cluster property.
# pcs property list --all
PaceMaker cluster command to check cluster Configuration with XML format.
# pcs cluster cib
PaceMaker cluster command to check cluster node status.
# pcs status nodes
PaceMaker cluster command to start cluster service on current node.
# pcs cluster start
PaceMaker cluster command to start cluster service on all the nodes.
# pcs cluster start --all
PaceMaker cluster command to stop cluster service on current node.
# pcs cluster stop
PaceMaker cluster command to stop cluster service on all the nodes.
# pcs cluster stop --all
PaceMaker cluster command to sync corosync.conf file.
# pcs cluster sync
PaceMaker cluster command to destroy cluster.
# pcs cluster destroy <Cluster Name>
PaceMaker cluster command to create new cluster configuration file. This file will be created on current location, you can add multiple cluster resource into this configuration file and apply them by using cib-push command.
# pcs cluster cib <new config name>
PaceMaker cluster command to apply the resource created in configuration file into cluster.
# pcs cluster cib-push <new config name>
PaceMaker cluster command to view cluster resource group list.
# pcs resource group list
PaceMaker cluster command to view corosync configuration output.
# pcs cluster corosync
PaceMaker cluster command to check the cluster resource order.
# pcs constraint list
PaceMaker cluster command to ignore quorum policy.
# pcs property set no-quorum-policy=ignore
PaceMaker cluster command to disable stonith.
# pcs property set stonith-enabled=false
PaceMaker cluster command to set cluster default stickiness value.
# pcs resource defaults resource-stickiness=100

 

No comments:

Post a Comment