Page tree
Skip to end of metadata
Go to start of metadata

If you've clicked on this link, the odds are everything has fallen to pieces and you don't know why.

The two most common problems are non-FreePBX RPMs breaking symlinks, and, people running commands they shouldn't.

As it's unfortunately common for people to accidentally upgrade their asterisk or mysql with a version that ISN'T aware of HA, we have a 'fixcluster' script built into FreePBX-HA

This script should be in /usr/local/asterisk/fixcluster on both machines.  If, for any reason, that fixcluster script isn't on either machine, it can be downloaded as an attachment from this page, or as a third possibility, via GitHub.

Before running this script, ensure that only one HA machine is on. The machine that is on will become the master, and all services will be started on that node.

Running the script is simple:

[root@freepbx-a freepbx_ha]# /usr/local/asterisk/fixcluster
Checking symlinks .......
Warning: /var/lib/mysql is NOT a symlink, and it should be. Run with --fixlinks to repair
 Done
Clearing Errors .................... Done
Removing Restraints .................... Done
[root@freepbx-a freepbx_ha]# /usr/local/asterisk/fixcluster --fixlinks
Checking symlinks .......! Done
Clearing Errors .................... Done
Removing Restraints .................... Done
[root@freepbx-a freepbx_ha]# /usr/local/asterisk/fixcluster
Checking symlinks ........ Done
Clearing Errors .................... Done
Removing Restraints .................... Done
[root@freepbx-a freepbx_ha]#

All errored services should now be started, and running the 'pcs status' command should show no errors:

[root@freepbx-a ~]# pcs status
Cluster name:
Last updated: Wed Oct  8 07:09:44 2014
Last change: Tue Oct  7 14:46:32 2014 via crmd on freepbx-b
Stack: cman
Current DC: freepbx-a - partition WITHOUT quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured
20 Resources configured

Online: [ freepbx-a ]
OFFLINE: [ freepbx-b ]
Full list of resources:
 spare_ip       (ocf::heartbeat:IPaddr2):       Started freepbx-a
 floating_ip    (ocf::heartbeat:IPaddr2):       Started freepbx-a
 Master/Slave Set: ms-asterisk [drbd_asterisk]
     Masters: [ freepbx-a ]
     Stopped: [ freepbx-b ]
 Master/Slave Set: ms-mysql [drbd_mysql]
     Masters: [ freepbx-a ]
     Stopped: [ freepbx-b ]
 Master/Slave Set: ms-httpd [drbd_httpd]
     Masters: [ freepbx-a ]
     Stopped: [ freepbx-b ]
 Master/Slave Set: ms-spare [drbd_spare]
     Masters: [ freepbx-a ]
     Stopped: [ freepbx-b ]
 spare_fs       (ocf::heartbeat:Filesystem):    Started freepbx-a
 Resource Group: mysql
     mysql_fs   (ocf::heartbeat:Filesystem):    Started freepbx-a
     mysql_ip   (ocf::heartbeat:IPaddr2):       Started freepbx-a
     mysql_service      (ocf::heartbeat:mysql): Started freepbx-a
 Resource Group: asterisk
     asterisk_fs        (ocf::heartbeat:Filesystem):    Started freepbx-a
     asterisk_ip        (ocf::heartbeat:IPaddr2):       Started freepbx-a
     asterisk_service   (ocf::heartbeat:freepbx):       Started freepbx-a
 Resource Group: httpd
     httpd_fs   (ocf::heartbeat:Filesystem):    Started freepbx-a
     httpd_ip   (ocf::heartbeat:IPaddr2):       Started freepbx-a
     httpd_service      (ocf::heartbeat:apache):        Started freepbx-a

PCSD Status:
Error: no nodes found in corosync.conf
[root@freepbx-a ~]#

Note that the 'Error: no nodes found in corosync.conf' is expected, and isn't actually an error.

At this point, you can now go into FreePBX, go to the HA Manage page, and run the full check there. That will fix any other errors that may have occured on the local machine.  

Only after you have run those checks, turn the other machine back on.

When the other machine boots, it will do a DRBD synchronization (visible via the web interface). When that's complete, you can then run another check (via the FreePBX HA web interface) which will validate the other machine and fix anything it can. When that's complete, without any errors, the cluster will be fully repaired.

 

  • No labels