Page tree
Skip to end of metadata
Go to start of metadata

Clearing Errors from the CLI

If a service has errored enough times for the cluster to be unable to start it on either node, you'll need to manually clear the errors.

[root@freepbx-aaa /]# pcs status
Cluster name:
Last updated: Wed Mar 12 04:37:50 2014
Last change: Tue Mar 11 13:49:29 2014 via cibadmin on freepbx-a
Stack: cman
Current DC: freepbx-b - partition with quorum
Version: 1.1.10-14.el6-368c726

[...]

 Resource Group: asterisk
     asterisk_fs        (ocf::heartbeat:Filesystem):    Stopped
     asterisk_ip        (ocf::heartbeat:IPaddr2):       Stopped
     asterisk_service   (ocf::heartbeat:freepbx):       Stopped
 Resource Group: httpd
     httpd_fs   (ocf::heartbeat:Filesystem):    Stopped
     httpd_ip   (ocf::heartbeat:IPaddr2):       Stopped
     httpd_service      (ocf::heartbeat:apache):        Stopped
Failed actions:
    asterisk_service_monitor_30000 on freepbx-a 'not running' (7): call=186, status=complete, last-rc-change='Wed Mar 12 04:38:10 2014', queued=0ms, exec=0ms
    asterisk_service_monitor_30000 on freepbx-b 'not running' (7): call=240, status=complete, last-rc-change='Wed Mar 12 04:26:05 2014', queued=0ms, exec=0ms

 Because Asterisk was discovered 'not running', on both nodes, the Cluster has decided that it's impossible to start and has marked it as unusable on both modes.  If this was, for example, a hardware error that was causing the problem that has now been resolved, you can clear the errors with the following command: 

crm_resource --resource asterisk_service -C --node freepbx-a
crm_resource --resource asterisk_service -C --node freepbx-b

Note that the resource used is whatever is specified before the _monitor in the error line. It could be that something appeared on your network with the same IP address as your floating IP, which caused it to fail. In that case you would have the failed actions being 'asterisk_ip_monitor_...' and you would need to clear the errors on asterisk_ip

Setting a node online or unstandby

If you end up in the situation where the only reachable node is in standby, you'll need to manually bring it out of standby via the command line

pcs cluster unstandby freepbx-b

Note that it's possible that the machine may have a buggy pcs on there (due to a RedHat issue) and will error saying "Error: node 'freepbx-b' does not appear to exist in configuration" or similar.

You'll need to use the alternative command

crm_unstandby -D -N freepbx-b

 

Setting a node offline or standby

If you end up in the situation where the only reachable node is in standby, you'll need to manually bring it out of standby via the command line

pcs cluster standby freepbx-b

Note that it's possible that the machine may have a buggy pcs on there (due to a RedHat issue) and will error saying "Error: node 'freepbx-b' does not appear to exist in configuration" or similar.

You'll need to use the alternative command

crm_standby -D -N freepbx-b

To determine the status of each node after running the aforementioned commands ( these results are from a healthy system, node A)

cat /proc/drbd 
 
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:896 nr:0 dw:1044 dr:3049 al:7 bm:10 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:480 nr:0 dw:580 dr:2105 al:5 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:4 nr:0 dw:4 dr:3633 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 4: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:4 nr:0 dw:4 dr:685 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

How to determine the floating IP

pcs resource show floating_ip
  • No labels