Page tree
Skip to end of metadata
Go to start of metadata



Overview

The Advanced Recovery Commercial module provides an easy to configure replication engine along with the ability to automatically fail over to a secondary server; this mechanism protects voice services when there is failure in primary server.

This module is applicable to PBX 15+ systems only.

If the backup server cannot be active at the same time as the primary this module will help to ensure that the Secondary server will come up quickly in the event the Primary server goes down to insure less downtime.

Below are the key features of this modules - 

  1. Easy PBX GUI interface for configuration -  in just a few steps all your current (primary) system configuration will be ready to replicate to the secondary server.
  2. Optional Automatic switchover - As soon as primary service(s) die automatically switching over to secondary system. 
  3. Actively monitoring primary server services like Network interface, Asterisk , MySQL or PBX stack so as soon as any of the services dies , switchover to secondary server will happen.
  4. Built-in notification mechanism to provide Call and Email both notification to admin during the fail over.
  5. Provides more control over the trunks to decide what all trunks should gets activated automatically after switchover.
  6. Provides configuration option to control primary server services after the switchover. This is needed for the scenario where secondary becomes active because of loosing connection with primary due to some minor fluctuation issues like power or network or any other quick maintenance so in those situations, primary will comes back after sometime , so its better to stop primary server services to avoid endpoints conflicts where some endpoints can register to primary and some can register to secondary. With the help of this option , secondary will stop the primary server services as soon as primary comes up.
  7. Auto switchover custom Hooks. Provides option to execute custom hooks or third party script during switchover.  This is useful when we want to trigger some custom or third party application related logic during the switchover.
  8. Integrated support for Endpoint Manager module to re-build the Sangoma's S and D series phones template to update secondary (fail over) server IP to phones.


Details of each feature and how it can be configured is defined in below.



Prerequisites

In order to successfully deploy the Advanced Recovery module the following requirements must be met - 

  1. You have an existing PBX system that will be your Primary server.

  2. You have an identical PBX system that will be your Secondary server.
  3. The two servers can communicate on an IP level.
    1. Both systems are configured with their own IP addresses.
    2. Both the systems IP network, can be a local network or can be on separate geographically location.
    3. SSH and HTTP(s) ports should be open between both the servers.
  4. Both servers are running the Advanced Recovery module, and it is "licensed" on each system.
  5. Backup & RestoreAPI and Filestore are dependent modules that must be installed on both the systems.

Setup Configuration

Advanced Recovery module can be configured by following below mentioned simple steps.

  1. Establish SSH connection between both the servers. 
    1. SSH to secondary server from primary without needing password.
    2. SSH to primary server from secondary without needing password.

  2. Configure Advanced Recovery module using "Quick configuration" wizard.

Establish SSH connection between both the servers

We need to establish SSH connection between both the servers.  

Please find below instructions to copy primary server SSH key to secondary so that we can easily SSH to secondary server from primary without needing password.

  1. Login to your Primary server with an SSH client such as PuTTySecureCRT, or other SSH client. 


  2. At the primary server Linux CLI prompt type: sudo -u asterisk ssh-copy-id -i  /home/asterisk/.ssh/id_rsa.pub root@SecondaryServerIP and enter the password when prompted. 

    APPLICATION NOTES

    Make sure you replace the SecondaryServerIP with the IP Address of your Secondary PBX. (use IP and not a hostname that may be common to both primary and warmspare; if fqdns are desired create 3 records the common name , specific name for primary and specific name for warm spare - ie mypbx.company.com , mypbx1.company.com , mypbx2.company.com)

    If the Firewall is configured, pay attention to creating the right rule allowing the two servers to talk to each other.


    3.  If above command completes without error, you are ready to test:

    At the prompt type: ssh -i /home/asterisk/.ssh/id_rsa root@SecondaryServerIP

     If all went well, you should now be logged in to the Secondary server.


    Please follow above mentioned same steps to copy secondary server SSH key to primary so that we can easily SSH to primary server from secondary without needing password.

Install Advanced Recovery module

Download and install the "Advanced Recovery" module by following "Check Online" and then download install guide as described in  Module Admin User Guide#CheckingforAvailableUpgrades wiki.

Advanced Recovery Module Configuration


How to Open the Advanced Recovery Module Settings 

Within the PBX GUI, navigate to Admin > Advanced Recovery


How to Configure the Advanced Recovery Module  


The main landing page of the Advanced Recovery module has an options to view system status, perform configuration changes, and adjust global settings(like SSH keys).

'Quick Configuration' option will display only for the first time when system is not configured.

Once system is configured then this option will not be visible and we need to use "Configuration → Primary (or secondary) Server " option to do any further configuration modification.



  

Quick Configuration Wizard 


Quick  Configure wizard will provide easy GUI interface to configure Advanced Recovery module.

This quick configuration wizard will take care of configuring primary and secondary server by himself so after this we do not have to do any further configuration.

When you click on "Quick configuration" button then it will pop up wizard as shown below -


   



Step-1 Server Configuration - 

Here, We have to specify the "Secondary" Server IP.

Click  "Next" , after select "Secondary Server" instance, 


If there are any issue in doing ssh to secondary server then this will throw the alert.


  

  

If ssh connection to secondary server is good then it will check if proper licensed "Advanced Recovery" module is installed on secondary server or not 

If module not installed on secondary system, then it will throw error like shown below.

   


If module is installed but not "licensed" on secondary system, then it will throw error like shown below.

   


If secondary has proper active licensed Advanced Recovery module , then it will proceed further with Step-2 i.e. Sync.

Step-2 Sync

To define syncing frequency.

Syncing can take from minutes to hours depends on system size(capacity). 

Syncing process might be CPU intensive depends on your system capacity so recommended to do syncing during "Off hours". Syncing Frequency should be configured more wisely.



Step-3 Settings -


This section allows you to do configuration of the Advanced Recovery module required for doing replication of configuration to secondary server.


   


Please find below details of each configuration options in this step.

Auto Switch services 

Auto switching the services to secondary server for example activating the registered trunks to secondary server. 

Disable Remote Trunks 
Should the trunks be disabled on secondary server after replicating/restoring trunks configuration. this is needed if we want Trunks to register from both primary and secondary servers at the same time. Generally we try to keep active trunk only from one server so this option should set to YES. Default is YES.

Exclude NAT Settings 
Should NAT settings from the Primary Server be restored to the Secondary Server?

Exclude Bind Address
Should Bind Address settings on the Primary Server be restored to the Secondary Server ?


Exclude DNS 
Should DNS settings on the Primary Server be restored to the Secondary Server ?


Apply Config 
Should we run "Apply Configs" on the Secondary Server after a restore is completed?


Once done with above configuration, then move on to to next step to do "Notification" configuration.

Step-4 Notification -

This section will allows you to do "Notification" configuration. 

Advanced Recovery module provides you to do notification either via Call to admin extension or via Email.


  


By default , Call Notification is disabled, if we enable this by selecting "Yes" then this will give few more options to configure as shown below - 





As shown in the screenshot above, the parameters to configure for Notification section are: 

  1. Notification Extension - Which extension to call during fail over event. On system failure event , active system will initiate call to configured extension and will play the configured announcement. Intention of this call notification is to update admin about the call failure. If you do not want "Call notification" then leave this option disabled.
  2. Recording when primary fails - Select recording to play when the Primary server fails. This will specify the list of "recordings" to choose from as configured in System Recording module.
  3. Recording when standby fails - Select recording to play when the standby/Warm Spare server fails. This will specify the list of "recordings" to choose from as configured in System Recording module.
  4. Notification Email -  The email address where notifications will be sent to.


Server Failure Notification Frequency

The Advanced Recovery module will generate a notification as soon as any failure event detection happens.


Once done with configuration, press "Configure" to finish the configuration of Advanced Recovery Module.

This will finish the "Quick configuration" part of Advanced Recovery module.  In case if you want to do any further modification of the configuration then please refer to Advanced Recovery Expert Configuration wiki.

We need to start "Advanced Recovery Service" daemon as soon as we done with "Quick configuration" process as described in below section.

Advanced Recovery service daemon


This service daemon is mainly responsible for keep monitoring the health of the primary system and on the event of failure, this will execute the necessary steps to perform switchover to the secondary server.
After completion of "Quick configuration" wizard , we can see status of the Primary and Secondary would be something like below. 

as shown in below screenshot, dashboard shows configuration is done but service is not yet started. Next step for us to "Start' the service from Primary. 


Advanced recovery service needs to be start only on Primary server. Secondary service this will start automatically.


Primary Server - 


Secondary Server - 


Advanced Recovery Dashboard

Dashboard provides the information about service status and last sync time. 

We can also use "Sync now" option to forcefully sync the configuration to secondary system.




Advanced Recovery Sync Now 

Sync now option is for user to do manual configuration syncing to secondary server. 

This could be useful for user to confirm syncing is working fine as soon as initial configuration is over and also to know that how much time sync could take for his/her PBX system.

As soon as we click on "sync now" option, we will start seeing the status of the process as shown below.


Syncing might take minutes to hours depends on system capacity. Please keep refreshing the page periodically (in minutes interval) to get the process latest status.



Dashboard will display "Time since last sync" to know when was the last sync happened from primary to secondary server.

As soon as Syncing finished, it will display, "Time taken to finish last sync" in hh:mm:ss format which will give a rough estimates of how much time your system can take to sync the configuration. 

If require, please change the "Syncing scheduling" frequency using "Advanced configuration" option.




Switchover Configuration

Advanced Recovery module provides the below mentioned configuration options to decide the various actions during switchover.

All the Switchover related configuration is part of Advanced Configuration. 

We can jump to advanced configuration from "Advanced Recovery Module → Configuration → Primary Server' as shown in below screenshot.




Trunk Selection Configuration option

As soon as we enable the "Auto switch services" , it will show list of currently configured trunks in the system.

We can select our desired behaviour on trunks i.e. to enable or disable after switchover.


Bring down Primary server after switchover configuration option

This option could be useful in the scenario where due to some partial outages like network or power fluctuation , primary server loosing communication with secondary. In that given situation secondary server will becomes active but after sometime primary also comes up which will leads to situation where some phones might try to register to this primary server and some with secondary server. 

To avoid this kind of situation, admin can choose if we want to bring down the primary server after switchover or not. 

If this option is set to YES then Advanced Recovery module will keep on checking the configured Primary server to see if its comes up and will bring down all the services to primary if its comes up.


Post Switchover Hook

This is for advanced users who would like to perform some special steps after switchover. 

Please specify the custom script path to execute after switchover.


APPLICATION NOTE

Please make sure script has execute permissions for the Asterisk user.



Advanced Configuration 

Once , Quick configuration wizard is over then If we need to modify any configuration or want to do further any advanced configuration changes like
changing GraphQL API tokens , bind to another Filestore etc then we need to use "Advanced configuration" as mentioned in Advanced Recovery Expert Configuration 

Switchover 

Advanced Recovery module decides Primary is down on detecting following conditions - 

  1. Network interface is down on Primary server - Secondary server lost access to communicate with primary server
  2. Asterisk running status on Primary server
  3. FreePBX stack running status on Primary server
  4. Database running status on Primary server


Switchover to secondary server will happen as soon it detects any of the failure condition as mentioned above.

Advanced Recovery modules will perform following actions during switchover - 

  1. Switchover related actions as configured in SwitchoverConfiguration
    1. Enable the Trunks on secondary as configured in TrunkSelectionConfigurationoption
    2. Execute post switchover hooks to run custom third party script with an "START" argument. 
  2. Notify to admin via Call to admin extension if Call Notification is enable.
  3. Notify to admin via email  



APPLICATION NOTE

Admin next step should be to fix , repair Primary server and perform switch over back to primary to ensure production is not getting affected due to just running one server.


Fail-over recommendation
 

The Advanced Recovery module will be beneficial during outages by automatically switching services over to a secondary server when a failure is detected on primary server.  However, it is critical to make sure other network elements such as IP/SIP Phones, SIP Trunking, and routers, etc. are configured properly to ensure they start working smoothly after services are switched over. 

SIP Phones Recommendation

Regenerate existing Sangoma's phone configuration 

Advanced recovery module has an option to regenerate the configuration of already connected/configured Sangoma's S and D series phones via Endpoint Manager.

"Advanced Recovery → Endpoint → Regenerate EPM config for S and D series phones" 

Please note that, this is only for "regenerate" configuration or updates existing configured phones. We have to use "PBX GUI → Settings → Endpoint Manager" for any new phone configuration.



Endpoint Manager Template for Sangoma's S series phones

Sangoma S and D series phones support the configuration of a "Fail Over" IP along with the Primary IP. 

The Endpoint Manager module, which is "Free" to use for Sangoma's S and D series phones, can be used to help configure this setup. 

Please refer to Connecting Sangoma Phone to FreePBX or PBXact Indepth for detailed guide for using Endpoint Manager for Sangoma 'S' series phones.

We have to "enable" Backup destination field and Secondary server information in below template to achieve the fall-over in case of primary server failure.

Application Note

If Advanced Recovery module is pre-configured then template will be pre populated with secondary server address.






Endpoint Manager Template for Sangoma's D series phones

We need to add Backup Destination address for D series phones "Digium" template.

Please refer to EPM-Admin User Guide#AdminUserGuide-templatesTemplateCreationandEditing(ExamplewithSangomaBrand) guide to see example of how to edit templates via EPM.



SIP Trunk recommendation

This is recommended to ensure SIP Trunk provider allows registration requests from both the Primary and Secondary server's IP.  

During the event of fail over when secondary server will become active then SIP Trunk provider should be able to accept the registration request from secondary server to bring up the SIP traffic.

IT admin recommendation

It is advisable to IT person or admin of PBX setup to take care of below roles and responsibilities -

  1. Any networking changes required in order to bring up the secondary server like router's port forwarding in case of NATing environment to make sure - 
    1. Secondary server registration messages are reaching to SIP Trunking providers.
    2. Phones are able to send registration messages to secondary server. 

  2. To make sure Primary and Secondary server IPs are not changing and if changing we need to make sure GraphQL configuration (only server url) are updating accordingly, because both the servers are talking to each other using GraphQL API .  IP changes might result in false declaration of "server down" event.

  3. To make sure both Primary and Secondary servers are accessible to each other so if "firewall" module is running then we have to white-list both the servers IP accordingly.

  4. To make sure SSH connectivity between both the servers i.e. Primary and secondary server is properly configuration.

  5. Latest recording report module i.e. v15.0.4.28 onward, Call recording files will not be the part of "full system backup" so user have to copy them manually.

Switchback to Primary server 

Advanced Recovery module is mainly designed to do easy fail over to secondary system on the event of primary server.

Once secondary is up and running fine then its recommended to bring up the primary server and switch back to primary to ensure any disruption in future also will not affect the production.

Bringing up the primary server could means one of the following possible scenario - 

  1. Primary server went down due to temporary failure and the same server comes back after sometime, so now we want to switch to the same server. 
  2. Primary server completely died and can not recover so we have to create fresh new system and make that new system as primary server.


During the switch over scenario, we need to follow the below mentioned steps - 

  1. Login to Secondary server GUI which is active as of now.
    1. Stop the Advanced recovery daemon to avoid getting notifications from Freepbx GUI → Advanced Recovery option.
  2. Repair Primary server (if possible) or bring up new Primary server by fresh installation of FreePBX.
  3. Once Primary server is ready then follow steps as mentioned in "Sync back to primary" to sync the data from secondary to primary.
  4. Once syncing over , switch back to primary server so that primary server will become active node and secondary will become standby.

Sync back to Primary 

This option will be useful when we want to bring up the Primary server which could be either old server or new server.

When secondary server is running as active,  "dashboard" status on secondary server will show Primary server is down and option to Sync back to Primary.



Application Note

Need to ensure that SSH connectivity to primary server is configured properly. Refer to EstablishSSHconnectionbetweenboththeservers


"Sync back to Primary" option will open below wizard and will ask to enter Primary server IP. 



After entering Primary server IP, Sync process will start.





As a part of syncing, data from secondary server will push to primary server IP.  



Once syncing to Primary server is finished , in the 3rd step it will give you option to do "Switchover". 





High Level use case scenario using Advanced Recovery module


As shown in below screenshot, Advanced Recovery module will help to maintain the below fail over scenario where Secondary server will take over the production when there is failure in primary server.



Frequently Asked Questions 


(question) Do we need a floating IP now like in the old HA setup? 
 A:  No. A floating IP is not a requirement with this module. Each server has its own IP address. SSH communication must be open between both the servers.

(question) Do both servers have identical configurations, except that on the standby server the trunks are disabled to avoid registrations coming from two machines simultaneously.
 A:  Yes. This module provides more granularity to control the trunks either during syncing or after switchover.  "Disable Remote Trunks" option will take care of trunks status during normal primary and secondary system and "Swicthover Trunk selection" option will take what should be the trunk status (like want to enable or disable) after switchover.

(question) Do non Sangoma phones need to be configured to register to the backup server address if registration to primary is unsuccessful.
 A: Yes we need to manually set the Fall back destination or sip server address to SIP phones so during the primary system failure time, phone can get register to secondary system.

(question) Are services like Asterisk, FreePBX, etc all running on both machines no matter whether in standby or active ?
 A: Yes. 

(question) How does the monitoring happen?
 A:  Primary server and Secondary server each monitor the other and send notification as soon as failure of peer node happens. If the primary goes down then switchover happens. If the Secondary goes down, only a notification will happen.

(question) What does the  “Bring down Primary server after switchover” option mean?
  A: After switchover, if this option is enabled then the Secondary will keep on monitoring the primary server IP and if Primary server comes back then will perform the 'fwconsole stop' on primary which basically means to stop all the running services like asterisk and any other FreePBX processes. This is required when we want only one node to be active to avoid:

  • Split registration scenarios where some phones can register to primary and some to secondary.
  • Sending SIP trunks registration request from both the servers.

 We could have situation due to some network or power fluctuation which results in lost communication between servers and due to which, switchover happens. 

(question) I have two FreePBX servers , each have two NICs. One will be providing regular access to the network to which SIP will bind to, and another NIC to directly connect to the other FreePBX server. How can I configure Advanced Recovery module to use dedicated NIC or LAN interface for monitoring , syncing purpose between the servers ?
 A:  As long as two servers can communicate at IP level with each other then Advanced Recovery module will work fine. I believe each NIC will have its own IP so we just to have to make sure "direct" link between servers is configured in such a way that communication between both the servers over dedicated NIC interface is absolutely fine.

For example Server A has NIC IP x.y.z.w and Server B NIC has IP a.b.c.d and there is direct link between both the servers.  We have to take care of following pre-conditions during the configuration - 

  • Routing or communication between x.y.z.w and a.b.c.d at IP level is perfectly fine. 
  • SSH , HTTP(s) port is opened on these IPs. 
  • Whitelist these IPs in both the server to ensure firewall is not blocking access to these IPs.
  • SSH key setup between both the servers to ensure they can SSH to each other using these IPs.
  • Use this NIC IP in Advanced module configuration. 
  • When we use Endpoint Manager to re-build fail over IP configuration , that time ensure that you are choosing "public" SIP server IP not this private NIC IP.  Same needs to done while doing manual configuration for non-sangoma  brands as well.







  • No labels