Much has been said and written about how important it is for networks to be highly available and how critical it is for a business, given the pace at which an enterprise network grows, and how the dynamics keep evolving and changing over time.  When the networks grow to accommodate the demands of an expanding enterprise business, the enterprise monitoring needs seem to get more and more complex.  Delivering high availability and disaster recovery is the mantra to successful, uninterrupted enterprise network monitoring. In this post, lets see how the high availability of an enterprise network can be ensured..

Ensuring high availability

To ensure uninterrupted enterprise network monitoring, a contingency plan detailing what must be done when there is a system failure or a site failure or maybe even a mishap, is essential. Before we proceed, it helps to understand that a thin line differentiates ‘failover’ from ‘disaster recovery’. Failover is a method employed by most enterprises to ensure that the system availability is resumed within an acceptable time-frame, whereas, ‘disaster recovery’ is a fallback strategy when all the failover strategies break.  Different enterprises employ different failover strategies that can be broadly categorized into cold, warm, or hot standbys, based on what is acceptable to their business.

As the enterprise business (and even those of SMBs) depends largely on the availability of various services, there are no two ways to continuous, uninterrupted network monitoring. As an administrator, you would look at the network monitoring software’s ability to quickly failover in the event of, say, a server crash, which by the way, is one of the myriad possibilities that can lead to interrupted and incomplete monitoring.  The worst of scenarios is where the entire site goes down due to a power outage, or owing to a natural phenomenon like an earth quake or a tsunami (fortunately such events are far and few between and hope that no one gets to suffer such a nightmare!). Whatever maybe the case, disaster preparedness is the only sure-shot way for a business to stay alive.

Failover for Enterprises

In the case of large enterprises, a cold or warm standby will not cut it. A cold start warrants manual intervention and warm start involves a backup running in the background with the data being mirrored to a secondary server at specified intervals. It is possible that the data on both servers is not synchronized all the time.  A hot standby becomes a clear gating factor for a software that manages fault and network performance of critical systems and services.

A hot standby failover is preferred because,

 i) the redundant systems run in parallel with a 100% data synchronization
ii) the users do not experience a glitch as the failover is smooth and almost instant.

Hot-standby in OpManager

More on setting-up single-site and multi-site redundancy in another post..

  1. Reto König

    I am planning a high available OpManager installation. All the documentation found on the ME site describes the basic mechanism of the redundancy, but information about some details is missing:
    • Is the standby server offering the same IP address as the primary server after a failover?
    • If yes, what’s about multi-site redundancy? Have the sites to run in a common VLAN?
    • If the two servers are not having the same IP address for the web server port, I assume that the web clients have to reconnect after a failover
    • If the two servers are not having the same IP address for the trap receiver port, I assume that all the traps destinations on the monitored systems have to point to both of the servers

    Can anyone answer my questions and comment my assumptions? Thanks!

    • vidya

      Thanks for posting your queries here Reto!

      Find the answers inline to your queries:

      • Is the standby server offering the same IP address as the primary server after a failover?

      The Primary and Secondary will have different IP addresses. You can consider setting up a cluster and using the cluster IP to access.

      • If yes, what’s about multi-site redundancy? Have the sites to run in a common VLAN?

      -So, this question is not applicable.

      • If the two servers are not having the same IP address for the web server port, I assume that the web clients have to reconnect after a failover

      The webclient re-direction does not happen automatically in the event of a failover or fail-back. It is manual. Again, if you are setting up a cluster and if it is capable of handling a redirection, you can use it.

      • If the two servers are not having the same IP address for the trap receiver port, I assume that all the traps destinations on the monitored systems have to point to both of the servers

      Yes, both the servers have to be specified as trap destinations.

      Let me know if you need any further details.

      Cheers
      Vidya

  2. Vidya

    Thanks on behalf of OpManager team:)

  3. You really do have a lot to offer when it comes to network monitoring. Kudos to a job well done. 🙂