Aug 23

The Root cause Analysis (RCA) messages for any attribute will be changed whenever there is a change in severity. It will not change during every polling.

For example, let us assume that you have associated a threshold ( Critical > 90, Warning > 80 and Clear <= 80 ) to Cpu Utilization of a server. If the current cpu utilization value is 92%, then RCA message for Health of the monitor will say

Quote:
Health is critical.Root Cause :

1. CPU Utilization 92 > 90 % (threshold).

The RCA message for CPU Utilization attribute will say,

Quote:
CPU Utilization - Critical.CPU Utilization of app-w2k1.india.adventnet.com is critical because its value 92 > 90 %.

[Threshold Details : Critical if value > 90, Warning if value > 80, Clear if value <= 80]

In the next poll if the cpu utilization value increases to 96%, the RCA message will not be updated now for Health and CPU Utilization attribute. This message will be updated only when the cpu utilization becomes less than 90 ( when it becomes either warning or clear ). The RCA message for Health of the monitor will change however if another attribute becomes critical. So if the Physical Memory Utilization (%) also becomes critical, the RCA message for Health of the monitor will say,

Quote:
Health is critical.Root Cause :

1. Physical Memory Utilization 93 > 90 % (threshold).

2. CPU Utilization 92 > 90 % (threshold).

- Arun

Aug 17

To start with, there are four levels of severity for every attribute ( CPU, Disk space, JVM size etc are known as attributes ) except availability - Unknown, Critical, Warning and Clear. Availability has only two states - Up and Down.

By default any attribute will be in unknown state and once you associate a threshold it will become either critical / warning / clear depending upon the threshold. Also note that by default actions ( notifications like email, sms etc ) will be executed only when there is a change in level of severity ( ie., actions will be executed only when some attribute changes from clear to critical level. This action will not be executed again when that attribute remains in critical state ).

The Health attribute of a monitor or a monitor group will depend upon all the other attributes/monitor’s severity levels present under it by default. Only when all the attributes/monitors are in clear state, health will be in clear state. If any attributes becomes critical or if the availability of the monitor goes down, the health will automatically become critical.

Pros and Cons of configuring alert in a Monitor Group level

Lets say that you have configured availability alert for a monitor group. So when any one monitor in this group goes down, you will get an alert and also configuring alert in group level is easy work.

The problem in this approach is that when the first monitor goes down, the availability of the group will also become down and you will get email notification. If another monitor also goes down after some 5 minutes ( Monitor 1 is still down and availability of group is also already down ) you will not get any email notifications as there is no change in severity of the availability of monitor group.

The best approach to this problem is to configure availability alert for each monitor individually.

Pros and Cons of configuring alert for Health of a monitor

When you configure alert for health of a monitor, you will be alerted if any attribute under that monitor becomes critical. But this approach also has the same problem as configuring alert for monitor group availability. You will be notified when first attribute becomes critical or warning however you will not be notified if another attribute also becomes critical in the next poll ( when the first attribute is still critical ).

The best approach to this problem is again to configure alerts for each required attribute individually.

- Arun

Aug 7

Last blog highlighted our plans for SAP Support in our next major update. However a nice enhancement to our Release 7.4 was the support for hierarchical monitor groups. This should help our users to better group & monitor IT resources. Here is how.

The traditional approach to monitoring resources, follow a siloed approach to monitoring technology components. There, high level availability and health are known for individual components and it is hard to visualize what resources need more attention based on business priority. With support for sub-groups, this situation is changing and will improve visibility even for the Head of Operations. For example, now you would be able to see that the overall business application had a 99.985% availability and can then check the reports and interpret how the downtimes total up.

The accuracy of reports will be improved by the ability to group relevant resources and by defining proper “dependency rules“. These dependency rules will also help handle clustered setups.

The hierarchical grouping along with the support for dependencies with rules like “Critical if any ‘N’ Monitors are critical” should enable managing business services better.

Hope the below graphic is self explanatory. Click it to enlarge it.

Cheers !

ps :

A previous blog highlight, just in case you missed it :

Response Time across multiple locations

AdventNet.com | ManageEngine.com