Alarms from Applications Manager can be of any of the following severity :

  1. Critical (for Health & Down for Availability) -Amber
  2. Warning – Orange
  3. Clear (for Health & Up for Availability) – Green

A few examples for Alarms can be :

  1. Service is down
  2. Server is down
  3. Process ‘java.exe’ is down
  4. CPU Utilization has violated the threshold value, 90% > 80% ( Can be Critical or Warning as configured )
  5. Response time is greater than the threshold value, 200 ms > 180 ms

Alarms can be notified through Email or SMS . Corrective Actions like restarting the service or server can be executed through ‘Execute Script ’ option when alarms are generated.

Let us now design an alarm workflow through Applications Manager for the scenario as shown in the diagram below :

The points below outline the solution for the above use case :

a)    In order that the first poll does not generate any alarm, you can configure to generate an alarm after ‘n’ consecutive polls

Ex : Poll 2 times consecutively before reporting the monitor is down.

You can configure this option

  1. Globally : This would mean that a Critical or Warning alert will be generated for all monitor types ( Server, Tomcat, Apache), for all attributes ( CPU, Memory, Java Heap Size ) only if the attribute has crossed the threshold 2 times
  2. At the specific attribute level : This is done while configuring the Threshold . This policy can be applied to monitors with similar attributes thereby saving time and managing escalation policies effectively.

The status of the Monitor will not change in the first poll. If the fault (or event) is still active in the second consecutive poll, the status of the Monitor changes to Critical or Warning, as the case may be.

b)    You can configure to Send Email or SMS and also execute a script for corrective action or ‘Log a Ticket’ in ServiceDesk Plus, when the alarm is generated in the 3rd poll. In order that this alert be raised as ticket in your helpdesk, configure the Helpdesk Email address in the Email Actions.

The second step in the above use case is thus solved. Now, the third one is to raise a ticket when the alarm is not acknowledged for 20 minutes.

c)    If the alarm is not cleared automatically or manually within the next 20 minutes, you can configure Alarm Escalation Rules .

A screenshot of the Alarm Escalation Rule is shown below :

Using these rules, you can create a ticket in your Helpdesk if the ticket is not closed within 20 minutes. You have to configure your helpdesk email address in the To Address while configuring Email Actions, so that the alarm is generated as a ticket in your helpdesk. [In case, if you would want to raise a ticket with ServiceDesk Plus , you can use the ‘Log a Ticket ’ Action or use the Email Commands Template to generate a ticket.]

The Technician can login to the Helpdesk System to add notes/work log in order to update the steps taken to resolve the problem. Once the problem is resolved, Application Manager automatically changes the status of the monitor to Clear (Green) in the next poll.

Let me know if you have a different alarm work flow, which needs to be integrated into Applications Manager.

Kevin

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  1. Anna Ewing

    Do you know what an activity diagram for an alarm will look like?