Alert (email and sms)

Overview

In GermainAPM, alerting is a core functionality that enables your teams to react to SLA breaches in monitored environments, whether they are Technology SLAs, Business Process SLAs or User Experience SLAs. These alerts can be triggered by basic ingested data, correlated data, query mechanisms ( see Automation - SQL execution )  and HTTP actions ( see  Automation - Http(s) execution ).

There are some key concepts in order to fully understand how alerting works, but in a nutshell, once a data point is ingested and an SLA evaluated for that data point, we are able to configure an action to trigger, one of these actions types is an AlertAction which will notify a target group of users - Distribution - by sending a message defined by a template.



In order to configure alerting you'll need to follow these steps:

  1. Define a distribution list

  2. Create the AlertAction

  3. Associate the AlertAction with an SLA

  4. Create or customize a Template

Alert Configuration Reference

Workspace Field Label

Configuration Key

Description

Possible Values

Default Value

Workspace Field Label

Configuration Key

Description

Possible Values

Default Value

Name

name

Unique Action Name

String



Distribution List

alertGroupName

Name of the alert distribution group for this alert.

String, matching an existing Distribution List



Run on Schedule



*Calculated Field* if set to true, allows schedule cron expression to be defined

Boolean, true/false

false

Triggered by SLAs



*Calculated Field* Collection of SLAs this action is configured for





Quiet Time Used

quietTimeUsed

If set, alerts of this type will not fire again during the quiet time period after the initial occurrence.

For More details please see our dedicated documentation page: Quiet Time

Boolean, true/false

true

Quiet TIme Period

quietTimePeriod

Quiet time period (in seconds) to use for this alert. If set to 0, will use default quiet time period.

Integer ( seconds )



Logging Enabled

loggingEnabled



Boolean, true/false



Notify On Success

notifyOnSuccess



Boolean, true/false



Notify On Failure

notifyOnFailure



Boolean, true/false



Execution Count

limitCount

Used in combination with limitInterval to define upper limit on how many times to execute this action.

Integer



Execution Interval

limitInterval

Used in combination with limitCount to defines interval during which upper limit applies.

String, one of the following values:
YEAR, MONTH, WEEK, DAY, HOUR, MINUTE





alertTypeName

*Advanced Field* Default alert template to be used by this action. Alert templates are usually defined at an SLA level, but in occurrences where an Template is not defined or the defined template does not exist we would default to the alert template defined here.

String, matching an existing Template Name

SLA



assignToPrimaryMember

*Advanced Field* Flag that determines if alerts are assigned to the associated distribution group or its primary member. If checked, any alert logs created for this alert will be assigned to the email address of the primary user of the associated distribution group.

Boolean, true/false





references

*Advanced Field* References to configuration entries associated with this action.

Allows other configuration items such as credentials to be used within the Template as a key/item pair that can then be used within the Alert Template, eg. §{Credentials.username} .*option1*

Also allows for direct key/value references to be defined that do not reference other configuration items. *option2*

name - unique reference name
eg. Credentials
key *option 1*- germain apm config collection key
eg. germain.apm.monitoringConfig.credentials
itemName *option 1* - germain apm config item key
eg. SSHProdCredentials
Value *option 2* - hard value for reference
Password1




Example - Email Alert on CPU Usage SLA breach

Below we'll try to configure the needed steps to setup alerting once the CPU Usage KPI on my target monitored server breaches a defined SLA.

KPI and SLA

I'll start by confirming that I have a CPU Usage KPI by navigating to Analytics > KPIs


Here I have confirmed I have the KPI configured and that there is also an SLA defined.

For more details on KPI and SLA concepts and configuration please see our dedicated documentation:

SLA - Server

KPIs (preconfigured)

Alerting Action Configuration

3 SLA Step

SLA: Select the CPU Usage SLA that was previously configured or create a new SLA

Click Finish


We now have a fully configured alert that will be sent out once a CPU Usage SLA breach is identified



Note that this AlertAction is not restricted only to CPU Usage SLA and can be reused across other SLAs, if we would like to also notify the same distribution of Memory Usage SLA breaches we could click on the + sign next to Triggered by SLAs and link it to any other existing SLA by selecting as many SLAs you would like to link with this Action in the displayed Wizard:


Template Configuration

Configure your Alert Template under System > Alert Templates and select CPU Usage




For more details on Template Configuration please see our dedicated documentation page: Alert Template Reference