Alert (email and sms)
Overview
In GermainAPM, alerting is a core functionality that enables your teams to react to SLA breaches in monitored environments, whether they are Technology SLAs, Business Process SLAs or User Experience SLAs. These alerts can be triggered by basic ingested data, correlated data, query mechanisms ( see Automation - SQL execution ) and HTTP actions ( see Automation - Http(s) execution ).
There are some key concepts in order to fully understand how alerting works, but in a nutshell, once a data point is ingested and an SLA evaluated for that data point, we are able to configure an action to trigger, one of these actions types is an AlertAction which will notify a target group of users - Distribution - by sending a message defined by a template.
In order to configure alerting you'll need to follow these steps:
Define a distribution list
Create the AlertAction
Associate the AlertAction with an SLA
Create or customize a Template
Alert Configuration Reference
Workspace Field Label | Configuration Key | Description | Possible Values | Default Value |
---|---|---|---|---|
Name | name | Unique Action Name | String | |
Distribution List | alertGroupName | Name of the alert distribution group for this alert. | String, matching an existing Distribution List | |
Run on Schedule | *Calculated Field* if set to true, allows schedule cron expression to be defined | Boolean, true/false | false | |
Triggered by SLAs | *Calculated Field* Collection of SLAs this action is configured for | |||
Quiet Time Used | quietTimeUsed | If set, alerts of this type will not fire again during the quiet time period after the initial occurrence. For More details please see our dedicated documentation page: Quiet Time | Boolean, true/false | true |
Quiet TIme Period | quietTimePeriod | Quiet time period (in seconds) to use for this alert. If set to 0, will use default quiet time period. | Integer ( seconds ) | |
Logging Enabled | loggingEnabled | Boolean, true/false | ||
Notify On Success | notifyOnSuccess | Boolean, true/false | ||
Notify On Failure | notifyOnFailure | Boolean, true/false | ||
Execution Count | limitCount | Used in combination with limitInterval to define upper limit on how many times to execute this action. | Integer | |
Execution Interval | limitInterval | Used in combination with limitCount to defines interval during which upper limit applies. | String, one of the following values: | |
alertTypeName | *Advanced Field* Default alert template to be used by this action. Alert templates are usually defined at an SLA level, but in occurrences where an Template is not defined or the defined template does not exist we would default to the alert template defined here. | String, matching an existing Template Name | SLA | |
assignToPrimaryMember | *Advanced Field* Flag that determines if alerts are assigned to the associated distribution group or its primary member. If checked, any alert logs created for this alert will be assigned to the email address of the primary user of the associated distribution group. | Boolean, true/false | ||
references | *Advanced Field* References to configuration entries associated with this action. Allows other configuration items such as credentials to be used within the Template as a key/item pair that can then be used within the Alert Template, eg. §{Credentials.username} .*option1* Also allows for direct key/value references to be defined that do not reference other configuration items. *option2* | name - unique reference name |
Example - Email Alert on CPU Usage SLA breach
Below we'll try to configure the needed steps to setup alerting once the CPU Usage KPI on my target monitored server breaches a defined SLA.
KPI and SLA
I'll start by confirming that I have a CPU Usage KPI by navigating to Analytics > KPIs
Here I have confirmed I have the KPI configured and that there is also an SLA defined.
For more details on KPI and SLA concepts and configuration please see our dedicated documentation:
Alerting Action Configuration
Navigating to the Automation > Alert page will give us access to our new wizard, that luckily will take care of most of the configuration for us so let's go ahead and bring up the Wizard by Clicking the + sign.
Using the Configuration reference above, let's go through each step of the wizard.
1 Alert Step
Name: infrastructure-alert-action
Distribution:
Either select an existing distribution or create a new one by clicking the + sign and following the distribution creation wizard.
Click Next
2 Schedule Step
Run on Schedule: leave unselected
Click Next
3 SLA Step
SLA: Select the CPU Usage SLA that was previously configured or create a new SLA
Click Finish
We now have a fully configured alert that will be sent out once a CPU Usage SLA breach is identified
Note that this AlertAction is not restricted only to CPU Usage SLA and can be reused across other SLAs, if we would like to also notify the same distribution of Memory Usage SLA breaches we could click on the + sign next to Triggered by SLAs and link it to any other existing SLA by selecting as many SLAs you would like to link with this Action in the displayed Wizard:
Template Configuration
Configure your Alert Template under System > Alert Templates and select CPU Usage
For more details on Template Configuration please see our dedicated documentation page: Alert Template Reference