Test Alerting Flow with the API

Pre-requisites

The alerting integration needs to be properly configured. The default installation provides the required ConfigMaps, and in case of errors please refer to the production installation guide.

Firing a test alert

The Seldon Enterprise Platform alerting integration is built with a flexible architecture and allows users to monitor and define SLAs, SLOs, and SLIs for their models and the operation of the platform. Prometheus metrics exposed by Enterprise Platform and the models form the basis of SLIs and defined alerts form the SLOs.

In this example, we’ll demonstrate how to push a test alert manually through the API to provide intuition on what would happen if SLO is breached.

  1. Using the API to fire a test alert

    You can use the API to fire a test alert. If you have configured Alertmanager correctly this will then show up in the Enterprise Platform frontend.

    You can make an authorized curl request as below, getting a token using the API auth guide here.

    curl http://<ENTERPRISE_PLATFORM_IP>/seldon-deploy/api/v1alpha1/alerting/test -X POST -H "Authorization: Bearer $TOKEN"

  2. Alert shows in notifications drawer

    The test alert will send a notification to the Enterprise Platform frontend within seconds, along with informing any other receivers (Pagerduty/Opsgenie/Slack/Email) that you may have configured.

    Unresolved notification

  3. Alert shows on alerts page

    Click on View All Firing Alerts in the alerts tray and, you will see the test alert along with any other currently firing alerts. This allows you to diagnose and fix any issues you may have missed when away from the Enterprise Platform UI.

    Alerts page

  4. Alert removed from alerts page

    The test alert will resolve after 1 minute and will no longer be visible on the alerts page once refreshed.

  5. Resolution notification

    Once the alert is resolved, after some time the frontend will be notified about the resolution. If you use the default configuration this will be after 5 minutes, but otherwise depends on Alertmanager’s resolve_timeout.

    This test alert only relies on Alertmanager, but real alerts will send resolution notifications as soon as Prometheus reports the alert as resolved.

    Resolved notification