Test Alerting Flow with the API

Pre-requisites

The alerting integration needs to be configured, the default installation provides the configmaps and setup required for this to work out of the box, but otherwise refer to the production installation guide, which covers the setup required.

Note: if your OIDC provider is using TLS and a self-signed certificate, alertmanager cannot send notifications to the frontend, see more for details.

Firing a test alert

The Seldon Deploy alerting integration is built with a flexible architecture, and allows Seldon users to monitor and define SLAs, SLOs and SLIs for their models and the operation of the platform. Prometheus metrics exposed by Deploy and the models form the basis of SLIs and defined alerts form the SLOs.

In this example we will show a very basic situation - pushing a test alert manually through the API to provide an intuition on what would normally happen when an SLO is breached.

  1. Using the API to fire a test alert

    You can use the API to fire a test alert. If you have configured Alertmanager correctly this will then show up in the Deploy frontend.

    You can make an authorized curl request as below, getting a token using the API auth guide here.

    curl http://<DEPLOY_IP>/seldon-deploy/api/v1alpha1/alerting/test -X POST -H "Authorization: Bearer $TOKEN"

  2. Alert shows in notifications drawer

    The test alert will send a notification to the Deploy frontend within seconds, along with informing any other receivers (Pagerduty/Opsgenie/Slack/Email) that you may have configured.

    Unresolved notification

  3. Alert shows in alerts page

    Click on View All Firing Alerts in the alerts tray and, you will see the test alert along with any other currently firing alerts. This allows you to diagnose and fix any issues you may have missed when away from the Deploy UI.

    Alerts page

  4. Alert removed from alerts page

    The test alert will resolve after 1 minute and will no longer be visible in the alerts page once refreshed.

  5. Resolution notification

    Once the alert is resolved, after a period of time the frontend will be notified about the resolution. If you use the default configuration this will be after 5 minutes, but otherwise depends on Alertmanager’s resolve_timeout.

    This test alert only relies on Alertmanager, but real alerts will send resolution notifications as soon as Prometheus reports the alert as resolved.

    Resolved notification