Distributions Monitoring

Distributions monitoring provides an ability to view the statistics and distributions of features and predictions made by your model between any given time. This feature also enables you to draw comparisons between the model predictions for different feature combinations, cohorts and/or time slices. It is a vital aspect of model monitoring cycle to understand if the deployed model has the desired prediction characteristics during different times and for different cohorts.

This demo uses a model trained to predict high or low income based on demographic features from a 1996 US census. In this demo we will observe the predictions and feature distributions of live predictions made using this model with the following steps:

  • Register an income classifier model with the relevant predictions schema

  • Launch a Seldon Core deployment with the income classifier model

  • Make predictions using a REST requests to the model deployment

  • Observe the feature distributions of the live predictions

  • Filter distributions by time or predictions and feature level filters

Note

This demo needs the request logger to connect to Seldon Deploy in order to fetch model level predictions schema. And this requires specific request logger configuration. Also this feature is supported with many protocols available with deployments like seldon, tensorflow and the kfserving v2 protocol. But not supported for json data, string data, bytes payload or multi-node graph use cases yet.

Register a income classifer model

Register the income classifier SKLearn model with the below URI.

gs://seldon-models/sklearn/income/model-0.23.2

register-model

Configure predictions schema

Edit the model metadata to update the predictions schema for the model. The predictions schema is a generic schema structure for machine learning model predictions. It is a definition of feature inputs and output targets from the model prediction. Use the income classifier model predictions schema to edit and save the model level metadata. Learn more about the predictions schema at the ML Predictions Schema open source repository.

configure-predictions-schema

Launch a Seldon Core deployment

Deploy the income classifier model from the catalogue into an appropriate namespace.

launch-deployment

Make predictions using the model deployment

Model predictions can be made in the appropriate protocol. In this demo, we use seldon protocol, see a single prediction payload example below,

{
  "data": {
    "names": [
      "Age",
      "Workclass",
      "Education",
      "Marital Status",
      "Occupation",
      "Relationship",
      "Race",
      "Sex",
      "Capital Gain",
      "Capital Loss",
      "Hours per week",
      "Country"
    ],
    "ndarray": [[53, 4, 0, 2, 8, 4, 2, 0, 0, 0, 60, 9]]
  }
}

Distributions monitoring is especially useful to keep track of predictions when a model makes thousands of predictions in real world scenario. To simulate such a use case, make multiple predictions over time in the Seldon protocol request format using the predictions data csv file and the following shell script which makes around 32560 predictions with an interval of 5 seconds between each request. Note that you need sufficient payload logging infra needed for this.

CLUSTER_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
DEPLOYMENT_NAME=income-classifier
DEPLOYMENT_NAMESPACE=seldon
INTERVAL_SECONDS=5

while IFS= read -r line; do
    curl -k -H "Content-Type: application/json" \
        http://$CLUSTER_IP/seldon/$DEPLOYMENT_NAMESPACE/$DEPLOYMENT_NAME/api/v0.1/predictions \
        -d '{"data":{"names":["Age","Workclass","Education","Marital Status","Occupation","Relationship","Race","Sex","Capital Gain","Capital Loss","Hours per week","Country"],"ndarray":[['${line// /,}']]}}'
    sleep "${INTERVAL_SECONDS}s"
done <prediction-data.csv

Observe predictions and feature distributions

Select the income classifier deployment and go to the monitor section to view the predictions and feature distributions.

observe-distributions

Filter distributions by time or feature level filters

Filter distributions by time or predictions and feature level filters to compare different cohorts and further analysis. For example let’s look at the predictions for all individuals in the Age group 25-50 and also filter by their Education as High-School Grads and Dropouts Only and see how the average prediction frequency changes for this cohort.

filter-distributions

Configuring parameters

Distributions parameters configuration allows you to configure your charts for further analysis. For example let’s look at at the charts in the Age group and change the Histogram interval to 11 and Number of time buckets to 30 to see.

configure-parameters