Model Drift Detection

When ML models are deployed in production, sometimes even minor changes in a data distribution can adversely affect the performance of ML models. When the input data distribution shifts then prediction quality can drop. It is important to track this drift. This demo is based on the mixed-type tabular data drift detection method in the alibi detect project for tabular datasets.

Here we will :

  • Launch an income classifier model based on demographic features from a 1996 US census. The data instances contain a person’s characteristics like age, marital status or education while the label represents whether the person makes more or less than $50k per year.

  • Setup a mixed-type tabular data drift detector for this particular model.

  • Make a batch of predictions over time

  • Track the drift metrics in the Monitoring dashboard.

Important

This demo requires Knative installation on the cluster as the drift detector will be installed as a kservice. See Knative intallation instructions for necessary setup required.

Register an income classifer model

Register the income classifier SKLearn model with the below URI.

gs://seldon-models/sklearn/income/model-0.23.2

register-model

Configure predictions schema

Edit the model metadata to update the predictions schema for the model. The predictions schema is a generic schema structure for machine learning model predictions. It is a definition of feature inputs and output targets from the model prediction. Use the income classifier model predictions schema to edit and save the model level metadata. Learn more about the predictions schema at the ML Predictions Schema open source repository.

configure-predictions-schema

Launch a Seldon Core deployment

Deploy the income classifier model from the catalogue into an appropriate namespace.

launch-deployment

  1. In the deployment creation wizard, enter a name for your new deployment.

  2. Select the namespace you would like the deployment to reside in (e.g. seldon).

  3. From the protocol dropdown menu, select Seldon and click Next.

  4. For the deployment details, enter the following values, then click Next:

  5. Skip the remaining steps, then click Launch.

Add A Drift Detector

From the deployment overview page, select your deployment to enter the deployment dashboard. Inside the deployment dashboard, add a drift detector with by clicking the Create button within the Drift Detection widget.

Set up a detector

Enter the following parameters in the modal popup which appears, to configure the detector:

  • Model Name: income-drift-detector.

  • Model URI: (For public google buckets, secret field is optional)

    gs://seldon-models/alibi-detect/cd/tabular/income-0_7_0/
    
  • Reply URL: (By default, the Reply URL is set as seldon-request-logger in the logger’s default namespace. If you are using a custom installation, please change this parameter according to your installation.)

    http://seldon-request-logger.seldon-logs
    
  • Batch Size: 200.

  • Protocol: Seldon Inference.

  • HTTP Port: 8080.

Then, click Create Drift-Detector to complete the setup.

Run Batch Predictions

  1. From the deployment dashboard, click on Batch Jobs. Run a batch prediction job using the ndarray payload format text predictions data file. This file has 4000 individual data points and based on our drift detector configuration, drift will be detected for a batch every 200 points. The distribution of the data in the first half section is the same as the distribution of the reference data the drift detector was configured with and the second half section of the data should be different to observe drift.

  2. Upload the data to a bucket store of your choice. This demo will use minio and store the data at bucket path s3://detect/income-batch-data/data.txt . Do not forget to configure your storage access credentials secret - we have it as seldon-rclone-secret here. Refer to the batch request demo for an example of how this can be done via the minio browser.

  3. Running a batch job with the configuration below. This runs an offline job that makes a prediction request for a batch of 200 rows in the file at s3://detect/income-batch-data/predictions.txt every 5 seconds:

    Input Data Location: s3://detect/income-batch-data/data.txt
    Output Data Location: s3://detect/income-batch-data/output.txt
    Number of Workers: 1
    Number of Retries: 3
    Batch Size: 200
    Minimum Batch Wait Interval (sec): 5
    Method: Predict
    Transport Protocol: REST
    Input Data Type: ndarray
    Object Store Secret Name: seldon-rclone-secret
    

batch predictions

Monitor Drift Detection Metrics

Under the Monitor section of your deployment navigation, on the Drift Detection Tab, you can see a timeline of drift detection metrics. The dashboard now showcases the main metrics, p-values, thresholds and distance scores at every feature level. Other Drift detection techniques enable showing these metrics at batch level too. You will notice that the starting batches are not drifting marked by the O symbol and later half of the batches start to drift marked with X symbol.

drift_detection

Troubleshooting

If you experience issues with this demo, see the troubleshooting docs and also the knative or elasticsearch sections.