NVIDIA Triton Server and Alibi Explanations

In this demo we will deploy an image classification model on NVIDIA Triton with GPUs and run explanations using Seldon Alibi. This demo also uses the KFserving V2 protocol for model prediction and explanation payload. Learn more about V2 protocol at Predict Protocol - Version 2 git repository.

Deploy an image classifier model

For this example choose tfcifar10 as the name and use the KFServing protocol option.

wizard1

For the model to run we have created several image classification models from the CIFAR10 dataset.

  • Tensorflow Resnet32 model: gs://seldon-models/triton/tf_cifar10

  • ONNX model: gs://seldon-models/triton/onnx_cifar10

  • PyTorch Torchscript model: gs://seldon-models/triton/pytorch_cifar10

Choose one of these and select Triton as the server. Customize the model name to that of the name of the model saved in the bucket for Triton to load.

wizard2

Configure NVIDIA GPU resources

Next, on the resources screen add 1 GPU request/limit assuming you have these available on your cluster and ensure your have provided enough memory for the model. To determine these settings we recommend you use the NVIDIA model analyzer.

wizard3

Make model predictions

When ready you can test with images. The payload will depend on the model from above you launched.

Configure an Alibi Anchor Images Explainer

The explanation will offer insight into why an input was classified as high or low. It uses the anchors technique to track features from training data that correlate to category outcomes. Create a model explainer using the URI below for the saved explainer.

gs://seldon-models/tfserving/cifar10/explainer-py36-0.5.2

create-explainer

Get Explanation for a single prediction

View all requests and then click the alibi icon to run an explanation request. Note that the explanation request is also made as the same KFserving V2 protocol payload.

explain-request