Batch Prediction Requests¶
Pre-requisites¶
MinIO should already be installed with Seldon Deploy.
The MinIO browser should be exposed on /minio/
(note the trailing forward slash).
For trials, the credentials will by default be the same as the Deploy login, with MinIO using the email as its Access Key
and the password as its Secret Key
.
Note that other cloud storage services such as S3
and GCS
can be specified alternatively with the corresponding secret files configured.
On a production cluster, the namespace needs to have been set up with a service account. This can be found under the argo install documentation.
We will:
Deploy a pipeline with a pretrained sklearn iris model
Run a batch job to get predictions
Check the output
Deploy Model¶
From the Overview page, open the Deployment Creation Wizard by clicking on + Create new deployment near the top right of the window.
Deployment Details¶
Choose a name for the deployment and which namespace you want it to be in, e.g. seldon
.
Set the Type
as shown below:
Name: batch-demo
Namespace: seldon
Type: Seldon ML Pipeline
Default Predictor¶
Set SciKit Learn
as the Runtime
and use the following model URI:
gs://seldon-models/mlserver/iris
The Model Project
can be left as default
, and the Storage Secret
field can be left blank in this setup.
Additional Creation Wizard Steps¶
Complete the remaining steps in the Deployment Creation Wizard by clicking Next with default values.
Setup Input Data¶
Download the input data file
. Note that this input file is different from the input data from
seldon-core-v1 batch prediction requests.
The first few lines of the input file ‘input-data-v2.txt’ should show the following format:
{"inputs":[{"name":"predict","data":[0.38606369295833043,0.006894049558299753,0.6104082981607108,0.3958954239450676],"datatype":"FP64","shape":[1,4]}]}
{"inputs":[{"name":"predict","data":[0.7223678219956075,0.608521741883582,0.8596266157372878,0.20041864827775757],"datatype":"FP64","shape":[1,4]}]}
{"inputs":[{"name":"predict","data":[0.8659159480026418,0.2383384971368594,0.7743518759043038,0.8748919374334038],"datatype":"FP64","shape":[1,4]}]}
Go to the MinIO browser and use the button in the bottom-right to create a bucket. Call it data
.
Again from the bottom-right choose to upload the input-data-v2.txt
file to the data
bucket.
Run a Batch Job¶
Click on the tile for your new pipeline called batch-demo
in the Overview page of the Deploy UI.
Go to the Batch Jobs section for this deployment by clicking on the ‘Batch Jobs’ button in the sidebar on the left.
Expand to see the sidebar button
Click on the Create your first job button, enter the following details, and click Submit:
Input Data Location: minio://data/input-data-v2.txt
Output Data Location: minio://data/output-data-{{workflow.name}}.txt
Number of Workers: 5
Number of Retries: 3
Batch Size: 10
Minimum Batch Wait Interval (sec): 0
Method: Predict
Transport Protocol: REST
Input Data Type: V2 Raw
Object Store Secret Name: minio-bucket-envvars
Expand to see the 'Create your first job' button

Note
Here minio-bucket-envvars
is a pre-created secret
in the same namespace as the model, containing environment variables.
Give the job a couple of minutes to complete, then refresh the page to see the status.
In MinIO you should now see an output file:
If you open that file you should see contents such as:
{"model_name":"","outputs":[{"data":[0],"name":"predict","shape":[1],"datatype":"INT64"}],"parameters":{"batch_index":0}}
{"model_name":"","outputs":[{"data":[0],"name":"predict","shape":[1],"datatype":"INT64"}],"parameters":{"batch_index":2}}
{"model_name":"","outputs":[{"data":[1],"name":"predict","shape":[1],"datatype":"INT64"}],"parameters":{"batch_index":4}}
{"model_name":"","outputs":[{"data":[0],"name":"predict","shape":[1],"datatype":"INT64"}],"parameters":{"batch_index":1}}
If not, see the argo section for troubleshooting.
Micro batching¶
You can specify a batch-size parameter which will group multiple predictions into a single request.
This allows you to take advantage of the higher performance batching provides for some models, and reduce networking overhead.
The response will be split back into multiple, single-prediction responses
so that the output file looks identical to running the processor with a batch size of 1
.