PostgreSQL Persistence for Model Metadata

Important

Before starting the installation procedure, please download installation resources as explained here and make sure that all pre-requisites are satisfied.

This page also assumes that main Seldon Core and Seldon Deploy components are installed.

Warning

PostgreSQL is an external component outside of the main Seldon stack. Therefore, it is the cluster administrator’s responsibility to administrate and manage the PostgreSQL instance used by Seldon.

We use PostgreSQL for persisting model metadata information.

Seldon Deploy Configuration

Enabling/disabling the PostgreSQL dependency in Seldon Deploy can be done with setting the following Helm variable - metadata.pg.enabled. If it is set to false Seldon Deploy will not attempt to connect to a PostgreSQL database, but all model metadata functionality will be unavailable. If metadata.pg.enabled is true, then Seldon Deploy will expect a metadata-postgres Kubernetes secret to be present in the namespace where Seldon Deploy is running. This secret needs to contain the information for connecting to a PostgreSQL database. The structure of the secret is:

kind: Secret
apiVersion: v1
data:
  dbname: the_name_of_the_database_to_use_for_model_metadata
  host: the_database_host
  user: the_database_user_to_use_to_authenticate
  password: the_database_password_to_use_to_authenticate
  port: the_port_the_database_is_exposed_on
  sslmode: the_sslmode
  ca.crt: the_ca_certificate_to_verify_identity_of_the_server # optional, based on sslmode

Installation

PostgreSQL can be installed in many different ways - using managed solutions by cloud providers, or running it in Kubernetes.

Bringing your own PostgreSQL

One option is to use PostgreSQL outside of the Kubernetes cluster that runs Seldon Deploy. If you already have a database you want to use with Seldon Deploy running on prem or in the cloud you can add the connection information in the metadata-postgres secret in the namespace Seldon Deploy is running like this substituting, the values with the ones of your database:

kubectl create secret generic -n seldon-system metadata-postgres \
--from-literal=user=your_user \
--from-literal=password=your_password \
--from-literal=host=your.postgres.host \
--from-literal=port=5432 \
--from-literal=dbname=metadata \
--from-literal=sslmode=require \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -

In the next sections we explore how you can start using a managed PostgreSQL in AWS and GCP and connect it with Seldon Deploy.

Amazon RDS

Amazon RDS provides a managed PostgreSQL solution that can be used for Seldon Deploy’s Model Metadata Storage. For setting up RDS for the first time you can follow the docs here.

Some important points to remember while setting up RDS:

  • Make sure the instance is accessible from Seldon Deploy. If Seldon Deploy is not on the same VPC, make sure the VPC used by RDS has a public subnet as discussed here.

  • Make sure the security group used for accessing the RDS instances allow inbound and outbound traffic from and to Seldon Deploy. Setting up security groups for RDS is discussed here.

Once you have a running PostgreSQL instance, with a database and a user created, you can configure Seldon Deploy by adding the metadata-postgres secret as discussed in the previous section.

To manage backups see the official documentation. Here is more documentation on other best practices around RDS.

Google SQL

GCP provides a managed PostgreSQL solution that can be used for Seldon Deploy’s Model Metadata Storage. For setting up Google SQL for the first time you can follow the docs here.

For connection instructions follow the official documentation. Make sure that the instance is accessible from Seldon Deploy. If using the public IP generated for the instance make sure the network that runs Seldon Deploy is part of the Cloud SQL authorized networks by following this guide.

Once you have a running PostgreSQL instance, with a database and a user created, you can configure Seldon Deploy by adding the metadata-postgres secret as discussed in the previous section.

SSL Support

By default, Seldon Deploy will not perform any verification of the Postgres server certificate. To allow server certificate verification, change the SSL mode to verify-ca or verify-full as needed and place one or more root certificates in the ca.crt key in the kubernetes secret. Intermediate certificates should also be added to the file if they are needed to link the certificate chain sent by the server to the root certificates stored on the client.

kubectl create secret generic -n seldon-system metadata-postgres \
--from-literal=user=your_user \
--from-literal=password=your_password \
--from-literal=host=your.postgres.host \
--from-literal=port=5432 \
--from-literal=dbname=metadata \
--from-literal=sslmode=verify-ca \
--from-file=ca.crt=/path/to/caFile \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -

Further, if the server attempts to verify the identity of the client by requesting the client’s leaf certificates, create another kubernetes TLS secret with client certificates for the connection. Here, we create a secret named postgres-client-certs for this purpose. See helm chart configuration section for details on usage of these secrets created.

kubectl create secret tls -n seldon-system postgres-client-certs \
--cert=`/path/to/cert` \
--key=`/path/to/key` \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -

Running PostgreSQL in Kubernetes

You can also run PostgreSQL in the Kubernetes cluster that runs Seldon Deploy. We recommend using the Zalando PostgreSQL operator to manage the PostgreSQL installation and maintenance. The official documentation can be seen here.

Warning

If your cluster is using Kubernetes version 1.25 or higher, you should install version 1.9.0+ of Zalando’s PostgreSQL operator. You can also confer with their installation matrix.

The instructions that follow will help you to quickly spin up a PostgreSQL instance. However, we don’t recommend using it in a production context, and should be treated as development only.

Below we show an example deployment of a PostgreSQL cluster:

To install the Zalando operator you can run:

git clone https://github.com/zalando/postgres-operator.git
cd postgres-operator
git checkout v1.8.2 # Use a tag to pin what we are using.
kubectl create namespace postgres || echo "namespace postgres exists"
helm install postgres-operator ./charts/postgres-operator --namespace postgres

If you want to install the operator UI you can do it by following this doc.

To install a minimal PostgreSQL setup you can run:

cat << EOF | kubectl apply -f -
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: seldon-metadata-storage
  namespace: postgres
spec:
  teamId: "seldon"
  volume:
    size: 5Gi
  numberOfInstances: 2
  users:
    seldon:  # database owner
    - superuser
    - createdb
  databases:
    metadata: seldon  # dbname: owner
  postgresql:
    version: "13"
EOF

For a more complex setup consisting of more users, databases, replicas, etc. please refer to the official documentation of the operator here.

Once the database instances have been created by the Zalando operator you can create the expected secret using the auto generated password:

kubectl get secret seldon.seldon-metadata-storage.credentials.postgresql.acid.zalan.do -n postgres -o 'jsonpath={.data.password}' | base64 -d > db_pass
kubectl create secret generic -n seldon-system metadata-postgres \
  --from-literal=user=seldon \
  --from-file=password=./db_pass \
  --from-literal=host=seldon-metadata-storage.postgres.svc.cluster.local \
  --from-literal=port=5432 \
  --from-literal=dbname=metadata \
  --from-literal=sslmode=require \
  --dry-run=client -o yaml \
  | kubectl apply -n seldon-system -f -
rm db_pass

Configuring Seldon Deploy

Once you have your PostgreSQL database ready and the secrets with credentials ready, add the following to deploy-values.yaml. See SSL support section for configuring client certs for mutual TLS verification.

metadata:
  pg:
    enabled: true
    secret: metadata-postgres
    clientTLSSecret: "postgres-client-certs" # Optional, only needed for SSL verification

Warning

Setting metadata.pg.enabled will cause the request logger to automatically try to retrieve metadata from Deploy. Ensure you have the correct configuration for this to work properly.

Production operations on self-managed PostgreSQL

One of the drawbacks of using self-hosted PostgreSQL rather than a managed solution is that you will need to handle operating the PostgreSQL cluster. Here is a list of some resources for best practices and how to handle some operations: