PostgreSQL Persistence for Model Metadata

Important

Before starting the installation procedure, please download installation resources as explained here and make sure that all pre-requisites are satisfied.

This page also assumes that main Seldon components are installed.

We use PostgreSQL for persisting model metadata information.

Seldon Deploy Configuration

Enabling/disabling the PostgreSQL dependency in Seldon Deploy can be done with setting the following Helm variable - metadata.pg.enabled. If it is set to false Seldon Deploy will not attempt to connect to a PostgreSQL database, but all model metadata functionality will be unavailable. If metadata.pg.enabled is true, then Seldon Deploy will expect a metadata-postgres Kubernetes secret to be present in the namespace where Seldon Deploy is running. This secret needs to contain the information for connecting to a PostgreSQL database. The structure of the secret is:

kind: Secret
apiVersion: v1
data:
  dbname: the_name_of_the_database_to_use_for_model_metadata
  host: the_database_host
  user: the_database_user_to_use_to_authenticate
  password: the_database_password_to_use_to_authenticate
  port: the_port_the_database_is_exposed_on
  sslmode: the_sslmode

Installation

PostgreSQL can be installed in many different ways - using managed solutions by cloud providers, or running it in Kubernetes.

Bringing your own PostgreSQL

One option is to use PostgreSQL outside of the Kubernetes cluster that runs Seldon Deploy. If you already have a database you want to use with Seldon Deploy running on prem or in the cloud you can add the connection information in the metadata-postgres secret in the namespace Seldon Deploy is running like this substituting the values with the ones of your database:

kubectl create secret generic -n seldon-system metadata-postgres \
--from-literal=user=your_user \
--from-literal=password=your_password \
--from-literal=host=your.postgres.host \
--from-literal=port=5432 \
--from-literal=dbname=metadata \
--from-literal=sslmode=require \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -

In the next sections we explore how you can start using a managed PostgreSQL in AWS and GCP and connect it with Seldon Deploy.

Amazon RDS

Amazon RDS provides a managed PostgreSQL solution that can be used for Seldon Deploy’s Model Metadata Storage. For setting up RDS for the first time you can follow the docs here.

Some important point to remember while setting up RDS:

  • Make sure the instance is accessible from Seldon Deploy. If Seldon Deploy is not on the same VPC, make sure the VPC used by RDS has a public subnet as discussed here.

  • Make sure the security group used for accessing the RDS instances allow inbound and outbound traffic from and to Seldon Deploy. Setting up security groups for RDS is discussed here.

Once you have a running PostgreSQL instance, with a database and a user created you can configure Seldon Deploy by adding the metadata-postgres secret as discussed in the previous section.

To manage backups see the official documentation. Here is more documentation on other best practices around RDS.

Google SQL

GCP provides a managed PostgreSQL solution that can be used for Seldon Deploy’s Model Metadata Storage. For setting up Google SQL for the first time you can follow the docs here.

For connection instructions follow the official documentation. Make sure that the instance is accessible from Seldon Deploy. If using the public IP generated for the instance make sure the network that runs Seldon Deploy is part of the Cloud SQL authorized networks by following this guide.

Once you have a running PostgreSQL instance, with a database and a user created you can configure Seldon Deploy by adding the metadata-postgres secret as discussed in the previous section.

Running PostgreSQL in Kubernetes

You can also run PostgreSQL in the Kubernetes cluster that runs Seldon Deploy. We recommend using the Zalando PostgreSQL operator to manage the PostgreSQL installation and maintenance. The official documentation can be seen here. Below we show an example deployment of a PostgreSQL cluster:

To install the Zalando operator you can run:

git clone https://github.com/zalando/postgres-operator.git
git checkout v1.6.1 # Use a tag to pin what we are using.
cd postgres-operator
kubectl create namespace postgres || echo "namespace postgres exists"
helm install postgres-operator ./charts/postgres-operator --namespace postgres

If you want to install the operator UI you can do it by following this doc.

To install a minimal PostgreSQL setup you can run:

cat <<EOF | kubectl apply -f -
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: seldon-metadata-storage
  namespace: postgres
spec:
  teamId: "seldon"
  volume:
    size: 5Gi
  numberOfInstances: 2
  users:
    seldon:  # database owner
    - superuser
    - createdb
  databases:
    metadata: seldon  # dbname: owner
  postgresql:
    version: "13"
EOF

For a more complex setup consisting of more users, databases, replicas, etc. please refer to the official documentation of the operator here.

Once the database instances have been created by the Zalando operator you can create the expected secret using the auto generated password:

kubectl get secret seldon.seldon-metadata-storage.credentials.postgresql.acid.zalan.do -n postgres -o 'jsonpath={.data.password}' | base64 -d > db_pass
kubectl create secret generic -n seldon-system metadata-postgres \
  --from-literal=user=seldon \
  --from-file=password=./db_pass \
  --from-literal=host=seldon-metadata-storage.postgres.svc.cluster.local \
  --from-literal=port=5432 \
  --from-literal=dbname=metadata \
  --from-literal=sslmode=require \
  --dry-run=client -o yaml \
  | kubectl apply -n seldon-system -f -
rm db_pass

Configuring Seldon Deploy

Once you have your PostgreSQL database ready and the secret with credentials ready add to deploy-values.yaml following:

metadata:
  pg:
    enabled: true
    secret: metadata-postgres

Production operations on self-managed PostgreSQL

One of drawbacks of using self-hosted PostgreSQL rather than a managed solution is that you will need to handle operating the PostgreSQL cluster. Here is a list of some resources for best practices and how to handle some operations: