PostgreSQL Persistence for Model Metadata¶
Important
Before starting the installation procedure, please download installation resources as explained here and make sure that all pre-requisites are satisfied.
This page also assumes that main Seldon Core and Seldon Enterprise Platform components are installed.
Warning
PostgreSQL is an external component outside of the main Seldon stack. Therefore, it is the cluster administrator’s responsibility to administrate and manage the PostgreSQL instance used by Seldon.
We use PostgreSQL for persisting model metadata information.
Seldon Enterprise Platform Configuration¶
Enabling/disabling the PostgreSQL dependency in Seldon Enterprise Platform can be done with setting the following Helm variable -
metadata.pg.enabled
. If it is set to false
Seldon Enterprise Platform will not attempt to connect to a PostgreSQL database,
but all model metadata functionality will be unavailable. If metadata.pg.enabled
is true
, then Seldon Enterprise Platform will
expect a metadata-postgres
Kubernetes secret to be present in the namespace where Seldon Enterprise Platform is running.
This secret needs to contain the information for connecting to a PostgreSQL database. The structure of the secret is:
kind: Secret
apiVersion: v1
data:
dbname: the_name_of_the_database_to_use_for_model_metadata
host: the_database_host
user: the_database_user_to_use_to_authenticate
password: the_database_password_to_use_to_authenticate
port: the_port_the_database_is_exposed_on
sslmode: the_sslmode
ca.crt: the_ca_certificate_to_verify_identity_of_the_server # optional, based on sslmode
Installation¶
PostgreSQL can be installed in many different ways - using managed solutions by cloud providers, or running it in Kubernetes.
Bringing your own PostgreSQL¶
One option is to use PostgreSQL outside of the Kubernetes cluster that runs Seldon Enterprise Platform. If you already have a database
you want to use with Seldon Enterprise Platform running on prem or in the cloud you can add the connection information in the
metadata-postgres
secret in the namespace Seldon Enterprise Platform is running like this substituting, the values with the ones
of your database:
kubectl create secret generic -n seldon-system metadata-postgres \
--from-literal=user=your_user \
--from-literal=password=your_password \
--from-literal=host=your.postgres.host \
--from-literal=port=5432 \
--from-literal=dbname=metadata \
--from-literal=sslmode=require \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -
In the next sections we explore how you can start using a managed PostgreSQL in AWS and GCP and connect it with Seldon Enterprise Platform.
Amazon RDS¶
Amazon RDS provides a managed PostgreSQL solution that can be used for Seldon Enterprise Platform’s Model Metadata Storage. For setting up RDS for the first time you can follow the docs here.
Some important points to remember while setting up RDS:
Make sure the instance is accessible from Seldon Enterprise Platform. If Seldon Enterprise Platform is not on the same VPC, make sure the VPC used by RDS has a public subnet as discussed here.
Make sure the security group used for accessing the RDS instances allow inbound and outbound traffic from and to Seldon Enterprise Platform. Setting up security groups for RDS is discussed here.
Once you have a running PostgreSQL instance, with a database and a user created, you can configure Seldon Enterprise Platform by adding the
metadata-postgres
secret as discussed in the previous section.
To manage backups see the official documentation. Here is more documentation on other best practices around RDS.
Google SQL¶
GCP provides a managed PostgreSQL solution that can be used for Seldon Enterprise Platform’s Model Metadata Storage. For setting up Google SQL for the first time you can follow the docs here.
For connection instructions follow the official documentation. Make sure that the instance is accessible from Seldon Enterprise Platform. If using the public IP generated for the instance make sure the network that runs Seldon Enterprise Platform is part of the Cloud SQL authorized networks by following this guide.
Once you have a running PostgreSQL instance, with a database and a user created, you can configure Seldon Enterprise Platform by adding the
metadata-postgres
secret as discussed in the previous section.
SSL Support¶
By default, Seldon Enterprise Platform will not perform any verification of the Postgres server certificate. To allow server certificate verification, change the SSL mode to verify-ca
or verify-full
as needed and place one or more root certificates in the ca.crt
key in the kubernetes secret. Intermediate certificates should also be added to the file if they are needed to link the certificate chain sent by the server to the root certificates stored on the client.
kubectl create secret generic -n seldon-system metadata-postgres \
--from-literal=user=your_user \
--from-literal=password=your_password \
--from-literal=host=your.postgres.host \
--from-literal=port=5432 \
--from-literal=dbname=metadata \
--from-literal=sslmode=verify-ca \
--from-file=ca.crt=/path/to/caFile \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -
Further, if the server attempts to verify the identity of the client by requesting the client’s leaf certificates, create another kubernetes TLS secret with client certificates for the connection. Here, we create a secret named postgres-client-certs
for this purpose. See helm chart configuration section for details on usage of these secrets created.
kubectl create secret tls -n seldon-system postgres-client-certs \
--cert=`/path/to/cert` \
--key=`/path/to/key` \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -
Running PostgreSQL in Kubernetes¶
You can also run PostgreSQL in the Kubernetes cluster that runs Seldon Enterprise Platform. We recommend using the Zalando PostgreSQL operator to manage the PostgreSQL installation and maintenance. The official documentation can be seen here.
Warning
If your cluster is using Kubernetes version 1.25 or higher, you should install version 1.9.0+ of Zalando’s PostgreSQL operator. You can also confer with their installation matrix.
The instructions that follow will help you to quickly spin up a PostgreSQL instance. However, we don’t recommend using it in a production context, and should be treated as development only.
Below we show an example deployment of a PostgreSQL cluster:
To install the Zalando operator you can run:
git clone https://github.com/zalando/postgres-operator.git
cd postgres-operator
git checkout v1.8.2 # Use a tag to pin what we are using.
kubectl create namespace postgres || echo "namespace postgres exists"
helm install postgres-operator ./charts/postgres-operator --namespace postgres
If you want to install the operator UI you can do it by following this doc.
To install a minimal PostgreSQL setup you can run:
cat << EOF | kubectl apply -f -
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: seldon-metadata-storage
namespace: postgres
spec:
teamId: "seldon"
volume:
size: 5Gi
numberOfInstances: 2
users:
seldon: # database owner
- superuser
- createdb
databases:
metadata: seldon # dbname: owner
postgresql:
version: "13"
EOF
For a more complex setup consisting of more users, databases, replicas, etc. please refer to the official documentation of the operator here.
Once the database instances have been created by the Zalando operator you can create the expected secret using the auto generated password:
kubectl get secret seldon.seldon-metadata-storage.credentials.postgresql.acid.zalan.do -n postgres -o 'jsonpath={.data.password}' | base64 -d > db_pass
kubectl create secret generic -n seldon-system metadata-postgres \
--from-literal=user=seldon \
--from-file=password=./db_pass \
--from-literal=host=seldon-metadata-storage.postgres.svc.cluster.local \
--from-literal=port=5432 \
--from-literal=dbname=metadata \
--from-literal=sslmode=require \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -
rm db_pass
Configuring Seldon Enterprise Platform¶
Once you have your PostgreSQL database ready and the secrets with credentials ready, add the following to deploy-values.yaml
. See SSL support section for configuring client certs for mutual TLS verification.
metadata:
pg:
enabled: true
secret: metadata-postgres
clientTLSSecret: "postgres-client-certs" # Optional, only needed for SSL verification
Warning
Setting metadata.pg.enabled
will cause the request logger to
automatically try to retrieve metadata from Enterprise Platform.
Ensure you have the correct configuration
for this to work properly.
Production operations on self-managed PostgreSQL¶
One of the drawbacks of using self-hosted PostgreSQL rather than a managed solution is that you will need to handle operating the PostgreSQL cluster. Here is a list of some resources for best practices and how to handle some operations:
Monitoring - deploying postgres exporter and hooking it up with your Prometheus monitoring solution is a common way of getting continuous monitoring of the instances.
Backups - the Zalando operator provides setup of periodic backups in s3 compatible storage - https://postgres-operator.readthedocs.io/en/latest/administrator/#wal-archiving-and-physical-basebackups. It also documents restoring state from backups - https://postgres-operator.readthedocs.io/en/latest/administrator/#restoring-physical-backups. We strongly recommend setting backups if self-hosting PostgreSQL.
Version update - Zalando supports cloning and in-place version updates - https://postgres-operator.readthedocs.io/en/latest/administrator/#minor-and-major-version-upgrade
Increase storage size - https://postgres-operator.readthedocs.io/en/latest/user/#increase-volume-size