Upgrading to 1.6.0¶
Licence¶
Licences issued to activate 1.5.x and below versions of Seldon Deploy are not compatible with 1.6.x versions. Obtain a licence for 1.6.0 version by contacting Seldon.
Prometheus Operator¶
Seldon Deploy 1.6.0 has changed from using the seldon-core-analytics
Helm charts to using the Bitnami Helm charts as the default installation method for Prometheus. Thus, users using the older seldon-core-analytics
installed Prometheus will need to take note of the following:
Active models metrics namespace label name change
Prometheus URL change
kube-state-metrics
metric change
1. Active models metrics namespace label name change¶
Deploy uses Prometheus namespace labels to form queries to retrieve and filter model/deployment metrics (such as CPU/memory limits/requests and active models usage). These are displayed in the Usage Monitor and Resource Monitor dashboard.
There are two namespace labels that are used - the Deploy server namespace, and the active model namespace. In Deploy 1.6.0, using the Bitnami default configuration, these labels are namespace
and exported_namespace
respectively. However, using the old default seldon-core-analytics
configuration, these would be kubernetes_namespace
and namespace
respectively. To allow for backward compatibility, the Deploy Helm chart allows users to specify these namespace label names:
Current defaults:
prometheus: seldon: namespaceMetricName: "namespace" activeModelsNamespaceMetricName: "exported_namespace"
Changes required for backward compatibility with
seldon-core-analytics
Prometheus:prometheus: seldon: namespaceMetricName: "kubernetes_namespace" activeModelsNamespaceMetricName: "namespace"
2. Prometheus URL change¶
The default Prometheus URL in the Helm chart now points to the Bitnami Prometheus default endpoint. From Deploy 1.6.0 onwards, users will need to specify the Prometheus URL for older installations of Prometheus:
Current defaults:
prometheus: seldon: url: "http://seldon-monitoring-prometheus.seldon-system:9090/api/v1/" knative: url: "http://seldon-monitoring-prometheus.seldon-system:9090/api/v1/" env: ALERTMANAGER_URL: http://seldon-monitoring-alertmanager.seldon-system:9093/api/v1/alerts
Changes required for backward compatibility with
seldon-core-analytics
Prometheus:prometheus: seldon: url: "http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/" knative: url: "http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/" env: ALERTMANAGER_URL: http://seldon-core-analytics-prometheus-alertmanager.seldon-system:80/api/v1/alerts
3.kube-state-metrics
metric change¶
The Bitnami
Prometheus defaults to installing a much later version of the kube-state-metrics
agent. Unfortunately, this has led to a breaking change in the metrics for CPU/Memory Requests/Limits as the metrics we were previously using are not present anymore. Seldon Deploy from version 1.6 using these metrics instead:
kube_pod_container_resource_requests_cpu_cores
->kube_pod_container_resource_requests{resource="cpu",unit="core"}
kube_pod_container_resource_limits_cpu_cores
->kube_pod_container_resource_limits{resource="cpu",unit="core"}
kube_pod_container_resource_limits_memory_bytes
->kube_pod_container_resource_limits{resource="memory",unit="byte"}
kube_pod_container_resource_requests_memory_bytes
->kube_pod_container_resource_requests{resource="memory",unit="byte"}
If your Prometheus instance does not expose these (Seldon Core Analytics should still be compatible) this may be a breaking change. Therefore, you may not be able to view data for the CPU limits
, CPU requests
, Memory limits
, and Memory requests
page on the Usage Monitor dashboard.
If these dashboards are required, we recommend updating your Prometheus installation.
Upgrading on OpenShift¶
Seldon Deploy 1.6 installation has been tested on OpenShift 4.10. The full documentation for the installation process is available here. Here, we briefly discuss differences from the previous OpenShift installation that must be taken into account during upgrade process.
New Network Policies¶
Two new NetworkPolicy
resources seldon-detectors
and seldon-detectors-serving
must be created as discussed in Add NetworkPolicy Resources section.
Following the provided documentation create networkpolicy-detectors.yaml
manifest file and apply to all your model namespaces:
oc apply -f networkpolicy-detectors.yaml -n <model-namespace>
Cluster Log Forwarder¶
Section on Installing ClusterLogForwarder has been updated with pointers on how to limit log forwarding from specific namespaces. This is to limit disk usage on Elasticsearch instance by forwarding container logs only from namespaces hosting Seldon models.
New Monitoring Resources¶
Section on OpenShift Monitoring has been updated to include new PodMonitor
resources that need to be created. Following the provided documentation:
create
deploy-podmonitor.yaml
manifest file and apply it to Seldon Deploy namespace:oc apply -f deploy-podmonitor.yaml -n seldon-system
update
seldon-podmonitor.yaml
manifest file to includeseldon-podmonitor-metrics-server
resource and apply it again to all your model namespaces:oc apply -f seldon-podmonitor.yaml -n <model-namespace>
updated
model-usage-prometheus-rules.yaml
manifest file to includeseldon-podmonitor-metrics-server
resource and apply it again to all your model namespaces:oc apply -f model-usage-prometheus-rules.yaml -n <model-namespace>
New Alerting Subsection¶
The new subsection has been added. Follow it to configure alerting in your cluster.
Seldon Core and Deploy¶
Follow Seldon Core configuration for Seldon Core v1 and make following YAML change to make use of RClone storage initializer:
- name: RELATED_IMAGE_STORAGE_INITIALIZER value: "seldonio/rclone-storage-initializer:1.13.1"
Change following values in
values-openshift.yaml
file:image: seldonio/seldon-deploy-server:1.6.0 env: ALERTMANAGER_URL: https://alertmanager-main.openshift-monitoring:9094/api/v1/alerts
Obtain new Seldon Deploy Helm charts as described here and execute helm upgrade ...
command as described in the documentation to upgrade Seldon Deploy.
‘Content-Type: application/json’
–data-raw “{“source”: {“index”: “\({OLD_INDEX}\"}, \"dest\": {\"index\": \"\){NEW_INDEX}”}}”
7. Delete the old index to avoid duplicates in the Requests Dashboard
bash
curl –request DELETE “\({ES_ADDR}/\){OLD_INDEX}”
```
A similar set of steps are required for reference data except for a few key differences:
In step ©, note that the old index pattern for reference data did not include the endpoint. This information can be found in the deployment spec as with the inference logs. The pattern that is followed will be
Old index pattern:
reference-log-<serving engine>-<deployment namespace>-<deployment name>
New index pattern:
reference-log-<serving engine>-<deployment namespace>-<deployment name>-<deployment endpoint>-<deployment node>
Between step (d) and (e), run the following to add the
Ce-Modelid
field to the mapping:MAPPINGS=$(echo $MAPPINGS | jq ".\"properties\" += {\"Ce-Modelid\": {\"type\": \"keyword\"}}")
After step (f), add the model id to the
Ce-Modelid
field in the new index.export MODEL_ID=income-container curl --request POST "${ES_ADDR}/_reindex/update_by_query" \ --header 'Content-Type: application/json' \ --data-raw "{\"script\": {\"source\": \"ctx._source.Ce-Modelid = params.modelId\", \"lang\": \"painless\", \"params\": {\"modelId\": \"${MODEL_ID}\"}}, \"query\": {\"match_all\": {}}}"
Helm values¶
If you have customised your Deploy installation, please be aware that the following Helm values have changed since v1.4.0:
Name |
Previous value |
Current value |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|