Pin a Kubernetes pod to the current node to avoid (hostPath) data loss


here's a handy one-liner to pin a running pod to the node it's currently on:

kubectl patch deployment -n $NAMESPACE $DEPLOYMENT -p '{"spec": {"template": {"spec": {"nodeSelector": {"": "'$(kubectl get pods -n $NAMESPACE -o jsonpath='{ ..nodeName }')'"}}}}}' || (echo Failed to identify current node of $DEPLOYMENT pod; exit 1)

The long version

I've been supporting with the Portainer team with a helm chart for their new v2, Kubernetes-supporting version. Recently the boss told me:

"Sometimes, when using one of these small/development, multi-node Kubernetes clusters like k3s or microk8s, Kubernetes will schedule the pod to a particular node, but when the pod moves to a different node, the data is lost. Find a way to ensure that the pod always remains on the same node"!

"Nonsense", I replied. "The Kubernetes storage provisioner will be smart enough to ensure that an allocated PV doesn't just move to a different node". And to prove how smart I was, I illustrated by creating a multi-node KinD cluster:

❯ cat kind.yaml
kind: Cluster
- role: control-plane
- role: worker
- role: worker

❯ kind create cluster --config kind.yaml
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.19.1) ?
 ✓ Preparing nodes ? ? ?
 ✓ Writing configuration ?
 ✓ Starting control-plane ?️
 ✓ Installing CNI ?
 ✓ Installing StorageClass ?
 ✓ Joining worker nodes ?
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! ?

I created the namespace, added the helm repo, and deployed the chart:

❯ kubectl create namespace portainer
namespace/portainer created
❯ helm repo add portainer
❯ helm repo update
❯ helm upgrade --install -n portainer portainer portainer/portainer
Release "portainer" does not exist. Installing it now.
NAME: portainer
LAST DEPLOYED: Wed Dec  9 21:08:09 2020
NAMESPACE: portainer
STATUS: deployed
1. Get the application URL by running these commands:
  export NODE_PORT=$(kubectl get --namespace portainer -o jsonpath="{.spec.ports[0].nodePort}" services portainer)
  export NODE_IP=$(kubectl get nodes --namespace portainer -o jsonpath="{.items[0].status.addresses[0].address}")
  echo http://$NODE_IP:$NODE_PORT

I examined the PV created by the deployment and saw, as expected, a nodeSelector:

> kubectl get pv -o yaml
      - matchExpressions:
        - key:
          operator: In
          - kind-worker

"Boom!", I said. "There's no problem, because Kubernetes won't let the pod run on a different node, due to the nodeSelector".

Not so fast!

"Try microk8s", the boss said, "it happens all the time..."

So I did. Grumbling about how much harder it is to setup a multi-node microk8s environment, I used Multipass to create 2 Ubuntu 20.04 VMs, and then followed the instructions re setting up a microk8s cluster.

Sure enough, as it turns out, when I examined the microk8s PV, there was no nodeSelector. Microk8s, it turns out, uses a simple hostPath-type provisioner!

Where's my data?

So this presents a problem for any application deployed on a multi-node microk8s cluster, as well as any other cluster using a hostPath-based storage provisioner. We came up with what I think is an elegant solution though..

This command will return the current node of a pod (provided that pod has been scheduled):

kubectl get pods <podname> -o jsonpath='{ ..nodeName }'

And this command will patch a deployment, adding a nodeSelector:

kubectl patch deployments <deploymentname> -p '{"spec": {"template": {"spec": {"nodeSelector": {"": "<nodename>"}}}}}'

Combined, we get this neat little command, a variation which is now featured on the Portainer install docs:

kubectl patch deployment -n $NAMESPACE $DEPLOYMENT -p '{"spec": {"template": {"spec": {"nodeSelector": {"": "'$(kubectl get pods -n $NAMESPACE -o jsonpath='{ ..nodeName }')'"}}}}}' || (echo Failed to identify current node of $DEPLOYMENT pod; exit 1)

It should be noted that pinning a pod to a node obviously reduces resiliency in the event that a node fails, and something like this shouldn't be attempted seriously in production. If you're using microk8s though, you're probably not in serious production, so go wild!

