I recently got to proctor an Openhack event on modern containerization. It ended up an excuse to dig deep on one of the corner cases that we all encounter, but no one likes to talk about.
Kubernetes is one of the greatest orchestration and scaling tools ever built, designed for modern decoupled, stateless architectures. Kubernetes tutorials abound to show you these strong use cases. But in the real world where you don’t get to build “green field” every time, there are a lot of applications that don’t fit that model.
Lots of people out there are still writing tightly-coupled monoliths, in many cases for good reason. In some use cases microservices style scalability isn’t even useful – you actually prefer stateful applications with tight coupling. For example a game server, where you don’t want to scale player capacity per-game, you want to add more games (server instances).
So today I’m writing about stateful, non-scalable applications in kubernetes.
There are a few different approaches to coupling appliciation components:
Level 0 is to simply specify multiple components (containers) in your deployment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
This specifies 3 copies of the same application, with the same two containers in each replica. This is a coupled application, but it’s still stateless. Let’s add a volume – that’s where we get into trouble.
The problem: If you add a Volume the normal way (persistentVolumeClaim), each of your replicas will try and connect to the same volume. It’ll act like a network shared drive. Maybe that’s OK for your application, but not if it’s our super-stateful example! And depending on your volume class, the volume may reject multiple connections like (Azure Disk does, for example).
So how do we get around this limitation? I want a separate volume for each instance of the application.
Kubernetes supports a different object type for this use case, called a StatefulSet. This is exactly what it sounds like: a set of objects that define a stateful application. It’s a template for creating multiple copies of all resources defined therein.
A statefulset will create replicas similar to a deployment, but it will set up separate Volumes and VolumeClaims for each one. The replicas will be identical except for an index number at the end of the labels. The first one might be called
nginx-deployment-0, the second:
nginx-deployment-1, and so on. The result is a set of tightly coupled components, which can be individually addressed, and scaled using normal Kubernetes tools.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
There are a few details to notice here.
Yes, we’ve replaced Deployment with StatefulSet. You get a shiny gold star if you noticed that one.
The interesting part is the VolumeClaimTemplates section, below the containers definition. This keyword only exists inside a StatefulSet, and it’s just what it sounds like: a template for creating Persistent Volume Claims.
If you apply this config, you’ll see three PVs created, with three PVCs, attached to three Pods. You can apply HPA rules to scale these up and down just like you would with deployments.
There’s also that weird Service at the bottom. A naked service with no clusterIP? What’s the point? The point is as a helper for Kubernetes’ internal DNS. All of those nice StatefulSet pods will come under a neat subdomain, eg nginx-0.nginx, nginx-1.nginx, etc. Additionally you can connect to active members of the StatefulSet by using that nginx domain component. A dns lookup on it will show all the IPs of the active members in the CNAME record.
“But what about external access?” I hear you cry. Yes, we’ve built a great stateful application that can scale instances, but it’s only internally addressable! Good luck hosting those games…
External access and metacontroller
Normally you would put a LoadBalancer service in front of your application. But a Kubernetes load balancer will grab all of these StatefulSet members – so you can’t address them externally one-by-one. What you really want to do, is create an external IP address for each statefulset member.
One solution is to use a reverse proxy like nginx or HAProxy, configured to differentiate based on hostnames. But this is a blog post about Kubernetes, so we’re going to do this the Kubernetes way!
Kubernetes is very extensible. If Pods, Services, etc don’t make sense for your application or domain, you can define custom object types and behaviors, through custom resources and controllers. That’s pretty edge case, but as we’ve seen, some kubernetes edge cases are mainstream cases in the real world.
In our super-stateful application, we don’t need a custom resource type. But we do want to attach custom behaviors to our StatefulSet: every time we start up a pod we should create a LoadBalancer for it. We should be nice and tear them down when the pods are scaled down, of course.
We’ll use the Metacontroller add-on to make our lives easier. Metacontroller makes it “easy” to add custom behaviors. Just write a short script, stick it into a ConfigMap or FaaS, and let Metacontroller work its magic!
Metacontroller project comes with several well documented examples, including one that’s very close to our requirement: service-per-pod.
Step 1 is to install Metacontroller, of course:
1 2 3 4
Then we’ll add some new metadata to our existing StatefulSet. The metacontroller script will use these values to configure the load balancers.
1 2 3 4 5 6 7
We also need to tell Kubernetes to decorate each StatefulSet with a pod-name label. We do this in the StatefulSet’s pod template.
1 2 3 4 5 6 7
Note: this only works in k8s 1.9+ – if you’re stuck with a lower version, you can script this action with Metacontroller, too. :).
Now you’re going to need two hooks. Put them in a directory together so they’re easy to apply at once. These ones are written in jsonnet, but you could write this in whatever language you like.
The first hook actually creates the LoadBalancer for each Pod.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
The other hook is the “finalizer” – it responds to changes or deletions in pods by tearing down the corresponding LoadBalancers.
1 2 3 4 5 6 7
Add those into a subdirectory, and put them into a configmap together. Metacontroller will run them from there.
Now apply the actual decorator controller which will run those functions. Note that you have to identify your hook jsonnet files by (file) name! Get the name wrong, and the finalizer will hang forever, preventing you from deleting your statefulset. In my case, the files were called
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
That’s it! Now you can scale complete replicas of a very-stateful application with a simple
kubectl scale sts nginx --replicas=900.
Enjoy bragging to your friends about your “macroservices architecture”, pushing the limits of Kubernetes to run and replicate a stateful monolith!
Everyone hates writing YAML. Check out the sample code for this post on Github