Microservices deployment strategies on Kubernetes
Kubernetes provides a very advanced platform for making continuous deployment a reality for software development teams. Continuous deployments and delivery take away the dread and crunch associated with deploying new code to users. The ability to reliably deploy code and more importantly rollback when problems occur is invaluable to fast-moving development teams. This gives engineering teams the flexibility to experiment while iterating very fast.
With the advent of microservices and the rise of containers to run these microservices in, there are several deployment strategies that are available to engineering teams to use to achieve continuous delivery nirvana. In this post, we will look at few of them and discuss their advantages, disadvantages, and gotchas when executing these strategies.
Before we look at the different deployment strategies let’s get some terms straight.
Kubernetes has a high-level object that cluster users can interact with called ‘Deployment’. The Deployment object and the associated Deployment Controller allow Kubernetes admins to define the declarative state of the ReplicaSets and Pods that make up the Deployment. The Deployment object also lets us set an upgrade strategy to follow to update pods within the deployment. The default for this property is Rolling Upgrade.
When we talk about deployment strategies in this post we are not talking about the Kubernetes Deployment object strategy property but a general deployment pattern. We will also see how to implement these patterns within the Kubernetes context.
These are the Deployment strategies that we will be looking at:
- Rolling Upgrades
- Blue/Green Deployments
- Canary Releases
This is the traditional installation/upgrade process that most of us are familiar with. Put in a maintenance window for the application, stop all current running application instances and create new ones with the new version or upgrade the stopped instances to the current latest version and bring them back online – this isn’t a true continuous deployment strategy.
Kubernetes lets us use this deployment strategy by setting the deployment strategy to Recreate – this will kill all existing pods and then create new ones.
An obvious downside of this strategy is the downtime it creates between killing the old versions and spinning up the new one.
The beauty of this approach is its simplicity. There is very little complexity to manage here. We only deal with one version of the application at the same time this takes a lot of complexities like API Versioning, data schema collisions directly off the table.
Best usage scenario:
A recreate deployment strategy is best used for non-critical systems where downtime is acceptable and the cost of the downtime is not very significant.
Rolling updates are perfect for services that can be easily scaled horizontally. With this strategy, old service versions are retired as the new ones come up. This should be accompanied with a health check to ensure that the new versions are able to serve traffic.
Within Kubernetes, this health check can be defined as a readiness probe to the Pod. Kubernetes won’t send traffic the new pod’s way unless the readiness probe succeeds.
Since this is a phased deployment approach, unlike the previous one we discussed, rollback is smooth. If the new instances are found to be unhealthy, the old version can be scaled back up smoothly to their old capacity while the new versions are killed.
One thing to keep in mind with this strategy is, unlike the previous one, here we are managing 2 versions of our service in parallel. This needs that both versions of our code be backwards compatible. Schema changes also need to be carefully managed to not break the existing deployed version. API versioning also needs to be implemented when there are the inevitable backwards incompatible changes that exist in the new version of the service.
- Multiple version support: This strategy requires some careful engineering to be present while designing and building a service. And this adds complexity, managing backwards compatibility and versioning APIs require a rigorous test suite (ideally automated) and have to be carefully and mindfully managed between versions of the service.
- System opaqueness: Since we manage multiple versions of the service in parallel for a period of time any unforeseen interactions or side-effects that occur are significantly harder to recreate and debug later.
- Resource cost: The rolling upgrade strategy allows us to deploy new versions while not having or having a minimal impact on the quality of service of that particular service because new versions are brought up side by side as the old versions are wound down. This does mean there is an increased cost to having a greater number of instances of the service running that is strictly necessary for us to maintain the required QoS. This has become easier in the age of the cloud but it’s still an increased cost to take into consideration.
- No Downtime: This strategy allows us to do a controlled rollout across all instances of a service along with health checks that let us have confidence that the new version can handle requests.
- Rollback: Let’s face it, we’ve all deployed bad buggy code. The ability to roll back to a previous good version while having a small to no impact on the end users is very powerful when deployment is bad.
Rolling updates are great for when the deployed service is easily scalable horizontally. We do have to spend extra effort to ensure that new versions are backwards compatible with the old version deployed both on the data and API sides.
Blue/Green (aka Red/Black) Deployment
This approach mandates us to manage 2 full versions of our production environments. One of them is the live (blue) environment that is serving requests at the moment. The new versions of services are deployed to the second environment (green). After validations happen in the green environment and we are confident that it can serve traffic, traffic is switched to the green environment and now it’s the new blue environment.
This strategy, for the first time, introduces managing not only multiple versions but also multiple deployment environments. This does add a new layer of complexity to our deployment pipeline.
- Duplication: We need to maintain 2 full sized production environments for our service.
- Complicated management: In case of a large set of services, it becomes very resource intensive to manage this strategy.
- Data: Data needs to be synced between both environments to enable a switchover whenever we need.
- No fear of breaking production because we never deploy to the blue production environment.
- Truly no downtime deployments because traffic redirection happens on the fly.
- Since the green environment is a clone of currently running blue environment we are able to test the new version on a true production environment not a facsimile of production.
- Rollback is simple and involves just a quick switch over to the old ‘blue’ environment.
This kind of a deployment strategy is great for when there are a few services in the system and a provision to maintain two copies for the full production environment (use separate namespaces in K8s). This approach is also great with the autonomous services, this means we can test each service in isolation and flip traffic to the new version after it is certified.
Canary releases are named from a practice coal miner’s used to follow to survive mines with toxic gases. Coal miners would carry a canary in a cage down with them in the mine. If there were toxic gases in the mine the canary would be killed before the miners giving them an early warning to leave the mine. Canary releases allow us to test new versions of the service/application for potential problems with a small section of user base without affecting the test.
How does canary release work?
Here, new versions are deployed into the current live production environment (unlike the previous approach where we were deploying into a clone of the production environment). After the new version is up, we route a small amount of traffic to the new version. This could be a straight % of the traffic or we can select certain traffic criteria (client, user location, etc.) to drive the traffic to the new version. The new version is monitored very carefully for performance and error rates. If these measurements satisfy us we can gradually spin up more instances of the new version and transfer more traffic to this.
Canary deployments are an evolution of blue/green deployments with added controls around traffic splitting and quality control (performance and error rate) measurement. This means all the complexities of blue/green deployments are present and in some c