Introduction to Heptio Ark
An Ark is a vessel or sanctuary that is supposed to serve as protection against extinction. The most famous Ark in history is Noah’s Ark, which the Bible says was built with instructions from God to protect Noah’s family and the pairs of animals he was instructed to bring on board. A more modern Ark would be the Svalbard Global Seed Vault which stores seeds in case of a global crisis.
Why are we talking about Arks on a technology blog? Because last week the folks at Heptio announced the release of Ark. Heptio Ark is a solution that helps ease the pain of cluster backups and restorations for Kubernetes admins.
As the production use of Kubernetes increases, many organizations face challenges backing up and restoring their Kubernetes clusters. At Opcito, we’ve helped clients restore their clusters by dumping cluster state from etcd. This method is hit-and-miss and the addition of persistent volumes and stateful loads makes it a much more complex affair. Divining the relationship between a volume’s snapshot and the pods it was running at a point in time is tricky because of the dynamic nature of the pods and the fact they can be rescheduled to other nodes transparently.
Heptio Ark is aimed at solving this problem of backing up Kubernetes clusters and allowing easy restoration of those backups. Ark allows you to create backups of all cluster components easily. Ark also manages volume snapshotting in a way that maintains pod associations. Restorations are a single command affair and can even be partial (e.g., scoped to a namespace).
Ark also opens up some interesting use-cases around testing because now you can snapshot your production cluster and stand up a perfect replica in your test and staging environments with a single command. Ark also lets you move between environments breaking any cloud lock-in you may have for a platform. It’s launching in alpha with support for AWS, Azure, and Google Cloud Platform. Heptio says the platform is extensible and more environments will be added in the future.
Your cluster needs to be running at least Kubernetes 1.7 to use Ark as your backup and restore solution.
Ark under the hood.
Ark leverages CRDs (Custom Resource Definitions) that allow you to extend the Kubernetes API with custom user-defined resources. CRDs replace the deprecated TPR (Third Party Resource) and are available in Kubernetes 1.7. Custom controllers complete the loop allowing developers to define behavior based on the state stored within Custom Resource Definitions.
Each of Ark’s operations (Backup, Restore and Schedules) are defined as CRD’s and have custom controllers that allow operations on the data they store.
Ark has 2 modes of running:
- Ark client: Allows you to query, create, and delete the Ark resources as desired.
- Ark server: Runs all of the Ark controllers. Each controller watches its respective custom resource for API operations, performs validation, and handles the majority of the cloud API logic (e.g., interfacing with object storage and persistent volumes).
Let’s look at a specific example of an ark command and the actions it triggers.
Running ark backup create test-backup --snapshot-volumes command triggers the following operations:
- The ark client makes a call to the Kubernetes API server, creating a Backup custom resource (which is stored in etcd).
- The BackupController sees that a new Backup has been created and validates it.
- Once validation passes, the BackupController begins the backup process. It collects data by querying the Kubernetes API Server for resources.
- Once the data has been aggregated, the BackupController makes a call to the object storage service (e.g. Amazon S3) to upload the backup file.
- If the --snapshot-volumes flag is specified, Ark also makes disk snapshots of any persistent volumes using the appropriate cloud service API.
This new project from the folks at Heptio solves a long-standing pain point with Kubernetes and makes Kubernetes itself a more reliable solution by providing easy backup and restoration of running Kubernetes clusters. We are excited to leverage this and make the lives of Kubernetes Admins easier. :)
You can follow the development of this exciting new tool on GitHub here: https://github.com/heptio/ark