Entering into the Docker world: A Hitch-hikers Guide to Clustering

Chaitanya Deshpande
Posted by Chaitanya Deshpande on March 28, 2016

Docker, a relatively new container technology, is making waves the size of a Tsunami these days. Why? Simply because it makes the packaging and shipping of your apps much faster and uncomplicated. Along with the added benefit of being a leaner, lighter, OS-independent alternative to virtual machines, Docker makes it easy to run a number of apps on the same old servers. All this wrapped up in a pretty bow of open source software, of course Docker has enough reason to be the popular kid in school. Linux virtualization techniques like LXC (LinuX Containers) have always been around, however due to the expertise and training required to actually deploy and maintain them, they never got much preference over Virtual Machines (VM).  This article will explain the need for containerized applications as well as roles of different tools and supporting frameworks for Docker containers.

Clustering using Docker

What difference does Docker make?

The kaleidoscope of changes that transformed the architecture, infrastructure and usage of applications since the beginning of the World Wide Web can be described as nothing less than radical. We drastically moved from thick clients to thin clients, and all the way to mobile apps. In a similar way, we progressed from dedicated hosted servers to Virtual Private Servers (VPS) and finally towards cloud infrastructure. The simple rule of cause and effect would dictate that our applications will, therefore, need a lot of flexibility and reliability to accommodate any further changes.

These changes also mean that an application must support high scaling and availability in order to be relevant and cutting edge. Due to such needs, we need to have a complete ecosystem of infrastructure and resources that ensures smooth deployment of our application. Now, to enable interaction between components of such an ecosystem, a lot of configurations are made necessary. Furthermore, the applications also change dynamically to keep up with the market requirements and trends. Hence, there is a need for continuous pipeline of different stages like development, QA, Production. Now imagine the complexity of making those configurations on each stage of the development pipeline and migrating the application through these stages.

All this makes developing a sleek intuitive application seem like an unconquerable mountain. However, there does happen to be an easy solution to this complication.  Docker, of course!

Akin to how shipping containers are used to ship goods and are standardised for multiple transport methods, in the software we have the concept of Docker containers which enable any payload and its dependencies to be encapsulated as a lightweight, portable, self-sufficient container. This container can be manipulated using standard operations and run consistently on practically any hardware platform.

Following are some popular use cases for Docker:

Docker Use Cases

In the DevOps world, there are two ways to look at docker, namely - Developer view: Build Once, Run Anywhere and Operations view: Configure Once, Run Anything.

Clustering using Docker

How Docker can be used as a Cluster:

The ability to package, transfer and run application code across multiple environments is all good fun but it adds the complexity of how to manage the applications as a whole. Moreover, as we increase the use of containers, new challenges come up in terms of managing which containers run where, dealing with large numbers of containers and facilitating communication between containers across hosts, etc. This is where clustering tools come into the picture. Some fairly popular tools would be -

Fleet is a low-level and fairly simple orchestration layer that can be used as a base for running higher level orchestration tools, such as Kubernetes or custom systems.

With Swarm, it is very simple to use the standard Docker interface and to integrate it into existing workflows. However, it may make it difficult to support the more complex scheduling that can be defined in the custom interfaces.

Kubernetes is an opinionated orchestration tool that comes with service discovery and replication baked-in. It may require some re-designing of existing applications, but used correctly, it will result in a fault-tolerant and scalable system.

Mesos is a low-level, battle-hardened scheduler that supports several frameworks for container orchestration including Marathon, Kubernetes, and Swarm. Kubernetes and Mesos are more developed and stable than Swarm. In terms of scale, only Mesos has been proven to support large-scale systems consisting of hundreds or thousands of nodes. However, when looking at small clusters of, say, less than a dozen nodes, Mesos may be an overly complex solution.

Clustering using Docker

On its own, Mesos only provides the basic “kernel” layer of your cluster. It lets other applications request resources in the cluster to perform tasks, but does nothing itself. Frameworks bridge the gap between the Mesos layer and your applications. They are higher level abstractions which simplify the process of launching tasks on the cluster.

Consider the following cluster configuration, here, the resources for your application is handled at the kernel level by Mesos. At the same time, as parts of the application are dockerized, they are portable and relatively hardware independent.

Following are some tools built on Mesos that are well liked:

Long Running Services

  • Aurora is a service scheduler that runs on top of Mesos, enabling you to run long-running services that take advantage of Mesos' scalability, fault-tolerance, and resource isolation.
  • Marathon is a private PaaS built on Mesos. It automatically handles hardware or software failures and ensures that an app is "always on."
  • Singularity is a scheduler (HTTP API and web interface) for running Mesos tasks: long running processes, one-off tasks, and scheduled jobs.
  • is a simple web application that provides a white-label "Megaupload" for storing and sharing files in S3.


Big Data Processing

  • Cray Chapel is a productive parallel programming language. The Chapel Mesos scheduler lets you run Chapel programs on Mesos.
  • Dpark is a Python clone of Spark, a MapReduce-like framework written in Python, running on Mesos.
  • Exelixi is a distributed framework for running genetic algorithms at scale.
  • Hadoop : Running Hadoop on Mesos distributes MapReduce jobs efficiently across an entire cluster.
  • Hama is a distributed computing framework based on Bulk Synchronous Parallel computing techniques for massive scientific computations e.g., matrix, graph and network algorithms.
  • MPI is a message-passing system designed to function on a wide variety of parallel computers.
  • Spark is a fast and general-purpose cluster computing system which makes parallel jobs easy to write.
  • Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data. It implements for realtime processing what Hadoop did for batch processing.


Batch Scheduling

  • Chronos is a distributed job scheduler that supports complex job topologies. It can be used as a more fault-tolerant replacement for cron.
  • Jenkins is a continuous integration server. The Mesos-Jenkins plugin allows it to dynamically launch workers on a Mesos cluster depending on the workload.
  • JobServer is a distributed job scheduler and processor which allows developers to build custom batch processing Tasklets using point and click web UI.
  • Torque is a distributed resource manager providing control over batch jobs and distributed compute nodes.


Data Storage

  • Cassandra is a highly available distributed database. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
  • ElasticSearch is a distributed search engine. Mesos makes it easy to run and scale.
  • Hypertable is a high performance, scalable, distributed storage and processing system for structured and unstructured data.


Now-a-days, microservice architectural applications have multiple distributed systems dedicated to specific tasks. Docker and Mesos together, where Mesos can run and manage Docker containers in addition to Marathon frameworks, can be a viable solution to the micro-service scenario. Docker containers provide a consistent, compact and flexible means of packaging application builds. Delivering applications with Docker on Mesos promises a truly elastic, efficient and consistent platform for delivering a range of applications on premises or in the cloud. Hence, depending on how your application makes use of docker containers (refer Docker usecases above), an appropriate framework should be chosen to work with the Docker-Mesos configuration.

To know more about how Docker can help your application, contact our experts!

Topics: Cloud, DevOps, & Containers, Clustering Tools, Docker, Mesos

Leave Comment

Subscribe Email

    Post By Topic

    See all