ES Index migration simplified!

Sachin Gade
Posted by Sachin Gade on June 20, 2019

Data migration may conceptually sound simple but it is one of those things that are not as simple as they sound. The lack of integration and synchronization between system components may lead to the loss or corruption of critical data. It may also lead to data quality issues along with security challenges which may incur heavy financial and reputational risks. Moreover, recovering data is an extremely tedious, time-consuming, and costly affair. And even after going through all this, the data needs to be checked and validated, resulting in extended downtimes. Considering the burnout, security challenges, and data quality issues involved in data migration, most organizations treat data migration as a distinct project. This blog is about how we addressed the exact problem for one of our clients with the help of ES Index Migration. But before we start with the actual solution for your migration needs, let me tell you in brief about ES Index and ES Index Migration.

What is ES Index?

An Index is usually a set of one or more Documents whereas a Document encompasses one or more fields. While referring to databases, a table row is termed as a Document and a table column is termed as a Field. An Index is like a database within a relational database and has mapping defining multiple types. It is a logical namespace that maps one or more primary shards and can have zero or more replica shards. Elasticsearch can fast search responses as it searches an Index instead of searching the text directly. This is like retrieving pages in a book related to a keyword by scanning the Index at the back of a book, instead of searching every word on every page of the book.

What is ES Index Migration?

Imagine a scenario where you want to migrate production Elasticsearch data to staging for QA purposes. There could be a lot of scenarios such as:

  • Upgrading underlying cloud infrastructure/instance types for ES deployment
  • Migrating from one cloud to another cloud
  • Migrating from on-premises to cloud or vice versa
  • Creating a pre-prod/staging environment from production data clusters

Here you may want to migrate Indices/data from one Elasticsearch cluster to another Elasticsearch cluster.

You can migrate ES Indices via the following ways:

  • Index your data from the original source
  • Reindex from a remote cluster
  • Restore from a snapshot

To ease out this operation, we created a simple Python-based automation tool that can migrate data from one Elasticsearch database to another using S3-based snapshot.

To migrate using S3-based snapshot, both the clusters need to install the given plugin:

sudo bin/elasticsearch-plugin install repository-s3

Here is how it works:

  • Clone the following Repo on your machine:
    $> git clone
    $> cd elasticsearch_migrator
    $> pip install -r requirements.txt
  • Update migration.yml with required config values:
    source_es_port: 9200
    dest_es_port: 9200
    indices_list: indices1,indices2
  • Run migration utility:
    $ python

How it migrates?
How it migrates

That was a brief overview of how the tool works and how it takes care of data migration projects. It is clear from the above explanation that the tool interacts with three components in the environment - the source database, the destination database and the storage element that acts as a medium between these two databases. The whole purpose is to achieve data migration in the most simple and efficient way. I hope this simplifies the Index migration process for your Elasticsearch.

Topics: Cloud, DevOps, & Containers, AWS, Python, ES Index migration, ES Index, Elasticsearch

Leave Comment

Subscribe Email

    Post By Topic

    See all