Backup and restore Neo4j Graph Database using Ansible

Akash Surwase
Posted by Akash Surwase on November 22, 2019

Increasing data volumes are like double-edged swords. The good thing about that is - a lot of data means a lot of information and a lot of value that comes with it. However, it also means a lot of technology to store and analyze this information is on its way. Thankfully, the graph database with its agility, performance, and flexibility is here to help us through this.

A graph database uses graph structures with two major elements for semantic queries viz. nodes and relationships. Nodes are the entities in the graph and relationships represent the connections between two nodes. In normal databases, data is the most important entity but a graph database treats the relationships between data with equal importance. Instead of narrowing down the approach towards data with a predefined model, the data is analyzed based on the inter-relation between two nodes. Let’s take an example that will help you understand what I am talking about. Just to make sure everyone gets it, I will select everyone’s favorite topic.

graph structures

In this particular example New England Patriots, Tom Brady, and Greater Boston are the nodes. The relationship between these nodes is represented over the arrow and the arrow represents that these relationships are directional. The additional attributes associated with the nodes are properties as represented below the node.

Neo4j is open-source and provides an ACID-compliant transactional backend. Neo4j is referred to as a native graph database because of the efficient implementation of the property graph model down to the storage level. This means that the data is stored exactly as you whiteboard it and the database uses pointers to navigate and traverse the graph. When it comes to production scenarios, Neo4j assures cluster support and runtime failover. Apart from these and the efficient store, process, and raise queries, here are some of the features and advantages that make Neo4j a popular choice among graphical databases -

  • It follows the Property Graph Data Model.
  • It supports UNIQUE constraints which ensures the uniqueness of data stored.
  • It contains a UI (Neo4j Data Browser) to execute CQL Commands that help to create and alter different storage unit for data. Neo4j CQL query language commands are in a readable format and very easy to learn.
  • The ACID(Atomicity, Consistency, Isolation, and Durability) rules support ensures the validity of the data.
  • Support for Cypher API and Native Java API makes it easy to develop Java applications./li>
  • It is very easy to represent and retrieve/navigate connected data as well as it represents semi-structured data very easily.

Recently, in one of the projects I was working on, we wanted to automate the Neo4j database backup and restore process. We used the Ansible’s configuration management capabilities to automate this Neo4j database Backup and Restore. Ansible is open-source and I love the way it enhances the scalability, consistency, and reliability of any IT environment. You can use Ansible to automate tasks such as - provisioning servers you need in your infrastructure or for configuration management or application deployment.

Now, let us see how the actual backup and restore process is like.

Backup Role

In case of unwanted scenarios like accidental deletion of data or a database instance gone down, to revert to the previous database state, we need to have the latest backup file of the running database. To save the existing Neo4j database, I have created this Backup role that can take a backup of the existing Neo4j instance and push the backup file to S3 bucket.

Here is an Ansible playbook to take the backup of Neo4j Database -

# tasks file for neo4j_backup - name: Create a neo4j backup file with proper permissions
path: /var/lib/neo4j/import/
state: touch
owner: neo4j
group: adm
mode: '0644' - name: Copy cypher file on neo4j instance
src: backup.cypher.j2
dest: /tmp/backup.cypher - name: Taking a backup
shell: cat /tmp/backup.cypher | cypher-shell -u neo4j -p neo4j123 - name: install boto packages
name: boto botocore boto3
executable: pip-3.3 - name: Copy file to S3 bucket
ansible_python_interpreter: /usr/bin/python3
aws_s3:aws_access_key: ""
aws_secret_key: ""
bucket: neosrc: /var/lib/neo4j/import/
mode: put
object: ""
PATH: /usr/bin/python3

To execute the backup command in the cypher-shell, I have created a backup.cypher.j2 file in the roles template folder -

CALL apoc.export.graphml.all('', {useTypes:true, storeNodeIds:false});

:To back up the existing Neo4j database, I have created the neo4j-backup role. The backup.cypher.j2 file will help to execute the backup command. The task written in the tasks/main.yml file will execute the cypher file with the help of a cat command to log in to the Neo4j database using the database credentials. Once the process is completed the backup file will get pushed to the specific S3 bucket in AWS.

To execute the Neo4j Backup role I have created a neo4j-backup.yml playbook -

- name: Neo4j-Backup
hosts: neo4j
gather_facts: true
remote_user: ubuntu
become: true
- neo4j_backup
To run this neo4j-backup role, the following command can be used:
ansible-playbook -i inventory neo4j-backup.yml -e AWS_ACCESS_KEY_ID=****************** -e AWS_SECRET_ACCESS_KEY=********************** -e environment_name=poc -vvv

Restore Role

Now that we have a backup file with us let’s see how to restore your database to the previous state. To restore the Neo4j database, download the latest backup file from the S3 bucket and restore it on the newly created database instance.

Here is an Ansible playbook to restore the Neo4j database -

# tasks file for neo4j_restore - name: install boto packages
name: boto botocore boto3
executable: pip-3.3 - name: Download file from S3 bucket
ansible_python_interpreter: /usr/bin/python3
aws_access_key: ""
aws_secret_key: ""
bucket: neo4j-backup
object: ""
dest: /home/ubuntu/
mode: get
PATH: /usr/bin/python3 - name: Copy cypher file to restore neo4j database
src: restore.cypher.j2
dest: /tmp/restore.cypher - name: Restoring Backup
shell: cat /tmp/restore.cypher | cypher-shell -u neo4j -p neo4j123

To execute the restore command in the cypher-shell I have created a restore.cypher.j2 file in the roles template folder -

CALL apoc.import.graphml('./', {batchSize: 10000, readLabels: true, storeNodeIds: false});

To restore the backed-up Neo4j database on a new instance, I have created the neo4j-backup role. In this, I have created the restore.cypher.j2 file to execute the backup command. After that, in tasks/main.yml file, I have written the task to execute the cypher file with the help of cat command to log in to the neo4j database using database credentials. To restore the database, you need to pass the backup file that is on the S3 bucket with date and time.

To execute the Neo4j Backup role, I have created a neo4j-backup.yml playbook -

- name: Neo4j-Restore
hosts: neo4j
gather_facts: False
remote_user: ubuntu
become: true
- neo4j_restore

To run this neo4j-restore role, use the following command -

ansible-playbook -i inventory neo4j-restore.yml -e
AWS_ACCESS_KEY_ID=************************ -e
AWS_SECRET_ACCESS_KEY=******************** -e
BACKUP_FILE=neo4j_backup_file.graphml -vvv

Using this backup role you will be able to take the backup from the Neo4j instance and push the backup file to the S3 bucket. Similarly, you can download the backup file pushed to the S3 bucket on the Neo4j instance to restore the Neo4j Database using the restore role. This was a quick rundown of how you can backup and restore Neo4j Database using Ansible. If you have any queries or want to share suggestions to ease the restore and backup process, feel free to comment.

Topics: AWS, Product & Test engineering, DataOps, Graph database, Ansible, Neo4j

Leave Comment

Subscribe Email

    Post By Topic

    See all