Unleashing a Docker Swarm orchestrator is a great (and relatively easy) way to deploy a container cluster. Yes, you could go with Kubernetes for more management features, but when you need the bare bones of a simple container cluster, [Docker Swarm](https://docs.docker.com/engine/swarm/) is a pretty good way to go.
The one thing you might find yourself needing is persistent storage for your cluster. What is persistent storage? I’m glad you asked. To put it simply, persistent storage is any type of data storage device that retains data, even after power to the device is cut off. With regards to a container, persistent storage is storage that remains, even if the container isn’t running. In other words, persistent storage is found on the hosting server, so when the container is spun down, the data within the storage is still accessible. Or, if the container is a part of the swarm, that persistent storage can be shared between nodes.
To any container developer, persistent storage is often a must-use tool. With some container technology, persistent storage can be done quite simply. Although with Docker you can use volumes, the problem with that feature is that it is a local-only system. Because of that, you need to make use of third-party software like NFS or GlusterFS. The big downfall with NFS is it’s not encrypted. So for many businesses and developers, GlusterFS is the way to go.
I want to walk you through the process of using GlusterFS to share persistent storage in a Docker Swarm.
I’ll be demonstrating on a small cluster with one master and two nodes, each of which will be running on Ubuntu Server 18.04. So for that, you’ll need:
That’s all you need to make this work.
Before you get going, it’s always best to update and upgrade your server OS. To do this on Ubuntu (or any Debian-based platform), open a terminal and issue the commands:
sudo apt-get update
sudo apt-get upgrade -y
Should your kernel upgrade in the process, make sure to reboot the server so the changes will take effect.
We now need to map our IP addresses in /etc/hosts. Do this on each machine. Issue the command:
sudo nano /etc/hosts
In that file (on each machine), you’ll add something like this to the bottom of the file:
192.168.100.101 docker-master
192.168.100.102 docker-node1
192.168.100.103 docker-node2
Make sure to edit the above to match your IP addresses and hostnames.
Save and close the file.
If you haven’t already done so, you need to install and deploy the Docker Swarm. On each machine install Docker with the command:
sudo apt-get install docker.io -y
Start and enable Docker with the commands:
sudo systemctl start docker
sudo systemctl enable docker
Add your user to the docker group (on all machines) with the command:
sudo usermod -aG docker $USER
Issue the following command (on all machines) so the changes take effect:
sudo newgrp docker
Next, we need to initialize the swarm. On the master issue the command:
docker swarm init --advertise-addr MASTER_IP
Where MASTER_IP is the IP address of the master.
Once the swarm has been initialized, it’ll display the command you need to run on each node. That command will look like:
docker swarm join --token SWMTKN-1-09c0p3304ookcnibhg3lp5ovkjnylmxwjac9j5puvsj2wjzhn1-2vw4t2474ww1mbq4xzqpg0cru 192.168.1.67:2377
Copy that command and paste it into the terminal window of the nodes to join them to the master.
And that’s all there is to deploying the swarm.
You now need to install GlusterFS on each server within the swarm. First, install the necessary dependencies with the command:
sudo apt-get install software-properties-common -y
Next, add the necessary repository with the command:
sudo add-apt-repository ppa:gluster/glusterfs-3.12
Update apt with the command:
sudo apt-get update
Install the GlusterFS server with the command:
sudo apt install glusterfs-server -y
Finally, start and enable GlusterFS with the commands:
sudo systemctl start glusterd
sudo systemctl enable glusterd
If you haven’t already done so, you should generate an SSH key for each machine. To do this, issue the command:
ssh-keygen -t rsa
Once you’ve taken care of that, it’s time to continue on.
Now we’re going to have Gluster probe all of the nodes. This will be done from the master. I’m going to stick with my example of two nodes, which are docker-node1 and docker-node2. Before you issue the command, you’ll need to change to the superuser with:
sudo -s
If you don’t issue the Gluster probe command from root, you’ll get an error that it cannot write to the logs. The probe command looks like:
gluster peer probe docker-node1; gluster peer probe docker-node2;
Make sure to edit the command to fit your configuration (for hostnames).
Once the command completes, you can check to make sure your nodes are connected with the command:
gluster pool list
You should see all nodes listed as connected (Figure 1).
Exit out of the root user with the exit command.
Let’s create a directory to be used for the Gluster volume. This same command will be run on all machines:
sudo mkdir -p /gluster/volume1
Use whatever name you want in place of volume1.
Now we’ll create the volume across the cluster with the command (run only on the master):
sudo gluster volume create staging-gfs replica 3 docker-master:/gluster/volume1 docker-node1:/gluster/volume1 docker-node2:/gluster/volume1 force
Start the volume with the command:
sudo gluster volume start staging-gfs
The volume is now up and running, but we need to make sure the volume will mount on a reboot (or other circumstances). We’ll mount the volume to the /mnt directory. To do this, issue the following commands on all machines:
sudo -s
echo 'localhost:/staging-gfs /mnt glusterfs defaults,_netdev,backupvolfile-server=localhost 0 0' >> /etc/fstab
mount.glusterfs localhost:/staging-gfs /mnt
chown -R root:docker /mnt
exit
To make sure the Gluster volume is mounted, issue the command:
df -h
You should see it listed at the bottom (Figure 2).
You can now create new files in the /mnt directory and they’ll show up in the /gluster/volume1 directories on every machine.
At this point, you are ready to integrate your persistent storage volume with docker. Say, for instance, you need persistent storage for a MySQL database. In your docker YAML files, you could add a section like so:
volume :
- type : bind
source : /mnt/staging_mysql
target : /opt/mysql/data
Since we’ve mounted our persistent storage in /mnt everything saved there on one docker node will sync with all other nodes.
And that’s how you can create persistent storage and then use it within a Docker Swarm cluster. Of course, this isn’t the only way to make persistent storage work, but it is one of the easiest (and cheapest). Give GlusterFS a try as your persistent storage option and see if it doesn’t work out for you.
=====================
The following gives a quick overview on the different GlusterFS administration tools.
# Add peer
gluster peer probe <host name>
# Remove peer
gluster peer detach <host name>
You can list the status of all known peers by running.
gluster peer status
gluster volume info all
gluster volume status <volume> detail
You can do standard Unix mounting
mount -t glusterfs server1:/volume /mnt/volume
which has the disadvantage of specifying one server IP. If this server is down you can’t mount the volume even though it is available. What is important to know is that the given server is only used to fetch a volume info file, which itself lists all servers providing this volume. So the volume info file doesn’t need to be on the volume servers. Also remember as with NFS consider noatime mount options when you have many small files accessed often.