Context
Today, a post about my temporary backup system of my cluster. Wait, « temporary » you say? Why the hell don’t you talk about your final non temporary solution?
Well, first of, I talk about what I want, thanks for asking self … (Yes, I might be going mad lately…) The main reason behind this temporary solution is because of the lock down due to covid-19, I couldn’t (or did not want) to get delivered what I wanted to build a NAS. I’ll talk more about the NAS when I get it, but because of an interesting conversation with nah on IRC, I am now waiting for the Helios64 by Kobol to be available… So the “real backup” system I want to setup will not happen until then.
For me, waiting that much for saving my cluster data was not an option. Backups are a critical part of a hosting architecture and it should be put in place from the beginning.
Desired architecture
My goal is to setup borgbackup correctly to do the backup system. I’m already using it for my digital ocean droplets and other cloud servers. I think it is a great tool that does all I want from a backup system: it saves space disk (with deduplication, compression and differencial backups) but still provides an incremental backup mechanismto save history of files… I will talk more about it when I’ll setup this desired architecture :)
The NAS will be the borg server in this architecture, hence the waiting part… I could use another machine for this but I’m too lazy to redo it later, so instead I took 30min to create a minimal temporary solution in the meantime.
Temporary architecture
As I wanted something temporary, and because I’m clearly lazy, I thought of the MVP like this:
- From my cluster, rsync the
/mnt/cluster-data
1 directory (where all containers definition and data seat) to another server (outside of the cluster) every 4 hours (I could de more often but data don’t change that much :)). For the remote server, I use the raspberry pi in my bedroom calledryosaeba
2 (Only used for the multiroom sound system); - On
ryosaeba
, create every day an archive of the rsync-ed directory. It means that on that server, I have a “live” version (the rsync-ed folder) as well as daily archives of that folder; - Archives are deleted after 15 days from
ryosaeba
(to save space on my rpi); - On
ryosaeba
, rsync the archives folder to another raspberry pi (in this casebrook
, the mopidy and snapcast servers for my multiroom sound system).
The main reasons I rsync the archives directory from one server to another is that I don’t trust the storage used in the raspberry pi (a microSD card). When I’ll switch to borg with a NAS configured, I will not need this. I will need to put in place a remote storage for the backup though if I want to follow the 3-2-1 backup methodology[^4].
Installation
Nota: This post is about backup-ing directory and creating archives, it will not cover how to backup containers database (I’ll write a dedicated posts for that). My solution is to create dump backup inside the cluster-data folder, so they are saved using the backup solution (this temporary one or the one based on borgbackup). But more on that later.
Rsync cluster data to storage server every 4 hours
Installation is easy, if you are using raspbian like me, everything you need is already installed (rsync, ssh and bash) :).
First, ssh to the server you want to backup. In my case, using a
docker
swarm cluster with a
shared
GlusterFS volume means that I can do it from any of the swarm nodes. I
choose to do it from one of the non manager node (as the manager is
already the most used one) and more precisely, I do it from the node
called ptitcell3
.
First, you need a ssh key to be able
to send data from ptitcell3
to ryosaeba
. For that, on ptitcell3
:
ssh-keygen
Answer the question and leave empty the passphrase.
Then do a cat ~/.ssh/id_rsa.pub
and copy the content.
On the backup server (ryosaeba
), open ~/.ssh/authorized_keys
file
and add a new line with the content of ~/.ssh/id_rsa.pub
from
ptitcell3
.
(If your server accept ssh connection with password - which you
shouldn’t - you can use the ssh-copy-id user@server
command instead of
manually copy/pasting it)
Then, to make sure ptitcell3
can connect to ryosaeba
, I’m adding a
line in the /etc/hosts
file on ptitcell3
:
10.0.0.xx ryosaeba # Adapt the IP address!
Normally, you can now ssh from the backup server to the storage one, in
my case, on ptitcell3
:
ssh ryosaeba
If you can connect without passphrase correctly, you can move on :).
It’s now time to create the simple bash script that will do the rsync to
the remote server. Create a backup.sh
file with the following content:
#!/usr/bin/env bash
set -euo pipefail
rsync -azvhP --delete /mnt/cluster-data/ ryosaeba:backups/cluster-data/
This rsync command will synchronise the /mnt/cluster-data
directory to
backups/cluster-data
on the remote server. The --delete
option will
remove files on the remote server that are not present on the cluster
anymore. I’m doing it because I want an exact replica of the
cluster-data folder. You may want to change it depending on your needs.
Then, add execution right to it: chmod +x backup.sh
.
Before running it, make sure the remote directory exists, so on
ryosaeba
: mkdir -p ~/backups/cluster-data
.
Run it once to make sure it works correctly and if it’s the case, add it
as a cron job. Open the crontab editor (crontab -e
) and add the
following line:
0 */4 * * * /home/pi/backup.sh
This will run the backup.sh script every 4 hours and synchronise
/mnt/cluster-data
(pticell3
) directory to backups/cluster-data
(ryosaeba
).
Create a daily backup and keep the last 15 archives
Create (on ryosaeba
), the create_archive.sh
script:
#!/usr/bin/env bash
set -euo pipefail
tar -czf /home/pi/archives/daily_$(date +%Y%m%d).tgz /home/pi/backups/ # Creates the daily archive.
find /home/pi/archives/ -mtime +15 -delete # remove archives older than 15 days.
This script will handle the creation of the archive (tar
) and cleaning
archives older than 15 days (find … -delete
)
Make it executable: chmod +x create_archive.sh
.
Make sure that the archives directory exists, so mkdir ~/archives
.
Run it to ensure it works correctly (./create_archive.sh
).
If it does (meaning you see a new archive in the archives folder), you
can add it to the cron tasks to be executed every day. Open the crontab
editor (crontab -e
) and add the line:
30 1 * * * /home/pi/create_archive.sh
In this example, the archive will be created every day at 1:30am.
Rsync archives to 2nd storage server
Now, last thing to do is to synchronise the archives folder on the 2nd
storage. First, you need to make sure that the 1st storage server
(ryosaeba
) can ssh to the 2nd storage server (brook
). For this, use
the same as above (ssh-keygen
, …).
Once ssh is working, create the script that will send the archives to
the 2nd storage server (brook
), so on ryosaeba
, create the
backup_archives.sh
script:
#!/usr/bin/env bash
set -euo pipefail
rsync -azvhP --delete /home/pi/archives/ brook:backups/
G Make the script executable: chmod +x backup_archives.sh
.
Make sure the remote directory exists, so on brook
:
mkdir ~/backups/
.
And then you can run the script to test it (./backup_archives.sh
). If
it works and you see your tar.gz files being copied to the 2nd server,
you can then move this script as a cronjob task. On ryosaeba
, add the
following line in your crontab editor (crontab -e
):
30 2 * * * /home/pi/backup_archives.sh
Meaning that everyday at 2:30am (1h after the archive creation), the
archives directory will be synchronise to brook
.
Nota: If you want to keep all archives on the 2nd storage server and
not the last 15 like on the first storage server, you just need to
remove --delete
from the find command in the script :).
Conclusion
And now, the cluster data are sync-ed every 4 hours to my ryosaba
server, where everyday a snapshot (tar.gz archive) is taken and put in
the archives folder. This folder will contain only the archives from the
last 15 days and they will be copied for redundancy to brook
every
day.
I’m planing to write about the databases inside containers backup soon to complement this post.
There will be a second version of this part 6 when using borgbackup and my future NAS server, but that will have to wait for the hardware needed :).
You can follow all posts about my Home Lab setup on the dedicated page.
/[MVP]: Minimum Valuable Project /[NAS]: Network Attached Storage
-
Remember my glusterFS setup :)? ↩︎
-
From the manga City Hunter. ↩︎