My Home Lab 2020, part 2: GlusterFS Setup

Tuesday, March 24, 2020 - Permalink

Categories: selfhosting -- Tags: #glusterfs #homelab #raspberrypi #selfhosting

/!\ Warning: This article is older than 555 days, make sure the content is still relevant!

Introduction

As said previously on this blog, I’ve rebuild my “home lab” from scratch lately based on a docker swarm architecture and GlusterFS. The goal of this post is to go through the storage setup of our cluster :-)¹

Assumptions

You have your raspberry pi or other servers installed, connected to your network and internet, up to date and you know a minimum your way in a Linux shell :)

If you don’t, I suggest reading about it first before following the rest of this post, there is plenty of great documentation online :-).

Nota: I always name my based on manga, for all my devices (laptop, desktop, phone, servers, …)². So don’t be surprise when reading machine name later on :).

Simple Tip

If like me you use tmux, you can easily launch a command on all your servers at once :).

For this, all you need to do is open tmux, create window pane for all your servers and ssh to them. Once you are connected on all the servers on all pane, you can simply do ctrl(keep pressed) a :. You’ll be shown a prompt : to enter a command. Enter setw synchronize-panes and validate with enter.

At this point, any key pressed will be send to all servers, so you can type a command once that will be fired on all systems.To stop it, just redo the same (ctrl a : then setw synchronize-panes).

GlusterFS

Why

To be able to have shared storage across your docker swarm cluster. This allow all servers to share the same files. Having this is mandatory so you don’t care anymore on which of these servers your containers are launched because they always have access to data :). Just think of it of a shared drive mechanism, but built for cloud usage.

There are other possible solution to this, but seems like a pretty robust one for cluster storage :).

Some definitions

Basic Concepts of GlusterFS from their documentation:

Distributed File System³: > A file system that allows multiple clients to concurrently access data which is spread across servers/bricks in a trusted storage pool. Data sharing among multiple locations is fundamental to all distributed file systems.

Brick⁴: > A Brick is the basic unit of storage in GlusterFS, represented by an export directory on a server in the trusted storage pool. A brick is expressed by combining a server with an export directory

Node⁵: > A server or computer that hosts one or more bricks.

Volume⁶: > A volume is a logical collection of bricks.

Read the documentation for more.

GlusterFS architecture choice

With GlusterFS you can create the following types of Gluster Volumes:

Distributed Volumes: (default option): This is for scalable storage with no data redundancy - «files are distributed across various bricks in the volume. So file1 may be stored only in brick1 or brick2 but not on both. Hence there is no data redundancy.»⁷
Replicated Volumes: (Better reliability and data redundancy): «Here exact copies of the data are maintained on all bricks.»⁸
Distributed-Replicated Volumes: (HA of Data due to Redundancy and Scaling Storage): «In this volume files are distributed across replicated sets of bricks.»⁹

More detail on GlusterFS Architecture

I decided to use a “Replicated Volumes” configuration, making sure I have all files on all nodes (aka on all my raspberry pi).

Setup a Gluster Replicated Volumes on 4 nodes

On all¹⁰:

Install, enable and start gluster deamon:

    sudo apt install glusterfs-server
    sudo systemctl enable glusterd.service
    sudo systemctl start glusterd.service

Then, create the needed volumes (still all on all nodes):

    sudo mkdir /glusterfs/bricks
    sudo mkdir /mnt/cluster-data

Then, you need to make sure they can contact each other. Either you have a local dns server that manage this (like dnsmasq) or simply edit your /etc/host file.

For the rest of this posts, the machine name will be: cell, ptitcell1, ptitcell2, ptitcell3¹¹. Cell is my “master” (docker swarm manager) node in the cluster, but this has no impact for GlusterFS.

So now, we have to “link” all the nodes together in a trusted pool storage.

On your first node (cell in my case):

    sudo gluster peer probe ptitcell1
    sudo gluster peer probe ptitcell2
    sudo gluster peer probe ptitcell3

Now all nodes should be in the pool storage.

Let’s now create the different bricks on all nodes.

On your “master” or first server:

    sudo mkdir /glusterfs/bricks/1/brick

on all other nodes (ptitcell{1,2,3}):

    sudo mkdir /glusterfs/bricks/2/brick # on ptitcell1
    sudo mkdir /glusterfs/bricks/3/brick # on ptitcell2
    sudo mkdir /glusterfs/bricks/4/brick # on ptitcell3

Then, back on my first node (cell):

    sudo gluster volume create cluster-data replica 4 \
        cell:/glusterfs/bricks/1/brick \
        ptitcell1:/glusterfs/bricks/2/brick \
        ptitcell2:/glusterfs/bricks/3/brick \
        ptitcell3:/glusterfs/bricks/4/brick

(If you are testing at first and use the root partition, you have to add force at the end of the command)

Now we can start the volume :)

      sudo gluster volume start data

Lastly, you need to update your /etc/fstab on all your nodes:

      localhost:cluster-data /mnt/cluster-data glusterfs defaults,_netdev,backupvolfile-server=localhost 0 0

To try, you can mount the volume on all nodes:

      sudo mount.glusterfs localhost:cluster-data /mnt/cluster-data/

If everything works, you can now create a file in /mnt/cluster-data/ and see it on all nodes :-).

With the line in the /etc/fstab, now you can reboot your nodes and the volume should automatically mounted :)

To be Continued… :-)

That’s it for now, it doesn’t do anything yet, we just have a shared folder between our nodes, but this post is already long enough… So we’ll start playing with docker and swarm on the next one.

I originally planned to write about glusterFS and Docker Swarm setup in this post but the gluster setup was long enough for 1 post :). ↩︎
Being huge fan since my childhood, I starting doing this with my first laptop when I was ±15 and never stopped since, almost 20 years later… ↩︎
GlusterFS glossary ↩︎
GlusterFS glossary ↩︎
GlusterFS glossary ↩︎
GlusterFS glossary ↩︎
GlusterFS Architecture ↩︎
GlusterFS Architecture ↩︎
GlusterFS Architecture ↩︎
Remember the tip above^^ ↩︎
As said before… I’m a huge fan of manga, so in this case I was inspired by Dragon Ball Z and the vilain cell. “Ptitcell” means “Cell Junior” in French^^. ↩︎

From the « Homelab 2020 edition »: collection:

Contact

If you find any issue or have any question about this article, feel free to reach out to me via webmentions, email, mastodon, matrix or even IRC, see the About page for details.

« My Home Lab 2020, part 3:...

Managing my dotfiles with Yadm »