Storage Clustering Part 2: GlusterFS

GlusterFS is an interesting solution for providing redundant network attached storage clusters. Some features include...

  • trivially easy for basic setups
  • supports virtually any file-system
  • no need to mess with kernels
  • can work across commodity network hardware
  • suitable for enterprise deployment
  • has commercial support available

GlusterFS can scale well also, its simple configuration based storage is easy to upgrade.

As of May 2011 the cluster with the most nodes in production has ~250 servers participating in a single volume. The largest volume in production is 2.5PB after hardware RAID and 1.25PB usable after Gluster replication.
GlusterFS FAQ: How big can I build a Gluster Cluster

Much of the guide below is inspired by this HowtoForge article. You may also want to read our Introduction to Storage Clusters first.

The scenario

We will be using three nodes to test GlusterFS, I've installed those with Debian Squeeze, but you should be able to use whatever distro you prefer. In Squeeze GlusterFS is available as a standard package which makes setup especially easy.

GlusterFS documentation recommends using 64bit OS installs.

Each node in our test has a private network uplink to a single switch which will be supporting that cluster traffic.

Two nodes will act as raw storage for the cluster, and the third will be our 'application' or client node.

Create the storage nodes

Each storage node should have the appropriate physical storage space set aside. I recommend that be configured as a physical partition, but a simple folder location can work fine as well for small/testing setups. For our test I've configured a 10G partition on each, formated with ext3, and mounted at /backups

Both nodes will get a complete copy of the data we want to save, which is where the redundancy comes in. It is important that storage nodes use the same filesystem type (eg ext3, ext4) to insure they behave predictably to data requests. They should also be the same size to avoid synchronization issues when a node runs low on space.

We will install the GlusterFS software, configure, and start that per the following. Set the bind address to the correct IP you are serving this over.

root@store1:~# apt-get install -y glusterfs-server
root@store1:~# cat /etc/glusterfs/glusterfsd.vol | grep -v ^# | grep -v ^$
volume posix
  type storage/posix
  option directory /backups
end-volume
volume locks
  type features/locks
  subvolumes posix
end-volume
volume brick
  type performance/io-threads
  option thread-count 8
  subvolumes locks
end-volume
volume server
  type protocol/server
  option transport-type tcp
  option transport.socket.bind-address 10.0.0.250       # Default is to listen on all interfaces
  subvolumes brick
  option auth.addr.brick.allow * # Allow access to "brick" volume
end-volume
root@store1:~# service glusterfs-server start

Note that there is no mention in the configuration above about other storage nodes, GlusterFS does most of the synchronization work at the client end. This can be a problem for some types of workload, which is why testing any NAS solution you use is especially important.

Configure the client

Make sure a DNS aliases for each storage node is set on your client. Ideally configure DNS on your network so those names are available to all servers. For example in /etc/hosts you can add something like...

10.0.0.250 remote1
10.0.0.251 remote2

On the front-end or client side (where the files will be read/written) do the following...

root@fw2:~# apt-get install -y glusterfs-client
root@fw2:/data# cat /etc/glusterfs/glusterfs.vol | grep -v ^# | grep -v ^$
volume remote1
  type protocol/client
  option transport-type tcp
  option remote-host test
  option remote-subvolume brick
end-volume
volume remote2
  type protocol/client
  option transport-type tcp
  option remote-host test2
  option remote-subvolume brick
end-volume
volume replicate
  type cluster/replicate
  subvolumes remote1 remote2
end-volume
root@fw2:~# mkdir /data
root@fw2:~# mount -t glusterfs /etc/glusterfs/glusterfs.vol /data

You should now be able to write to the cluster storage on the client (eg anything under /data).

One of the nice things about this solution is that each node gets a full copy of your files. If one node goes bang then the other is still there. In some ways this is a bit like using raid1 with your hard-drive storage.

Some considerations

There are a few things to keep in mind when deciding how to use something like this.

  • The biggest is that writing files across a network is pretty much always slower than writing to real disks. Ideally you might want this when large storage for lots of small files is more important than fast retrieval.
  • Do not be tempted to write directly to individual storage nodes. This can break synchronization, usually in unpleasant ways.
  • If you have a firewall running and trouble getting GlusterFS working,  refer to the official documentation on the network ports that GlusterFS needs
  • Using GlusterFS means you may have a mutiple copies of your data in storage (eg one per node)

There is a lot more to configuring a real world cluster that can include striping, and other methods for distributing data across nodes, so do look deeper for more GlusterFS options if this is something that concerns you.

Additional documentation on GlusterFS is available at

The current implementation of GlusterFS appears to be based on NFSv3 and Fuse technologies so things may look familiar from those aspects. However its important to realize that they are not the same thing.

A use case for GlusterFS

So what can you actually do with this. One example scenario might be for seldom changed files, such as user uploaded images to a website. Use of a suitable caching method in front of the client node would allow often used static content to be served relatively quickly, while old files could be safely stored and forgotten about.

There are several prominent users of large scale GlusterFS installations, including the music service Pandora. If you are interested in working with us to build your own Gluster Cluster, pop in a ticket with details on what you need and we can take a look.

About Glenn Enright

Linux Systems Administrator at RimuHosting.com. I focus mainly on dedicated server provisioning with a sprinkling of network administration.

This entry was posted in Clustering and tagged , , , , , . Bookmark the permalink.

3 Responses to Storage Clustering Part 2: GlusterFS

  1. Anand Avati says:

    Thanks for your writeup Glenn. Just wanted to mention that since version 3.1 glusterfs has really taken away the need to deal with volfiles and you can work completely from the command line. The suggested way to mount is 'mount -t glusterfs SERVER:/volname /mnt'

    Thanks again!
    Avati

  2. newbie says:

    Thanks for the post. But following the instruction described above. The client would throws error

    0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (aaa.bbb.ccc.ddd:port)

    How to fix this?

    glusterfs version is 3.2.5

  3. Glenn Enright says:

    @newbie: check that the IPs you are using are correct, the ones used above are for demonstration purposes only. If you need us to take a look please raise a support ticket and I can check that out for you.