GlusterFS is an interesting solution for providing redundant network attached storage clusters. Some features include...
- trivially easy for basic setups
- supports virtually any file-system
- no need to mess with kernels
- can work across commodity network hardware
- suitable for enterprise deployment
- has commercial support available
GlusterFS can scale well also, its simple configuration based storage is easy to upgrade.
As of May 2011 the cluster with the most nodes in production has ~250 servers participating in a single volume. The largest volume in production is 2.5PB after hardware RAID and 1.25PB usable after Gluster replication.
GlusterFS FAQ: How big can I build a Gluster Cluster
We will be using three nodes to test GlusterFS, I've installed those with Debian Squeeze, but you should be able to use whatever distro you prefer. In Squeeze GlusterFS is available as a standard package which makes setup especially easy.
GlusterFS documentation recommends using 64bit OS installs.
Each node in our test has a private network uplink to a single switch which will be supporting that cluster traffic.
Two nodes will act as raw storage for the cluster, and the third will be our 'application' or client node.
Create the storage nodes
Each storage node should have the appropriate physical storage space set aside. I recommend that be configured as a physical partition, but a simple folder location can work fine as well for small/testing setups. For our test I've configured a 10G partition on each, formated with ext3, and mounted at /backups
Both nodes will get a complete copy of the data we want to save, which is where the redundancy comes in. It is important that storage nodes use the same filesystem type (eg ext3, ext4) to insure they behave predictably to data requests. They should also be the same size to avoid synchronization issues when a node runs low on space.
We will install the GlusterFS software, configure, and start that per the following. Set the bind address to the correct IP you are serving this over.
root@store1:~# apt-get install -y glusterfs-server root@store1:~# cat /etc/glusterfs/glusterfsd.vol | grep -v ^# | grep -v ^$ volume posix type storage/posix option directory /backups end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads option thread-count 8 subvolumes locks end-volume volume server type protocol/server option transport-type tcp option transport.socket.bind-address 10.0.0.250 # Default is to listen on all interfaces subvolumes brick option auth.addr.brick.allow * # Allow access to "brick" volume end-volume root@store1:~# service glusterfs-server start
Note that there is no mention in the configuration above about other storage nodes, GlusterFS does most of the synchronization work at the client end. This can be a problem for some types of workload, which is why testing any NAS solution you use is especially important.
Configure the client
Make sure a DNS aliases for each storage node is set on your client. Ideally configure DNS on your network so those names are available to all servers. For example in /etc/hosts you can add something like...
10.0.0.250 remote1 10.0.0.251 remote2
On the front-end or client side (where the files will be read/written) do the following...
root@fw2:~# apt-get install -y glusterfs-client root@fw2:/data# cat /etc/glusterfs/glusterfs.vol | grep -v ^# | grep -v ^$ volume remote1 type protocol/client option transport-type tcp option remote-host test option remote-subvolume brick end-volume volume remote2 type protocol/client option transport-type tcp option remote-host test2 option remote-subvolume brick end-volume volume replicate type cluster/replicate subvolumes remote1 remote2 end-volume root@fw2:~# mkdir /data root@fw2:~# mount -t glusterfs /etc/glusterfs/glusterfs.vol /data
You should now be able to write to the cluster storage on the client (eg anything under /data).
One of the nice things about this solution is that each node gets a full copy of your files. If one node goes bang then the other is still there. In some ways this is a bit like using raid1 with your hard-drive storage.
There are a few things to keep in mind when deciding how to use something like this.
- The biggest is that writing files across a network is pretty much always slower than writing to real disks. Ideally you might want this when large storage for lots of small files is more important than fast retrieval.
- Do not be tempted to write directly to individual storage nodes. This can break synchronization, usually in unpleasant ways.
- If you have a firewall running and trouble getting GlusterFS working, refer to the official documentation on the network ports that GlusterFS needs
- Using GlusterFS means you may have a mutiple copies of your data in storage (eg one per node)
There is a lot more to configuring a real world cluster that can include striping, and other methods for distributing data across nodes, so do look deeper for more GlusterFS options if this is something that concerns you.
Additional documentation on GlusterFS is available at
The current implementation of GlusterFS appears to be based on NFSv3 and Fuse technologies so things may look familiar from those aspects. However its important to realize that they are not the same thing.
A use case for GlusterFS
So what can you actually do with this. One example scenario might be for seldom changed files, such as user uploaded images to a website. Use of a suitable caching method in front of the client node would allow often used static content to be served relatively quickly, while old files could be safely stored and forgotten about.
There are several prominent users of large scale GlusterFS installations, including the music service Pandora. If you are interested in working with us to build your own Gluster Cluster, pop in a ticket with details on what you need and we can take a look.