Storage Clustering Part 1: An Introduction

John and I have recently been looking at server clusters, and how they can be provisioned to best fit our customers needs. This post is the first in (hopefully) a series on the results of that research.

To start with in this post I would like highlight some common questions about big storage options. If you have any queries, or suggestions for further options please do let us know. Keep in mind we are focused on Linux based servers, so I will be approaching solutions from that angle. Lets start…

What are storage clusters?

In the most abstract sense they consist of a uniform storage service that you can access over a network, commonly by mounting as a virtual file-system inside your own server. Features usually include a high level of both redundancy and availability. For example if a single server in your storage cluster fails, you can still access the data from other servers in that cluster. Also desirable are simple ways of adding more capacity.

Simple storage cluster
Simple Storage Cluster

Who wants storage clustering?

Those operating multi tiered applications (eg LAMP setups). That may require better storage redundancy. That improves uptime by reducing single points of failure, mainly through improved service architecture.

One important feature is that far larger redundant storage capacity can be achieved in a cluster than is typically available on a single machine.

When are storage clusters great?

Storage clusters can be useful for all sorts of things. Examples include…

  • website images and templates
  • backup files
  • some types of archival storage
  • download-able content
  • personal files (/home)
  • email storage
  • …much more.

What sort of file-systems can be used with a storage cluster?

This will usually depend a bit on which technology is used to build it. However most solutions can be configured to not care particularly what is used. For example GlusterFS can be configured to use the file-system on each underlying physical device, such as ext3 which is used in all our VPS servers now.

What technologies are good for building storage clusters?

One example that seems quite nice is GlusterFS. You can see many other options at Wikipedia

When not to use storage clustering?

For performance reasons database clustering is normally better done using built in replication where possible. According to a number of reviews I’ve read (sorry no hard numbers yet) using a storage cluster can be relatively slow, even with some high end SAN or iSCSI solutions. Allowing individual database engines to get closer to the storage platform (IE with direct disk access) is also probably better for data integrity.

MySQL replication/clustering is generally pretty simple to setup. For examples of what we can help setup see…

Some things to think about (applying the KISS principle)…

  • Is connection security important, and how? Remember its your data being sent off over the network.
  • What load will using a storage cluster add to existing servers?
  • Ideally your storage cluster should be on a physically separate private network
  • Is the network able to physically cope with traffic demands?
  • Will that have enough storage, can that be scaled easily?
  • How much will the storage cluster cost to run and to maintain?
  • How easy will the storage cluster be to maintain?
  • Is standalone storage enough for now? Can it be OK to use a larger solution later?

I hope that this post has helped you start thinking about storage options in general. Next up will be a post about GlusterFS with a simple redundant configuration that anyone can implement on their own servers across our private network.

2 responses to “Storage Clustering Part 1: An Introduction”