Gluster on Rackspace Cloud with High Availability

RackerHacker has a nice blog post / tutorial on deploying Gluster with Rackspace cloud servers or Slicehost (it’s been up for a couple of months but I just came across it). It’s a well written tutorial on setting up a simple HA Gluster configuration with WordPress hosted in a pair of VM’s. Take it away RackerHacker:

“High availability is certainly not a new concept, but if there’s one thing that frustrates me with high availability VM setups, it’s storage. If you don’t mind going active-passive, you can set up DRBD, toss your favorite filesystem on it, and you’re all set.

If you want to go active-active, or if you want multiple nodes active at the same time, you need to use a clustered filesystem like GFS2, OCFS2 or Lustre. These are certainly good options to consider but they’re not trivial to implement. They usually rely on additional systems and scripts to provide reliable fencing and STONITH capabilities.

What about the rest of us who want multiple active VM’s with simple replicated storage that doesn’t require any additional elaborate systems? This is where GlusterFS really shines. GlusterFS can ride on top of whichever filesystem you prefer, and that’s a huge win for those who want a simple solution. However, that means that it has to use fuse, and that will limit your performance.”

Quick comment on FUSE and performance… Gluster runs entirely in userspace  and uses FUSE to interface with the kernel.  The conventional wisdom holds that building a kernel-based  file system will always give better performance than a userspace implementation; the theory basically boils down to the overhead associated with context switching. In reality we don’t see this because CPUs are so powerful now, and context switching is tiny compared to latency in the network and other areas of the system. [The team didn’t just blindly go down this path, AB and the rest of the initial team had extensive experience with the Mach microkernel and GNU HURD and applied principles from thereto the Gluster architecture. They then performed extensive testing to validate the hypothesis. This topic is probably worth it’s own post.] Another way to think about the isssue is VMware would not be as wildly successful as it is if context switching was really a barrier. Running in userspace brings a host of advantages:

  • Easy to install, portable
  • Eliminate the complexity of kernel dependencies, patches, the kernel release process
  • Faster time to market for new features
  • Enables the modular design of GlusterFS
  • From a development standpoint, one does not need to be a kernel engineer to develop a file system, if you are a good C programmer you can create GlusterFS modules

We think the last point has had a big impact on the growth of our community. The universe of kernel engineers is much smaller than the universe of C programmers. As the saying goes: innovation happens everywhere, especially outside the company, and this allows us to tap a broader pool of creativity. Happy Hacking.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s