Why Thomas Jefferson could predict the next big thing in storage

In 1814, Thomas Jefferson donated the contents of his vast personal library of books and correspondence to form the foundation of the Library of Congress. Some 200 years later, that library is one of the largest in the world. Yet, the text of all of its contents could fit on a stack of DVDs that would reach to the top of a two-story building.

read more

Gluster Community Profile: Louis ‘Semiosis’ Zuckerman

Our 3rd and final community profile features Louis ‘Semiosis’ Zuckerman. Semiosis maintains a repository of GlusterFS binaries for Ubuntu on Launchpad.net. While he came in 2nd in the contest based on his contributions on our Community Q&A forums, many of you may know him from his participation on #gluster on Freenode.

Louis 'Semiosis' Zuckerman

The following is an exchange I had with him a few weeks ago.

Q&A with Louis Zuckerman

How long have you used GlusterFS?

Since November 2010, glusterfs version 3.1.1.

What was it that led you to try it?

I wanted to move my production network from dedicated hosting to EC2.  The biggest hurdle was figuring out how to replace a hardware RAID array NFS server with EBS storage in the AWS cloud.  I needed more capacity than a single EBS volume, and more performance & reliability than any single-server solution could provide.  I was also constrained by my application’s requirement for a POSIX-compliant mounted filesystem.  After reviewing every distributed filesystem I could find, GlusterFS turned out to be a perfect match for my needs across the board, from its technical capabilities and open source license, to the availability of both community and commercial support.

What should everyone know about your participation in the GlusterFS community (if they don’t know already)

I’m co-maintainer, with Patrick Matthaei, of the Debian project’s GlusterFS packages.  I also independently maintain Ubuntu packages of the latest 3.1, 3.2 & 3.3 series versions of GlusterFS.  I publish these packages to Launchpad PPAs, making both client/server and pure-client binary packages publicly available for Ubuntu i386 and amd64 architectures.

I discovered & solved a bug which prevented mounting local glusterfs volumes at boot on Ubuntu, and contributed the solution (an upstart job) to Gluster for inclusion in future versions.  I also provide Ubuntu packages that include the new upstart job (instead of an initscript) for glusterd in separate PPAs.

https://launchpad.net/~semiosis

I participated in the first Ubuntu Cloud Days event, giving a presentation called “Scaling shared-storage web apps in the cloud with Ubuntu & GlusterFS.”  In this tutorial I introduced glusterfs and outlined the concepts and techniques involved in managing a glusterfs storage cluster, paying special attention to the opportunities & issues which arise specifically from running glusterfs in EC2.

I’m one of the few regulars in the gluster IRC channel with experience running glusterfs in production on either EC2 or Ubuntu, and maybe the only one using them all together.

I’m semiosis on Freenode IRC #gluster, community.gluster.orglaunchpad.net@pragmaticism on twitter

Why do you participate in the Gluster community?

When I started using glusterfs there was minimal support for Debian & Ubuntu from Gluster or from the community.  The Debian project’s glusterfs packages were broken and the Ubuntu project’s packages were very outdated.  The official packages for Debian & Ubuntu provided by Gluster were only for amd64 and included the full client/server installation; no i386 or pure-client packages were available.  As a result of this situation, and my need for such packages, I set out to build my own to solve the problem.  Contributing my work back to the community seemed like the right thing to do because if I needed these packages, then others surely did as well.

I participate in the gluster IRC channel because when I was starting out with glusterfs I learned a ton by asking questions there, and also from reading the chat logs, and like to give back in kind.  Last but not least, I enjoy sharing my knowledge & experience, and learn a lot by helping & observing others troubleshoot their glusterfs installations.

What was a Gluster Community Moment ™ that you’ll never forget?

The earliest was when Debian & Ubuntu developer Al Stone (ahs3) appeared in the gluster IRC channel to thank me for my packaging work bringing current versions of glusterfs to the Ubuntu i386 architecture.

Bonus: Other than yourself, who’s your favorite Gluster community member, and why?

I appreciate all of the community members I’ve had the pleasure of speaking with on IRC and would have a hard time choosing a favorite.  Each of us has a unique background & skill set we bring to the community and I enjoy that diversity.  However having said that, if I had to pick one, I would say glusterbot because he’s never wrong.

[editor: glusterbot!]

Clouds Cannot Be Contained In A Box

A big part of the value proposition of cloud is to ensure that you have continuous access to your data, and that you’ve moved beyond the physical limitations of a single box or a single data center or a single geography. While the move to the cloud can allow greater leverage of compute servers and storage, it also provides the ability to move away from aging, monolithic storage and servers, and gives cloud customers access to their data irrespective of any technical issues that may be going on and irrespective of their physical location. Cloud is supposed to be always on, with resources available on-demand, 24×7. However, how do you deliver all of this with a cloud that’s been imprisoned in a box? Clouds cannot be contained in a box.

Next Gluster Meetup: The Future of Gluster.org, a Roadmap

It’s been a few weeks since our acquisition by Red Hat. Things have been moving forward fairly rapidly since then, and you may have questions about what has changed regarding the direction of GlusterFS and associated projects. Come to our meetup and learn about all the exciting things that are happening, and how you can benefit.

RSVP at meetup.com

Here is just a sampling of what will be included in the discussion:

  • HekaFS – the multi-tenancy extension for cloud providers
  • Feature list for 3.3 – Simultaneous file and object access, HDFS compatibility, and more
  • Tentative release schedule
  • Project structure – changes are afoot for gluster.org
  • How you can get involved

We’re looking to take Gluster to the next level, and we want you to be a part of that. Come to the meetup and set the direction for Gluster and GlusterFS.

RSVP at meetup.com

Quorum Enforcement

As of yesterday, my most significant patch yet became a real part of GlusterFS. It’s not a big patch, but it’s significant because what it adds is enforcement of quorum for writes. In operational terms, what this means is that – if you turn quorum enforcement on – the probability of “split brain” problems is greatly reduced. It’s not eliminated entirely, because clients don’t see failures at the same time and might take actions that lead to split brain during that immediate post-failure interval. There are also some failure conditions that can cause clients to have persistently inconsistent models of who has quorum and who doesn’t. Still, for 99% of failures this will significantly reduce the number of files affects by split brain – often down to zero. What will happen instead is that clients attempting writes (actually any modifying operation) without quorum will get EROFS instead. That might cause the application to blow up; if that’s worse for you than split brain would be, then just don’t enable quorum enforcement. Otherwise, you have the option to avoid or reduce one of the more pernicious problems that affect GlusterFS deployments with replication.

There’s another significant implication that might be of interest to those who follow my other blog. As such readers would know, I’m an active participant in the endless debates about Brewer’s CAP Conjecture (I’ve decided that Gilbert and Lynch’s later Theorem is actively harmful to understanding of the issues involved). In the past, GlusterFS has been a bit of a mess in CAP terms. It’s basically AP, in that it preserves availability and partition tolerance as I apply those terms, but with very weak conflict resolution. If only one side wrote to a file, there’s not really a conflict. When there is a conflict within a file, GlusterFS doesn’t really have the information it needs to reconstruct a consistent sequence of events, so it has to fall back on things like sizes and modification times (it does a lot better for directory changes). In a word, ick. What quorum enforcement does is turn GlusterFS into a CP system. That’s not to say I like CP better than AP – on the contrary, my long-term plan is to implement the infrastructure needed for AP replication with proper conflict resolution – but I think many will prefer the predictable and well understood CP behavior with quorum enforcement to the AP behavior that’s there now. Since it was easy enough to implement, why not give people the choice?

GlusterFS Community Profile: Jeff Darcy

(This is the 2nd in a series highlighting our community contest winners)

When you contribute code to an open source project, it’s common practice to find yourself with a hefty job offer from the company sponsoring the project.  Jeff Darcy, author of HekaFS and long-time GlusterFS contributor, turned that on its head: his employer acquired *us*. We’ve had great fun with jdarcy over the years, so it was a natural evolution from external contributor to co-worker. After the acquisition, nothing really changed – jdarcy was still hanging out on #gluster, just as he always was. Below is a nice picture of Jeff with his daughter, and an email Q&A with him. Many Many thanks to Jeff – and check out HekaFS (fka CloudFS), his extensions to GlusterFS for the cloud, including multi-tenancy and SSL encryption.

Jeff Darcy
jdarcy in the wild

Jeff Darcy Q&A

How long have you used GlusterFS?

First tried it approximately three years ago, have used it constantly for the last two.

What was it that led you to try it?

I’m a distributed filesystem developer/researcher, so it had been on my radar for a long time.  Having worked with alternatives that had a single metadata server, and seen in all too painful detail the problems with that approach, I heartily approved of GlusterFS’s more fully distributed approach.  When I started working on CloudFS/HekaFS, the extremely modular architecture was what really set it apart. That’s what made it possible for me to do what I wanted to do, without having to understand and interfere with the *entire* code base.

What should everyone know about your participation in the GlusterFS community (if they don’t know already)

Perhaps that I’m not only a developer working with GlusterFS by day, but also a user by night.  I keep copies of my own personal data on a
server at Rackspace, which I access from many machines using GlusterFS and some parts of HekaFS.

Why do you participate in the Gluster community?

I can’t deny that it’s part of my job to do so, but it’s also just a good community to be part of.  The people who hang out in the IRC channel are not only helpful but funny as well, so it’s my “home channel” where I’m most likely to share my own jokes and random observations.  It’s like sitting in a coffee shop with a bunch of my friends while I work.

What was a Gluster Community Moment ™ that you’ll never forget?

How about the time I’d arranged to meet AB, and he sent me directions to the wrong hotel, and I – despite this occurring in my own town – just blindly followed them?  We had a good laugh about that.

Other than yourself, who’s your favorite Gluster community member, and why?

It just has to be Joe [Julian].  He’s the soul of the community, setting an example and a tone that makes the community what it is.  Also, to be
quite honest, he’s better at answering users’ questions than I am. He’s actually in the same boat with the other users; I’m like some weird fish on the bottom of the pond looking up at them through a whole lot of code.

The beauty of little boxes, and their role in the future of storage

If you’re into 1960’s songs about middle class conformity, you may not have a positive association with lots of interchangeable “little boxes.” In storage, however, those little boxes are not only beautiful but the wave of the future. Insider (free registration required)

read more

GlusterFS Community Profile: Joe Julian

We are publishing a series of profiles on the winners of our recent community contest, and today we’re starting with the grand prize winner, Joe Julian. Joe has transformed the #gluster IRC channel in the 2 years since he started participating, and he also maintains his own repository of RPMs for 32-bit builds of GlusterFS: we don’t support 32-bit builds, and he stepped in to fill this gap. You may have seen Joe’s picture before, like when he won the Gluster award for ‘Hacker of the Year’ – an honor we bestowed on him last year. Included in this post is his picture that he took upon receiving his prize, a Motorola Xoom. I’ll post the picture of him with his new award as soon as he receives it 🙂

Joe Julian
Joe Julian, with his 'Hacker of the Year' award

What follows are excerpts from an email interview I had with Joe:

How long have you used GlusterFS? What was it that led you to try it?

I implemented GlusterFS at Ed Wyse Beauty Supply in March of 2009 as a result of a total failure of DRBD to meet our system’s needs. When DRBD became corrupted, since it was a block level failure, this (if not for our backup routines) would have resulted in a massive data loss. It did result in about 18 hours of critical system downtime as I restored and brought systems back online. I looked for solutions that would store whole files so if there was a problem, it wouldn’t affect everything. I wanted to find a solution with no single points of failure. GlusterFS was the only solution I found that met all my criteria.

What should everyone know about your participation in the GlusterFS community (if they don’t know already)?

I’m a chronic fixer. I can’t help myself. Even when I know I should just walk away shaking my head, I throw together a test structure to see if I can figure a way to make it work. I’m also kinda nice, I guess. It’s very important to me that I help people in the way I’d like to be helped, not like some other channels that treat you as if you don’t deserve to be using their solution if you don’t already know everything <cough>#centos</cough>.

Why do you participate in the Gluster community?

I participate in the Gluster community because I was frustrated that when I needed help, there was nobody there. It’s more of an aggressive assault on the general lack of help I got. You’ve heard of passive aggressive, well I take it to a whole new level… Besides that, by helping other people learn about GlusterFS, and by helping identify problems, I learn more. The more I learn, the better I do my job. The more bugs I report, the better the system works for Ed Wyse Beauty Supply.

What was a Gluster Community Moment (tm) that you’ll never forget?

I went to OSCon and met with a bunch of people whose names I’d only heard, or that I’ve talked to on IRC. I felt, for the first time, that I was now a part of the open source community that has made everything I do possible. It was all because of my participation with Gluster. People actually knowing who I was was pretty unexpected. As huge as my ego sounds on IRC, I’m a fairly humble guy. I don’t really feel like I do anything for this inconceivably huge open source movement, even though I’ve been a consumer of it since kernel-0.96. I’m just a guy that hangs out and BSes with people on company time. To be included felt pretty cool.

Bonus: Other than yourself, who’s your favorite Gluster community member, and why?

I can’t say there is a favorite community member. Everybody that helps makes me happy. This silent channel with 16 people is now a thriving community with over 100. There are so many people giving back, it’s incredible. Every single person that answers someone else’s question adds themselves to my favorites.

Editor’s note: I’ll say it now – Joe Julian is my favorite community member, because he laid the groundwork for all future community participation. What he’s been able to accomplish in #gluster is nothing short of phenomenal. Hats off to Joe and the rest of our community members who make this one of the most friendly open source communities around.

Announcing the Scale-out Community Contest Winners

Some time ago, we held a bit of a contest that we dubbed the International GlusterFS Scale-out Community Contest. We tabulated results and even selected a winner. And then something happened. A big something – like an acquisition by Red Hat. Not that they stopped it, rather there were suddenly a lot of things to deal with, and the community contest went to the back burner, where it stayed – until now! As I was gearing up to do the next contest iteration, I realized that we kind of need to do the big announcement of the last contest’s winners before really proceeding with the next one. So, without further ado – here goes.

The results of the community contest were extraordinary. We ended up with a series of fantastic technical blog posts that I dubbed “The Straight Tech”, which you can find on this blog under the tag “thestraighttech”. There were three individuals who distinguished themselves, all of whom are well-known in the GlusterFS user and developer community: Joe Julian, Jeff Darcy and Louis Zuckerman (aka “semiosis”). Joe Julian quickly established himself in the pole position, which was no surprise to anyone paying attention to our community. We’re sending all of them fabulous prizes, and I’m going to post a series of profiles on them in this blog, starting today.

In the meantime, if you catch any of them on #gluster on Freenode, gluster-users or gluster-devel mailing lists, or community.gluster.org – please thank them for their time and effort that make the GlusterFS community the friendly place that it is.

License Change

As of a few minutes ago, the license on HekaFS changed from AGPLv3+ to GPLv3 – not AGPL, not LGPL, not later versions. This only affects the git repository so far; packages with the change still need to be built, then those need to be pushed into yum repositories, and all of that will take some time.

Why the change? It actually had very little to do with the acquisition; Gluster themselves had already moved to GPLv3+ and the plan has always been for the HekaFS license to track that for GlusterFS. What the acquisition did was spur a general conversation about what license should apply to both GlusterFS and HekaFS (as long as it remains separate). After several rounds of this, I was told it should be GPLv3, and so it is. While I’ve personally gone from favoring BSD/MIT to favoring A/GPL, I actually believe they’re all fine. Even though I’ve argued on my own blog about why AGPL is what GPL should be, I’ve also seen actual cases where AGPL-aversion has threatened to kill projects. It doesn’t matter what I think about the AGPL’s effect on others’ code, or even what the legal outcome if/when there’s a proper test case to set precedent. The fact is that the engineers who are trying to use the code can’t change a no-AGPL policy, and the people who make such policies have their reasons. As far as I know, that’s why Gluster had already abandoned AGPL. As to why it’s GPL instead of LGPL, or v3 instead of v2 or v3+ . . . well, I don’t know. The differences at that point are below my threshold of caring, so I didn’t even ask.

Translator 101 Class 1: Setting the Stage

This is the first post in a series that will explain some of the details of writing a GlusterFS translator, using some actual code to illustrate.

Before we begin, a word about environments. GlusterFS is over 300K lines of code spread across a few hundred files. That’s no Linux kernel or anything, but you’re still going to be navigating through a lot of code in every code-editing session, so some kind of cross-referencing is essential. I use cscope with the vim bindings, and if I couldn’t do “crtl-\ g” and such to jump between definitions all the time my productivity would be cut in half. You may prefer different tools, but as I go through these examples you’ll need something functionally similar to follow on. OK, on with the show.

The first thing you need to know is that translators are not just bags of functions and variables. They need to have a very definite internal structure so that the translator-loading code can figure out where all the pieces are. The way it does this is to use dlsym to look for specific names within your shared-object file, as follow (from xlator.c):

        if (!(xl->fops = dlsym (handle, "fops"))) {
                gf_log ("xlator", GF_LOG_WARNING, "dlsym(fops) on %s",
                        dlerror ());
                goto out;
        }
 
        if (!(xl->cbks = dlsym (handle, "cbks"))) {
                gf_log ("xlator", GF_LOG_WARNING, "dlsym(cbks) on %s",
                        dlerror ());
                goto out;
        }
 
        if (!(xl->init = dlsym (handle, "init"))) {
                gf_log ("xlator", GF_LOG_WARNING, "dlsym(init) on %s",
                        dlerror ());
                goto out;
        }
 
        if (!(xl->fini = dlsym (handle, "fini"))) {
                gf_log ("xlator", GF_LOG_WARNING, "dlsym(fini) on %s",
                        dlerror ());
                goto out;
        }

In this example, xl is a pointer to the in-memory object for the translator we’re loading. As you can see, it’s looking up various symbols by name in the shared object it just loaded, and storing pointers to those symbols. Some of them (e.g. init are functions, while others e.g. fops are dispatch tables containing pointers to many functions. Together, these make up the translator’s public interface.

Most of this glue or boilerplate can easily be found at the bottom of one of the source files that make up each translator. We’re going to use the rot-13 translator just for fun, so in this case you’d look in rot-13.c to see this:

struct xlator_fops fops = {
        .readv        = rot13_readv,
        .writev       = rot13_writev
};
 
struct xlator_cbks cbks = {
};
 
struct volume_options options[] = {
        { .key  = {"encrypt-write"},
          .type = GF_OPTION_TYPE_BOOL
        },
        { .key  = {"decrypt-read"},
          .type = GF_OPTION_TYPE_BOOL
        },
        { .key  = {NULL} },
};

The fops table, defined in xlator.h, is one of the most important pieces. This table contains a pointer to each of the filesystem functions that your translator might implement – open, read, stat, chmod, and so on. There are 82 such functions in all, but don’t worry; any that you don’t specify here will be see as null and filled with defaults from defaults.c when your translator is loaded. In this particular example, since rot-13 is an exceptionally simple translator, we only fill in two entries for readv and writev.

There are actually two other tables, also required to have predefined names, that are also used to find translator functions: cbks (which is empty in this snippet) and dumpops (which is missing entirely). The first of these specify entry points for when inodes are forgotten or file descriptors are released. In other words, they’re destructors for objects in which your translator might have an interest. Mostly you can ignore them, because the default behavior handles even the simpler cases of translator-specific inode/fd context automatically. However, if the context you attach is a complex structure requiring complex cleanup, you’ll need to supply these functions. As for dumpops, that’s just used if you want to provide functions to pretty-print various structures in logs. I’ve never used it myself, though I probably should. What’s noteworthy here is that we don’t even define dumpops. That’s because all of the functions that might use these dispatch functions will check for xl->dumpops being NULL before calling through it. This is in sharp contrast to the behavior for fops and cbks, which must be present. If they’re not, translator loading will fail because these pointers are not checked every time and if they’re NULL then we’ll segfault. That’s why we provide an empty definition for cbks; it’s OK for the individual function pointers to be NULL, but not for the whole table to be absent.

The last piece I’ll cover today is options. As you can see, this is a table of translator-specific option names and some information about their types. GlusterFS actually provides a pretty rich set of types (volume_option_type_t in options.h) which includes paths, translator names, percentages, and times in addition to the obvious integers and strings. Also, the volume_option_t structure can include information about alternate names, min/max/default values, enumerated string values, and descriptions. We don’t see any of these here, so let’s take a quick look at some more complex examples from afr.c and then come back to rot-13.

        { .key  = {"data-self-heal-algorithm"},
          .type = GF_OPTION_TYPE_STR,
          .default_value = "",
          .description   = "Select between \"full\", \"diff\". The "
                           "\"full\" algorithm copies the entire file from "
                           "source to sink. The \"diff\" algorithm copies to "
                           "sink only those blocks whose checksums don't match "
                           "with those of source.",
          .value = { "diff", "full", "" }
        },
        { .key  = {"data-self-heal-window-size"},
          .type = GF_OPTION_TYPE_INT,
          .min  = 1,
          .max  = 1024,
          .default_value = "1",
          .description = "Maximum number blocks per file for which self-heal "
                         "process would be applied simultaneously."
        },

When your translator is loaded, all of this information is used to parse the options actually provided in the volfile, and then the result is turned into a dictionary and stored as xl->options. This dictionary is then processed by your init function, which you can see being looked up in the first code fragment above. We’re only going to look at a small part of the rot-13′s init for now.

        priv->decrypt_read = 1;
        priv->encrypt_write = 1;
 
        data = dict_get (this->options, "encrypt-write");
        if (data) {
                if (gf_string2boolean (data->data, &priv->encrypt_write) == -1) {
                        gf_log (this->name, GF_LOG_ERROR,
                                "encrypt-write takes only boolean options");
                        return -1;
                }
        }

What we can see here is that we’re setting some defaults in our priv structure, then looking to see if an “encrypt-write” option was actually provided. If so, we convert and store it. This is a pretty classic use of dict_get to fetch a field from a dictionary, and of using one of many conversion functions in common-utils.c to convert data->data into something we can use.

So far we’ve covered the basic of how a translator gets loaded, how we find its various parts, and how we process its options. In my next Translator 101 post, we’ll go a little deeper into other things that init and its companion fini might do, and how some other fields in our xlator_t structure (commonly referred to as this) are commonly used.

Time for Cloud Expo West

Going to Cloud Expo?

Visit Red Hat in booth #408 to see demos, talk to experts, and find out how we can be your partner in the cloud.

Also, the Red Hat Storage / Gluster folks are in booth #317 to show how to effectively scale-out your storage systems for the cloud. To celebrate Red Hat’s cloudiness, we’re also hosting a meetup:

Join us at the Red Hat Cloud Meet-up

Want to hear about our major open source cloud projects all in one location? Feeling a little thirsty? Come meet our cloud computing experts and hear all about our cloud initiatives:

  • The Fedora project – open source community distribution
  • OpenShift PaaS – our developer on-ramp to the cloud
  • Aeolus/DeltaCloud cloud management suite – hybrid cloud resource management on EC2, Rackspace, Azure and many more
  • oVirt virtualization management – management automation of your virtualized KVM environments
  • GlusterFS storage system – a scale-out storage solution for the cloud

The cloud begins with open source and Red Hat enables your move to the cloud.

When: Wednesday, November 9, 7:30 – 9pm

Where: Winchester/Stevens Creek room

Why: Come for the talks, stay for the drinks!

Cloud Expo Conference Sessions

If you’re attending any conference sessions, we have a stable of speakers presenting this year. Here’s what’s coming up:

Ben Golub – Red Hat
Track 5 Wrap-Up: The Future of Cloud Storage
Track: Cloud Storage Virtualization APIs

Vijay Sarathy – Red Hat
General Session | Unlock the Value of the Cloud
Cut Through the Cloud Clutter

David Blado – Red Hat
Deploying Apps to RedHat’s OpenShift: Get Your Code in the Cloud and Your PaaS Over Here!
Track: Hot Topics 2

  • Page 1 of 2
  • 1
  • 2
  • >