Quick Look at XtreemFS

I like checking out other projects in my space, so yesterday I had a quick look at XtreemFS. This is one of the more interesting projects out there IMO, with a focus on distribution/replication beyond a single data center. The quick blurb says:

Clients and servers can be distributed world-wide. XtreemFS allows you to mount and access your files via the internet from anywhere.

With XtreemFS you can easily replicate your files across data-centers to reduce network consumption, latency and increase data availability.

open source
XtreemFS is fully open source and licensed under the GPLv2.

OK, sounds good. Since I was working from home, I decided to run some simple tests between Lexington and Westford, which might not seem like a long distance except that I was going through a VPN server in Phoenix. Installation was fairly straightforward using their RHEL6 repository. Initial setup using their Quick Start instructions. I was able to create a volume, mount it, and read/write data from both of my machines in practically no time. Well done, guys.

The next step was to do some actual replication, and that’s where things started to get a bit ragged around the edges. First, I just have to say that replication only of explicitly read-only files impresses me as little as I expected. Also, the process to perform the actual replication seems both cumbersome and error-prone. The instructions for this require at least two steps:

xtfs_repl --set_readonly ~/xtreemfs/movie.avi
xtfs_repl --add_auto --full ~/xtreemfs/movie.avi

Then those instructions didn’t even seem to work. It turned out that the problem was my own fault (insufficient iptables magic on my two machines) but the way the error presented itself was problematic. The actual commands just paused for a long time and then threw a generic I/O error. The logs had big Java tracebacks ending with “Set Replication Strategy not known” messages. This led me down a big blind alley trying to set the strategy on the second xtfs_repl command before I figured out the real problem; I suspect many users who haven’t been thinking about replication strategies for years might have felt even more lost.

The other problem I ran into this morning. My machine at home is no longer accessible, but the DIR and MRC processes plus one OSD are running here at work so I thought that I should be able to operate normally except for not replicating across sites. Wrong. When I tried to build in the iozone tree I had unpacked yesterday, I again saw long pauses followed by the thoroughly misleading “Set Replication Strategy not known” message in the OSD log. Further investigation suggests that the real problem is the iozone build process trying to modify old files that are marked read-only, but that should yield a pretty obvious EPERM/EROFS sort of error. Creating a separate volume and unpacking/building there seemed to work, though. This did make me wonder, though, about how well availability across sites really works. The site says that DIR and MRC replication are supposed to be features in version 1.3 (scheduled for Q1/10 but I don’t see any signs of 1.3 having been released yet. I looked around a bit for instructions on how to set up a redundant DIR/MRC with manual failover, but didn’t find any. As far as I can tell, XtreemFS still requires that remote sites be able to contact a primary DIR/MRC site even though their data might reside locally. That’s OK considering that most other distributed filesystems are exactly the same way, but since distribution across sites was (in my mind) XtreemFS’s main distinguishing feature it was a bit of a disappointment. If the situation is actually better than what I’ve presented here, then I hope one of the XtreemFS developers (with whom I’ve corresponded in the past) will stop by and point me in the right directions.

I know all of that seems like a bit of a downer, but I’d like to end on a high note. Once I had fixed my own configuration issues, and as long as I stayed within the limitations I’ve mentioned, XtreemFS was the only distributed filesystem besides GlusterFS that could get through my “smoke test” without crashes, hangs, or data corruption. That might not seem like a very high standard considering that the test is just iozone reading and writing files sequentially, but four out of six distributed filesystems that I’ve tested (or tried to test) couldn’t even get that far. I wasn’t testing on systems where performance results would be really meaningful except to say that I test GlusterFS this way all the time and XtreemFS performance didn’t seem radically different. The fact that XtreemFS can handle even that much, along with the relative ease of installation and setup, already puts it at #2 on my list. I expect that when 1.3 does come out it will address at least some of the issues I’ve mentioned and offer a worthwhile choice for those who are interested in its unique feature set. I highly recommend that anyone interested in this area give it a look.