GlusterFS is at the core of Gluster’s scale-out storage solutions.  GlusterFS is an open source, distributed file system capable of scaling to several petabytes and handling thousands of clients. GlusterFS clusters together storage building blocks over Infiniband RDMA or TCP/IP interconnect, aggregating disk and memory resources and managing data in a single global namespace. GlusterFS is based on a stackable user space design and can deliver exceptional performance for diverse workloads.

Figure 1. GlusterFS – One Common Mount Point

GlusterFS supports standard clients running standard applications over any standard IP network.  Figure 1, above, illustrates how users can access application data and files in a Global namespace using a variety of standard protocols.

No longer are users locked into costly, monolithic, legacy storage platforms.  GlusterFS gives users the ability to deploy scale-out, virtualized storage – scaling from terabytes to petabytes in a centrally managed and commoditized pool of storage.

Attributes of GlusterFS include:

  • Scalability and Performance
  • GlusterFS leverages a combination of features to deliver a solution that scales from a few terabytes to multiple petabytes. The scale-out architecture allows resources to be added as required for capacity and performance. Disk, compute, and I/O resources can be added independently and higher performance interconnects such as 10GbE and InfiniBand are supported. The Gluster Elastic Hash removes the need for a metadata server, eliminating it as a bottleneck and truly parallelizing data access.

  • High Availability
  • Files can be replicated (mirrored) two or more times to ensure data is always available, even in the event of hardware failure. Self-healing capabilities restore data to the correct state following recovery and is performed incrementally in the background with nearly no overhead. GlusterFS does not use a proprietary format to store files on disk, rather it uses the underlying disk file system in the operating system (e.g. ext3, zfs) so the data is always accessible with standard tools.

  • Global Namespace
  • The unified global namespace aggregates disk and memory resources into a single pool, virtualizing the underlying hardware. Storage resources can scale elastically within the storage pool to grow or shrink as necessary. When storing virtual machine (VM) images an unlimited number of images can be stored and a single mount point can be shared by thousands of VMs. Virtual machine I/O is automatically load balanced within servers in the namespace, eliminating the hotspots and bottlenecks that often occur in SAN environments.

  • Elastic Hash Algorithm
  • Rather than using a centralized or distributed metadata server index, GlusterFS employs an elastic hash algorithm to locate data in the storage pool. The metadata server is a common source of I/O bottlenecks and vulnerability to failure in other scale-out storage systems. All storage systems in the scale-out storage configuration have the intelligence to locate any piece of data without looking it up in an index or querying another server. This fully parallelizes data access and ensures linear performance scaling.

  • Elastic Volume Manager
  • Data is stored in logical volumes that are abstracted from the hardware and logically partitioned from each other. Storage servers can be added or removed while data continues to be online with no application interruption. Volumes can grow or shrink across machines in the scale-out storage configuration and can be migrated within the storage configuration to rebalance capacity or add/remove systems on-the-fly. File system configuration changes can be made at run-time and immediately applied to adapt to changing workload conditions or for live performance tuning.

  • Gluster Console Manager
  • The Command Line Interface (CLI), Application Programming Interface (API) and shell are merged into a single powerful interface, enabling automation by giving the CLI higher level API’s and scripting capabilities. Languages such as Python, Ruby or PHP can be used to script a series of commands that are invoked through the command line. This new tool requires no new APIs and is able to script-out and rapidly automate any information inserted in the CLI allowing cloud administrators the ability to simply automate large scale operations.

  • Standards-based
  • Gluster storage servers support NFS and the Gluster protocol natively, as well as CIFS, HTTP and FTP. Gluster is fully POSIX-compliant and does not require any unique APIs for data access, and existing applications do not need to be modified to be supported. This is especially useful when deploying Gluster in public cloud environments, as Gluster abstracts the cloud vendor-specific APIs and presents a standard POSIX interface.

    [intlink id=”19″ type=”page”]Learn more about how to deploy GlusterFS in Amazon EC2 and VMware environments[/intlink].