Today marks an important milestone in the evolution of Gluster as the storage operating system for public and private clouds. As part of our release of Gluster 3.2, we now have the ability to support Continuous Data Replication (CDR).
As I discussed in my previous post, failures of one form or another are endemic to data center operations. Most frequently—the failures are limited to systems within a data center; disks or servers fail, administrators trip over power cords, networks get segmented, etc. Gluster has long had a series of internal features to minimize the frequency and impact of those failures, including our no-metadata architecture, self healing features, support for RAID 5 &6 within a node, and—of course—the ability to do n-way synchronous replication between nodes.
On occasion, however, failures affect an entire data center. Power lines can get cut, blizzards can keep employees from the data center, and fires, floods, or other natural disasters can take out the entire data center. Of course, the recent issues at AWS illustrated this point.
To prepare for such data-center level failure, it is necessary to support data replication across geographic regions. Gluster’s recently announced Continuous Data Replication allows our private cloud customers to asynchronously replicate a) between data centers across the WAN or b) between their data centers and a public cloud, such as Amazon Web Services. Similarly, Gluster’s public cloud customers can now not only replicate between availability zones in a geographic region, but also between regions.
Generally speaking, the connectivity between geographic regions is slow, expensive, and subject to interruption. If you are storing hundreds of terabytes or petabytes, you need to be sure that replication is both continuous and efficient. Replication should be continuous because a) it is impractical to snapshot and replicate terabytes of data on a regular basis, and b) because the time required to either back up or restore in the event of an incident is often too long for the real-time availability requirements of most enterprises. Similarly, replication should be efficient, because the cost of snapshotting and replicating large data sets can be extremely expensive in terms of both storage and I/O.
Gluster’s Continuous Data Replication meets both requirements. Gluster’s CDR provides continuous, asynchronous and incremental replication service from one site to another over local area networks (LANs), wide area networks (WANs), and across the Internet. Only incremental changes are replicated, eliminating the need for snapshot-like copies of whole files or volumes. In addition, 3.2 features intelligent asynchronous replication in which GlusterFS tracks changes to the primary data and replicates in real time across a WAN. Changes are tracked and queued to ensure data stays synchronized regardless of latency or potential network interruptions.
Ultimately, the significance of CDR goes beyond availability. For the vision of hybrid clouds to fully develop, we must support the ability not only to migrate applications from one data center to another (e.g. VM Migration), but also must make sure that application data is available. While it may be practical to migrate Virtual Machine images on a moment’s notice, it is not practical to migrate terabytes or petabytes of application data at a moment’s notice. Therefore, we need to make sure that that application data is available and waiting in multiple data centers, and that the costs of doing so are not prohibitive. More information on that in a subsequent post.