By Sayandeb Saha, Director, Product Management, Storage Business Unit
In our last blog post in this series, we talked about how the Container-Native Storage (CNS) offering for OpenShift Container Platform from Red Hat has seen increased customer adoption in on-premise environments by offering a peaceful coexistence approach with classic storage arrays that are not deeply integrated with OpenShift. In this post, we’ll explore why many customers are deploying our CNS offering in the three big public clouds—AWS, Microsoft Azure, and the Google Cloud Platform—on top of native public cloud offerings from the public clouds—despite good integration of Kubernetes with native storage offerings in the cloud. Let’s examine some of these problems and constraints in a bit more detail and describe how CNS addresses them.
Slow attach/detach—poor availability
The first issue stems from the fact that the native block storage offerings (EBS in AWS, Data Disk in Azure, Persistent Disk in Google Cloud) in the public cloud were designed and engineered to support virtual machine (VM) workloads. In such workloads, attaching and consequently detaching a block device to a machine image/instance is an infrequent occurrence at best, as these workloads are less dynamic compared to Platform-as-a-Service (PaaS) and DevOps workloads, which frequently run on OpenShift powering dynamic build and deploy CI/CD pipelines and other similar workloads and workflows.
Some of our customers found that attach and detach times for these block devices, when directly accessed from OpenShift workloads using the native kubernetes storage provisioners, are unacceptable because they led to poor startup times for pods (slow attach) and limited or no high availability on a failover, which usually triggers a sequence that includes a detach operation, an attach operation, and a subsequent mount operation.
Each of these operations usually triggers a variety of API calls specific for the public cloud provider. Any or all of these intermediate steps can fail, causing users to lose access to storage persistent volumes (PVs) for their compute pods for an extended period. Overlaying Red Hat’s CNS offering as a storage management fabric to aggregate, pool, and serve out PVs expediently without worrying about the status of individual cloud native block storage (a.k.a EBS or Azure Data Disk) can provide major relief, because it effectively isolates the lifecycle of cloud-native block storage devices from that of the application pods allocating and deallocating PVs dynamically as application teams work on OpenShift. This isolation effectively addresses this issue.
Block device limits per compute instance
The second issue some of our customers run into is the fact that there is a limit to the number of block devices that one can attach to the machine images or instances in various public cloud environments.
OpenShift supports a maximum of 250 containers per host. The maximum number of block devices that are supported to be attached to machine instances per account is far fewer (for example, max 40 EBS devices per EC2 instance). Even though it is unusual to have a 1:1 mapping between containers and storage devices, this low maximum can lead to a lot of unintended behavior, notwithstanding the fact that it leads to a higher total cost of ownership (need more hosts than necessary).
For example, in a failover scenario during the detach, attach, and mount sequence, the API call to attach might fail, because there are already a maximum number of devices attached to the EC2 instance where this attempt is being made, which can cause a glitch/outage. Overlaying Red Hat’s CNS offering as a storage management fabric on cloud-based block devices mitigates the impact of hitting the maximum number of devices that can be attached to a machine image or instance, because storage is served out from a pool that is unencumbered by individual max device per instance/host limit. Storage can continue to be served out until the entire pool is exhausted which, at that time, can be expanded by adding new hosts and devices.
Cross-AZ storage availability
The third issue arises from the fact that cloud block storage devices are usually accessible within a specific Availability Zone (AZ) in AWS or Availability Sets in Azure. AZs are like failure domains in public clouds.
Most customers who deploy OpenShift in the public cloud do so to span more than one AZ for high availability. This is done so that when one AZ dies or goes offline, the OpenShift cluster remains operational. Using block devices constrained to an AZ for providing storage services to OpenShift workloads can defeat the purpose, because then containers must be scheduled within hosts that belong to the same AZ, and customers can not leverage the full power of Kubernetes orchestration. This configuration could also lead to an outage when an AZ goes offline.
Our customers use CNS to mitigate this problem so that even when there is an AZ failure, a three-way replicated cross-AZ storage service (CNS) is available for containerized applications to avoid downtimes. This also enables Kubernetes to schedule pods across AZs (instead of within an AZ), thereby preserving the spirit of the original fault-tolerant OpenShift deployment architecture that spans multiple AZs.
Cost-effective storage consolidation
Storage provided by CNS is efficiently allocated and offers performance with the first gigabyte provisioned, thereby enabling storage consolidation. For example, consider six MySQL database instances, each in need of 25 GiB of storage capacity and up to 1500 IOPS at peak load. With EBS in AWS, one would create six EBS volumes, each with at least 500 GiB capacity out of the gp2 (General Purpose SSD) EBS tier, in order to get 1500 IOPS. The level of performance is tied to provisioned capacity with EBS.
With CNS, one can achieve the same level using only 3 EBS volumes at 500 GiB capacity from the gp2 tier and run these with GlusterFS. One would create six 25 GiB volumes and provide storage to many databases with high IOPS performance, provided they don’t peak all at the same time. Doing that, one would halve EBS cost and still have capacity to spare for other services. Read IOPS performance is likely even higher, because in CNS with three-way replication as data is read from distributed across 3×1500 IOPS gp2 EBS volumes.
Check us out for more
As you can see, there’s a good case to be made for using CNS in various public clouds for a multitude of technical reasons our customers care about, besides the fact that Red Hat CNS provides a consistent storage consumption and management experience across hybrid and multi clouds (see the following figure).
In addition to the application portability that OpenShift already provides across hybrid and multi clouds, we’re working on multi cloud replication features that would enable CNS to effectively become the data fabric that enables data portability—another good reason to select and stay with CNS. Stay tuned for more information on that!
For hands-on experience now combining OpenShift and CNS, check out our test drive, a free, in-browser lab experience that walks you through using both.