Why traditional storage doesn’t cut it in the new world of containers

By Steve Bohac, Red Hat Storage

Persistent storage for containers is a hot topic these days. While containers do a great job of storing application logic, they do not offer a built-in solution for storing application data across the lifecycle of containers. Ephemeral (or local) storage is not enough–Stateful applications require that the container data be available beyond the life of the containers. They also require that the underlying storage layer provide all the enterprise features available to applications that are deployed in, say, virtualized environments.

Another important consideration is that because many view containers as the next step in the evolution of server virtualization, it’s critical to provide persistent storage options to administrators because hypervisors have always allowed for persistent storage in one form or the other.

One approach is to use traditional storage appliances that support legacy applications. This is a natural inclination and assumption, but… the wrong one.

Traditional storage appliances are based on decades-old architectures at this point and were not made for a container-based application world. These approaches also fail to offer the portability you need for your apps in today’s hybrid cloud world. Some of these traditional storage vendors offer additional software for your containers, which can be used as a go-between for these storage appliances and your container orchestration, but this approach still falls short as it is undermined by those same storage appliance limitations. This approach would also mean that storage for the container is provisioned separately from your container orchestration layer.

There’s a better way! Storage containers containing storage software co-­reside with compute containers and serve storage to the compute containers from hosts that have local or direct-attached storage. Storage containers are deployed and provisioned using the same orchestration layer you’ve adopted in house (like Red Hat OpenShift Container Platform, which is Kubernetes based), just like compute containers. In this deployment scenario, storage services are provided by containerized storage software (like Red Hat Container-Native Storage based on Red Hat Gluster Storage) to pool and expose storage from local hosts or direct-attached storage to containerized applications.

Red Hat Container-Native Storage for Red Hat OpenShift Container Platform is built with Red Hat Gluster Storage and is flexible, cost-effective, and developer-friendly storage for containers. It helps organizations standardize storage across multiple environments and easily integrates with Red Hat OpenShift to deliver a persistent storage layer for containerized applications that require long-term storage. Enterprises can benefit from a simple, integrated solution including the container platform, registry, application development environment, and storage–all in one, supported by a single vendor.

To hear from one customer who implemented a Red Hat Container-Native Storage solution, please check out our Brinker International case study. Also, take our solution for a free test drive and see for yourself.

If you–like we are–attending KubeCon and CloudNativeCon in Austin, Texas, this week, we’d love to take a minute to meet with you and talk about Red Hat Container-Native Storage. Stop by the Red Hat booth (D1, near the Hub Lounge) or attend one of our sessions devoted to container storage to learn more about running Red Hat Container-Native Storage for your container-based application platform. Also, our own Steve Watt from the Red Hat Office of the CTO will be speaking from the show on theCube tomorrow, December 7, as well. If you’re not able to make it to Austin, please find us at a roadshow event coming to a city near you.

Red Hat Ceph Storage 3: Featuring CephFS and iSCSI support and containerized storage daemons

By Douglas Fuller, Red Hat Ceph Storage Engineering

If you missed last week’s huge announcement about Red Hat Ceph Storage 3, you can find details here. To quickly get you up to speed, though, the big news in this release is around enabling a large variety of storage needs in OpenStack, easing migration from legacy storage platforms, and deploying enterprise storage in Linux containers.

CephFS is here!

One of the highlights of the Red Hat Ceph Storage 3 announcement was production support for CephFS. This delivers a POSIX-compliant shared file system layered on top of massively scalable object storage. Client support is available in the Red Hat Enterprise Linux 7.4 kernel and via FUSE. CephFS leverages Ceph’s RADOS object store for data scalability as well as a natively clustered metadata server (MDS) for metadata scalability, high availability, and performance.

One cluster to do it all

Red Hat Ceph Storage uses the CRUSH structured data distribution scheme, enabling users to deploy a highly scalable and reliable file system using industry-standard, commodity hardware. Expensive, custom-engineered RAID controllers are no longer necessary. Expanding a CephFS deployment is as easy as expanding a Ceph cluster—CRUSH smoothly manages cluster changes, including expansions with new or different hardware.

Have a hybrid storage cluster with SSDs, HDDs, and NVMe devices? CRUSH can divide your storage workload across any and all devices for maximum performance where you need it and maximum capacity at commodity cost where you don’t. This allows disparate workloads—such as scratch, home, or archive data—to coexist in the same cluster using different or overlapping hardware as needed.

In addition, CephFS’s MDS may be dynamically provisioned and resized online to maximize performance and scalability. For metadata-intensive workloads, the Ceph MDS cluster can repartition its workload, either statically or dynamically, online in response to demand. It’s also fault-tolerant by design, with no need for passive standby or expensive and complex “Shoot the Other Node in the Head” (STONITH) configurations to maintain constant availability.

Take the “cluster” out of cluster management

Red Hat Ceph Storage 3 deploys with Red Hat Ansible Automation, integrating smoothly into existing cluster management environments. Now you can deploy and manage compute and storage both using Ansible playbooks.

New in Red Hat Ceph Storage 3 is a REST API for cluster data and management tools. Monitoring tools are available out of the box to provide detailed health and performance data across your Ceph cluster.

A million uses and counting

Red Hat Ceph Storage offers great flexibility to customers. It can be deployed across a wide variety of storage applications, allowing enterprises to manage one unified system supporting block, file, and object interfaces. With the added flexibility of iSCSI support, users from heterogeneous environments—such as VMware and Windows—can leverage the power of the storage platform.

This flexibility is extremely attractive to organizations such as academic research institutions, many of which are participating in the SuperComputing17 conference in Denver this week. Their IT departments have the onerous task of supporting complicated workflows and yet have to work with shoestring budgets in many cases.

To learn more, check out this additional blog post, and join us at the Red Hat SC17 booth (1763) for presentations, swag, and more.

Gluster linear scaling: How to choose wisely

We talk a lot about the linear scalability of Red Hat Gluster Storage, and we can generally back that up with empirical data. Indeed, homogeneously scaling out the storage nodes and network infrastructure can result in both capacity and throughput capabilities that are directly proportional. But it’s important to note that this is potential scalability, and how you use the volumes plays a vital role in the experience you have.

We architect optimal solution recommendations based on a few expectations:

  1. Most of the workload falls into a particular category—high throughput, small file, or latency sensitive, for example.
  2. When your capacity needs grow, so do your concurrent client demands.
  3. You’re using the glusterfs native client.

Let’s take a look at these points and how they affect your real scalability.

Architecting for workload

We know through thousands of test cycle results that there is a generally optimal server configuration that will apply broadly to a majority of workloads. This compiled knowledge is a huge benefit to you, the user, and it can greatly reduce your own time commitment in designing and testing fundamental system architectures. However, just up the stack from the server and network components are low-level configuration choices that you will make for every deployment. These choices are the big knobs—Particular to your workload there is likely one best choice for peak performance. And it’s important to note that these aren’t choices you can easily change later. Changes at these layers likely require moving data, potentially more than once, and data has inertia.

When you understand your majority workload, and preferably you isolate dislike workloads entirely, you will be positioned to make choices about server density (12, 24, or higher drive capacity), block-level configurations (e.g., HDD vs. SSD, RAID vs. JBOD, caching vs. not, block and stripe sizes), and Gluster volume geometry (e.g., replicated vs. dispersed, failure resiliency, arbiter bricks, tiering). Locked into these choices and the related workload, you’ll find it reasonably simple to integrate new nodes and bricks into the volume for predictable capacity and performance expansion.

Client concurrency

So you’ve built to your workload and everything is great. That is, unless your expectations aren’t aligned with a scale-out solution. Any single connection to the storage pool is bound by physics. One client communicates over one network link to one server to one file system and block stack. Sure, some design options allow for single-client concurrency to multiple stacks, but those come at a trade off, and each connection is still bound by physics and bottlenecked somewhere along the line. So if your need is to provide expanded throughput capabilities to a single or a small number of clients, you will likely find that horizontal scale-out won’t give you much performance benefit. There are some tricks we can use to architect for such a need, but it will never be an efficient solution.

To that end, an optimal design assumes you are operating at an appropriate client:server concurrency ratio. The best ratio will vary with your workload and the architecture decisions you make per the preceding discussion, but you can expect for most cases a ratio range of 12:1 to 48:1 to be appropriate for peak or plateau storage throughput capabilities. So if you build out a 12-node storage pool based on your capacity needs and then expect 4 client systems to use that storage concurrently, you’ll bottleneck on the server node I/O stack long before you saturate the aggregate system capabilities. But with an appropriate concurrent client count of say 150+ for your 12 server nodes, you may be operating at the peak capabilities of the system.

Client choice

Great! So you’re heeding all the advice here, and you’re going to deploy 12 Red Hat Gluster Storage nodes in an optimal architecture for 150 NFS clients. Well, hold on there a minute, buckaroo. We’re more than happy to support the NFS client, but you should know what you’re getting into.

When using the Gluster native client, data placement calculations are made on the client side. This means that each client is fully aware of the volume geometry and all server nodes participating, allowing it to determine how the data protection scheme is applied and which nodes and backend filesystems (bricks) each file will be written to. All client-to-server connections are then made efficiently based on this client-side intelligence. And because data placement among the distributed system is done pseudo-randomly, there is a statistically even distribution of work between the clients and servers and therefore predictable performance scalability.

When choosing NFS (or SMB), a client will make its connections to a single Gluster server. That server then has to apply the client-side intelligence for data resilience, conversion, and placement, and it will then make secondary network calls out to each participating server node for the file transaction. This inefficiency leads to a concurrency bottleneck far below the capabilities of the native client—You’ll still hit peak throughput at about the same client:server ratio, but that throughput will be well below what can be achieved on the same systems with the native client.

The one surprise that can come up with the NFS client is that if you do indeed require a lower client:server ratio, NFS can in some conditions outperform the native client at that concurrency level. YMMV on this, and you’ll still be far below the peak capabilities of the system, but it’s worth testing out if you’re absolutely determined to connect your 4 clients to your 12 Gluster nodes (but don’t say I didn’t warn you not to).

Oh yeah? Prove it.

Lucky for you, I did that already. Take a look at our published reference architectures and, in particular, our most recent Gluster Performance and Sizing Guide. And keep an eye out here for future publications as we continue to expand and refine our data.

Gartner pegs Red Hat as storage visionary. Two years in a row.

Red Hat Storage strengthens position relative to key competitors

It’s finally here! The Gartner 2017 Magic Quadrant for Distributed File Systems and Object Storage.

We’re extremely excited to announce that Gartner has once again positioned Red Hat in the “Visionaries” quadrant. More important, Red Hat is the furthest to the right and top in the visionary quadrant since last year’s Magic Quadrant. Gartner’s Magic Quadrant judges vendors on completeness of vision and ability to execute.

We are humbled and excited with this new development. It corroborates key investments and strategic product decisions taken over the years by our leadership to deliver tangible and substantial value to customers. The relative movement of Red Hat vis-à-vis established storage vendors tells a compelling story about the rapidly changing enterprise storage landscape.

Some of the key highlights from the past year include:

  • Continued leadership with container-native storage to enable a unified storage platform for cloud-native applications and container infrastructure.
  • Customer traction across geographies and industry verticals that brings to bear Red Hat’s vision of storage for the open hybrid cloud.
  • Strong leadership in upstream open source communities for private cloud infrastructure and Infrastructure as a Service (IaaS), as well as enterprise adoption of highly elastic object storage.
  • Breakthrough innovation around open source hyperconverged infrastructure for remote-office/branch-office and IoT use cases.

You can download a complimentary copy of the Gartner 2017 Magic Quadrant for Distributed File Systems and Object Storage here.

Container-native storage for the OpenShift masses

By Daniel Messer, Red Hat Storage

 

Red Hat Container-Native Storage 3.6, released today, reaches a new level of storage capabilities on the OpenShift Container Platform. Container-native storage can now be used for all the key infrastructure pieces of OpenShift: the registry, logging, and metrics services. The latter two services come courtesy of the new block storage implementation. Object storage is now also available directly to developers in the form of the well-known S3 API. Administrators will enjoy a more robust cns-deploy utility, support for online volume expansion, and more choice in deployment topologies in the OpenShift Advanced Installer. Last, but just as important, it now supports more concurrent workloads serving over 1,000 persistent volumes with just 3 nodes.

________________________________________

You know you must be doing something right when some of your users are looking to use your technology in different ways than expected. Initially, the idea of running GlusterFS alongside Kubernetes and OpenShift promised the ability to use a distributed storage system with a framework for distributed applications. They goes nicely together because both approaches are entirely based on scale-out software, hence independent of the underlying platform, and they are driven by a declarative API-driven design. On the GlusterFS side, that API is available in the form of an additional software daemon, called heketi. Things soon took a new direction when the first experiments of running the GlusterFS/heketi combination as an OpenShift workload were conducted.

A lot of engineering cycles later, the idea of hacking GlusterFS onto OpenShift has emerged to a fully supported product offering: container-native storage. Today, we are happy to announce container-native storage 3.6.

For the impatient: In essence, we have taken container-native storage from being an optional supplement in OpenShift to being a storage solution that now serves file, block, and object storage to applications on top of OpenShift and to the entire OpenShift internal infrastructure, as well.

For the curious reader, let’s go see how we did that….

Increase density

The first thing we had to do was ensure that container-native storage was a robust, scalable, long-term solution for the different possible OpenShift cluster sizes. When we launched container-native storage with OpenShift 3.2 last summer, the container images were based on Red Hat Gluster Storage 3.1.3 and, on average, each brick process on a GlusterFS host/pod consumed about 300 MB of RAM.

That may not sound like much, but you have to be aware that every PersistentVolume served by container-native storage results in a GlusterFS volume being created. Bricks are local directories on GlusterFS pods that make up volumes. The consistency of volumes across all its bricks (by default, 3 in container-native storage) is handled by the glusterfsd process, which is what consumes the memory.

In older releases of Red Hat Gluster Storage, there was one such process per brick on each host. It’s easy to see that with potentially hundreds of application pods in OpenShift requiring their own PersistentVolumes, the resulting number of brick processes in each GlusterFS pod will easily consume gigabytes of RAM and would create a significant effort to coordinate in each pod.

That many processes in a pod are an anti-pattern for Kubernetes and, even if we would have broken out those in separate containers, the memory overhead would still be huge.

Fortunately, Red Hat Gluster Storage 3.3 came to the rescue. Released just a little over 2 weeks ago, it introduced a new feature called brick-multiplexing. It’s easier to depict how this feature changes the structure of a GlusterFS pod in a diagram than a lengthy explanation:

With brick-multiplexing, only one glusterfsd process is governing the bricks such that the amount of memory consumption of GlusterFS pods is drastically reduced and the scalability is significantly improved.

By introducing brick-multiplexing in version 3.6, we are able to support over 1,000 PersistentVolumes in a single container-native storage cluster. The amount of memory consumed increases linearly, so that 32GB of RAM are only needed at the high end of that. The rule of thumb is roughly 30-35 MB RAM per volume on each of the participating GlusterFS pods.

Container-native storage can probably support an even greater number of volumes, and we hope to confirm that soon. Until then, you always have the option to either run more GlusterFS pods in your OpenShift cluster or deploy a second container-native storage cluster, governed by the same Heketi API service.

Optimized storage for logging/metrics

File storage is what containers on OpenShift (and in general) deal with today. It’s a ubiquitous, well-understood concept. There are also proposals for native access to block devices in pods, but they are still in design or planning phases.

That is—at least for now—storage (including block) in Kubernetes and OpenShift always ends up being a mounted file system on the host running the pod, which is then bind-mounted to the target container’s file system namespace. Block storage provisioners in OpenShift eventually format the device with XFS too, before handing it over to the container.

GlusterFS is a distributed, networked file system which, in contrast to local filesystems like XFS, allows shared access from multiple hosts and stores the data in the backend distributed across multiple nodes. This big advantage does not come without cost, however: Some type of operations that are fast and cheap on a local file system are quite expensive in a distributed file system.

For some workloads (e.g., OpenShift Logging and Metrics), this can be a show-stopper. To properly support those, we designed something that might seem counter-intuitive at first: gluster-block. Take a look at the implementation scheme below:

Yes, you see that right: We are using TCM (the Linux kernel’s iSCSI stack, also called LIO) managed by targetcli to create iSCSI LUNs from files on a GlusterFS volume and present those as block devices to pods. The TCM stack allows local storage of a Linux system to be made available on the network via the iSCSI protocol. In our specific case, the local storage is a large raw file on a GlusterFS volume. On the client side, the iSCSI block device will be formatted with XFS and then bind-mounted to the target container’s file system namespace.

But why go through all the trouble? In distributed file systems—and here GlusterFS is no exception—metadata-intensive operations like file create, file open, or extended attribute updates are particularly expensive and slow compared to a local file system. In particular, indexing solutions likes ElasticSearch (part of OpenShift Logging) and scale-out NoSQL databases like Cassandra (part of OpenShift Metrics) generate such workloads.  But also other database software might make heavy use of locking and byte-range locking, which are costly compared to simple read and writes.

In order to qualify OpenShift Metrics and Logging Services to run well on a container-native storage backend, a significant speed up was needed for a lot of special file system operations like these.

You can probably guess what we were thinking: In software, many problems can be solved by adding an additional layer of indirection.

The indirection in accessing data on GlusterFS via iSCSI instead of a normal GlusterFS mount converts otherwise expensive file system operations to a single stream of continuous reads and writes to a single raw file on GlusterFS. The TCM stack delivers this IO stream over the network via iSCSI. On the receiving end, the file in GlusterFS backing the iSCSI LUN is accessed via libgfapi, a userspace library to access files in GlusterFS without the need to mount a volume.

The clients, in our case containers in pods on OpenShift, still write to an XFS file system the iSCSI LUN is formatted with. As a result, simple client-level read and write requests remain virtually as fast as accessing the file directly on GlusterFS, but also all the other file system operations are converted into much faster reads and writes to the file backing the block volume because they are not distributed. From the perspective of GlusterFS, it’s a constant stream of basic read and write requests, which GlusterFS is efficient at. Of course, this comes with a trade-off: gluster-block is not shared storage.

Container-Native Storage version 3.6 now provides backend storage for OpenShift Logging and OpenShift Metrics with gluster-block. For the moment, the use of gluster-block in production is only supported for OpenShift Logging and Metrics services, but use of gluster-block beyond that is under qualification, and support is expected to be extended soon.

The Logging and Metrics services have strict performance and latency requirements and are important for any OpenShift cluster in production. They provide vital information and debugging capabilities for administrators. By design, they are scale-out services, because their storage backend (ElasticSearch for Logging, Cassandra for Metrics) supports a shared-nothing approach. However, in production you do not want additional shards of ElasticSearch and Cassandra run side-by-side with your application pods. That’s why there is a concept of infrastructure nodes in OpenShift that do not run business applications but are dedicated to OpenShift infrastructure components like these. Typically, these kind of servers only have storage locally available, which is limited in capacity and performance. Thus, it might quickly become insufficient to store the logs and metrics of hundred of pods. With container-native storage, you now have a scalable, robust, and long-term storage solution for logging and metrics that utilizes the entire cluster’s storage capacity.

Support a scale-out registry

There is one additional component in OpenShift that’s crucial for operations: the container image registry. This is where all the resulting images from source-to-image builds will be pushed to and where developers can upload their custom images. If it’s unavailable, those operations will fail, and users will be unable to launch new or update existing applications.

The default configuration for the OpenShift registry is to use `emptyDir` storage, that is, a local file system on the container host that depends on the registry pod’s lifetime. In this setup, the registry, of course, cannot be scaled out, updated, or restarted on another host.

Fortunately, as of version 3.5, container-native storage allows for a scale-out registry using shared storage on a PersistentVolume served by GlusterFS. This has several advantages:

  1. No external storage is required, like NFS, which can cause problems with metadata consistency with a busy registry.
  2. There is no dependency on provider storage (e.g., AWS S3 being unavailable in a VMware environment) for shared data access.
  3. The registry can now be scaled out, ideally across all infra nodes.
  4. The registry storage backend can grow dynamically with the platform.

The beauty of this is that it can be installed like this right away. Like we’ve already covered during the announcement of OpenShift Container Platform 3.6 earlier this year, the OpenShift Advanced Installer now supports deploying container-native storage and the registry on container-native storage out of the box. See this video here for details.

All you have to do since OpenShift Container Platform 3.6 is add a few lines to your Ansible inventory file.

To deploy an OpenShift registry backed by container-native storage, first add the following variable definition in the [OSEv3:vars] section:

openshift_hosted_registry_storage_kind=glusterfs

And then add a new host group defining the container-native storage nodes to the inventory, for example:

[glusterfs_registry]
infra-1.lab glusterfs_devices='[ "/dev/sdd" ]'
infra-2.lab glusterfs_devices='[ "/dev/sdd" ]'
infra-3.lab glusterfs_devices='[ "/dev/sdd" ]'

This is enough to tell the OpenShift Advanced Installer that it should create a basic 3-node container-native storage cluster, in this case on the infrastructure nodes, using the supplied devices to create bricks. From this cluster a PersistentVolume will be created and supplied to the registry DeploymentConfig.

That way the registry will be launched with shared storage, provided by container-native storage, and scaled to 3 instances across the infrastructure nodes. You get a highly available and robust registry out of the box with no additional configuration needed.

S3 object storage for applications

In addition to providing block and file storage services, Container-Native Storage 3.6 now provides an S3 object storage interface as a TechPreview. Application developers have a ready-to-use REST API at hand to provide object storage to workloads on OpenShift, just a HTTP PUT or GET request away.

Object storage in Red Hat Container-Native Storage 3.6 provides a simple yet scalable storage layer for distributed applications that were previously tied to specific cloud provider S3 object storage. These application now run with little or no modification on OpenShift.

In this implementation, a gluster-s3 service is deployed as a pod in your OpenShift cluster, and an OpenShift Route is generated for it. The Route’s URL is provided to applications as their S3 endpoint. The service receives the S3 requests and translates those to file system operations on GlusterFS volumes. The S3 buckets and objects are stored as directories and files on that volume, respectively.

For now, this service can be deployed with the cns-deploy utility. There are some new command switches available for this purpose:

cns-deploy topology.json --namespace gluster-storage --log-file=cns-deploy.log --object-account dmesser --object-user dmesser --object-password redhat

The new parameters allow you to specify a name for the S3 account (object-account, an aggregate of multiple S3 buckets, one per CNS cluster), a named user (object-user), and the authentication password for that user in that account (object-password). Once all of these 3 switches are presented, cns-deploy will create the glusterfs-s3 infrastructure.

Support for doing this with the OpenShift Advanced Installer is expected to follow soon. The design foresees exactly one S3 domain/account per CNS cluster, although multiple CNS clusters can be deployed easily.

Improvements for deployment and operations

Besides a whole bunch of new features, we’ve also introduced improvements in usability to make the container-native storage experience better.

In Container-Native Storage 3.6, the cns-deploy tool has been improved in a number of ways. It is now more idempotent, allowing the administrator to run the installer multiple times without having to start from scratch. There will still be error scenarios that may require manual intervention, but it should be much easier to recover from such errors. It will also deploy the required resources to use gluster-block and gluster-s3. Combined with the idempotency improvements, administrators will be able to run cns-deploy to deploy those features into an environment that’s already running container-native storage.

Container Native Storage 3.6 also provides improved integration with container-ready storage. All of our new features will work just as well on container-ready storage as container-native storage. In addition, we have introduced support for a configuration we’re calling Container-Ready Storage without Heketi. heketi is the volume management API service for GlusterFS. In this configuration, container-ready storage runs with the usual Red Hat Gluster Storage nodes outside the OpenShift cluster, but heketi resides as a pod within OpenShift. This has the advantage of making the heketi service highly available rather than residing on a single machine. For new deployments, the cns-deploy can be used to initialize a container-ready storage cluster in this configuration.

Another common scenario that is likely to occur over time, even with the short-lived nature of some workloads, is PersistentVolumes filling to capacity. This can happen when a user under-estimates the required capacity for a workload or the pod simply runs way longer than expected. In any case, heketi now allows for online volume expansion.

To take advantage of this, simply use the heketi-client on the CLI to expand the size of any given volume:

heketi-cli volume expand --volume=0e8a8adc936cd40c2df3698b2f06bba9 --expand-size=2

In the background, heketi changes the GlusterFS volume layout from a 3-way replicated to distributed-replicated. See below for a comparison from GlusterFS perspective.

Before volume expansion:

sh-4.2# gluster vol info vol_0e8a8adc936cd40c2df3698b2f06bba9

Volume Name: vol_0e8a8adc936cd40c2df3698b2f06bba9
Type: Replicate
Volume ID: 841bd097-659b-4b5d-b3ec-56bb8cc51c2f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.20.5.232:/var/lib/heketi/mounts/vg_c05319c8a95eaa083adbedb7d43913fa/brick_4bf9ae183dacceccf4bf525186850bdd/brick
Brick2: 10.20.6.239:/var/lib/heketi/mounts/vg_bd7fbf9053d6340771f7b75ce2872339/brick_e1175aaaa8596aedc18bf8c56b42fe8d/brick
Brick3: 10.20.4.184:/var/lib/heketi/mounts/vg_0797a1d458309eec3e5e818a9b87f6c6/brick_2b5255cc2c0297e4e34eb6f1b4319fb9/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on

After volume expansion:

sh-4.2# gluster vol info vol_0e8a8adc936cd40c2df3698b2f06bba9

Volume Name: vol_0e8a8adc936cd40c2df3698b2f06bba9 
Type: Distributed-Replicate 
Volume ID: 841bd097-659b-4b5d-b3ec-56bb8cc51c2f 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 2 x 3 = 6 
Transport-type: tcp 
Bricks: 
Brick1: 10.20.5.232:/var/lib/heketi/mounts/vg_c05319c8a95eaa083adbedb7d43913fa/brick_4bf9ae183dacceccf4bf525186850bdd/brick 
Brick2: 10.20.6.239:/var/lib/heketi/mounts/vg_bd7fbf9053d6340771f7b75ce2872339/brick_e1175aaaa8596aedc18bf8c56b42fe8d/brick 
Brick3: 10.20.4.184:/var/lib/heketi/mounts/vg_0797a1d458309eec3e5e818a9b87f6c6/brick_2b5255cc2c0297e4e34eb6f1b4319fb9/brick 
Brick4: 10.20.6.239:/var/lib/heketi/mounts/vg_bd7fbf9053d6340771f7b75ce2872339/brick_c48d4ea4b43635f62c464ddf0259d733/brick 
Brick5: 10.20.4.184:/var/lib/heketi/mounts/vg_0797a1d458309eec3e5e818a9b87f6c6/brick_121fbc266c905311d8a8810f221fbdca/brick 
Brick6: 10.20.5.232:/var/lib/heketi/mounts/vg_c05319c8a95eaa083adbedb7d43913fa/brick_5f208c680444b4820f53c923aa079614/brick 
Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.brick-multiplex: on 

Finally, with Container-Native Storage 3.6, we have expanded the amount of technical documentation available. We provide more examples of things both new and pre-existing that you can do with container-native storage, as well as detailed upgrade procedures from a variety of configurations to make sure you can get the latest set of features.

Verdict

The storage play for containers is an exciting space at the moment. There are many options available for customers, and Red Hat container-native storage is unique in the way it runs natively on OpenShift and provides scalable shared file, block, and object storage to business applications and container platform infrastructure.

OpenShift Container Platform 3.6: Streamlined installation and configuration of Red Hat Gluster Storage for containers

By Erin Boyd, Jose Rivera, and Scott Creeley, Red Hat

Did you ever get a new, exciting toy only to have that excitement squashed by the phrase “Batteries not included”?

With the introduction of Red Hat OpenShift Container Platform 3.6, no longer will customers have to wait or jump extra hurdles to get resilient, persistent storage with their new installations. Now they can more easily deploy Red Hat Gluster Storage ready for use by their containerized applications—This is PaaS with batteries included!

With the release of Red Hat OpenShift Container Platform 3.6, users will have the convenience of using a single tool to use Red Hat Gluster Storage as either container-native storage (CNS) or container-ready storage (CRS) alongside the rest of their OpenShift installations. As part of the OpenShift Advanced Installation, users can specify two new storage options: Red Hat Gluster Storage for (1) hosted registry storage or (2) general application storage. To facilitate evaluation of these, an Openshift Container Platform evaluation subscription now includes Red Hat Gluster Storage evaluation binaries and subscriptions.

Following is a sample inventory file that would be used with an OpenShift Container Platform Advanced Installation that deploys two CNS clusters for both hosted registry storage and general application storage.

[OSEv3:children]
 masters
 nodes
 glusterfs_registry
 glusterfs

[OSEv3:vars]
 ansible_ssh_user=root
 openshift_master_default_subdomain=cloudapps.example.com
 openshift_deployment_type=openshift-enterprise
 openshift_hosted_registry_storage_kind=glusterfs
 openshift_disable_check=disk_availability,memory_availability

[nodes]
 master1 node=True storage=True master=True openshift_schedulable=False
 node1 node=True storage=True openshift_node_labels="{'region': 'infra'}"
 openshift_schedulable=True
 node2 node=True storage=True openshift_node_labels="{'region': 'infra'}"
 openshift_schedulable=True
 node3 node=True storage=True openshift_node_labels="{'region': 'infra'}"
 openshift_schedulable=True
 node4 node=True storage=True openshift_schedulable=True
 node5 node=True storage=True openshift_schedulable=True
 node6 node=True storage=True openshift_schedulable=True

[glusterfs_registry]
 node1 glusterfs_devices="[ '/dev/xvdc' ]"
 node2 glusterfs_devices="[ '/dev/xvdc' ]"
 node3 glusterfs_devices="[ '/dev/xvdc' ]"

[glusterfs]
 node4 glusterfs_devices="[ '/dev/xvdc' ]"
 node5 glusterfs_devices="[ '/dev/xvdc' ]"
 node6 glusterfs_devices="[ '/dev/xvdc' ]"

[masters]
 master1 node=True storage=True master=True openshift_schedulable=False

Let’s go over the highlighted portions in detail.

The first section defines the host groups the installation will be using. We’ve defined two new groups: (1) glusterfs_registry and (2) glusterfs. The first specifies a cluster that will host a single volume for use exclusively by a hosted registry. The second specifies a cluster for general application storage and will, by default, come with a Storage Class to enable dynamic provisioning.

[OSEv3:children]
 masters
 nodes
 glusterfs_registry
 glusterfs

In the following section, we indicate that we want the hosted registry to use Red Hat Gluster Storage for its storage needs.

[OSEv3:vars]
 ansible_ssh_user=root
 openshift_master_default_subdomain=cloudapps.example.com
 openshift_deployment_type=openshift-enterprise
 openshift_hosted_registry_storage_kind=glusterfs
 openshift_disable_check=disk_availability,memory_availability

In the [nodes] section, we need to specify all nodes in the OpenShift Container Platform cluster. For our installation, we also need to specify which nodes will run pods for the hosted registry. This is done by specifying “openshift_node_labels=”{‘region’: ‘infra’}”” for each such node. It is recommended to have at least three nodes running your hosted registry.

[nodes]
 master1 node=True storage=True master=True openshift_schedulable=False
 node1 node=True storage=True openshift_node_labels="{'region': 'infra'}"
 openshift_schedulable=True
 node2 node=True storage=True openshift_node_labels="{'region': 'infra'}"
 openshift_schedulable=True
 node3 node=True storage=True openshift_node_labels="{'region': 'infra'}"
 openshift_schedulable=True
 node4 node=True storage=True openshift_schedulable=True
 node5 node=True storage=True openshift_schedulable=True
 node6 node=True storage=True openshift_schedulable=True

Now we get to our new sections where we specify the nodes that will be used for storage. CNS and CRS require that each cluster have a minimum of three nodes. Multiple clusters can not share a given node. Because we are deploying two clusters, we need to specify six nodes total. It is also required that each node have at least one dedicated, bare storage device (no data or formatting of any kind) for exclusive use by Red Hat Gluster Storage.

Our first new section is [glusterfs_registry]. Here we specify the nodes of the Red Hat Gluster Storage cluster and the storage devices on those nodes that will be used for a hosted registry’s storage. It is not required that these nodes be the same as the ones running the hosted registry.

[glusterfs_registry]
 node1 glusterfs_devices="[ '/dev/xvdc' ]"
 node2 glusterfs_devices="[ '/dev/xvdc' ]"
 node3 glusterfs_devices="[ '/dev/xvdc' ]"

Our second new section, [glusterfs], is used for specifying the Red Hat Gluster Storage cluster and storage devices that will be used for general application storage. These storage devices must also be for exclusive use by Red Hat Gluster Storage. As mentioned, these nodes may not also be part of the cluster used by [glusterfs_registry]. In the case of CNS, it is not required that these nodes be dedicated exclusively to serving storage; CNS pods can coexist with other application pods.

[glusterfs]
 node4 glusterfs_devices="[ '/dev/xvdc' ]"
 node5 glusterfs_devices="[ '/dev/xvdc' ]"
 node6 glusterfs_devices="[ '/dev/xvdc' ]"

Once the installer is complete, the user can see the pre-defined Storage Class by executing:

# oc get storageclasses
 NAME TYPE
 glusterfs-storage kubernetes.io/glusterfs

This Storage Class can be used for applications by specifying a Persistent Volume Claim to dynamically provision the required storage volume:

apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
 name: mypvc
 namespace: glusterfs
 spec:
 accessModes:
 - ReadWriteOnce
 resources:
 requests:
 storage: 100Gi
 storageClassName: glusterfs-storage

And that’s it, your PaaS solution with built-in storage is ready to go! If you want to tune the installation further, more options are available in the Advanced Installation, and a demo video is available here.

Red Hat Ceph Storage: Object storage performance and sizing guide

Red Hat Ceph Storage is a proven, petabyte-scale, object storage solution designed to meet the scalability, cost, performance, and reliability challenges of large-scale, media-serving, savvy organizations. Designed for web-scale object storage and cloud infrastructures, Red Hat Ceph Storage delivers the scalable performance necessary for rich media and content-distribution workloads.

While most of us are familiar with deploying block or file storage, object storage expertise is less common. Object storage is an effective way to provision flexible and massively scalable data storage without the arbitrary limitations of traditional proprietary or scale-up storage solutions. Before building object storage infrastructure at scale, organizations need to understand how to best configure and deploy software, hardware, and network components to serve a range of diverse workloads. They also need to understand the performance and scalability they can expect from given hardware, software, and network configurations.

This reference architecture/performance and sizing guide describes Red Hat Ceph Storage coupled with QCT (Quanta Cloud Technology) storage servers and networking as object storage infrastructure. Testing, tuning, and performance are described for both large-object and small-object workloads. This guide also presents the results of the tests conducted to evaluate the ability of configurations to scale to host hundreds of millions of objects.

After hundreds of hours of [Test ⇒ Tune ⇒ Repeat] exercises, this reference architecture provides empirical answers to a range of performance questions surrounding Ceph object storage, such as (but not limited to):

  • What are the architectural considerations before designing object storage?
  • What networking is most performant for Ceph object storage?
  • What does performance look like with dedicated vs. co-located Ceph RGWs?
  • How many Ceph RGW nodes do I need?
  • How do I tune object storage performance?
  • What are the recommendations for small/large object workloads?
  • What should I do? I’ve got millions of objects to store.

And the list of questions goes on. You can unlock the performance secrets of Ceph object storage for your organization with the help of the Red Hat Ceph Storage/QCT performance and sizing guide.

Storage for RHV and OCP: Two Glusters on one platform

Architecture is an interesting discipline. There are whitepapers and best practices and reference architectures to offer pristine views of what your perfect deployment should look like. And then there are budgets and timelines and business requirements to derail all of that. It’s what makes this job so interesting and challenging—hacking together the best pieces of disparate and often seemingly unrelated systems to meet goals driven by six leaders whose bonuses are met by completely different metrics.

A recent project has involved combining OpenShift Container Platform (OCP), Red Hat Virtualization (RHV), and Red Hat Gluster Storage (Gluster) into a unified system with common lifecycle operations, minimized management points, and the lowest overall footprint in terms of both capital cost and TCO. The primary storage challenge here is in creating a Gluster environment to support both RHV and its VMs as well as OCP container persistent volume requirements.

Our architectural goals include:

  • Purchase a single flexible hardware platform to serve all the storage needs
  • Segregate Gluster for RHV and Gluster for OCP into separate pools for resource allocation and to avoid possible administration snafus (such as we experienced in early testing)
  • Maintain a single-point and single-method of management—one Heketi server to rule them all
  • Containerize as much as possible to keep lifecycle maintenance atomic

Our early version of the architecture had Gluster running as container-native storage (CNS) for OCP on top of RHV while also serving storage to RHV, but this proved to introduce a chicken-and-egg problem where a single failure (such as an etcd crash) could cause a cascading outage. So our redesign involved splitting Gluster off from OCP as a stand-alone system while still being a unified storage provider and leveraging container atomicity.

The approach we wanted involved containerized Gluster running on bare-metal container hosts. Fundamentally, this is actually pretty straightforward today with pre-build Gluster containers available from the Red Hat registry. What complicated this was our desire to run two separate containerized Gluster pools on the same hardware nodes.

Disclaimer

There’s a pretty good chance that this architecture is not explicitly supported by Red Hat. While all the components we use here are definitely supported, this particular combination is untested by our engineering, QE, and performance teams. Don’t consider anything here a recommendation for how you should run your environment, only an academic study of a possible approach to solving an interesting challenge. If you have any questions, please reach out to your Red Hat sales and support teams.

The platform

We initially wanted to build this on top of Red Hat Enterprise Linux Atomic Host, but our lab environment wasn’t setup to provision this build on our systems, so we had to go forward with RHEL plus the docker packages. For a production build, we would return to using Atomic.

Networking

Gluster containers are usually configured with host networking because they need to communicate freely with each other and need to serve storage out to other systems and containers. However, with host networking, the Gluster ports are bound to all interfaces, so it is not possible to run two Gluster containers in this mode due to port conflicts. To solve this, the networks for each Gluster pool had to be segregated.

First, a VLAN sub-interface was created on each Gluster node for the storage network interface and using VLAN ID 199. There are ifcfg files to make these persistent. So each node includes a 192.168.99.0/24 IP on the primary interface and a 192.168.199.0/24 IP on a VLAN sub-interface. The Switch ports for the storage network interfaces have been configured for the tagged VLAN ID 199. The 802.1q kernel module (for VLANs) was set to load at boot time on each node with a /etc/modules-load.d/8021q.conf file.

Containerized Gluster

Networks

Each Gluster container needs to exist on its own interface and subnet. So leveraging the system-level network stuff done above, the two interfaces were each attached to a docker macvlan network on each node.

docker network create -d macvlan --subnet=192.168.99.0/24 \

-o parent=eth1 gluster-rhv-net
docker network create -d macvlan --subnet=192.168.199.0/24 \

-o parent=eth1.199 gluster-ocp-net

Containers

The containers were pulled down from the Red Hat registry.

docker pull registry.access.redhat.com/rhgs3/rhgs-server-rhel7
docker pull registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7

The Gluster containers need to be privileged in order to access the /dev/sdX block devices. They also need a number of local persistent volume stores in order to ensure they start up properly each time.

The container fstab file needs a persistent mount. So first we should touch these files, otherwise the gluster-startup command in the container will fail.

touch /var/lib/heketi-{rhv,ocp}/fstab

Then we can run the containers.

docker run -d --privileged=true --net=gluster-rhv-net \

--ip=192.168.99.28  --name=gluster-rhv-1 -v /run \

-v /home/gluster-rhv-1-root:/root:z \

-v /etc/glusterfs-rhv:/etc/glusterfs:z \

-v /var/lib/glusterd-rhv:/var/lib/glusterd:z \

-v /var/log/glusterfs-rhv:/var/log/glusterfs:z \

-v /var/lib/heketi-rhv:/var/lib/heketi:z \

-v /sys/fs/cgroup:/sys/fs/cgroup:ro \

-v /dev:/dev rhgs3/rhgs-server-rhel7
docker run -d --privileged=true --net=gluster-ocp-net \

--ip=192.168.199.28 --name=gluster-ocp-1 -v /run \

-v /home/gluster-ocp-1-root:/root:z \

-v /etc/glusterfs-ocp:/etc/glusterfs:z \

-v /var/lib/glusterd-ocp:/var/lib/glusterd:z \

-v /var/log/glusterfs-ocp:/var/log/glusterfs:z \

-v /var/lib/heketi-ocp:/var/lib/heketi:z \

-v /sys/fs/cgroup:/sys/fs/cgroup:ro \

-v /dev:/dev rhgs3/rhgs-server-rhel7

Block device assignments

Running the containers in privileged mode allows them to access all system block devices. For our particular architectural needs, we intend to use from each node only one SSD for the gluster-rhv pool and the remaining five SSDs for the gluster-ocp pool.

 Gluster Pool  Block Devices
 gluster-rhv  sdb
 gluster-ocp  sdc, sdd, sde, sdf, sdg

Heketi

Config

The persistent Heketi config is being stored in the /etc/heketi directory on one of the nodes (we’ll call it node1). First, an ssh keypair is created and placed there.

ssh-keygen -f /etc/heketi/heketi_key -t rsa -N ''

Next, the heketi.json file is created. Right now, no auth is being used — obviously don’t do this in production. Note the ssh port is 2222, which is what the Gluster containers are configured to listen on.

{
  "_port_comment": "Heketi Server Port Number",
  "port": "8080",

  "_use_auth": "Enable JWT authorization. Please enable for deployment",
  "use_auth": false,

  "_jwt": "Private keys for access",
  "jwt": {
    "_admin": "Admin has access to all APIs",
    "admin": {
      "key": "My Secret"
    },
    "_user": "User only has access to /volumes endpoint",
    "user": {
      "key": "My Secret"
    }
  },

  "_glusterfs_comment": "GlusterFS Configuration",
  "glusterfs": {
    "_executor_comment": [
      "Execute plugin. Possible choices: mock, ssh",
      "mock: This setting is used for testing and development.",
      "      It will not send commands to any node.",
      "ssh:  This setting will notify Heketi to ssh to the nodes.",
      "      It will need the values in sshexec to be configured.",
      "kubernetes: Communicate with GlusterFS containers over",
      "            Kubernetes exec api."
    ],
    "executor": "ssh",

    "_sshexec_comment": "SSH username and private key file information",
    "sshexec": {
      "keyfile": "/etc/heketi/heketi_key",
      "user": "root",
      "port": "2222"
    },

    "_db_comment": "Database file name",
    "db": "/var/lib/heketi/heketi.db",

    "_loglevel_comment": [
      "Set log level. Choices are:",
      "  none, critical, error, warning, info, debug",
      "Default is warning"
    ],
    "loglevel" : "debug"
  }
}

SSH access

The Heketi server needs passwordless SSH access to all Gluster containers on port 2222. The public key generated above needs to be added to the authorized_keys for all of the Gluster containers. Note that we have a local persistent volume (PV) for each Gluster container’s /root directory, so this authorized_key entry was simply added to each one of those.

cat /etc/heketi/heketi_key.pub >> \

/home/gluster-rhv-1-root/.ssh/authorized_keys

NOTE: This needs to be done for each of the root home directories for each Gluster container

Container

The single Heketi container will run on node1. It needs access to both of the subnets, so the best thing to do is run the container in host networking mode. It also needs a few persistent volumes.

docker run -d --net=host --name=gluster-heketi \

-v /etc/heketi:/etc/heketi:z -v /var/lib/heketi:/var/lib/heketi:z \

rhgs3/rhgs-volmanager-rhel7

Network

Since we are running heketi-cli on the same node that we are running the Heketi container, there is a security issue we have to work through. By default, the container host cannot directly access the local container via the IP assigned to its macvlan network interface. So on the container host node1 we need to create local macvlan interfaces for each of the subnets. Use this at runtime and the /etc/rc.d/rc.local file:

/usr/sbin/ip link add macvlan0 link eth1 type macvlan mode bridge
/usr/sbin/ip addr add 192.168.99.228/24 dev macvlan0
/usr/sbin/ifconfig macvlan0 up

/usr/sbin/ip link add macvlan1 link eth1.199 type macvlan mode bridge
/usr/sbin/ip addr add 192.168.199.228/24 dev macvlan1
/usr/sbin/ifconfig macvlan1 up

The rc.local file in RHEL is for legacy support, so it has to be made executable and its systemd service has to be enabled.

chmod 755 /etc/rc.d/rc.local
systemctl enable rc-local.service

Heketi CLI

The heketi-cli needs to run $somewhere. For simplicity, the RPM is installed on node1. With the container running with networking in host mode, heketi is listening on localhost port 8080. Export the environment variable in order to be able to run heketi-cli commands.

export HEKETI_CLI_SERVER=http://localhost:8080

Setting up the Heketi clusters

A JSON file is populated at /root/heketi-rhv-plus-ocp-topology.json on node1. This file defines two separate Heketi clusters with their respective Gluster nodes (containers) and block devices.

{
    "clusters": [
        {
            "nodes": [
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "192.168.99.28"
                            ],
                            "storage": [
                                "192.168.99.28"
                            ]
                        },
                        "zone": 1
                    },
                    "devices": [
                        "/dev/sdb"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "192.168.99.29"
                            ],
                            "storage": [
                                "192.168.99.29"
                            ]
                        },
                        "zone": 2
                    },
                    "devices": [
                        "/dev/sdb"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "192.168.99.30"
                            ],
                            "storage": [
                                "192.168.99.30"
                            ]
                        },
                        "zone": 3
                    },
                    "devices": [
                        "/dev/sdb"
                    ]
                }
            ]
        },

        {
            "nodes": [
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "192.168.199.28"
                            ],
                            "storage": [
                                "192.168.199.28"
                            ]
                        },
                        "zone": 1
                    },
                    "devices": [
                        "/dev/sdc",
                        "/dev/sdd",
                        "/dev/sde",
                        "/dev/sdf",
                        "/dev/sdg"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "192.168.199.29"
                            ],
                            "storage": [
                                "192.168.199.29"
                            ]
                        },
                        "zone": 2
                    },
                    "devices": [
                        "/dev/sdc",
                        "/dev/sdd",
                        "/dev/sde",
                        "/dev/sdf",
                        "/dev/sdg"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "192.168.199.30"
                            ],
                            "storage": [
                                "192.168.199.30"
                            ]
                        },
                        "zone": 3
                    },
                    "devices": [
                        "/dev/sdc",
                        "/dev/sdd",
                        "/dev/sde",
                        "/dev/sdf",
                        "/dev/sdg"
                    ]
                }
            ]
        }
    ]
}

This file is passed (once) to Heketi to setup the two clusters.

heketi-cli topology load --json=heketi-rhv-plus-ocp-topology.json

It’s important to note the two different clusters. It’s not (AFAIK) possible to “name” the clusters, so we have to reference them by their UUIDs. The Gluster volumes for RHV will be created on one cluster, and those orchestrated for OCP PVs will be created on a different cluster.

RHV Gluster volumes

For the purposes of RHV, two volumes were requested—one for the Hosted Engine and one for the VM storage. These were created via heketi-cli. Note the cluster ID passed to the commands.

heketi-cli volume create --size 100 --name rhv-hosted-engine \

--clusters ae2a309d02781816adfed567693221a9
heketi-cli volume create --size 1024 --name rhv-virtual-machines \

--clusters ae2a309d02781816adfed567693221a9

These can be mounted to the RHV nodes via the 192.168.99.0/24 subnet using the Gluster native client. Example fstab entries:

192.168.99.28:rhv-hosted-engine      /100g   glusterfs       backupvolfile-server=192.168.99.29:192.168.99.30 0 0
192.168.99.28:rhv-virtual-machines      /1t   glusterfs       backupvolfile-server=192.168.99.29:192.168.99.30 0 0

OCP PV Gluster volumes

Our OCP pods are attached to the 192.168.199.0/24 subnet to communicate with the storage. First on node1 the Heketi API port (8080) needs to be opened in the firewall.

firewall-cmd --add-port 8080/tcp
firewall-cmd --add-port 8080/tcp --permanent

Then the storage class for OCP is defined with the below YAML. Note that we aren’t currently doing any authentication (but obviously we should). You see here that we explicitly define the Heketi cluster ID for this class in order to ensure that all volumes for PVCs are created only on the Gluster pool we have identified for OCP use.

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
 name: gluster-dyn
provisioner: kubernetes.io/glusterfs
parameters:
 resturl: "http://192.168.199.128:8080"
 restauthenabled: "false"
 clusterid: "74edade536c80f14486edfbabd204151"

Then the storage class is added to OCP on the master.

oc create -f glusterfs-storageclass.yaml

From this point, PVCs (persistent volume claims) made against this storage class will interface with Heketi to dynamically provision Gluster volumes to match the claim.

Miscellaneous

Auto-start containers

Docker container systemd init scripts are tricky. I’ve found that every example on the internet is either wrong, outdated, or uses an approach I don’t like.

Below is an example systemd service file for the Heketi container, which is simple and works the way we expect it to with the docker run command in the ExecStart (/etc/systemd/system/docker-container-gluster-heketi.service). NOTE: Do not daemonize (-d) the docker run command in the init script. Also, the SuccessExitStatus is important here.

[Unit]
Description=Gluster Heketi Container
Requires=docker.service
After=docker.service

[Service]
TimeoutStartSec=60
Restart=on-abnormal
SuccessExitStatus=0 137
ExecStartPre=-/usr/bin/docker stop gluster-heketi
ExecStartPre=-/usr/bin/docker rm gluster-heketi
ExecStart=/usr/bin/docker run --net=host --name=gluster-heketi -v /etc/heketi:/etc/heketi:z -v /var/lib/heketi:/var/lib/heketi:z rhgs3/rhgs-volmanager-rhel7
ExecStop=/usr/bin/docker stop gluster-heketi

[Install]
WantedBy=multi-user.target

Reload the systemd daemon:

systemctl daemon-reload

Enable and start the service

systemctl enable docker-container-gluster-heketi

systemctl start docker-container-gluster-heketi

Known issues and TODOs

  • Security needs to be taken into account. We’ll set up appropriate key-based authentication and JWT for Heketi. We’d also like to use role-based auth. Hopefully we’ll cover this in a future blog post.
  • Likely $other_things I haven’t realized yet, or better ways of approaching this. I’d love to hear your comments.

Introducing Red Hat Hyperconverged Infrastructure 1.0

By Steve Bohac, Red Hat Storage

Today we’re proud to announce Red Hat Hyperconverged Infrastructure 1.0! By combining Red Hat virtualization and storage technologies with a stable, proven operating platform, Red Hat Hyperconverged Infrastructure is designed to help enterprises bring datacenter capabilities into locations with limited space, such as branch offices and other remote facilities.

Built on Red Hat Virtualization and Red Hat Gluster Storage, Red Hat Hyperconverged Infrastructure provides simplified planning and procurement, streamlined deployment and management, and a single support stack for virtual compute and virtual storage resources. Red Hat Hyperconverged Infrastructure is an ideal solution for remote/branch office or edge computing needs. Deployment is enabled by Ansible by Red Hat, and Red Hat CloudForms can be used to manage all the Red Hat Hyperconverged Infrastructure installations in your enterprise via a single application.

Customers have been asking us for this type of an integrated solution, so we’re happy to offer this hyperconverged combination in a single SKU to satisfy that request.

Organizations with distributed operations, such as those in the banking, energy, or retail industries, can benefit from offering the same infrastructure services in remote and branch offices as they run in their datacenters. However, remote and branch offices can have unique challenges: less space and power/cooling and fewer (or no) technical staff on site. Organizations in this situation need powerful services, integrated on a single server that allow them to keep their key applications local to the remote site.

Red Hat Hyperconverged Infrastructure addresses these challenges for remote installations. The following figure depicts the benefits that consolidation with a hyperconverged infrastructure provides:

  • Eliminate storage as a discrete tier
  • Easily virtualize business applications, maximizing resource utilization
  • Single budget for compute and storage
  • Single team managing infrastructure
  • Simplified planning and procurement
  • Streamlined deployment and management
  • Single support stack for compute and storage

Removing the storage tier by consolidating compute and storage onto a single server platform/tier offers streamlined deployment and management (enabled by Ansible by Red Hat and Red Hat CloudForms), a single support stack (one vendor to call now instead of two), and simplified planning and procurement (reducing the number of vendors to source from).

For more information on Red Hat Hyperconverged Infrastructure, click here.

For an on-demand webinar discussing Red Hat Hyperconverged Infrastructure in more detail, click here.

Storage can make your digital transformation—or break it

By Ross Turk, Red Hat Storage

The following chart might look familiar, especially if you’ve ever studied patterns of online behavior.

Like all the best charts, this one has a glorious up-and-to-the-right shape. But each year at the end of December, when much of the world goes offline for a few quiet days, there’s a characteristic drop. This chart—from Google Trends—represents a phrase that’s growing in prominence: “digital transformation.”

Digital transformation is everywhere

When I noticed—with distinct déjà vu—the industry using this phrase, I admit I was somewhat taken aback. Many of us live in a world dominated by technology. I can’t remember the last time I paid for fast-food tacos with actual money, but I do know I stopped carrying cash completely when the taco shops began accepting credit cards. Every time I need to mail a letter now, it’s a huge production! I’m just not prepared for that kind of task anymore.

Imagine a world without digital technology…. See?! You can’t do it.

Not all digital transformation is equal

Another case in point. I renewed my driver’s license recently and found myself wondering: Now that the DMV is doing business using modern technology, who’s left to transform? If you live your life in an ivory tower made of wifi and capacitive touch screens—like me—a phrase like “digital transformation” can seem obsolete. It can throw you off the scent a bit. And, indeed, I was missing the point. Sure, even taxi companies embrace digital technologies these days…but are they any good at it? Do they enjoy the same efficiencies as a digitally native service like Uber?

Technology is now serious business

Digital transformation isn’t about using technology—or even offering digital services. It’s about redefining a business in technology terms, putting the modern technology experience first. It’s about businesses coming to terms with the truth: Technology can’t be a hobby for them anymore. They’re going to have a ton of applications and data, and they need to get really good at managing all of it. That means having solid priorities; agility and elasticity are a great place to start.

Modern storage can transform your business

Speaking of great places to start, there’s no better example of the challenges of digital transformation than storage. The amount of data that enterprises need to maintain is growing at a steady clip, and their customers expect all that data to be available instantly. Access patterns change as frequently as customer behaviors. Data is getting bigger, analytics are getting even more sophisticated. The traditional storage appliances that do a lot of the heavy lifting today are convenient, but at petabyte scale they show their inflexibility and limitation.

That’s where Ceph and Gluster come in. They’re flexible, scale-out, software-defined storage technologies built for those who don’t think storage is a hobby.

Learn how storage can make your digital transformation

If you want to learn more about modern approaches to storage—and Red Hat Storage, of course!—join me on June 22 for a 45-minute webinar. Register here.

Going public with Red Hat Ceph Storage 2.3

By Daniel Gilfix, Red Hat Storage

You may not have heard a lot about Red Hat Ceph Storage lately, but that doesn’t mean the product hasn’t been active. News in conjunction with Red Hat OpenStack Platform 11 and 10, technology partners such as Rackspace and Micron, and customer adoption at places like Massachusetts Open Cloud, UKCloud, and CLIMB have reinforced the product’s role as a cornerstone of the Red Hat portfolio. But the advances of the product, itself, have been relatively under wraps, with versions 2.2 and 2.1 carefully monitored by existing fans and loyal software-defined storage blog readers. We don’t expect the announcement of Red Hat Ceph Storage 2.3 to shake mountains with seismic impact, but we do expect it to inspire our user community with the doors the product opens today and what might be possible long term.

Source: Sheri Terris, from June 1, 2007, https://www.flickr.com/photos/crestedcrazy/534647428

Greater versatility

Red Hat Ceph Storage 2.3 takes aim at extending the versatility of object storage so that users can connect more successfully to traditional workloads and link them effectively to modern ones. One way is through our new Network File System (NFS) gateway into the product’s S3-compatible object interface. The gateway facilitates the adoption of Ceph Storage as a target for file-based data without requiring changes in data access protocols or the management of data caching semantics. It means Red Hat Ceph Storage users can access the same data set from both object and file interfaces and gradually transition between them based on business need. It also means they can extend the multi-site capabilities of Red Hat Ceph Storage to enable global clusters and data access with the NFS protocol.

Improved connectivity

Red Hat Ceph Storage 2.3 is laying the groundwork for improved connectivity with analytics engines. By conducting validation tests for compatibility with the Hadoop S3A plugin, Red Hat is extending financial and operational benefits of a scale-out, software-defined object storage platform to analytics workloads and data-driven applications leveraging tools like Hadoop, Hive, and Spark. With analytics data stored in an S3-compatible object store, developers have access to a broader ecosystem of tools and language bindings, no longer forced to use a bulky HDFS client. By concentrating data sets in a common object store, data duplication and data lineage challenges are simplified. Multiple ephemeral instances of elastic analytics clusters can reference a single source of truth.

Deployment flexibility

A final capability targets the highly coveted customer requirement of running Ceph in a containerized format. Red Hat Ceph Storage 2.3 includes a single container image that delivers the same capabilities as in the traditional package format. With a Red Hat Ansible-based deployment tool, users can perform installations, upgrades, and updates with atomicity for reduced complexity, easier management, and faster deployment. This supports customers in areas like telecommunications seeking standardized orchestration and deployment of infrastructure software in containers with Kubernetes by adding a cloud-native storage service to this orchestration methodology.

A new chapter

These new capabilities demonstrate the new heights Red Hat Ceph Storage aims to scale, supported by a company firmly committed to real-world deployment of the future of storage. By combining massive scalability with multi-protocol access to highly available clusters, Red Hat Ceph Storage is moving object storage up the mountain to help unleash its power for modern workloads. For more information, check out this short video:

Red Hat Ceph Storage 2.3 is expected to be available later this month.

Storage for the modern enterprise

In early May, Red Hat put on its annual customer and community conference in Boston—Red Hat Summit—centered around a common theme: the power of the individual. Now more than ever, open source is seen as the most viable option as we enter the next phase of IT delivery, planning, and deployment.

The power of the individual on display

Red Hat has assumed the role of de facto open source leader, driving and nurturing hundreds of communities across the world. One could argue that, at the core, Red Hat isn’t a software company at all. In fact, our best asset is our ability to curate open source communities, bringing to bear the efforts of thousands of contributors, committers, and testers to enterprises in a reliable, secure package that can solve some of the most demanding IT challenges.

The end of planning as we know it

Red Hat CEO, Jim Whitehurst, made a pertinent point in his keynote about the changing face of IT planning. “Planning harder” in an environment full of unknowns is complex and fraught with error. CIOs struggle to balance predictability with the inherent flexibility needed to maintain smooth IT operations.

Building IT infrastructure with open source and industry-standard constructs helps alleviate some of the planning challenges while addressing today’s pain points, much like interchangeable Lego® pieces can be used to build everything from two-story houses to skyscrapers, still withstanding the shock of an earthquake when needed.

Storage for the modern enterprise

With each passing year, it becomes apparent that traditional storage just isn’t going to cut it as enterprises look to create flexible, scalable, and cost-effective IT platforms for cloud-native applications. Simply put, legacy storage systems have failed to keep up with the way customers want to consume storage.

A key piece of the value proposition of software-defined storage is the hardware choice available to customers. For the second year in a row, a Storage Ecosystem Showcase in the partner pavilion of Red Hat Summit featured seven Technology Partners that complete or enhance Red Hat’s software-defined storage offering. Cisco, NGD Systems, Permabit, QCT, Seagate, Storage Made Easy, and Supermicro all demonstrated their wares.

In addition, several other storage partners, such as Dell EMC, Mellanox, and Penguin Computing, chose to sponsor their own booths. The solid upstream ecosystem combines with a growing downstream array of partners to truly differentiate Red Hat Storage.

Learn more

Storage was featured prominently both on the expo floor and in the news from the event. In addition to breakout sessions, Red Hat Storage engineers and consultants held a number of hands-on labs that were very well received. You can access similar self-paced material on the online AWS test drives (Gluster test drive and Ceph test drive) or at an upcoming webinar.

For a review of all the 2017 Red Hat Summit videos, click here. For a video recap with some of the Red Hat Storage team, watch this: