Part 2: Connecting an OCP application to a MySQL instance

By Marko Karg and Annette Clewett

At the end of the first post in this blog series, we deployed a MySQL instance using StatefulSets (STS). Today, we want to use that same deployment method and connect a WordPress instance to the MySQL database and see what happens if the database fails. Remember, we’re using a MySQL pod created from a StatefulSet.

OpenShift on AWS test environment

All the posts in this series use an OCP-on-AWS setup that includes 8 EC2 instances deployed as 1 master node, 1 infra node, and 6 worker nodes that also run OCS gluster and heketi pods. The 6 worker nodes are basically the storage provider (OpenShift Container Storage [OCS]) and persistent storage consumers (MySQL). As the following figure shows, the ocs worker nodes are of instance type m5.2xlarge with 8 vCPUs, 32 GB Mem, and 3x100GB gp2 volumes attached to each node for OCP and 1 1TB gp2 volume for OCS storage cluster. The AWS region us-west-2 has availability zones (AZs) us-west-2a, us-west-2b, us-west-2c, and the 6 worker nodes are spread across the 3 AZs, two nodes in each AZ. This mean the OCS storage cluster is “stretched” across these 3 AZs.

MySQL setup

We’ve created a headless MySQL service already, using an OCS-based persistent volume claim (PVC):

oc get services
NAME        TYPE CLUSTER-IP    EXTERNAL-IP   PORT(S)     AGE
mysql       ClusterIP             <none>     3306/TCP    16h

The STS we need is also created as described in the first post of this series. Once the STS has been created and the container has been started, we have a running MySQL instance running:

oc get pods
NAME                READY   STATUS    RESTARTS    AGE
mysql-ocs-0         1/1     Running   0           21h

With a service and a database up, we can move forward to get our application deployed.

Although there are templates available that would set up WordPress with a preconfigured database in one shot, we want to take the long way and start from scratch to illustrates the required steps.

WordPress setup

Let’s create a new php application to run WordPress:

# oc new-app php~https://github.com/wordpress/wordpress

After a few seconds, we have the required pods in our project:

# oc get pods
NAME                READY     STATUS      RESTARTS   AGE
mysql-ocs-0         1/1       Running     0          21h
wordpress-1-build   0/1       Completed   0          22h
wordpress-1-q5jts   1/1       Running     0          22h

To make our WordPress instance available to the world, we need to expose it:

# oc expose service wordpress

Now two services are available, one for MySQL and one for WordPress:

# oc get service
NAME                                                     TYPE       
CLUSTER-IP      EXTERNAL-IP  PORT(S)             AGE
glusterfs-dynamic-31a07eb1-3a72-11e9-96fc-02e7350e98d2   ClusterIP  
172.30.165.245  <none>       1/TCP               1m
mysql-ocs                                                ClusterIP  
172.30.210.183  <none>       3306/TCP            1m
wordpress                                                ClusterIP  
172.30.1.139    <none>       8080/TCP,8443/TCP   28s

Let’s connect to the web interface of WordPress now. To do so, we take the HOST / PORT portion from the following command:

oc get route wordpress
NAME        HOST/PORT                                         PATH 
SERVICES    PORT       TERMINATION     WILDCARD
wordpress   wordpress-marko.apps.ocpocs311.ocpgluster.com             
wordpress   8080-tcp None              None

In our case, that’s wordpress-marko.apps.ocpocs311.ocpgluster.com. Take that string and put it into a browser to go to the WordPress interface. The WordPress web interface will guide us through setting up the database connection now.

We have most of the above values pre-defined in our MySQL STS, so the database name is “wordpress”, the username is “admin”, and the password is “secret”.

cat mysql-sts.yaml
….omitted….
   spec:
      terminationGracePeriodSecods: 10
      containers:
      - name: mysql-ocs
        image: mysql:5.7
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: password
        - name: MYSQL_DATABASE
          value: wordpress
        - name: MYSQL_USER
          value: admin
        - name: MYSQL_PASSWORD
          value: secret
….omitted….

Our database host can be found by running this command:

oc get services
NAME        TYPE CLUSTER-IP       EXTERNAL-IP     PORT(S)
AGE
NAME                                                     TYPE
CLUSTER-IP     EXTERNAL-IP      PORT(S)             AGE
glusterfs-dynamic-31a07eb1-3a72-11e9-96fc-02e7350e98d2   ClusterIP
172.30.165.245 <none>           1/TCP               1m
mysql-ocs                                                ClusterIP 
               <none>           3306/TCP            1m
wordpress                                                ClusterIP
172.30.1.139   <none>           8080/TCP,8443/TCP   28s 

So we will use mysql-ocs for the database host.

If information is entered is correct, WordPress will guide us through the rest of the installation process:

We now need to enter some information for the web front end:

The installation takes some time and finally presents this screen:

So now our deployment is done. Next, we’ll log into WordPress and create some test content:

Failure scenario

So now that we have everything in place, we want to see what happens when our MySQL container fails. To simulate that, we’ve set up a client that checks the website using the “curl” command.

Because we’re only interested in the HTTP response over a longer time, we run it in a loop, trimming the output to what we’re interested in:

while true; do date; curl -I 
http://wordpress-marko.apps.ocpocs311.ocpgluster.com/2019/02/26/lorem-ipsum/ 
2>&1 | grep HTTP; sleep 1; done

As a first test we simply kill the MySQL container, watching the preceding loop closely:

# oc get pods
NAME                READY STATUS     RESTARTS  AGE
mysql-ocs-0         1/1   Running    0         21h
wordpress-1-build   0/1   Completed  0         22h
wordpress-1-q5jts   1/1   Running    0         22h

Delete the MySQL pod:

oc delete pod mysql-ocs-0

Here’s the output from the preceding curl loop:

Mi 27. Feb 09:40:44 UTC 2019
HTTP/1.1 200 OK
Mi 27. Feb 09:40:45 UTC 2019
HTTP/1.1 500 Internal Server Error

Mi 27. Feb 09:40:59 UTC 2019
HTTP/1.1 500 Internal Server Error
Mi 27. Feb 09:41:03 UTC 2019
HTTP/1.1 200 OK

So the pod failure effectively caused our WordPress instance to be unavailable for 18 seconds (09:40:45 to 09:41;)3), give or take a few seconds for the curl command. We’ve run a larger number of the same test and ended up with an average value of 12 seconds. This is the time that the WordPress application is unavailable due to the MySQL pod being deleted. Once the pod is re-created and the OCS storage is mounted in the pod, then WordPress is available again.

This test only deletes the pod that is running the database. What we cannot be sure of so far is that the MySQL pod actually moves from one node to another. To have that happen, we have to cordon the node on which the pod currently runs and then delete the pod. Cordoning the node means that it will take no new containers and, as a consequence, the new incarnation of our database pod will have to be started on a different node.

As a first step, we need to find the node on which mysql-ocs is currently running:

# ocs get pod mysql-ocs-0 -o wide

ocs get pod mysql-ocs-0 -o wide

NAME          READY   STATUS    RESTARTS      AGE      IP            NODE           
NOMINATED NODE
mysql-ocs-0   1/1     Running   0             5m       10.129.2.44   
ip-172-16-27-161.us-west-2.compute.internal   <none>

Now we cordon the node ip-172-16-27-161.us-west-2.compute.internal and then delete the mysql-ocs-0 pod:

# oc adm cordon ip-172-16-27-161.us-west-2.compute.internal
node/ip-172-16-27-161.us-west-2.compute.internal cordoned
# oc delete pod mysql-ocs-0
pod "mysql-ocs-0" deleted

Again, we’ve run a series of the same test and ended up with an average value of 12 seconds. Therefore, it does not matter if the pod must be relocated to another node or is re-created on the same node again.

Conclusion

The goal behind this post was to show how an application like WordPress in OCP can be connected to a database, as well as how fast a pod can fail-over to another node when it is using Red Hat OCS as a storage platform. An average time of 12 seconds is what we can reproduce persistently for one MySQL pod. The exact time is, of course, specific to every setup, as it depends on a lot of different factors, but the reproducibility and the deterministic time is something that’s common to OpenShift deployments.

How to run a MySQL pod on OCP using OCS and StatefulSets

By Sagy Volkov and Annette Clewett

Greetings from Red Hat’s storage architect team! With this post, we’re kicking off a series in which we’ll demonstrate a step-by-step deployment of a stateful application on OpenShift Container Platform (OCP) using OpenShift Container Storage (OCS). This series, based on the 3.11 version of both OCP and OCS, will not cover how to install OCP or OCS.

We’ll start with creating one MySQL pod (using OCP StatefulSets and OCS), and then add the application that uses the MySQL database on persistent storage. As we progress in this series, we’ll show more advanced topics, such as OCP multi-tenant scenarios, MySQL performance on OCS, failover scenarios, and more.

OpenShift on AWS test environment

All the posts in this series use an OCP-on-AWS setup that includes 8 EC2 instances deployed as 1 master node, 1 infra node, and 6 worker nodes that also run OCS gluster and heketi pods. The 6 worker nodes are basically the storage provider (OCS) and persistent storage consumers (MySQL). As shown in the following, the OCS worker nodes are of instance type m5.2xlarge with 8 vCPUs, 32 GB Mem, and 3x100GB gp2 volumes attached to each node for OCP and a single 1TB gp2 volume for OCS storage cluster. The AWS region us-west-2 has Availability Zones (AZs) us-west-2a, us-west-2b, us-west-2c, and the 6 worker nodes are spread across the 3 AZs, two nodes in each AZ. This means the OCS storage cluster is “stretched” across these 3 AZs.

MySQL deployment with StatefulSets

This post revolves around deploying a MySQL pod using OCS and StatefulSets (STS), so let’s get started.

Stateful applications need persistent volume(s) (PVs) to support failover scenarios in which, when a pod (or pods) move(s) to a different worker node, the data it/they use(s) must be persistent after the pod(s) move(s).

STS were introduced in Kubernetes 1.9 and have a few advantages over “simple” deployments:

  1. Pod creation can be ordered when creating (and reversed ordered when scaling down). This is especially important in master/slave scenarios and/or distributed databases.
  2. Pods can have an easy naming convention and retain the name when migrating from one node to another after a failover.
  3. The persistent volume claims (PVCs) are not deleted when the STS is deleted to keep the data intact for future usage.

The first step in creating a PVC is making sure we have a storage class we can use to dynamically create the volume in OCP:

oc get sc
NAME                PROVISIONER      AGE
glusterfs-storage   kubernetes.io/glusterfs   9d
gp2 (default)       kubernetes.io/aws-ebs     21d
gp2-xfs             kubernetes.io/aws-ebs     18d

As you can see, we have 3 storage classes in our OCP cluster. For the MySQL deployment, we will be using the glusterfs-storage class, which is created with the installation of OCS when deploying OCP using Ansible playbooks and specific OCS inventory file options. This means that every time a claim is made for storage it will be the glusterfs-storage class that will provide it because it is configured into our STS definition file. If you want to see the content of any of the storageclass (SC) resources, run “oc get sc <storageclass_name> -o yaml”.

Because we are going to use STS, one of the requirements is to create a headless service for our MySQL application. We’re going to use the following yaml file:

cat headless-service-mysql.yaml
apiVersion: v1
kind: Service
metadata:
    name: mysql-ocs
    labels:
       app: mysql-ocs
spec:
    ports:
    - port: 3306
       name: mysql-ocs
    clusterIP: None
    selector:
       app: mysql-ocs

And then create the service.

oc create -f headless-service-mysql.yaml
service/mysql-ocs created
$oc get svc
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP    PORT(S)    AGE
mysql-ocs   ClusterIP   None         <none>         3306/TCP   6s

Now that we have a storageclass and a headless service, let’s look at our STS yaml. This is a simple example, and as we progress in this series, we’ll update and add to this file.

Note: It is neither secure nor recommended to have plain-text password sets in yaml files. Instead, use secrets. For our example, to make things simple, we’ll use plain text.

cat mysql-sts.yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
    name: mysql-ocs
spec:
  selector:
         matchLabels:
      app: mysql-ocs
  serviceName: "mysql-ocs"
  podManagementPolicy: Parallel
  replicas: 1
  template:
    metadata:
      labels:
        app: mysql-ocs
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mysql-ocs
        image: mysql:5.7
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: password
        - name: MYSQL_DATABASE
          value: wordpress
        - name: MYSQL_USER
          value: admin
        - name: MYSQL_PASSWORD
          value: secret
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-ocs-data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-ocs-data
    spec:
      storageClassName: glusterfs-storage
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 8Gi

Most of the container definitions are similar to that of a “DeploymentConfig” type. We’re using the headless service “mysql-ocs” that we previously created and specified MySQL 5.7 as the image to be used. The interesting part is at the bottom of the preceding file; the “volumeClaimTemplates” definition is how we create a persistent volume (PV), then claim it (PVC) and attach it to the newly created MySQL pod. As you can also see, we’re using the storage class we have from the OCP/OCS installation (glusterfs-storage), and we request a volume size of 8 GB to be created and use in a “ReadWriteOnce” mode.

To create our STS, we run the following command:

oc create -f mysql-sts.yaml
statefulset.apps/mysql-ocs created

Deployment validation

Let’s check that the pod is running. Please note that, depending on the hardware used, the MySQL container image download speed, size of volume requested, and availability of existing PVCs, this action can take between from a few seconds to around a minute.

oc get pods
NAME          READY     STATUS      RESTARTS     AGE
mysql-ocs-0   1/1       Running     0            31s

Let’s look at the PVC we created with this STS.

oc get pvc
NAME                         STATUS     VOLUME
CAPACITY ACCESS MODES     STORAGECLASS          AGE
mysql-ocs-data-mysql-ocs-0   Bound 
pvc-cb25b2c0-3a12-11e9-96fc-02e7350e98d2     8Gi       RWO 
glusterfs-storage 1m

And the PV that is associated with the PVC:

oc get pv
NAME                                       CAPACITY ACCESS MODES  RECLAIM 
POLICY   STATUS    CLAIM .                         STORAGECLASS 
REASON    AGE
pvc-cb25b2c0-3a12-11e9-96fc-02e7350e98d2   8Gi      RWO .         Delete 
Bound     sagy/mysql-ocs-data-mysql-ocs-0   glusterfs-storage          3m

If you want to see the connection/relationship between Kubernetes, gluster, heketi, and our persistent storage volume, we can run a few commands to show it. We know the PV name from our “oc get pvc” we ran previously, so we’ll use “oc describe” and search for Path.

oc describe pv pvc-d1fc687c-3a14-11e9-96fc-02e7350e98d2|grep Path
    Path:           vol_82f64c461e4796213160f30519f318f8

In our case, the volume name is vol_82f64c461e4796213160f30519f318f8, and this is the same volume name in gluster., If you log in to the container inside the MySQL pod, we can see the same volume and the directory it is mounted to.

oc rsh mysql-ocs-0
$ df -h|grep vol_82f64c461e4796213160f30519f318f8
172.16.26.120:vol_82f64c461e4796213160f30519f318f8  8.0G 325M 7.7G 4% 
/var/lib/mysql

We can see that the volume is mounted on /var/lib/mysql (what we specified in our STS yaml file) and size is 8.0G.

If we want to check heketi for more info, we must first make sure that heketi-client package is installed on the server you’re trying to run it from. The following file must be sourced to export the environment before using heketi-client commands.

cat heketi-export-app-storage
export HEKETI_POD=$(oc get pods -l glusterfs=heketi-storage-pod -n 
app-storage -o jsonpath='{.items[0].metadata.name}')
export HEKETI_CLI_SERVER=http://$(oc get route/heketi-storage -n app-storage 
-o jsonpath='{.spec.host}')
export HEKETI_CLI_USER=admin
export HEKETI_CLI_KEY=$(oc get pod/$HEKETI_POD -n app-storage -o 
jsonpath='{.spec.containers[0].env[?(@.name=="HEKETI_ADMIN_KEY")].value}')
export HEKETI_ADMIN_KEY_SECRET=$(echo -n ${HEKETI_CLI_KEY} | base64)

The  heketi volume name is the gluster volume name without the “vol_”, which can be found using the following command:

oc describe pv pvc-d1fc687c-3a14-11e9-96fc-02e7350e98d2|grep Path|awk '{print 
$2}'|awk -F 'vol_' '{print $2}'
82f64c461e4796213160f30519f318f8

And now, after we made sure heketi-cli is installed and sourced the environment variables, the heketi-cli command can be used to get more information about this gluster volume.

heketi-cli volume info 82f64c461e4796213160f30519f318f8
Name: vol_82f64c461e4796213160f30519f318f8
Size: 8
Volume Id: 82f64c461e4796213160f30519f318f8
Cluster Id: f05418936dc63638041af2831914c37d
Mount: 172.16.26.120:vol_82f64c461e4796213160f30519f318f8
Mount Options: 
backup-volfile-servers=172.16.53.212,172.16.39.190,172.16.56.45,172.16.27.161
,172.16.44.7
Block: false
Free Size: 0
Reserved Size: 0
Block Hosting Restriction: (none)
Block Volumes: []
Durability Type: replicate
Distributed+Replica: 3
Snapshot Factor: 1.00

Deleting StatefulSet and persistent storage

So far, we’ve seen how to create a MySQL pod using STS and OCS storage, but what happens when we want to delete a pod or the storage? First, let’s look at our PVC.

oc get pvc
NAME                         STATUS    VOLUME                        
CAPACITY ACCESS MODES     STORAGECLASS        AGE
mysql-ocs-data-mysql-ocs-0   Bound 
pvc-d1fc687c-3a14-11e9-96fc-02e7350e98d2   8Gi        RWO 
glusterfs-storage 20h

Now let’s delete our STS for MySQL.

$ oc delete -f mysql-sts.yaml
statefulset.apps "mysql-ocs" deleted

And let’s check the PVC again after MySQL STS is deleted.

oc get pvc
NAME                         STATUS    VOLUME                        
CAPACITY ACCESS MODES     STORAGECLASS        AGE
mysql-ocs-data-mysql-ocs-0   Bound 
pvc-d1fc687c-3a14-11e9-96fc-02e7350e98d2    8Gi       RWO 
glusterfs-storage 20h

As you can see. the PVC remains with the data intact and will be used again if we will redeploy the same STS.

If you want to delete the PVC, run the following command:

$ oc delete pvc mysql-ocs-data-mysql-ocs-0
persistentvolumeclaim "mysql-ocs-data-mysql-ocs-0" deleted

And you monitor the PV and watch how it gets deleted, as well (PV is first released).

oc get pv
NAME                                       CAPACITY ACCESS MODES RECLAIM 
POLICY STATUS     CLAIM                              STORAGECLASS   
REASON  AGE
pvc-d1fc687c-3a14-11e9-96fc-02e7350e98d2   8Gi      RWO          Delete 
Released   sagy/mysql-ocs-data-mysql-ocs-0    glusterfs-storage 
20h

And if we query again, the PV will be gone and deleted.

oc get pvc
No resources found.

Conclusion

In this post, we’ve shown the first step toward running on OCP an application that needs persistent data. We used the glusterfs-storage storageclass that is provided by OCS to create a PVC and attached the volume to a MySQL pod. We automated the process using an STS. We also explained the relationship between OCS, heketi, the PV, PVC, and the MySQL pod.

In our next post we’ll show how to connect a WordPress pod to our database pod.

Infrastructure monitoring as a service

A SAAS solution to monitor your Ceph storage infrastructure

By Ilan Rabinovitch (Datadog) and Federico Lucifredi (Red Hat)

Monitoring a distributed system

Red Hat Ceph Storage is a highly scalable, fault-tolerant platform for object, block, and file storage that delivers excellent data resiliency (we default to keeping three copies of a customer’s data at all times), with service availability capable of enduring the loss of a single drive, of a cluster node, or even of an entire rack of storage without users experiencing any interruption. Like its resiliency, Ceph’s ability to scale is another outcome of its distributed architecture.

Distributed systems’ architectures break with the common assumptions made by most traditional monitoring tools in defining the health of an individual device or service. In a somewhat obvious example, Nagios’ Red/Green host (or drive) health status tracking becomes inadequate, as the loss of a drive either generates unnecessary alerts or fails to highlight enough what is likely to be a more urgent condition, like the loss of a MON container. The system can withstand the loss of multiple drives or storage nodes without needing immediate action from an operator, as long as free storage capacity remains available on other nodes. More urgent events, like the loss of a MON bringing the cluster from HA+1 to HA status, would, however, be mixed in with all the other “red” false alarms and lost in all the noise.

Distributed systems need monitoring tools that are aware of their distributed nature to ensure that pagers go off only for alerts truly critical in nature. Most hardware failure reports naturally found in a large enough system are managed weekly or monthly as part of recurring maintenance activity. An under-marketed advantage of distributed systems is that the swapping of failed drives or the replacement of PSUs becomes a scheduled activity, not an emergency one.

Red Hat Ceph Storage’s built-in monitoring tools are designed to help you keep tabs on your clusters’ health, performance, and resource usage, while avoiding distributed-systems awareness shortcomings. Red Hat has a long history of customer choice, and for that reason we include with Red Hat Ceph Storage documentation information on how to use external monitoring tools. Nagios may be sub-optimal for the details, but it is still the closest thing we have to a standard in the open source community’s fragmented monitoring space.

Datadog is much more interesting.

Monitoring is a service

Unlike traditional monitoring solutions, Datadog’s software-as-a-service (SaaS) platform was built specifically for dynamic distributed systems like Red Hat Ceph Storage. Datadog automatically aggregates data from ephemeral infrastructure components, so you can maintain constant visibility even as your infrastructure scales up or down. And because Datadog is fully hosted, it self-updates with features so you only get alerted when it matters most. Field-tested machine learning algorithms distinguish between normal and abnormal trends, and you can configure alerts to trigger only on truly urgent occurrences (e.g., the loss of a Ceph monitor rather than the loss of a single storage node). To enable data-driven collaboration and troubleshooting, Datadog automatically retains your monitoring data for more than a year and makes that data easily accessible from one central platform.

Datadog provides an out-of-the-box integration with Red Hat Ceph Storage to help you get more real-time visibility into the health and performance of your clusters with near-zero setup delay. Datadog also delivers template integrations with more than 250 other technologies, including services that are commonly used with Ceph storage, like OpenStack and Amazon S3, so you can get more comprehensive insights into every layer of your stack in one place.

We will explore a few ways in which you can use Datadog to monitor Red Hat Ceph Storage in full context with the rest of your stack. Then we’ll explain how to set up Datadog to start getting clearer insights into your Ceph deployment in three easy steps.

Key Ceph metrics at a glance

Datadog automatically collects metrics from your Red Hat Ceph Storage clusters and makes it easy to explore, visualize, and alert on this data at the cluster, pool, and node levels. The integration includes a template Ceph dashboard that displays an overview of health and performance data from your monitor and storage nodes. You can use the template variables at the top of the dashboard to filter metrics by individual clusters, pools, and Object Storage Daemons (OSDs) to get more granular insights.

Red Hat Ceph Storage includes robust features for high availability and performance and, by monitoring its built-in health checks, you can help you ensure everything is running smoothly.

Datadog automatically queries Ceph for the status of these health checks, along with other key information about your nodes, including:

  • Object Storage Daemon (OSD) statusQuickly find out if an OSD is down, so you can try restarting the node or troubleshooting potential issues (e.g., networking, disk capacity).

  • Monitor status: If you’re running more than one monitor in your cluster (as recommended for high availability), Ceph requires a quorum of monitor nodes to reach a consensus about the latest version of the cluster map. Monitor nodes fall out of the quorum when they become unavailable, or when they fall behind and cannot access the latest version of the map. If your cluster cannot maintain a quorum, clients will be unable to read or write data from the cluster. With Datadog, you can track the number of available monitor nodes in your cluster, as well as the real-time quorum status (the number of monitors in the quorum). You can also set up an alert that notifies you when the number of monitors in the quorum decreases, so you can have enough time to troubleshoot the issue or deploy more nodes if needed.
  • Storage capacity: Datadog’s Red Hat Ceph Storage integration reports OSD storage capacity metrics so you can take action before any OSD runs out of disk space (at which point Ceph will stop writing data to the OSD to guard against data loss). You can set up an alert to detect when any OSD reaches a “NEARFULL” state (85 percent capacity, by default), which gives you enough time to add more OSDs, as recommended in the documentationYou can also use Datadog’s forecasting algorithms to get notified a certain amount of time before any OSD, pool, or cluster is predicted to run out of disk space.

Increased visibility across Ceph metrics

Datadog’s integration also reports other metrics from Ceph, including the rate of I/O operations and commit latency. See the full list of metrics collected as part of this integration in our documentation.

Although it’s important to monitor these metrics, they provide only part of the picture. In the next section, we’ll explore a few of the other ways you can use Datadog to monitor Red Hat Ceph Storage alongside all the other services in your environment.

Monitoring Ceph in context

Your infrastructure depends on Ceph for storage, but it also relies on a range of other systems, services, and applications. To help you monitor Red Hat Ceph Storage in context with other components of your stack, Datadog also integrates with more than 250 technologies, including Amazon S3 and OpenStack Nova.

Monitoring OpenStack + Ceph

If you’ve deployed Ceph on OpenStack, Datadog can help you get clearer insights into your infrastructure across multiple dimensions. Datadog’s OpenStack integration includes a default dashboard that provides a high-level overview of metrics from the hypervisors, Nova servers, tenants, and other components of your OpenStack Compute cluster.

To learn more about integrating Datadog with OpenStack, consult the documentation.

Monitoring Amazon S3 + Ceph

If you’re using Amazon S3 alongside Red Hat Ceph Storage, it’s important to track your S3 activity in real time. Datadog’s AWS S3 integration automatically collects metrics related to request throughput, HTTP errors, and latency. Upon setting up the AWS S3 integration, you’ll see all these key metrics displayed in an out-of-the-box dashboard.

To learn more about integrating Datadog with AWS S3, consult the documentation.

More visibility with APM and logs

Datadog’s distributed tracing and APM can help you monitor the performance of applications and services that use Ceph. Datadog APM is fully integrated with the rest of Datadog, so you can easily navigate from inspecting a distributed request trace to viewing system-level metrics from the specific host that executed that unit of work.

You can also use log processing and analytics to collect and monitor Ceph logs in the same place as your metrics and distributed request traces. Simply follow the configuration steps described here.

Setup guide

It only takes a few minutes to set up Datadog’s Ceph integration. The open source Datadog Agent collects data (including system-level metrics like CPU and memory usage) from all your nodes, as well as the services running on those nodes, so that you can view, correlate, and alert on this data in real time.

Installing the Agent on a node usually only takes a single command—see the instructions for your platform here. You can also deploy the Agent across your entire Ceph cluster with configuration management tools like Chef and Ansible, if desired.

Configure metric collection

To configure the Datadog Agent to collect Ceph metrics, you’ll need to create a configuration file for the integration on your Ceph nodes. The Agent comes with an example config that you can use as a template. Navigate to the “ceph.d” directory within your Agent’s configuration directory, and locate the example configuration file: **conf.yaml.example**.

Copy the example to a new **conf.yaml** file, and edit the new file to include the correct path to your Ceph executable. The Agent check expects the executable to be located at “/usr/bin/ceph”, but you can specify a different path if needed:

init_config:

instances:
  - ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
    use_sudo: true               # only if the ceph binary needs sudo on your nodes

As the preceding example shows, you can also enable “sudo” access if it’s required to execute “ceph” commands on your nodes. If you enable the “use_sudo” option, you must also add the Datadog Agent to your sudoers file, as described in the documentation. For example:

dd-agent ALL=(ALL) NOPASSWD:/usr/bin/ceph

Restart the Agent

Save and exit the configuration file. Restart the Agent using the command for your platform (as specified here) to pick up the Agent configuration change. Then run the Agent status command to ensure that the Agent can successfully connect to Ceph and retrieve data from your cluster. When the integration is working properly, you should see a “ceph” section in the terminal output, similar to the following snippet:

  Running Checks
  ==============
    ceph (unversioned)
    ------------------
      Total Runs: 124
      Metric Samples: 27, Total: 3348
      Events: 0, Total: 0
      Service Checks: 19, Total: 2356
      Average Execution Time : 2025ms

In the Datadog platform, navigate to the Ceph integration tile of your Datadog account and click the “Install Integration” button.

Now that the Datadog Agent is collecting metrics from Ceph, you should start to see data flowing into the built-in Ceph dashboard in your Datadog account.

After you deploy Datadog, your Ceph data is available for visualization, alerting, and correlation with monitoring data from the rest of your infrastructure and applications. The template Ceph dashboard provides a high-level overview of your cluster at a glance, but you can easily customize it to highlight the information that matters most. And dashboards are just the tip of the iceberg—You can use features like the Host Map to visualize how resources are distributed across availability zones. Visit Datadog’s web site to learn more about how these and other features can help you ensure the availability and performance of Ceph and the rest of your systems.

For more detailed setup instructions and a full list of metrics collected as part of the Red Hat Ceph Storage integration, consult Datadog’s documentation.

For more information on Red Hat Ceph Storage, please visit this product page.

BlueStore: Improved performance with Red Hat Ceph Storage 3.2

Red Hat Ceph Storage 3.2 is now available! The big news with this release is full support for the BlueStore Ceph backend, offering significantly increased performance for both object and block applications.

First available as a Technology Preview in Red Hat Ceph Storage 3.1, Red Hat has conducted extensive performance tuning and testing work to verify that BlueStore is now ready for use in production environments. With the 3.2 release, Red Hat Ceph Storage has attributes that make it suitable for a wide range of use cases and workloads, including:

  • Data analytics: As a data lake, Red Hat Ceph Storage uses object storage to deliver massive scalability and high availability to support demanding multitenant analytics workloads. Disparate analytics clusters can be consolidated to reduce cost of ownership, lower administrative burden, and increase service levels. BlueStore helps improve performance, while support for erasure coding helps reduce overall storage costs for data protection over simple replication.
  • Hybrid cloud applications: Red Hat Ceph Storage is ideal for on-premise storage clouds. Because Red Hat Ceph Storage supports the Amazon Web Services (AWS) Simple Storage Service (S3) interface, applications can access their storage with the same API, whether in public or private clouds.
  • OpenStack applications. Red Hat Ceph Storage is very popular for OpenStack applications. Red Hat Ceph Storage 3.2 can offer improved performance for OpenStack deployments, including Red Hat OpenStack Platform. Erasure coding for RADOS Block Device (RBD) is available as a Technology Preview in this release.
  • Backup target. A growing list of software vendors have certified their backup applications with Red Hat Ceph Storage as a backup storage target:
    • Veritas NetBackup for Symantec OpenStorage (OST) cloud backup – versions 7.7 and 8.0  
    • Rubrik Cloud Data Management (CDM) – versions 3.2 and later  
    • NetApp AltaVault – versions 4.3.2 and 4.4  
    • Trilio, TrilioVault – versions 3.0
    • Veeam Backup & Replication – version 9.x

BlueStore performance

BlueStore is all about performance. For hard disk drive (HDD) based clusters, BlueStore architecturally removes the double-write penalty incurred by the traditional FileStore backend. Additionally, BlueStore provides significant performance enhancements in configurations that use all solid-state drives (SSDs) or Non Volatile Memory Express (NVM Express, or NVMe) drives.

The architectural shift to a BlueStore backend has already shown performance improvements on community Ceph distributions. Testing by Micron in 2018 demonstrated up to 2x increases in performance with the BlueStore over the traditional FileStore backend.

Micron conducted BlueStore vs. FileStore object testing and reported significant performance improvements in terms of both improved throughput and reduced latency.

4MB objects

100% writes

  • 88% increase in throughput
  • 47% decrease in average latency

70%/30% reads/writes

  • 64% increase in throughput
  • 40% decrease in average latency

Micron also conducted BlueStore vs. FileStore block testing and reported higher IOPS and lower latency.

4K random blocks

100% writes

  • 18% higher I/O operations (IOPS)
  • 5% lower average latency
  • Up to 70%+ reduced 99.999% latency

70%/30% reads/writes

  • 14% higher IOPS
  • 80%+ lower read tail latency
  • 70%+ lower write tail latency

Upgrades and new installs

Importantly, both the BlueStore and FileStore backends coexist in Red Hat Ceph Storage 3.2. Existing Red Hat Ceph Storage 2.5 and 3.1 clusters retain the FileStore backend when upgrading to version 3.2. Newly created Red Hat Ceph Storage clusters default to the BlueStore backend. Those wishing to upgrade existing clusters to the BlueStore backend should contact Red Hat Support.

For more information on how Red Hat Ceph can tackle your toughest data storage challenges, please visit our Ceph product page.

How to back up and recover Red Hat OpenShift Container Storage

By Annette Clewett and Luis Rico

The snapshot capability in Kubernetes is in tech preview at present and, as such, backup/recovery solution providers have not yet developed an end-to-end Kubernetes volume backup solution. Fortunately, GlusterFS, an underlying technology behind Red Hat OpenShift Container Storage (RHOCS), does have a mature snapshot capability. When combined with enterprise-grade backup and recovery software, a robust solution can be provided.

This blog post details how backup and restore can be done when using RHOCS via GlusterFS. As of the Red Hat OpenShift Container Platform (OCP) 3.11 release, there are a limited number of storage technologies (EBS, Google Cloud E pDisk, and hostPath) that support creating and restoring application data snapshots via Kubernetes snapshots. This Kubernetes snapshot feature is in tech preview, and the implementation is expected to change in concert with upcoming Container Storage Interface (CSI) changes. CSI, a universal storage interface (effectively an API) between container orchestrators and storage providers, is ultimately where backup and restore for OCS will be integrated in the future using volume snapshots capability.

Traditionally, backup and restore operations involve two different layers. One is the application layer. For example, databases like PostgreSQL have their own procedures to do an application consistent backup. The other is the storage layer. Most storage platforms provide a way for backup software like Commvault or Veritas NetBackup to integrate, obtain storage level snapshots, and perform backups and restores accordingly. An application layer backup is driven by application developers and is application specific. This study will focus on traditional storage layer backup and restore using Commvault Complete™ Backup and Recovery Software for this purpose. Other backup software tools can be used in a similar manner if they supply the same capabilities as used with Commvault.

RHOCS can be deployed in either converged mode or independent mode, and both are supported by the process described in this article. Converged mode, formerly known as Container Native Storage (CNS), means that Red Hat Gluster Storage is deployed in containers and uses the OCP host storage and networking. Independent mode, formerly known as Container Ready Storage (CRS), is deployed as a stand-alone Red Hat Gluster Storage cluster that provides persistent storage to OCP containers. Both modes of RHOCS deployment use heketi in a container on OCP for provisioning and managing GlusterFS volumes.

Storage-level backup and restore for RHOCS

If a backup is performed at the Persistent Volume (PV) level, then it will not capture the OCP Persistent Volume Claim (PVC) information. OCP PVC to PV mapping is required to identify which backups belong to which application. This leaves a gap: Which GlusterFS PV goes with which OCP PVC?

In a traditional environment, this is solved by naming physical volumes such that the administrator has a way of identifying which volumes belong to which application. This naming method can now be used in OCP (as of OCP 3.9) by using custom volume naming in the StorageClass resource. Before OCP 3.9, the names of the dynamically provisioned GlusterFS volumes were auto-generated with random vol_UUID naming. Now, by adding a custom volume name prefix in the StorageClass, the GlusterFS volume name will include the OCP namespace or project as well as the PVC name, thereby making it possible to map the volume to a particular workload.

OCS custom volume naming

Custom volume naming requires a change to the StorageClass definition. Any new RHOCS persistent volumes claimed using this StorageClass will be created with a custom volume name. The custom volume name will have prefix, project or namespace, PVC name and UUID (<myPrefix>_<namespace>_<claimname>_UUID).

The following glusterfs-storage StorageClass has custom volume naming enabled by adding the volumenameprefix parameter.

# oc get sc glusterfs-storage -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: glusterfs-storage
parameters:
  resturl: http://heketi-storage-app-storage.apps.ocpgluster.com
  restuser: admin
  secretName: heketi-storage-admin-secret
  secretNamespace: app-storage
  volumenameprefix: gf 
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete

❶ Custom volume name support: <volumenameprefixstring>_<namespace>_<claimname>_UUID

As an example, using this StorageClass for a namespace of mysql1 and PVC name of mysql the volume name would be gf_mysql1_mysql_043e08fc-f728-11e8-8cfd-028a65460540 (the UUID portion of the name will be unique for each volume name).

Note: If custom volume naming cannot be used, then it is important to collect information about all workloads using PVCs, their OCP PV associated, and the GlusterFS volume name (contained in Path variable in description of OCP PV).

RHOCS backup process

The goal of this blog post is to provide a generic method to back up and restore OCS persistent volumes used with OCP workloads. The example scripts and .ini files have no specific dependency on a particular backup and restore product. As such, they can be used with a product such as Commvault, where the scripts can be embedded in the backup configuration. Or, they can be used standalone, assuming that basic backup/recovery of the mounted gluster snapshots will be done via standard RHEL commands.

Note: The methods described here apply only for gluster-file volumes and currently will not work for gluster-block volumes.

For this approach, a “bastion host” is needed for executing the scripts, mounting GlusterFS snapshot volumes, and providing a place to install the agent if using backup and restore software. The bastion host should be a standalone RHEL7 machine separate from the OCP nodes and storage nodes in your deployment.

Requirements for the bastion host

The bastion host must have network connectivity to both the backup and restore server (if used), as well as the OCP nodes with the gluster pods (RHOCS converged mode) or the storage nodes (RHOCS independent mode). The following must be installed or downloaded to the bastion host:

  • backup and restore agent, if used
  • heketi-client package
  • glusterfs-fuse client package
  • atomic-openshift-clients package
  • rhocs-backup scripts and .ini files

RHOCS backup scripts

The github repository rhocs-backup contains unsupported example code that can be used with backup and restore software products. The two scripts, rhocs-pre-backup.sh and rhocs-post-backup.sh, have been tested with Commvault Complete™ Backup and Recovery Software. The rhocs-pre-backup.sh script will do the following:

  • Find all gluster-file volumes using heketi-client
  • Create a gluster snapshot for each volume
  • Mount the the snapshot volumes on the bastion host that has the backup agent installed
  • Protect the heketi configuration database by creating a json file for the database in the backup directory where all gluster snapshots are going to be mounted

Once the mounted snapshot volumes have been backed up, the rhocs-post-backup.sh script will do the following:

  • Unmount the snapshot volumes
  • Delete the gluster snapshot volumes

The two .ini files, independent_vars.ini and converged_vars.ini, are used to specify parameters specific to your RHOCS mode of deployment. Following are example parameters for converged_var.ini.

## Environment variables for RHOCS Backup:
## Deployment mode for RHOCS cluster: converged (CNS) or independent (CRS)
export RHOCSMODE="converged"

## Authentication variables for accessing OpenShift cluster or
## Gluster nodes depending on deployment mode
export OCADDRESS="https://master.refarch311.ocpgluster.com:443"
export OCUSER="openshift"
export OCPASS="redhat"
export OCPROJECT="app-storage" ## OpenShift project where gluster cluster lives

## Any of the Gluster servers from RHOCS converged cluster
## used for mounting gluster snapshots
export GLUSTERSERVER=172.16.31.173

## Directory for temporary files to put the list of
## Gluster volumes /snaps to backup
export VOLDIR=/root
export SNAPDIR=/root

## Destination directory for mounting snapshots of Gluster volumes:
export PARENTDIR=/mnt/source

## Heketi Route and Credentials
export USERHEKETI=admin ## User with admin permissions to dialog with Heketi
export SECRETHEKETI="xzAqO62qTPlacNjk3oIX53n2+Z0Z6R1Gfr0wC+z+sGk=" ## Heketi user key
export HEKETI_CLI_SERVER=http://heketi-storage-app-storage.apps.refarch311.ocpgluster.com ##
Route where Heketi pod is listening

## Provides Logging of this script in the dir specified below:
export LOGDIR="/root"

The pre-backup script, when executed, uses the heketi-client for the list of current gluster-file volumes. Because of this, for the script to work properly the heketi container must be online and reachable from bastion host. Additionally, for the scripts to work properly, all GlusterFS nodes or peers of RHOCS cluster must be online, as GlusterFS snapshot operation requires all bricks of a GlusterFS volume be available.

Manual execution of pre- and post-backup scripts

This section assumes that the bastion host has been created and has the necessary packages, scripts, and .ini files are installed on this machine. Currently, the pre- and post-backup scripts run as the root user. Because of this, backing up volumes for RHOCS independent mode will require that the bastion host can SSH as the root user with passwordless access to one of the GlusterFS storage nodes. This access should be verified before attempting to run the following scripts.

The scripts can be manually executed in the following manner for RHOCS converged mode:

sudo ./rhocs-pre-backup.sh /<path_to_file>/converged_vars.ini

Followed by this script to unmount the snapshot volumes and to remove the snapshot volumes from the RHOCS Heketi database and GlusterFS converged cluster:

sudo ./rhocs-post-backup.sh /<path_to_file>/converged_vars.ini

A variation of these scripts for RHOCS independent mode can be manually executed in the following manner:

sudo ./rhocs-pre-backup.sh /<path_to_file>/independent_vars.ini

Followed by this script to unmount the snapshot volumes and to remove the snapshot volumes from the RHOCS Heketi database and GlusterFS independent cluster:

sudo ./rhocs-post-backup.sh /<path_to_file>/independent_vars.ini

For each execution of the pre- or post-backup script a log file will be generated and placed in the directory specified in the .ini file (default is /root).

Note: Pre-backup scripts can be modified as needed for specific scenarios to achieve application-level consistency, like quiescing a database before taking a backup. Also, if special features are used with RHOCS, like SSL encryption or geo-replication, scripts will have to be customized and adjusted to be compatible with those features.

Commvault backup process

Note that this blog post does not cover the tasks to install and configure Commvault to back up and restore data. In addition to having the Commvault Console and Agent in working order, this section also assumes that the bastion host has been created and has the necessary packages, scripts, and .ini files installed.

The use of these scripts to back up OCP PVs is compatible with any backup frequency or retention configured in the Commvault backup policy. But, as we are mounting gluster snapshots in newly created folders with date and time information, the backup application will always consider contents as new, so even if backup policy is incremental, it will effectively do a full backup. Also, the backup content will consist of dozens or even hundreds of very small filesystems (1-10 GB), that could run faster under a “always do full backup” strategy.

Detailed process for backup using Commvault

Once the scripts and .ini files are on the bastion host and a Commvault Agent is installed, a backup can be done using the Commvault Commcell Console or the Commvault Admin Console. The following views show how to do the backup using the Commcell Console and validate the backup using the Admin Console.

A Subclient must be created before a backup can be done and a unique name must be specified.

When creating a Subclient, you must input where on the bastion host you want the backup to be done from on the bastion host with Commvault Agent (e.g., /mnt/source).

Choose what schedule you want the backup done on (or Do Not Schedule; start backup manually using Console instead).

And last or the Subclient configuration, add the path to the pre- and post-backup scripts, as well as the path to the appropriate .ini file, converged_vars.ini or independent_vars.ini. Once this is done and the subclient has been saved, you are ready to take a backup of the gluster-file snapshot volumes.

The backup can then be done by selecting the desired subclient and issuing an immediate backup or letting the selected schedule do the backups when configured (e.g., daily).

For an immediate backup, you can choose full or incremental. As already stated, a full backup will be done every time, because the pre-backup script always creates a new directory to mount the gluster-file snapshot volume.

You can track backup progress using the Job Controller tab.

Once the backup is complete, in the Job Controller tab of the Commvault Commcell Console, verification can be done by logging into the Commvault Admin Console, selecting the correct subclient (ocsbackups), and viewing the backup content for the GlusterFS volumes and the heketi database.

RHOCS restore and recovery process

Now that there are backups for RHOCS snapshot volumes, it is very important to have a process for restoring the snapshot from any particular date and time. Once data is restored, it can be copied back into the OCP PV for a target workload in a way that avoids conflicts (e.g., copying files into running workload at same time updates are attempting to be made to same files). 

To do this, we will use CLI command “oc rsync” and an OCP “sleeper” deployment. You can use the command “oc rsync” to copy local files to or from a remote directory in a container. The basic syntax is “oc rsync <source> <destination>”. The source can be a local directory on the bastion host or it can be a directory in a running container, and similar is true for the destination.

In the case where the data in a PV must be completely replaced, it is useful to use a “sleeper” deployment as a place to restore the data so that the workload can essentially be turned off while the data is being restored to its volume, thereby avoiding any conflicts. The sleeper deploymentconfig will create a CentOS container and mount the workload PVC at a configured mount point in the CentOS container. This allows the backup gluster snapshot for the PV to then be copied from the directory where the snapshot was restored into the sleeper container (e.g., oc rsync <path to directory with restored data>/ sleeper-1-cxncv:/mnt –delete=true).

Simple restore or file-level recovery

This section details how to restore files or folders from a backup of a particular volume to a local working directory on the bastion host.

  1. Identify the desired backup of the gluster snapshot volume by date and time and volume name (see the following in Commvault subclient Restore view). Find the folder or files you want to restore, and check the appropriate boxes.

2. Restore the files to the same directory where the backup was taken or to any other directory on the bastion host.

3. Verify that the files are in the specified directory for the Commvault Restore. They can now be copied into the
destination pod using “oc rsync” or the method described in the next section using a “sleeper” deployment.

$ pwd
/home/ec2-user/annette
$ ls -ltr test*
-rw-r-----. 1 1000470000 2002  8590 Dec 3 17:19 test.frm
-rw-r-----. 1 1000470000 2002 98304 Dec  3 17:19 test.ibd

Complete restore and recovery

This section details how to restore an entire volume. The process tested here will work for operational recovery (volumes with corrupted data), instances where volumes were inadvertently deleted, or to recover from infrastructure failures. The following example is for a MySQL database deployed in OCP.

  1. Identify the desired backup by date, time, and volume name (see the backup directory that is checked in Commvault Subclient Restore view).

2. Restore the backup to original directory where backup was taken of mounted gluster snapshot volume or any other directory on the bastion host.

# ls

/mnt/source/backup-20181203-1732/ocscon_mysql3_mysql_f3610b3f-f122-11e8-b862-
028a65460540-snap-20181203-1732
auto.cnf    ca.pem     client-key.pem  ibdata1  ib_logfile1  mysql   
      mysql_upgrade_info  private_key.pem sampledb      server-key.pem
ca-key.pem  client-cert.pem  ib_buffer_pool ib_logfile0  ibtmp1 
mysql-1-gs92m.pid  performance_schema public_key.pem   server-cert.pem sys

3. Change to the correct namespace or project (oc project msyql3).

4. Scale the mysql deploymentconfig to zero to temporarily stop the database service by deleting the mysql pod (oc scale –replicas=0 dc mysql).

5. Create a sleeper deployment/pod (oc create -f sleeper-dc.yml). The YAML file to create the “sleeper deployment/pod” can be found in the next section.

6. Copy the backup to the PVC mounted in sleeper pod (oc rsync <path to directory with restored data>/ sleeper-1-cxncv:/mnt –delete=true).

# oc rsync
/mnt/source//backup-20181203-1732/ocscon_mysql3_mysql_f3610b3f-f122-11e8-b862-
028a65460540-snap-20181203-1732/ sleeper-1-f9lgh:/mnt --delete=true

Note: Disregard this message: WARNING: cannot use rsync: rsync not available in container.

7. Delete sleeper deploymentconfig (oc delete dc/sleeper)

8. Scale up the mysql deploymentconfig to recreate the mysql pod and start the service again. This will mount the mysql volume with the restored data (oc scale –replicas=1 dc mysql).

9. Log in to the mysql pod and confirm the correct operation of the database with the restored data.

Creating the Sleeper DeploymentConfig

Following is the YAML file to create the sleeper deployment (oc create -f dc-sleeper.yaml). This deployment must be created in same namespace or project as the workload you are trying to restore data to (e.g., the mysql deployment).

Note: The only modification needed for this YAML file it to specify the correct <pvc_name> below (e.g., mysql).

$ cat dc-sleeper.yaml
---
apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
  annotations:
    template.alpha.openshift.io/wait-for-ready: "true"
  name: sleeper
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    name: sleeper
  strategy:
    activeDeadlineSeconds: 21600
    recreateParams:
      timeoutSeconds: 600
    resources: {}
    type: Recreate
  template:
    metadata:
      labels:
       name: sleeper
spec:
     containers:
       - image: centos:7
          imagePullPolicy: IfNotPresent
          name: sleeper
          command: ["/bin/bash", "-c"]
          args: ["sleep infinity"]
          volumeMounts:
            - mountPath: /mnt
              name: data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 5
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: <pvc_name>
  test: false
  triggers:
    - type: ConfigChange

SQL Server database point-in-time recovery example

To make sure there are no updates during the gluster-file snapshot, the following must be done for a mysql volume so that the backup is consistent.

  1. Log in to the mysql pod.
  2. Log in to mysql (mysql -u root).
  3. mysql>USE SAMPLEDB;
  4. mysql> FLUSH TABLES WITH READ LOCK;
  5. Take a gluster snapshot of the mysql volume by executing the rhocs-pre-backup.sh script manually for the correct RHOCS mode.
  6. mysql> UNLOCK TABLES;
  7. Remove the pre- and post-backup scripts in the Advanced tab for the Commvault subclient.
  8. Take a backup of the snapshot volume.
  9. Execute the rhocs-post-backup.sh script manually for the correct RHOCS mode (unmount and delete the gluster snapshot volumes).
  10. Continue with step 2 in the the “Complete restore and recovery” section.

Backup scripts on Github

Scripts and .ini files can be found here. You are more than welcome to participate in this effort and improve the scripts and process.

Want to learn more about Red Hat OpenShift Container Storage?

Get a more intimate understanding of how Red Hat OpenShift and OCS work together with a hands-on test drive, and see for yourself.

Still want to learn more? Check out the Red Hat OpenShift Container Storage datasheet.

KubeCon Seattle, here we come!

Our top 3 storage-for-containers things to look forward to at KubeCon

By Steve Bohac, OpenShift Storage Product Marketing

Season greetings!

As always, much going on with Red Hat OpenShift Container Storage!

Of course, Kubernetes 1.13 was released this week, Container Journal recently published an article I authored, and KubeCon Seattle is coming up next week… By the way, did you see the latest Forrester Wave Enterprise Container Platform Software Suites where Red Hat OpenShift was named a Leader? Good stuff!

Red Hat OpenShift Container Storage helps organizations standardize storage across multiple environments and easily integrates with Red Hat OpenShift to deliver a persistent storage layer for containerized applications that require long-term, stateful storage. Enterprises can benefit from a simple, integrated solution including the container platform, registry, application development environment, and storageall in one, supported by a single vendor.

December is always a busy month with industry conferences (not to mention holiday planning!), so as I finalized my own KubeCon plans, I wanted to pause and take a quick breath and outline my top 3 things I’m looking forward to at KubeCon Seattle 2018 next week:

  1. Assorted Kubernetes announcements (whatever they are!). Yes, who knows what kind of interesting things will be announced next week… but they’ll likely be exciting! The Kubernetes ecosystem has gotten so large now, there is always a plethora of interesting products and technologies announced at KubeCon. It’s always interesting to see how these new announcements dictate where things are going with Kubernetes and cloud native technologies in general. (By the way, for a great overview of the “third era” of Kubernetes, check out PodCTL #54 with our own Brian Gracely and Tyler Britten.)
  2. For the first time ever, there will be a Cloud Native Storage Day as one of the co-located events at KubeCon. Like the other co-located events, it takes place next Monday before the KubeCon show officially kicks off. The day’s agenda includes customers and industry leaders like Red Hat (I’ll be there with a few colleagues presenting) discussing current implementations and future directions of container storage. This should be very educational and interactive for everyone! And…. the sessions will be recorded (look back here for a post-KubeCon blog after the show for links to the recordings!).
  3. Catching up on the status of the Rook project. What is Rook? Rook is a persistent storage orchestrator that is designed to run as a native Kubernetes service. Consider it the glue between storage and the containerthe thing that makes automation work. This is an interesting development around storage for containers, and I’m looking forward to meeting up with colleagues and “fellow travelers” to understand more.

Anyway, it should be a good one at KubeCon next week (did I mention it is sold out!?). In between sessions, make sure to visit us in Booth D1 in the Expo Hall for product demonstrations, to speak with Red Hat OpenShift Container Storage experts and other community leaders about upstream projects, and to snag some of our giveaways (while supplies last!).

We hope to see you there! If we don’t catch you in person, we’ll be tweeting (and re-tweeting) all week! If you don’t already, make sure to follow us on Twitter at @RedHatStorage.

Not attending KubeCon? No sweat! You can still learn more and get hands on with a more intimate understanding of how Red Hat OpenShift and OpenShift Container Storage work together with a test drive.

Still want to learn more? Check out the Red Hat OpenShift Container Storage datasheet.

Red Hat Hyperconverged Infrastructure for Virtualization delivers increased efficiencies for storage and compute at the edge

Customers can realize more value and greater simplicity with cost-effective, open source, integrated compute and storage delivered in a compact footprint

By Daniel Gilfix, Red Hat Cloud Storage and Hyperconverged Infrastructure

Hyperconverged Infrastructure (HCI) emerged as an infrastructure category about a decade ago aimed at a few specific use cases and has been dominated by proprietary software vendors offering appliances built on their hardware, or rigid configurations delivered with OEM hardware partners.

What’s new?

Today we announced the next iteration of our enterprise-grade, open source approach in this spaceRed Hat Hyperconverged Infrastructure for Virtualization 1.5, which benefits from the combined strength of Red Hat Enterprise Linux, Red Hat Virtualization, Red Hat Gluster Storage, and Red Hat Ansible Automation.

Where’s the beef?

Red Hat Hyperconverged Infrastructure for Virtualization (RHHI-V) is an optimized, hyperconverged infrastructure (HCI) that has helped organizations across industries like energy, retail, banking, telco, and the public sector make the most of business-critical applications that must be deployed with limited space, budget, and IT staff, including departmental and lines of business ops, remote sites, and development and test environments. Integration with Red Hat Ansible Automation helps reduce manual errors normally associated with downtime while enabling a more streamlined and speedy deployment. Simplified administration via a single user interface means you can consolidate your infrastructure and adopt a software-defined datacenter more efficiently. Such adoption includes using RHHI-V in lieu of a more expensive VMware “lock-in” environment or transitioning from it under professional guidance with the Red Hat infrastructure migration solution.

What’s inside?

Red Hat Hyperconverged Infrastructure for Virtualization 1.5 now features advanced data reduction capabilities for even greater efficiencies as well as a series of validated server configurations for optimized workloads to reduce or eliminate the guesswork out of infrastructure deployment. Details follow:

  • Data reduction via deduplication and compression. Made possible through embedded Virtual Data Optimizer (VDO) code in Red Hat Enterprise Linux, you can now efficiently eliminate duplicate instances of repeating data and compress the reduced data set. This results in improved storage utilization and enables more affordable high-performance storage options.
  • Virtual graphics processing unit (vGPU). With the vGPU capability, you can assign GPU slices to VMs to accelerate 3D graphics and to offload computationally heavy jobs, including applications in computational science, workloads in oil and gas and manufacturing, as well as emerging AI and machine learning applications processing.
  • Open Virtual Network support. Support for software-defined networking via Open Virtual Network (OVN) helps improve scalability while enabling live migration of virtual networking components in a hyperconverged Linux environment.
  • Deep Ansible integration. Red Hat Ansible Automation enables true “ops value” at deploy and runtime, thereby paving the way toward your broader automation goals. We also deliver Ansible playbooks to enable remote replication and recovery of RHHI-V environments.
  • Validated hardware configurations. To help ensure RHHI-V users deploy sound infrastructure configurations, Red Hat has tested a number of use cases with our hardware partners and documents configuration guidelines for optimized workloads. These configurations, along with our new RHHI-V sizing tool, can help you anticipate platform requirements based on their usage patterns, taking the guesswork out of deploying a software-defined HCI platform, and reducing time to value. You can choose among industry standard hardware and enjoy more predictable performance for their desired deployment patterns.

Who benefits?

While RHHI-V was initially targeted at remote office/branch office deployment, we’ve experienced steadily increasing demand to support more mission-critical applications, such as remote tactical operations for public sector, field analysis and oil rig operations in the energy sector, and managing data from a myriad of sensors in factories across both process and discrete manufacturing. Now integrated even more broadly across the Red Hat software stack, RHHI-V is a powerful, general purpose platform for anyone seeking to jumpstart edge computing or modernize their existing data center to accommodate new workloads with greater degrees of efficiency. 

How can you learn more?

For more information on Red Hat Hyperconverged Infrastructure for Virtualization, check out this article by Storage Switzerland. Feel free to also attend our upcoming webinar on December 11. You can always simply access us on the web.

 

Five reasons you need to change your data storage—now

By Terry L. Smith, Senior Director, Penguin Computing’s Advanced Solutions Group

Transformation of the data storage industry in recent years has been dramatic. We’ve seen the development of new, component technologies, yielding higher capacities and performance. But more profound is the general acceptance that the old, proprietary, monolithic approach to storage simply cannot keep up with business needs. Open, software-defined storage delivers a flexilble, cost-efficient alternative to traditional storage appliances while being better able to handle the demands required by modern workloads.

Penguin Computing and Red Hat together deliver comprehensive, open, software-defined storage solutions, expertly architected and configured to meet your business requirements.

But why should you consider complementing your existing monolithic appliance storage with a software-defined approach?  I see five key reasons:

  1. Your data storage requirements keep growing, but traditional storage appliances are not built to handle them.
    There is only so much scaling up you can do with a traditional storage appliance. To keep up with your growing data storage needs, you find yourself in a cycle of “upgrades by replacement.” This is a huge capital burden, exacerbated by additional costs to license and support both the old and new systems during the upgrade migration. Worse still, you may even need to “upgrade” before the appliance’s expected end-of-life, when it would be fully amortized. With an open, scale-out, software-defined storage solution, you can take control. Built with industry-standard server technologies, you can scale out your open storage in manageable units and replace hardware only when needed. You can control your storage growth in a way that cost-effectively meets your needs, not the needs of the vendor.
  2. Your data storage solution should be feature-rich and flexible.
    With traditional storage appliances, your options for capacity and performance may be severely limited. And, other features, like advanced data protection and access protocol support, may be unavailable or require additional licensing. You may even be required to purchase a completely new appliance. But open, software-defined storage solutions empower you with features and flexibility out-of-the-box, often with all-inclusive software pricing. Hardware, software, and support can be decoupled, giving you the ability to work with vendors of your choice and sculpt the cost-effective solution that fits your business needs.
  3. You should have control of your storage support costs.
    Most traditional storage vendors have a business model based on volume of units sold. A “next-generation” box comes out on a regular schedule, and customers are expected to purchase the “upgrade.” To encourage this, traditional storage vendors often keep raising the cost of support for the older appliance. And, if you stop paying for support, the appliance may even stop working. So, you end up buying the new box, even if the older appliance is still capable of meeting your needs. Open storage solutions let you decide how and when to handle hardware and software upgrades. In fact, you could use a rolling upgrade, where old industry standard server equipment is replaced by new equipment as needed and the software subscription is rolled over from the old equipment. This helps eliminate the traditional “migration” concept and its  associated costs. And enterprise-level software support is typically at a flat, predictable rate, which is often lower than the average support cost for proprietary, traditional storage appliances.
  4. You can avoid vendor lock-in and keep your options open.
    Most traditional storage vendors count on locking you into their ecosystems, limiting your upgrade and support options. You can even be legally restricted from making any changes to the storage appliance, such as buying disks directly from disk vendors, to better meet your business requirements or keep using it past the expiration of the support contract. Open technology solutions free you from these limitations and restrictions. If you’re comfortable working directly with the open source and can support it, you may even replace or modify the software layer to meet your own, specific requirements without getting permission from anyone. The message here is clear: You are in control of your options.
  5. You can be ready for the next industry shift, like hybrid-cloud computing with open, software-defined storage.
    Most traditional storage vendors rely on costly and often small development teams who lack the scale to keep up with changing business needs. Open technology solutions, however, generally are created by some of the largest development communities in the world with guidance, vetting, and end-customer support delivered by world-class solution providers who understand business. The result is that open technologies can deliver reliable, feature-rich solutions capable of meeting your business needs now and in the future.

Penguin Computing and Red Hat have been bringing open technology solutions to enterprises for over two decades. With Penguin Computing’s FrostByte family of software-defined storage solutions, featuring Red Hat Ceph Storage and Red Hat Gluster Storage, businesses can break free of the traditional storage appliance without giving up enterprise-quality hardware, software, and services.

You can learn more about Penguin FrostByte with Red Hat Gluster Storage here and Penguin FrostByte with Red Hat Ceph Storage here.

About Terry
Terry L. Smith is senior director of Penguin Computing’s Advanced Solutions Group (ASG). Terry came to Penguin Computing in 2014 with a history of entrepreneurship and deep technical expertise. Launched in 2017, Terry’s group has opened new markets with solutions featuring advanced technologies and designed with world-class partnerships. This includes the FrostByte family of software-defined storage solutions featuring Red Hat Storage. One of ASG’s successes features FrostByte with Red Hat Gluster Storage delivered as an ongoing service for a Fortune 500 financial services institution.

Running OpenShift Container Storage 3.10 with Red Hat OpenShift Container Platform 3.10

By Annette Clewett anJose A. Rivera

With the release of Red Hat OpenShift Container Platform 3.10, we’ve officially rebranded what used to be referred to as Red Hat Container-Native Storage (CNS) as Red Hat OpenShift Container Storage (OCS). Versioning remains sequential (i.e, OCS version 3.10 is the follow on to CNS 3.9). You’ll continue to have the convenience of OCS 3.10 as part of the normal OpenShift deployment process in a single step, and OpenShift Container Platform (OCP) evaluation subscription has access to OCS evaluation binaries and subscriptions.

OCS 3.10 introduces an important feature for container-based storage with OpenShift. Arbiter volume support allows for there to be only two replica copies of the data, while still providing split-brain protection and ~30% savings in storage infrastructure versus a replica-3 volume. This release also hardens block support for backing OpenShift infrastructure services. Detailed information on the value and use of OCS 3.10 features can be found here.

OCS 3.10 installation with OCP 3.10 Advanced Installer

Let’s now take a look at the installation of OCS with the OCP Advanced Installer. OCS can provide persistent storage for both OCP’s infrastructure applications (e.g., integrated registry, logging, and metrics), as well as  general application data consumption. Typically, both options are used in parallel, resulting in two separate OCS clusters being deployed in a single OCP environment. It’s also possible to use a single OCS cluster for both purposes.

Following is an example of a partial inventory file with selected options concerning deployment of OCS for applications and an additional OCS cluster for infrastructure workloads like registry, logging, and metrics storage. When using these options for your deployment, values with specific sizes (e.g., openshift_hosted_registry_storage_volume_size=10Gi) or node selectors  (e.g., node-role.kubernetes.io/infra=true) should be adjusted for your particular deployment needs.

If you’re planning to use gluster-block volumes for logging and metrics, they can now be installed when OCP is installed. (Of course, they can also be installed later.)

[OSEv3:children]
...
nodes
glusterfs
glusterfs_registry

[OSEv3:vars]
...      
# registry
openshift_hosted_registry_storage_kind=glusterfs       
openshift_hosted_registry_storage_volume_size=10Gi   
openshift_hosted_registry_selector="node-role.kubernetes.io/infra=true"

# logging
openshift_logging_install_logging=true
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_size=50Gi
openshift_logging_es_cluster_size=3
openshift_logging_es_pvc_storage_class_name='glusterfs-registry-block'
openshift_logging_kibana_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_curator_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

# metrics
openshift_metrics_install_metrics=true
openshift_metrics_storage_kind=dynamic
openshift_metrics_storage_volume_size=20Gi
openshift_metrics_cassandra_pvc_storage_class_name='glusterfs-registry-block'
openshift_metrics_hawkular_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_cassandra_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_heapster_nodeselector={"node-role.kubernetes.io/infra": "true"}

# Container image to use for glusterfs pods
openshift_storage_glusterfs_image="registry.access.redhat.com/rhgs3/rhgs-server-rhel7:v3.10"

# Container image to use for gluster-block-provisioner pod
openshift_storage_glusterfs_block_image="registry.access.redhat.com/rhgs3/rhgs-gluster-block-prov-rhel7:v3.10"

# Container image to use for heketi pods
openshift_storage_glusterfs_heketi_image="registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7:v3.10"
 
# OCS storage cluster for applications
openshift_storage_glusterfs_namespace=app-storage
openshift_storage_glusterfs_storageclass=true
openshift_storage_glusterfs_storageclass_default=false
openshift_storage_glusterfs_block_deploy=false   

# OCS storage cluster for OpenShift infrastructure
openshift_storage_glusterfs_registry_namespace=infra-storage  
openshift_storage_glusterfs_registry_storageclass=false       
openshift_storage_glusterfs_registry_block_deploy=true   
openshift_storage_glusterfs_registry_block_host_vol_create=true    
openshift_storage_glusterfs_registry_block_host_vol_size=200   
openshift_storage_glusterfs_registry_block_storageclass=true
openshift_storage_glusterfs_registry_block_storageclass_default=false

...
[nodes]
ose-app-node01.ocpgluster.com openshift_node_group_name="node-config-compute"
ose-app-node02.ocpgluster.com openshift_node_group_name="node-config-compute"
ose-app-node03.ocpgluster.com openshift_node_group_name="node-config-compute"
ose-app-node04.ocpgluster.com openshift_node_group_name="node-config-compute"
ose-infra-node01.ocpgluster.com openshift_node_group_name="node-config-infra"
ose-infra-node02.ocpgluster.com openshift_node_group_name="node-config-infra"
ose-infra-node03.ocpgluster.com openshift_node_group_name="node-config-infra"

[glusterfs]
ose-app-node01.ocpgluster.com glusterfs_zone=1 glusterfs_devices='[ "/dev/xvdf" ]'   
ose-app-node02.ocpgluster.com glusterfs_zone=2 glusterfs_devices='[ "/dev/xvdf" ]'
ose-app-node03.ocpgluster.com glusterfs_zone=3 glusterfs_devices='[ "/dev/xvdf" ]'
ose-app-node04.ocpgluster.com glusterfs_zone=1 glusterfs_devices='[ "/dev/xvdf" ]'

[glusterfs_registry]
ose-infra-node01.ocpgluster.com glusterfs_zone=1 glusterfs_devices='[ "/dev/xvdf" ]'
ose-infra-node02.ocpgluster.com glusterfs_zone=2 glusterfs_devices='[ "/dev/xvdf" ]'
ose-infra-node03.ocpgluster.com glusterfs_zone=3 glusterfs_devices='[ "/dev/xvdf" ]'

Inventory file options explained

The first section of the inventory file defines the host groups the installation will be using. We’ve defined two new groups: (1) glusterfs and (2) glusterfs_registry. The settings for either group all start with either openshift_storage_glusterfs_ or openshift_storage_glusterfs_registry. In each group, the nodes that will make up the OCS cluster are listed, and the devices ready for exclusive use by OCS are specified (glusterfs_devices=).

The first group of hosts in glusterfs specifies a cluster for general-purpose application storage and will, by default, come with the StorageClass glusterfs-storage to enable dynamic provisioning. For high availability of storage, it’s very important to have four nodes for the general-purpose application cluster, glusterfs.

The second group, glusterfs_registry, specifies a cluster that will host a single, statically deployed PersistentVolume for use exclusively by a hosted registry that can scale. This cluster will not offer a StorageClass for file-based PersistentVolumes with the options and values as they are currently configured (openshift_storage_glusterfs_registry_storageclass=false). This cluster will also support gluster-block (openshift_storage_glusterfs_registry_block_deploy=true). PersistentVolume creation can be done via StorageClass glusterfs-registry-block (openshift_storage_glusterfs_registry_block_storageclass=true). Special attention should be given to choosing the size for openshift_storage_glusterfs_registry_block_host_vol_size. This is the hosting volume for gluster-block devices that will be created for logging and metrics. Make sure that the size can accommodate all these block volumes and that you have sufficient storage if another hosting volume must be created.

If you want to tune the installation, more options are available in the Advanced Installation. To automate the generation of required inventory file options as shown previously, check out this newly available red-hat-storage tool called “CNS Inventory file Creator” or CIC (alpha version at this time). The CIC tool creates CNS or OCS inventory file options for both OCP 3.9 and OCP 3.10, respectively. CIC will ask a series of questions about the OpenShift hosts, the storage devices, sizes of PersistentVolumes for registry, logging and metrics and has baked-in checks to make sure the OCP installation will be successful. This tool  is currently alpha state, and we’re looking for feedback. Download it from github repository openshift-cic.

Single OCS cluster installation

Again, it is possible to support both general-application storage and infrastructure storage in a single OCS cluster. To do this, the inventory file options will change slightly for logging and metrics. This is because when there is only one cluster, the gluster-block StorageClass would be glusterfs-storage-block. The registry PV will be created on this single cluster if the second cluster, [glusterfs_registry], does not exist. For high availability, it’s very important to have four nodes for this cluster.  Also, special attention should be given to choosing the size for openshift_storage_glusterfs_block_host_vol_size. This is the hosting volume for gluster-block devices that will be created for logging and metrics. Make sure that the size can accommodate all these block volumes and that you have sufficient storage if another hosting volume must be created.

[OSEv3:children]
...
nodes
glusterfs

[OSEv3:vars]
...      
# registry
...

# logging
openshift_logging_install_logging=true
...
openshift_logging_es_pvc_storage_class_name='glusterfs-storage-block'
... 

# metrics
openshift_metrics_install_metrics=true
...
openshift_metrics_cassandra_pvc_storage_class_name='glusterfs-storage-block'

...

# OCS storage cluster for applications
openshift_storage_glusterfs_namespace=app-storage
openshift_storage_glusterfs_storageclass=true
openshift_storage_glusterfs_storageclass_default=false
openshift_storage_glusterfs_block_deploy=true
openshift_storage_glusterfs_block_host_vol_create=true
openshift_storage_glusterfs_block_host_vol_size=100
openshift_storage_glusterfs_block_storageclass=true
openshift_storage_glusterfs_block_storageclass_default=false
...

[nodes]

ose-app-node01.ocpgluster.com openshift_node_group_name="node-config-compute"   
ose-app-node02.ocpgluster.com openshift_node_group_name="node-config-compute" 
ose-app-node03.ocpgluster.com openshift_node_group_name="node-config-compute" 
ose-app-node04.ocpgluster.com openshift_node_group_name="node-config-compute" 

[glusterfs]
ose-app-node01.ocpgluster.com glusterfs_zone=1 glusterfs_devices='[ "/dev/xvdf" ]'   
ose-app-node02.ocpgluster.com glusterfs_zone=2 glusterfs_devices='[ "/dev/xvdf" ]'
ose-app-node03.ocpgluster.com glusterfs_zone=3 glusterfs_devices='[ "/dev/xvdf" ]'
ose-app-node04.ocpgluster.com glusterfs_zone=1 glusterfs_devices='[ "/dev/xvdf" ]'

OCS 3.10 uninstall

With the OCS 3.10 release, the uninstall.yml playbook can be used to remove all gluster and heketi resources. This might come in handy when there are errors in inventory file options that cause the gluster cluster to deploy incorrectly.

If you’re removing an OCS installation that is currently being used by any applications, you should remove those applications before removing OCS, because they will lose access to storage. This includes infrastructure applications like registry, logging, and metrics that have PV claims created using the glusterfs-storage and glusterfs-storage-block Storage Class resources.

You can remove logging and metrics resources by re-running the deployment playbooks like this:

ansible-playbook -i <path_to_inventory_file> -e
"openshift_logging_install_logging=false"
/usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

ansible-playbook -i <path_to_inventory_file> -e
"openshift_logging_install_metrics=false"
/usr/share/ansible/openshift-ansible/playbooks/openshift-metrics/config.yml

Make sure to manually remove any logging or metrics PersistentVolumeClaims. The associated PersistentVolumes will be deleted automatically.

If you have the registry using a glusterfs PersistentVolume, remove it with the following command:

oc delete deploymentconfig docker-registry
oc delete pvc registry-claim
oc delete pv registry-volume
oc delete service glusterfs-registry-endpoints

If running the uninstall.yml because a deployment failed, run the uninstall.yml playbook with the following variables to wipe the storage devices for both glusterfs and glusterfs_registry before trying the OCS installation again.

ansible-playbook -i <path_to_inventory file> -e
"openshift_storage_glusterfs_wipe=True" -e
"openshift_storage_glusterfs_registry_wipe=true"
/usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/uninstall.yml

OCS 3.10 post installation for applications, registry, logging and metrics

You can add OCS clusters and resources to an existing OCP install using the following command. This same process can be used if OCS has been uninstalled due to errors.

ansible-playbook -i <path_to_inventory_file>
/usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.yml

After the new cluster(s) is created and validated, you can deploy the registry using a newly created glusterfs ReadWriteMany volume. Run this playbook to create the registry resources:

ansible-playbook -i <path_to_inventory_file>
/usr/share/ansible/openshift-ansible/playbooks/openshift-hosted/config.yml

You can now deploy logging and metrics resources by re-running these deployment playbooks:

ansible-playbook -i <path_to_inventory_file>
/usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

ansible-playbook -i <path_to_inventory_file>
/usr/share/ansible/openshift-ansible/playbooks/openshift-metrics/config.yml

Want to learn more?

For hands-on experience combining OpenShift and OCS, check out our test drive, a free, in-browser lab experience that walks you through using both. Also, watch this short video explaining why to use OCS with OCP. Detailed information on the value and use of OCS 3.10 features can be found here.

Improved volume management for Red Hat OpenShift Container Storage 3.10

By Annette Clewett and Husnain Bustam

Hopefully by now you’ve seen that with the release of Red Hat OpenShift Container Platform 3.10 we’ve rebranded our container-native storage (CNS) offering to be called Red Hat OpenShift Container Storage (OCS). Versioning remains sequential (i.e, OCS 3.10 is the follow on to CNS 3.9).

OCS 3.10 introduces important features for container-based storage with OpenShift. Arbiter volume support allows for there to be only two replica copies of the data, while still providing split-brain protection and ~30% savings in storage infrastructure versus a replica-3 volume. This release also hardens block support for backing OpenShift infrastructure services. In addition to supporting arbiter volumes, major improvements to ease operations are available to give you the ability to monitor provisioned storage consumption, expand persistent volume (PV) capacity without downtime to the application, and use a more intuitive naming convention for PVs.

For easy evaluation of these features, an OpenShift Container Platform evaluation subscription now includes access to OCS evaluation binaries and subscriptions.

New features

Now let’s dive deeper into the new features of the OCS 3.10 release:

  • Prometheus OCS volume metrics: Volume consumption metrics data (e.g., volume capacity, available space, number of inodes in use, number of inodes free) available in Prometheus for OCS are very useful. These metrics monitor storage capacity and consumption trends and take timely actions to ensure applications do not get impacted.
  • Heketi topology and configuration metrics: Available from the Heketi HTTP metrics service endpoint, these metrics can be viewed using Prometheus or curl http://<heketi_service_route>/metrics. These metrics can be used to query heketi health, number of nodes, number of devices, device usage, and cluster count.
  • Online expansion of provisioned storage: You can now expand the OCS-backed PVs within OpenShift by editing the corresponding claim (oc edit pvc <claim_name>) with the new desired capacity (spec→ requests → storage: new value).
  • Custom volume naming: Before this release, the names of the dynamically provisioned GlusterFS volumes were auto-generated with random uuid number. Now, by adding a custom volume name prefix, the GlusterFS volume name will include the namespace or project as well as the claim name, thereby making it much easier to map to a particular workload.
  • Arbiter volumes: Arbiter volumes allow for reduced storage consumption and better performance across the cluster while still providing the redundancy and reliability expected of GlusterFS.

Volume and Heketi metrics

As of OCP 3.10 and OCS 3.10, the following metrics are available in Prometheus (and by executing curl http://<heketi_service_route>/metrics):

kubelet_volume_stats_available_bytes:      Number of available bytes in the volume
kubelet_volume_stats_capacity_bytes: Capacity in bytes of the volume
kubelet_volume_stats_inodes: Maximum number of inodes in the volume
kubelet_volume_stats_inodes_free: Number of free inodes in the volume
kubelet_volume_stats_inodes_used: Number of used inodes in the volume
kubelet_volume_stats_used_bytes: Number of used bytes in the volume
heketi_cluster_count: Number of clusters
heketi_device_brick_count: Number of bricks on device
heketi_device_count: Number of devices on host
heketi_device_free: Amount of free space available on the device
heketi_device_size: Total size of the device
heketi_device_used: Amount of space used on the device
heketi_nodes_count: Number of nodes on the cluster
heketi_up: Verifies if heketi is running
heketi_volumes_count: Number of volumes on cluster

 

 

Populating Heketi metrics in Prometheus requires additional configuration of the Heketi service. You must add the bolded annotations using the following commands:

# oc annotate svc heketi-storage prometheus.io/scheme=http
# oc annotate svc heketi-storage prometheus.io/scrape=true
# oc describe svc heketi-storage
Name:           heketi-storage
Namespace:      app-storage
Labels:         glusterfs=heketi-storage-service
                heketi=storage-service
Annotations:    description=Exposes Heketi service
                prometheus.io/scheme=http
                prometheus.io/scrape=true
Selector:       glusterfs=heketi-storage-pod
Type:           ClusterIP
IP:             172.30.90.87
Port:           heketi  8080/TCP
TargetPort:     8080/TCP

Populating Heketi metrics in Prometheus also requires additional configuration of the Prometheus configmap. As shown in the following, you must modify the Prometheus configmap with the namespace of Hekti service and restart prometheus-0 pod:

# oc get svc --all-namespaces | grep heketi
appstorage       heketi-storage       ClusterIP 172.30.90.87  <none>  8080/TCP
# oc get cm prometheus -o yaml -n openshift-metrics
....
- job_name: 'kubernetes-service-endpoints'
   ...
   relabel_configs:
     # only scrape infrastructure components
     - source_labels: [__meta_kubernetes_namespace]
       action: keep
       regex: 'default|logging|metrics|kube-.+|openshift|openshift-.+|app-storage'
# oc scale --replicas=0 statefulset.apps/prometheus
# oc scale --replicas=1 statefulset.apps/prometheus

Online expansion of GlusterFS volumes and custom naming

First, let’s discuss what’s needed to allow expansion of GlusterFS volumes. This opt-in feature is enabled by configuring the StorageClass for OCS with the parameter allowVolumeExpansion set to “true,” enabling the feature gate ExpandPersistentVolumes. You can now dynamically resize storage volumes attached to containerized applications without needing to first detach and then attach a storage volume with increased capacity, which enhances application availability and uptime.

Enable the ExpandPersistentVolumes feature gate on all master nodes:

# vim /etc/origin/master/master-config.yaml
kubernetesMasterConfig:
  apiServerArguments:
    feature-gates:
    - ExpandPersistentVolumes=true
# /usr/local/bin/master-restart api
# /usr/local/bin/master-restart controllers

This release also supports adding a custom volume name prefix created with the volume name prefix, project name/namespace, claim name, and UUID (<myPrefix>_<namespace>_<claimname>_UUID). Parameterizing the StorageClass ( `volumenameprefix: myPrefix`) allows easier identification of volumes in the GlusterFS backend.

The new OCS PVs will be created with the volume name prefix, project name/namespace, claim name, and UUID (<myPrefix>_<namespace>_<claimname>_UUID), making it easier for you to automate day-2 admin tasks like backup and recovery, applying policies based on pre-ordained volume nomenclature, and other day-2 housekeeping tasks.

In this StorageClass, support for both online expansion of OCS/GlusterFS PVs and custom volume naming has been added.

# oc get sc glusterfs-storage -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: glusterfs-storage
parameters:
  resturl: http://heketi-storage-storage.apps.ose-master.example.com
  restuser: admin
  secretName: heketi-storage-admin-secret
  secretNamespace: storage
  volumenameprefix: gf 
allowVolumeExpansion: true 
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete

❶ Custom volume name support: <volumenameprefixstring>_<namespace>_<claimname>_UUID
Parameter needed for online expansion or resize of GlusterFS PVs

Be aware that PV expansion is not supported for block volumes, only for file volumes.

Expanding a volume starts with editing the PVC field “requests:storage” with the new expanded size for the PersistentVolume. For example, we have 1GiB PV, we want to expand the PV to 2GiB. To expand/resize PV to 2GiB, edit the PVC field “requests:storage” with the new value. The PV will be automatically resized to 2GiB. The new 2GiB size will be reflected in OCP, heketi-cli, and gluster commands. The expansion process creates another replica set and converts the 3-way replicated volume to distributed-replicated volume, 2×3 instead of 1×3 bricks.

GlusterFS arbiter volumes

Arbiter volume support is new to OCS 3.10 and has the following advantages:

  • An arbiter volume is still a 3-way replicated volume for highly available storage.
  • Arbiter bricks do not store file data; they only store file names, structure, and metadata.
  • Arbiter uses client quorum to compare this metadata with metadata of other nodes to ensure consistency of the volume and prevent split brain conditions.
  • Using Heketi commands, it is possible to control arbiter brick placement using tagging so that all arbiter bricks are on the same node.
  • With control of arbiter brick placement, the ‘arbiter’ node can have limited storage compared to other nodes in the cluster.

The following example has two gluster volumes configured across 5 nodes to create two 3-way arbitrated replicated volumes, with the arbiter bricks on a dedicated arbiter node.

In order to use arbiter volumes with OCP workloads, an additional parameter must be added to the GlusterFS StorageClass, user.heketi.arbiter true. In this StorageClass, support for the online expansion of GlusterFS PVs, custom volume naming, and arbiter volumes have been added.

# oc get sc glusterfs-storage -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: glusterfs-storage
parameters:
  resturl: http://heketi-storage-storage.apps.ose-master.example.com
  restuser: admin
  secretName: heketi-storage-admin-secret
  secretNamespace: storage
  volumenameprefix: gf 
  volumeoptions: user.heketi.arbiter true ❸
allowVolumeExpansion: true 
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete

❶ Custom volume name support: <volumenameprefixstring>_<namespace>_<claimname>_UUID
Parameter needed for online expansion or resize of GlusterFS volumes
❸ Enable arbiter volume support in the StorageClass. All the PVs created from this StorageClass will be 3-way arbitrated replicated volume.

Want to learn more?

For hands-on experience combining OpenShift and OCS, check out our test drive, a free, in-browser lab experience that walks you through using both. Also, check out  this short video explaining why using OCS with OpenShift is the right choice for the container storage infrastructure. For details on running OCS 3.10 with OCP 3.10, click here.

Breaking down data silos with Red Hat infrastructure

By Brent Compton, Senior Director, Technical Marketing, Red Hat Cloud Storage and HCI

Breaking down barriers to innovation.
Breaking down data silos.

These are arguably two of the top items on many enterprises’ wish lists. In the world of analytics infrastructure, people have described a solution to these needs as “multi-tenant workload isolation with shared storage.” Several public-cloud-based analytics solutions exist to provide this. However, many large Red Hat customers are doing large-scale analytics in their own data centers and were unable to solve these problems with their on-premises analytic infrastructure solutions. They turned to Red Hat private cloud platforms as their analytics infrastructure and achieved just this: multi-tenant workload isolation with shared storage. To be clear, Red Hat is not providing these customers with analytics tools. Instead, it is welcoming these analytics tools onto the same Red Hat infrastructure platforms running much of the rest of their other enterprise workloads.

Traditional on-premises analytics infrastructures do not provide on-demand provisioning for short-running analytics workloads, frequently needed by data scientists. In addition, traditional HDFS-based infrastructures do not share storage between analytics clusters. As such, traditional analytics infrastructures often don’t meet the competing needs of multiple teams needing different types of clusters, all with access to common data sets. Individual teams can end up competing for the same set of cluster resources, causing congestion in busy analytics clusters, leading to frustration and delays in getting insights from their data.

As a result, a team may demand their own separate analytics cluster so their jobs aren’t competing for resources with other teams, and so they can tailor their cluster to their own workload needs. Without a shared storage repository, this can lead to multiple analytic cluster silos, each with its own copy of data. Net result? Cost duplication and the burden of maintaining and tracking multiple data set copies.

An answer to these challenges? Bring your analytics workloads onto a common, scalable infrastructure.

Red Hat has seen customers solve these challenges by breaking down traditional Hadoop silos and bringing analytics workloads onto a common, private cloud infrastructure running in today’s enterprise datacenters. At its core is Red Hat Ceph Storage, our massively scalable, software-defined object storage platform, which enables organizations to more easily share large-scale data sets between analytics clusters. The on-demand provisioning of virtualized analytics clusters is enabled through Red Hat OpenStack Platform. Additionally, early adopters are deploying Apache Spark in kubernetes-orchestrated, container-based clusters via Red Hat OpenShift Container Platform. Delivery and support are provided by the IT experts at Red Hat Consulting based on documented leading practices to help establish an optimal architecture for our clients’ unique requirements.

Key benefits to customers

Agility

  • Get answers faster. By enabling teams to elastically provision their own dedicated analytics compute resources via Red Hat OpenStack Platform, teams have avoided cluster resource competition in order to better meet service-level agreements (SLAs). And teams can spin up these new analytics clusters without lengthy data-hydration delays (made possible by accessing shared data sets on Red Hat Ceph Storage).
  • Remove roadblocks. Empower teams of data scientists to use the analytics tools/versions they need through dynamically provisioned data labs and workload clusters (while still accessing shared data sets).
  • Hybrid cloud versatility. Enable your query authors to use the same S3 syntax in their queries, whether running on a private cloud or public cloud. Spark and other popular analytics tools can use the Hadoop S3A client to access data in S3-compatible object storage, in place of native HDFS. Ceph is the most popular S3-compatible open-source object storage backend for OpenStack.

Cost/risk reduction

  • Cut costs associated with data set duplication. In traditional Hadoop/Spark HDFS clusters, data is not shared. If a data scientist wants to analyze data sets that exists in two different clusters, they may need to copy data sets from one cluster to the other. This can result in duplicate costs for multi-PB data sets that must be copied among many analytics clusters.
  • Reduce risks of maintaining duplicate data sets. Duplicate data-set maintenance can be time-consuming and prone to error, but it can also result in incomplete or inaccurate insights being derived from stale data.
  • Scale costs based on requirements. In traditional Hadoop/Spark HDFS clusters, capacity is added by procuring more HDFS nodes with a fixed ratio of CPU and storage capacity. With Red Hat data analytics infrastructure, customers can provision compute servers separately from a common storage pool and thus can scale each resource according to need. By freeing storage capacity from compute cores previously locked together, companies can scale storage capacity costs independently of compute costs according to need.

Innovation for today’s data needs

As data continues to grow, organizations should have a supporting infrastructure that can break down data silos and enable teams to access and use information in more agile ways. Red Hat platforms can foster greater agility, efficiency, and savings–a nice combination for today’s data-driven organizations looking to build analytics applications across the open hybrid cloud.

You can also find our blog post that covers other news from the Strata conference and upstream community projects here. For more details on empirical test results, see here. For a video whiteboard of these topics, see here. Finally, To learn more, visit www.redhat.com/bigdata.

 

Introducing Red Hat Gluster Storage 3.4: Feature overview

By Anand Paladugu, Principal Product Manager

We’re pleased to announce that Red Hat Gluster Storage 3.4 is now Generally Available!

Since this release is a full rebase with the upstream, it consolidates many bug fixes, thus giving you a greater degree of overall stability for both container storage and traditional file serving use cases. Given that Red Hat OpenShift Container Storage is based on Red Hat Gluster Storage, these fixes will also be embedded in the 3.10 release of OpenShift Container Storage. To enable you to refresh your Red Hat Enterprise Linux (RHEL) 6-based Red Hat Gluster Storage installations, this release supports upgrading your Red Hat Gluster Storage servers from RHEL 6 to RHEL 7. Last, you can now deploy Red Hat Gluster Storage Web Administrator with minimal resources, which also offers robust and feature-rich monitoring capabilities.

Here is an overview of the new features delivered in Red Hat Gluster Storage 3.4:

Support for upgrading Red Hat Gluster Storage from RHEL 6 to RHEL 7

Many customers like to ensure they’re on the latest and greatest RHEL in their infrastructures. Two scenarios are now supported for upgrading RHEL servers in a Red Hat Gluster Storage deployment from RHEL 6 to RHEL 7:

  1. Red Hat Gluster Storage version is <= 3.3.x and the underlying RHEL version is <= latest version of 6.x. The upgrade process updates Red Hat Gluster Storage to version 3.4 and the underlying RHEL version to the latest version of RHEL 7.
  2. Red Hat Gluster Storage version is 3.4 and the underlying RHEL version is the latest version of 6.x. The upgrade process keeps the Red Hat Gluster Storage version at 3.4 and upgrades the underlying RHEL version to the latest version of RHEL 7.

MacOS client support

Mac workstations continue to make inroads into corporate infrastructures. Red Hat Gluster Storage 3.4 supports MacOS as a Server Message Block (SMB) client and thereby allows customers to map SMB shares backed by Red Hat Gluster Storage in the MAC finder tool.

Punch hole support for third-party applications

The “punch hole” feature provides the benefit of freeing up physical disk space when portions of a file are de-referenced. For example, suppose you’ve used up 20 Gigs of your disk space for backing up a file, and some portions of the file are de-referenced due to data duplication. Without punch hole support, the 20 Gigs remain occupied in the underlying physical hard disk. With support for punch holes, however, third-party applications can “punch a hole” corresponding to the portions of the deleted files, thereby freeing up physical disk space. This further helps to reduce storage costs associated with backing up and archiving those virtual machines (VMs).

Subdirectory exports using the Gluster Fuse protocol now fully supported

Beginning with Red Hat Gluster Storage 3.4, subdirectory export using Fuse is now fully supported. This feature provides namespace isolation where a single Gluster volume can be shared to many clients, and they can be mounting only a subset of the volume (namespace) (i.e., a subdirectory). You can also export a subdirectory of the already exported volume, to utilize space left in the volume for a different project.

Red Hat Gluster Storage web admin enhancements

The Web Administration tool delivers browser-based graphing, trending, monitoring, and alerting for Red Hat Gluster Storage in the enterprise. This latest Red Hat Gluster Storage release optimizes this web admin tool to consume fewer resources and allow greater scaling to monitor larger clusters than in the past.

Faster directory lookups using the Gluster NFS-Ganesha server

In Red Hat Gluster Storage 3.4, the Readdirp API is extended and enhanced to return handles along with directory stats as part of its reply, thereby reducing NFS operations latency.

In internal testing, performance gains were noticed for all directory operations when compared to Red Hat Gluster Storage 3.3.1. For example, make directory operations improved by up to 31%, file create operations have improved by up to 42%, and file read operations have improved by up to 150%.

Want to learn more?

For hands-on experience with Red Hat Gluster Storage, check out our test drive.