The Red Hat Ceph Storage/Supermicro reference architecture has arrived

If you’re building public or private clouds—or simply need massively scalable, flexible storage designed for the cloud—you’ve probably heard about Ceph. But what you may not have heard is that there’s now a reference architecture covering configurations and benchmarks results for Red Hat Ceph Storage on Supermicro storage servers. For real!

supermicro

Ceph has become popular with public and private cloud builders who see the benefits of a unified storage platform over silos of data spread across multiple storage systems. To build today’s cloud infrastructures, though, businesses must address the needs of varied workload demands with their storage systems, including:

  • Throughput-optimized IO
  • Cost/capacity-optimized IO

One of Ceph’s core strengths is the ability to provision different storage pools for different IO categories. Each of these storage pools can be deployed on hardware infrastructure optimized for that type of IO. To identify optimal hardware configurations for Ceph pools serving several common workload IO categories, the Red Hat Ceph Storage team collaborated with Supermicro, Mellanox, Intel, and Seagate to create this newly published reference architecture.

So if you need performance, capacity, and sizing guidance for Red Hat Ceph Storage on Supermicro storage servers, read on. And if you want to buy the lab-validated configurations you read about, learn more here.

Learn about Capital One’s Analytics Garage, built on Docker and Red Hat

capitaloneTWCard

Containers, Big Data, and Software Defined Storage are three of the hottest trends in the IT landscape today, and a new project at Capital One combines all three. The Analytics Garage, seen in this DockerCon 2015 presentation by Santosh Bardwaj, Sr. Director Big Data Platform and Engineering at Capital One, is a truly ground breaking concept that leverages the agility of containers and software defined storage to offer a big data analytics applications development platform that is self-service, scalable, and fast.

Over the last decade, Capital One has grown into one of the giants of the financial markets. With the completion of the ING acquisition, they are now one of the top ten largest banks in the US. One of the reasons for their growth and success is their focus on analytics from the early days. Industry analyst Timothy Prickett Morgan wrote a great article on the history of analytics at Capital One with details on the new Analytics Garage. In particular, he writes about how an open, agile infrastructure can enable better business outcomes for big data projects, a view central to the Red Hat big data philosophy.

The need for the Analytics Garage arose from the fact that Capital One associates needed to be able to evaluate a number of data analysis tools for their day to day roles. Developers needed to be able to build, test, and iterate on applications using a modular approach. Capital One determined that the most cost effective and flexible solution for them was to provision containerized microservices that could enable the continuous testing of tools to allow for self-service evaluation of numerous tools available to users.

There are a couple of interesting elements of Capital One’s Analytics Garage. First, it epitomizes the new model of provisioning a buffet of microservices, served out via containers that can be used by applications or users to build more complex constructs. Rather than create individual microservice containers for each of the 30-odd analytics tools, Capital One decided to build an uber container to cut down the complexity of container management – all without adversely affecting performance. In fact, some of their published benchmarks show performance comparable to bare metal and much better than virtualized environments!

Second, the Analytics Garage is a poster child for a cutting edge analytics platform built on an open source software stack, including Red Hat Gluster Storage and Red Hat Enterprise Linux as the underlying file storage layer and operating system environment, respectively. Docker containers run on a Marathon framework that allows for resource allocation across analytic jobs on Hadoop and Spark clusters that are in turn connected via high speed networking, as shown in the slides from Santosh’s DockerCon presentation here.

Containerized applications are still in their infancy in terms of full-fledged enterprise adoption. Most interesting applications of the technology in production today are limited to leaders like Capital One. However, it’s a space that is moving rapidly and is of great interest to us in Red Hat Storage since persistent storage is a critical checkbox that will need to be addressed in the bid for enterprise status. Read our blog on how Red Hat Storage offers two compelling options for persistent for containerized applications.

We will continue to watch the evolution of the Captial One Analytics Garage over the coming months, as I’m sure they will update us through sessions at conferences on containers. I guess an apt Capital One inspired ice breaker at those events might be: “What’s in your container?”.

Visit our Slideshare page for presentations from Red Hat Summit 2015 and everything you’ve ever wanted to know about storage

For easy access to a wealth of Red Hat Storage information, from product updates and use case insights to industry best practices, check out the Red Hat Storage Slideshare page. With presentations going back a few years you’ll be able to find just about anything – including the presentations used during select sessions at Red Hat Summit 2015. Below is just a sampling of some of the presentations you’ll find.

This presentation will teach you how RADOS Block Devices (RBD) work, including how RBD uses RADOS classes to make access easier from user space and within the Linux kernel, how it implements thin provisioning, and how it builds on RADOS self-managed snapshots for cloning and differential backups.

Here’s an overview of performance-related developments in Red Hat Gluster Storage 3 and best practices for testing, sizing, configuration, and tuning.

If you want a thorough introduction to Red Hat Storage, check this presentation to learn everything from how to install Red Hat Gluster Storage to how to configure disks, link storage nodes, and more.

View this presentation to learn about erasure code in Ceph — learn about erasure code logic, understand erasure code plug-ins, and learn how to estimate the trade off between erasure coding and replication.
Do you have a favorite presentation? We’d love to know which, and why. Leave a note in the comments!

Learn how Ansible can reduce the time to roll out Ceph to new clusters from hours to minutes

 

When you put a time consuming challenge with a lot of steps in front of an engineer, their mind invariably strays to “How can I make this faster, easier, better?” We recently spoke with Sébastien Han, senior cloud architect at Red Hat in France, who, like any good engineer, thinks along those lines.

In this case, Sébastien was thinking about the challenges inherent in deploying Red Hat Ceph Storage across numerous nodes, typically a task that requires a commitment of time and concentration for every node that needs to be configured. To expedite — and automate — the process, he uses Ansible (made by Ansible,) an open source tool for orchestration and configuration management that happens to be included in Red Hat Enterprise Linux.

Sébastien recently recorded a video (above) that explains how to set up Ansible to deploy Red Hat Ceph Storage, but we also caught up with him to find out why he recorded it.

Q: Why did you record this video?

A: I deploy Ceph for customers or myself on a daily basis, and because Ansible can make the process faster I wanted to learn it. I’ve found videos are a great way to learn. So I record demos because people tend to be lazy and don’t read documents. If you have a video to listen to, sometimes it is easier. I’m part of this group of people who don’t like to read the documents, too. But I see this every day, people open [support] issues because they don’t read the docs, even if it is four lines. Videos are more lively. I like watching videos. And also to show people it is easy to use and manage and implement. They think, “Ceph is a beast. I don’t have the right tool. I don’t know much about Ansible.” But it is easy.

Q: Why a demo about Ansible and not tools like Puppet or Chef?

A: Usually Puppet and Chef are good for giving you the state of a specific machine, but when you have a set of machines that must interact with each other and you need to do a first action on the first node, second action on the second node, then go back to first node, it gets complicated for Puppet to do this. Or inefficient. Puppet lacks orchestration. Ansible features orchestration, which gives you the flexibility to orchestrate installation to multiple architectures. Ansible runs multiple commands at the same time on the same nodes, is easy to learn, powerful, and written in Python which has become a very popular programming language.

Big companies and startups are moving to Ansible because it is easy to learn, manage, and has built-in functionality that is easy to configure compared to other configuration management systems.

Q: What else should we know about Ansible?

A: Ansible is a good tool to use when deploying Ceph. You can run containerized Ceph containers, even use Ansible to bootstrap them. It can perform live rolling upgrades, or individual host maintenance, without downtime. If you aren’t working under IT restrictions you can also use Ansible to deploy and maintain your Ceph cluster. For anyone who needs to bootstrap and manage the lifecycle of storage clusters or tweak configuration flags or options, Ansible is an invaluable tool because everything can happen automatically and not need any manual adjustment.

The time savings are hard to estimate, but if we keep it simple — like if we have to deploy Ceph on six nodes. Manually, installing packages and configuring configuration files, bootstrapping daemons, it will probably take hours. But I’ve seen instances when Ansible has reduced this time to less than 20 minutes.

Don’t miss the video — or the Ceph test drive!

For a walkthrough on how to use Ansible to deploy Ceph, straight from Sébastien himself, check out the video embedded above. And, if you’d like to take Red Hat Ceph Storage for a free test drive, click here.

 

It’s here: Red Hat Gluster Storage 3.1

Gluster

So remember last month when we announced updates to Red Hat Gluster Storage? Well, we made good: Those updates are here in v3.1.

New features to leverage

Red Hat Gluster Storage 3.1 offers a host of new features you can leverage, including:

  • Erasure coding
  • Tiering
  • Bit-rot detection
  • Active/active NFSv4
  • Enhanced security

This version also includes enhancements to address the data-protection and storage-management challenges confronting users of unstructured and big data storage.

Erasure coding: Delivering data protection

Erasure coding is an advanced data-protection mechanism that can lower the total cost of ownership (TCO) as it reduces the need for RAID. An alternative to RAID, erasure coding reconstructs corrupted or lost data by using information about the data that’s stored elsewhere in the system. Erasure coding provides failure protection beyond just single/double component failure and consumes less space than replication.

Tiering: Streamlining data management

Tiering, now in Tech Preview, offsets the computational expense of moving data between tiers of hot and cold storage, Red Hat Gluster Storage 3.1 automatically assigns or reassigns data a “temperature” based on the frequency of access and promotes or demotes data in a volume so different sub-volume types act as hot and cold tiers. An “attach” operation in the solution converts an existing volume to a “cold” tier in a volume and creates a new “hot” tier in the same volume. The result is one “tiered” volume—thus the feature’s name.

Bit-rot detection: Enhancing data integrity

Bit-rot detection enhances end-to-end data integrity by scanning data periodically to detect the data corruption that arises from silent failures in underlying storage media. Without this feature, the performance and integrity of stored data would slowly deteriorate and eventually severely degrade storage system performance.

Active/active NFSv4: Boosting security and resilience

By supporting active/active NFSv4 via the NFS ganesha project, Red Hat Gluster Storage 3.1 allows users to export Gluster volumes through NFSv4.0 and NFSv3 via NFS Ganesha. A user space implementation of the NFS protocol, NFS Ganesha is very flexible, with simplified failover and failback in case of a node/network failure. The high availability implementation
(that supports up to 16 nodes of active-active NFS heads) uses the corosync and pacemaker infrastructure. Each node has a floating IP address that fails-over to a configured surviving node in case of failure. Failback happens when the failed node comes back online.

Enhanced security: Spanning deployments

Red Hat Gluster Storage 3.1 enhances security via:

  • Support for SELinux in enforcing mode and SSL-based network encryption, increasing security across the deployment
  • Support for active NFSv4, based on the NFS Ganesha project, to provide performance and secure data access through clustered NFSv4 endpoints
  • SMB 3 capabilities, adding protocol negotiation, copy-data offload, and in-flight data encryption to allow for efficient file transfer and secure access in Microsoft Window environments

More features to anticipate

In addition to these features, there’s another thing to note now that Red Hat Gluster Storage 3.1 is generally available. In August, we plan to announce the availability of the product’s RHEL 7-based features. Right now, Red Hat Gluster 3.1 is based on RHEL 6.

The chance to see for yourself

So there’s more to come all the time with Red Hat Gluster Storage. But yearning for more in the meantime? Click here for product details.

Follow

Get every new post delivered to your Inbox.

Join 3,222 other followers

%d bloggers like this: