What is the most valuable asset of an organization? Hint: think Big Data.


Sometimes it pays to go back to the basics and ask the fundamental questions, such as, what, exactly, is Big Data?

In this latest Storage Tutorial, Brian Chang, Red Hat and Syed Rasheed, JBoss Middleware Solutions Marketing Manager, look at these basic questions. Some of which are easy to answer, such as our approach to Big Data, namely that databases, warehouses, as well as data from social media and mobile devices are a source for ongoing discovery and insight. And some of which are much trickier to answer, such as when did Small Data become Big Data?

Come listen to this friendly chat and get an easy introduction to Big Data and its invaluable place – and use – in any organization.

Big news: Introducing Red Hat Ceph Storage and Red Hat Gluster Storage

We’re pleased to announce our unified open software-defined storage portfolio, which brings together Red Hat Ceph Storage, formerly known as Inktank Ceph Enterprise, and Red Hat Gluster Storage, formerly known as Red Hat Storage Server. This unified Red Hat Storage portfolio helps enterprises manage their current and emerging data storage workloads using open source software and standard hardware.




Today’s announcement is an important milestone in the continued momentum of Red Hat’s charter to bring open software-defined storage to enterprises that began with the acquisition of Gluster, Inc., in October 2011, and continued with the acquisition of Inktank, Inc., provider of Ceph, in May 2014. The product developed by Inktank has gone through Red Hat’s quality engineering processes and is now a fully-supported Red Hat solution, re-branded as Red Hat Ceph Storage.

Read the full press release, which has details about each of the Red Hat Storage offerings, here.

  • For more information about Red Hat Ceph Storage, click here
  • For more information about Red Hat Gluster Storage, click here
  • Visit us on Facebook here
  • Follow us on Twitter here
  • Watch our videos on YouTube here

Storage Tutorial: Live from Spark Summit East with Continuum

This Storage Tutorial was filmed live at Spark Summit East.


Our host, Brian Chang, is joined by Peter Wang, president of Continuum, along with show regulars Irshad Raihan and Greg Kleiman of Red Hat Big Data. Peter fills the group in about what buzz he is hearing at the conference as well as what sorts of big data use cases he’s seeing best supported on Spark. Read on for an excerpt of the conversation, but check out the video for the full discussion.

What is Continuum analytics?
Continuum Analytics supports the use of open source data science tools, primarily around the python programming language. Many of the core libraries in Python for data and scientific computing were written by principles at Continuum, and we’ve been heavily involved in PyData and promoting the use of Python for data and analytics

Tell our viewers about what is Spark and what are you hearing about Spark at this conference?
It’s very exciting, this is my first time at the summit! I’m really excited to see the energy around the technology stack and around the things happening with Spark. The most interesting thing for me is, the Python world has been involved in high end, very large data science and data analytics workloads for a long time, but the rise of Hadoop was a separate sort of thing. Python and R were outsiders in the Hadoop ecosystem. What we’re seeing with Spark that’s interesting, is that they are working really hard to ensure Python and R are native in the technology stack. It goes to the design of the underlying components in Spark even, whether it is a scheduler, or the resilient data structure, all these things in Spark are exposed nicely in Python.

There’s great energy, great buzz here. The show floor is certainly smaller than Strata+Hadoop, so you feel like this is an event that will grow as time goes on. A lot of the energy behind Spark is, because it has taken the storage efficiencies of Hadoop and made that more accessible to a wider audience. A lot of people were not thrilled about doing MapReduce jobs in Java, they’d rather do them in Python, but that connection was tenuous. But now with Spark and with Python behind a first class citizen in the Spark ecosystem, a lot of people, at least the Python folks I’ve talked to with Hadoop workloads, they are excited about that.

Tell us about those high-end workloads you talked about?
There are a lot of people doing traditional cluster level workloads using Red Hat in the cluster, and use Python to drive the computation. As Hadoop has emerged and Spark has emerged on top of Hadoop, we’re seeing a lot of these people doing exploratory data science and analytics with Python on a workstation, but then they have to port to larger scale equipment. There’s a workflow impediment, a mismatch there, between the workload they can do on their machine, which doesn’t have a petabyte of storage attached to it. After they do the work on the subset and do the work on scale, that moving back and forth, we’ve built tools like Anaconda cluster that eases the transitions, but the actual storage of the bits….at the end of the day, we all know that when you do computation at scale you have to move code to data.

So where the data sits, that’s an important place. How the data is formatted, what file systems, what walled gardens are built around it, those limit what you can do. It’s unfortunate. Your storage should be flexible, it should give you scale and resiliency without limiting what you can do.

Watch the video for more of the conversation!

Architects must choose: Will they fill their datacenters with pets or cattle?

Today we’ve got an analogy for you. It has to do with your IT infrastructure, and whether you aim to fill your datacenter with expensive, proprietary equipment and software or a better, more efficient alternative. It comes to you via a Q&A with Red Hat’s Brent Compton, director of storage and big data.

So sit back, grab some lemonade, don your 10-gallon hat and spurs, and read on, pardner!


What are pets and cattle and how do they relate to storage?
Pets and cattle are a data center metaphor coined by OpenStack. Pets equate to old school data while cattle equates to new school data.

What are pets?
Think about how much coddling you give your pets. This is analogous to big proprietary traditional data centers with their big, scale-up storage servers that consume a lot of resources, require a lot of administrative time and effort, and need constant coddling, much as you do with a pet.

What are the cattle?
Hyperscale companies – Google, Apple, Facebook, and so on – couldn’t scale, so they fundamentally altered the way data centers work so they function like cattle. For, while tradition data centers offer custom treatments, such as you do with pets, these organizations expect individual nodes to fail, making them, much like cattle, replaceable.

How is does this relate to Red Hat?
Red Hat is a driving force behind the switch from high-cost pets to commodity cattle as it relates right now in the storage industry.

Is this structured data or unstructured data?
Both are growing. Structured data, which is based on relational databases, continues to grow. But unstructured, or semi-structured data – such as when someone takes a selfie or records a video of their toddler – is exploding.

Where is Red Hat Storage in this?
Red Hat Storage is at the nexus of a lot of this. X86 server plus data services software is a compelling commodity data storage combination.

And the Cricket World Cup Champion is…

Don’t sweat it if you’re initial reaction was “The Cricket World Cup is on?”

You’re not alone.

"CWC Aus v Eng at the MCG", Creative Commons Copyright Tourism Victoria.

“CWC Aus v Eng at the MCG” Creative Commons Copyright Tourism Victoria.

Yes, baseball’s poor cousin is throwing its once-in-four-years party, in the Australasian continent down under. Cricket may not be a popular sport in North America but get this: the game between arch rivals India & Pakistan last month was watched by a billion people worldwide!

So what is cricket, you ask. Well, you take baseball and slow it down – if that’s even possible. Then you slow it down some more. Take away two bases and replace the diamond with a somewhat circular ground, and you get a sport that started out as a past time of the English upper crust. How else do you explain a tea break in the middle of an inning?

These days though the game has evolved from its white flannel roots, and has since spread through many of the original British colonies. Money has poured in. Player contracts run into the millions. Flashy team names and uniforms are common. The game itself has been shortened considerably to accommodate work schedules and television rights. The original five day format, considered by many as still the only true form of the game, still exists but the shorter four and eight hour formats have gained favor, especially amongst the smartphone generation.

The cricket ball – harder and heavier than a baseball – contains an iron core and is covered by tightly stitched leather with a pronounced seam that acts like a rudder allowing for curved balls through the air and off the bounce. Yes, bounce!

Pitchers (bowlers) are not only allowed to bounce the ball before reaching the batter (batsman), they are encouraged to do so. Bowlers have a particularly unhygienic (and rather gross) habit of rubbing sweat and saliva on one side of the ball and shining it on their trousers. Flu season, anyone? In any other sport this practice would likely dominate most of the discussion but in cricket it goes unnoticed.

Much of cricket is spent getting the ball back from the catcher (wicketkeeper) to the pitcher. You’d think that the catcher could simply toss it back to the pitcher but, as with all things English, there is an unnecessarily convoluted way. The ball makes its way back to the pitcher in three to five stops, while each fielder joins in the polishing ritual with the aim to add moisture (read germs) and shine to one side of the ball allowing it to swerve more viciously.

As you’d imagine, this Japanese high tea-like routine gives commentators and statisticians a chance to mull on detailed, and sometimes obscure, facts and trends. In fact, growing up in India, knowledge of cricket statistics was valuable currency during school recess. This obsession with measuring anything that can be measured makes cricket fertile ground for big data scientists.

In a previous post, we wrote about how big data has permeated the NFL, from predicting winners to tracking hydration. Much of that science has crept into cricket, especially with the more affluent teams whose entourage can be up to sixty people at a time, only a fourth of them actual players. Team coaches and stats jockeys track everything – every pitch, every hit, every catch, and perhaps every scratch. (Hey, a lot of cricket is played in the humid tropics!)

Coaches even track how pitchers perform in the “nets” (practice areas to simulate a match experience, fenced off by fishing nets). They mine this data for insights on which pitcher to bring on in a particular match situation given their propensity to pitch certain types of curved balls.

Like most sports, every aspect of cricket has gone digital. This influx of data has helped organizers plan more effectively, players perform more consistently, and fans enjoy more thoroughly. Read previous posts on this blog that deep dive into the challenges posed by the growing volumes of data and real time analysis, as well Red Hat solutions to address those challenges in an increasingly data hungry sports world.

So who will be crowned world champions in 2015? Going into the tournament, South Africa looked like favorites but losses against India and Pakistan have pegged them back. The final four are likely going to be India, Australia, New Zealand, and South Africa. Our money is on a Australia-South Africa final, with a solid edge to the home team.. Either way we suggest you tune in for a great spectacle in the Melbourne twilight on March 29. Howzat?


Get every new post delivered to your Inbox.

Join 3,174 other followers

%d bloggers like this: