Welcome to the final installment of our three-part series of Q&A’s resulting from the launching of Red Hat Storage Server 3. In this part, we’ll be looking at workloads.
What kind of growth in big data do you feel we will see over the next five years?
Analyst estimates for the big data market range from 30% to 45% CAGR but there are a couple trends we hear from our customers that may be more significant than the growth figures. First, unstructured data is already about 80% of all enterprise data and it’s growing faster than traditional structured data. Second, patch fixes to traditional systems do not address the performance, scale, and portability demands of big data workloads. To address big data challenges cost effectively, enterprises are looking towards agile, software-defined infrastructures that are purpose built for big data workloads.
How would the SW defined storage benefit Hadoop infrastructure?
Red Hat Storage Server provides additional choice and flexibility to Hadoop workloads. Hadoop programmers and administrators who are forced to work within the constraints of HDFS often complain that HDFS is not POSIX compatible, it has a single point of failure, and that they would like to avoid cumbersome data movement to and from their storage platform into HDFS for Hadoop analytics. The Hadoop plugin for Red Hat Storage addresses each of those concerns by enabling customers to keep data in-place and run not only Map-Reduce directly on top of Red Hat Storage Server, but also a number of Hadoop management and orchestration software such as Ambari, Oozie, Zookeeper, Sqoop, Flume, etc.
Beyond capacity how do you see the new workloads changing performance requirements of storage how does SDS evolve to address key metrics?
We see the workloads changing performance requirements of storage in two ways. The first is the focus on unstructured data – customers demand that their storage platform be optimized to store and retrieve any form of unstructured data at scale. The second is the approach taken by customers to build hybrid storage stack to address their SLAs. For instance, we find cybersecurity analytics customers resort to a hybrid storage model where software-defined storage is used to build a federated cold storage layer across index servers running direct attached storage. Much of the focus in the open software-defined storage community is to provide advanced data tiering and storage optimization features (such as erasure coding, bit rot detection, and support for NFSv4) to better enable these workloads by addressing metrics around datacenter utilization and security.