Containers, Big Data, and Software Defined Storage are three of the hottest trends in the IT landscape today, and a new project at Capital One combines all three. The Analytics Garage, seen in this DockerCon 2015 presentation by Santosh Bardwaj, Sr. Director Big Data Platform and Engineering at Capital One, is a truly ground breaking concept that leverages the agility of containers and software defined storage to offer a big data analytics applications development platform that is self-service, scalable, and fast.
Over the last decade, Capital One has grown into one of the giants of the financial markets. With the completion of the ING acquisition, they are now one of the top ten largest banks in the US. One of the reasons for their growth and success is their focus on analytics from the early days. Industry analyst Timothy Prickett Morgan wrote a great article on the history of analytics at Capital One with details on the new Analytics Garage. In particular, he writes about how an open, agile infrastructure can enable better business outcomes for big data projects, a view central to the Red Hat big data philosophy.
The need for the Analytics Garage arose from the fact that Capital One associates needed to be able to evaluate a number of data analysis tools for their day to day roles. Developers needed to be able to build, test, and iterate on applications using a modular approach. Capital One determined that the most cost effective and flexible solution for them was to provision containerized microservices that could enable the continuous testing of tools to allow for self-service evaluation of numerous tools available to users.
There are a couple of interesting elements of Capital One’s Analytics Garage. First, it epitomizes the new model of provisioning a buffet of microservices, served out via containers that can be used by applications or users to build more complex constructs. Rather than create individual microservice containers for each of the 30-odd analytics tools, Capital One decided to build an uber container to cut down the complexity of container management – all without adversely affecting performance. In fact, some of their published benchmarks show performance comparable to bare metal and much better than virtualized environments!
Second, the Analytics Garage is a poster child for a cutting edge analytics platform built on an open source software stack, including Red Hat Gluster Storage and Red Hat Enterprise Linux as the underlying file storage layer and operating system environment, respectively. Docker containers run on a Marathon framework that allows for resource allocation across analytic jobs on Hadoop and Spark clusters that are in turn connected via high speed networking, as shown in the slides from Santosh’s DockerCon presentation here.
Containerized applications are still in their infancy in terms of full-fledged enterprise adoption. Most interesting applications of the technology in production today are limited to leaders like Capital One. However, it’s a space that is moving rapidly and is of great interest to us in Red Hat Storage since persistent storage is a critical checkbox that will need to be addressed in the bid for enterprise status. Read our blog on how Red Hat Storage offers two compelling options for persistent for containerized applications.
We will continue to watch the evolution of the Captial One Analytics Garage over the coming months, as I’m sure they will update us through sessions at conferences on containers. I guess an apt Capital One inspired ice breaker at those events might be: “What’s in your container?”.