Many of you, my good Data Science fellows should be hearing about Real-Time since from several years before, but we are in the Era of Information, and in the years of Big Data, and changes happens so quickly that you need to adapt very fast to support the big wave of information. In Analytics, it’s happening the same thing: because if you can answer smarter questions in seconds, you will be able to react quicker to these changes and that’s really matters in these rush times, my dear friends.
I was reading yesterday a great blog post from Derrick Harris, the well known technology journalist from GigaOM where he exposed some good points about Spark, the great technology which is been developed by AMLab from the University of California, Berkeley. But it’s not just Spark, there are some good pieces of technology which are disrupting Analytics field for good. I will try to put you some of my favorite platforms in this post, but I don’t want to repeat information, so I will write just little things and amazing quotes of each platform. Let’s begin.
Spark: A shining light in the Big Wave of Data
From its official site:
Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce. To make programming faster, Spark integrates into the Scala programming language, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala interpreter.
Druid: A wizard of Data
Here at Metamarkets we have developed a web-based analytics console that supports drill-downs and roll-ups of high dimensional data sets — comprising billions of events — in real-time. This is the first of two blog posts introducing Druid, the data store that powers our console. Over the last twelve months, we tried and failed to achieve scale and speed with relational databases (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase). So instead we did something crazy: we rolled our own database. Druid is the distributed, in-memory OLAP data store that resulted.
If you want to learn more, see to Michael E. Driscoll, Metamarkets ‘s CEO and Eric Tschetter, Chief Architect for Druid at StrataConf + Hadood World 2012, watch this:
Pivotal HD: the “Red Bull RB8” of Big Data
Pivotal HD is one of the best pieces of engineering that I’ve ever seen in my life. For that reason I called the “RB8” of this era:
Powered by new data fabrics, Pivotal One is a complete, next generation Enterprise Platform-as-a-Service that makes it possible, for the first time, for the employees of the enterprise to rapidly create consumer-grade applications. To create powerful experiences that serve a consumer in the context of who they are, where they are, and what they are doing in the moment. To store, manage and deliver value from fast, massive data sets. To build, deploy and scale at an unprecedented pace.
You can see the launch video of Pivotal HD Launch, but I think that you have to see the key feature of this platform: HAWQ:
HAWQ is the most complete, performant, and capable SQL engine available on the market right now. It delivers a fully-functional, high performance “True SQL” database and makes SQL a first class citizen in the Hadoop cluster. HAWQ utilizes the same technology that made Greenplum Database the dominant force in MPP database market. It brings in the innovations bought forth by Greenplum Database over the past 10+ years to optimize the MPP database to fully utilize distributed computing and query processing.
SAP HANA One:
Do you want to know more about SAP HANA One? See this video:
Apache Storm
Interested about Storm? Just see this talk about the platform from its creator Nathan Marz (it seems that he is working in a stealth startup.
Perhaps a Storm-battery tested for Enterprise? We will see in the next months what will happen) at Slideshare
MapR’s M7: The Fast and Furious HBase platform
Just see to MapR’s CEO John Schroeder talking about the benefits of the platform in this video:
Conclusions
I think that you have to find out in your mind the conclusions about this WOD (War of Data) and use the best platform for your needs. If I miss some of any platform, just put a comment and I will insert it here. Best wishes my friends, and stay tuned for all this WOD, it’s just beginning. Believe me.