We are in the era of Real-Time Analytics

We are in the era of Real-Time Analytics

Data can come to us in some many forms that we sometimes, feel fear about this constant growing. But, more important is what we can do with this data. It doesn’t matter if we have a vast quantity of data if we don’t know how to become this on revenue for the company, and here is when Analytics plays a key role on this. But, not just Analytics, but: Real-Time Analytics.

Time variable is so important in these days, that many of us have a simple formula in our heads: “Time is equal to money”, and by fortune or not, that’s the truth. Real-Time Analytics is becoming in a trend in almost every industry because if you detect more quickly a particular problem, you can have more time to try to solve it.

It doesn’t matter is you are working in Stock trading, Wealth Management, Cloud infrastructure providers, SCADA systems, Real-Time Bidding systems, Marketing analytics, Social gaming; every industry has an market exposure to Real-Time.

I will give you some examples about how important is Real-Time Analytics today with three companies in different businesses:

  • Cloudera (Big Data services provider focused on Hadoop technologies)
  • Boundary (Real-Time Big Data agent-based Monitoring)
  • MarketShare (Provides cross-media analytics solutions for global marketers)

Cloudera enters in the Real-Time market with Impala

If you are working with Apache Hadoop or any project of its rich ecosystem (Sqoop, HBase, Zookeeper, etc); you should have heard the name of Cloudera. Inside its HQ in Palo Alto, CA, work an incredible team of Hadoop commiters (Todd and Harsh for example), fighting the hardest problems in the development of the platform. For this, and many more reasons, Cloudera is well known like one of the leaders in the Big Data market with its products: Cloudera Distribution for Hadoop (CDH like everybody knows) and Cloudera Enterprise, and its rich set of services.

But, you should be wondering: if Apache Hadoop is a batch-based processing platform, Why do you put to Cloudera inside the Real-Time trend? Cloudera Impala is the answer to this. Read the news that came out to my mailbox today:

We are pleased to inform you that today Cloudera has announced Cloudera Impala, the industry’s first real time query framework for Hadoop. This major evolution makes Cloudera Enterprise, the platform for Big Data, the first data management solution that allows both batch and real-time operations to be performed on any type of data — unstructured and structured — within one massively scalable system.To learn more about Cloudera Impala and Cloudera Enterprise, the Platform for Big Data, attend our webinar on Tuesday November 6, 2012 at 10AM PST. Click here to register.
Benefits of Impala:
  • Speed-to-Insight — perform interactive, real-time analysis directly on source data stored in Hadoop
  • Simplicity — interact with data in HDFS and HBase at the “speed of thought” using SQL or existing BI tools
  • Cost Savings — reduce data movement between systems for the purposes of interactive analysis; eliminate double storage between Hadoop, data warehouses and analytical databases
Cloudera Impala is now available for anyone to try as part of our public beta program. For more information on how to participate, click here.

So, tell me, What do you think about this announcement from the Cloudera team? Is not awesome? Cloudera Impala will permit to companies and organizations to answer quickly critical questions to run its business, so I think that this will disrupt the Big Data industry.

Stay tuned to new announcements, because now is taking place the Strata + Hadoop World Conference 2012, the premier Hadoop event of the year. If you want a guide for these days in the conference, read the blog post released in the blog of the company.


This great startup is focused on the IT and Network monitoring business, providing a Software-as-a-Service(SaaS) platform with remarkable data visualization techniques where you can see in Real-Time, the current status of your big data services: it doesn’t matter if is a Hadoop cluster, a Cassandra data store, a Zookeeper distributed system, or your Riak cluster. But, How it works?

A key component of the great Boundary’s software stack is the Meter, an agent-based utility who gather all the relevant information of the server and services, providing great visibility and predictability.

Then, you can see the collectors, a highly distributed nodes platform with global server load balancing(GSLB) capabilities that receive all incoming data directly from the Meters.

But the great approach of Boundary is that the platform shows graphically the current state of your systems in Real-Time, you can look an example seeing this screenshot:

The full software stack used by Boundary is very diverse, and is called the Fauna, by Cliff Moon (current Chief Technology Officer (CTO) of the company and one of its co-founders); and believe me when I say that is “full of great animals”:

  • Erlang: the team use this great language heavily for its amazing concurrent programming capabilities and outstanding performance (It would be nice to see the current use of the language in the company in next Erlang-focused events like past San Francisco Erlang Factory Conference 2012)
  • Zookeeper: the distributed coordination service from the Apache Foundation. This is great because Zookeeper is a highly performance component which can help you to write highly distributed applications, and this is the case for the large scale of Boundary’s infrastructure. If you can see more about the use of Zookeeper at Boundary, look this post
  • Apache Kafka: the highly scalable message queue, who is used for a key component of the streaming system called Phloem, which is the first interaction with collectors nodes.

If you want to see more about the infrastructure, you can follow the GitHub account of the company, or the Cliff’s profile.

If you read until this, you can see that Boundary is a great example of this trend, because with this large infrastructure, the completed cycle to see real statistics of your services is very short (less of a second); so you can imagine the incredible quantity of challenges that face Boundary’s team everyday. If you want to work on this kind of problems, you like Erlang, Distributed Systems and you are obssesed with performance and monitoring, look here.


You are a seasoned marketer, and your CFO (Chief Financial Officer) is requesting to you to put in the table how the budget dedicated to marketing efforts is helping to the organization, but you don’t want to bring justifications, you want to show to him valuable results, fast and strong answers to his questions, but can you do that? MarketShare can help you on this journey.

The amazing team behind MarketShare is focused on three kind of data:

“Where a client is investing its market dollars, what the business outcomes are, and literally hundreds of other variables (e.g., time, weather and price) that could affect those outcomes. And it goes deep in order to determine outcomes. If all you do is track clickthroughs, you might miss that an ad campaign actually resulted in someone opening a piece of mail four months later”

said Wes Nichols (current co-CEO and co-founder of the company) to Derrick Harris from GigaOM, for his article called 5 companies turning your data into dollars where he explained MarketShare and other companies like Acxiom, Applied Predictive Technologies, BloomReach and InsightsOne, can help to your business to generate more revenue using your own data.

So, Which is the core of the business of MarketShare? This great team of experienced businesses managers, recognized assistant marketing professors, data scientists and techologists can help to organization to measure in almost Real-Time the effectiveness of their marketing strategy, offering the industry leading cross-media analytics solutions for global marketers.

Do you think that this is cool? More than half of the Fortune 50 companies think that too, trusting in the solutions delivered by MarketShare to measuring the Return On Investment (ROI), a very important variable today for every Chief Marketing Officer (CMO). So, you can see that MarketShare is a leader in the marketing mix modeling market, using great pieces of technologies on top of the AWS platform, who precisely selected to MarketShare like a Advanced Consulting Partner, and in that way, entering in the selected group of companies that are inside of the Amazon Web Services Partner Network (APN).

But, What means this? It means that you use the solutions by MarketShare on top of AWS, providing a big data analytics platform giving to marketers highly accurate attribution analysis. These solutions are:

If you can read more about MarketShare, you can download the whitepaper released by Forrester Research called: “The Forrester WaveTM: Marketing Mix Modeling, Q3 2011” where Luca S. Paderni described why MarketShare is a leader in the marketing mix modeling market.

So, if you are a Data Scientist and you want to work with several of the most brilliant minds in the marketing industry, give a deep look to the careers section of the site and find a position according to your knowledge. They use heavily Hadoop, R, MATLAB, and more cool things for Data Science professionals, so it should be challenging, interesting and a lot of fun all time. Its HQ is in Los Angeles, but they have offices in San Francisco, New York, London, Tokio and Bangalore.

Final Thoughts

So, you should have noted that Real-Time Analytics is a key component in almost every industry, and Cloudera, Boundary and MarketShare have found great ways to insert its products in this hard economy; so my advice is that you should take note of these great teams and think again in the phrase: “Time is equal to money in almost every industry” and build your platforms with a deep look to real-time architectures. Best wishes and Happy hacking !!!

Marcos Luis Ortíz Valmasedaabout.me/marcosortiz@marcosluis2186

Marcos Ortiz

Marcos Ortiz