After several weeks very busy in the job, I´ve had a short amount of time to write a post. In the past days, I have worked with Apache Solr, the de-facto Open Source platform for Enterprise Search applications, because I´m leading a new team for a new Search application for a client, and of course, the first reference is this incredible piece of software. I have to thank to Andy Wibbels (CMO) and Max Bunag (Sales Director — Southwest US, APAC and LATAM) from Lucidworks team, who helped me to find the right content about Solr’s scalability and performace tuning. In this “search”, I found the amazing talks in the past Lucene/Solr Revolution event, focused entirely in Lucene and Solr, and this post is about my favorite talks in this event.
Search Analytics Component, Steve Bower from Bloomberg L.P
You should know that Bloomblerg is the largest provider of financial news and information in the world, so this is one of the best use-cases to see Apache Solr working amazingly well. In this talk, Steve put all the information about the Search Analytics component released by Bloomberg Labs and how they use it in several places of the Search arquitecture in the company. One of the platforms inside Bloomberg is Bloomberg Vault, which is a hosted communication archive, where different financial firms required to keep data from some numbers of years, so they send all data to Bloomberg to a privated cloud, and then, Bloomberg provides Search and Analytics platforms on top of that. We are talking about 80 Billion of documents with average size of 50 KB, and in this incredible group of datasets, Bloomberg Search team has used Search Analytics components in every possible way; so I think you should take a serious look to this component. Steve was explaining the benefits to use the component and the features and differences with the StatsComponent. The relevant JIRA issue for the component is 5302. You can see the talk here: [youtube=http://www.youtube.com/watch?v=d5eGPi4KjiM]
Search Architecture at Evernote, Christian Kohlschütter
This talk from Chris is full of insights about how Evernote is innovating in the Search space using Apache Lucene. This is one of my favorite talks, because I used Evernote in almost every work task I have, so it’s incredible to see that Chris and the Augmented Intelligence team is working hard to delight us. But first, he talked about the numbers behind Evernote:
- Serving more than 100 Million of Users
- 559 shards (200K users per shard) using Linux, Tomcat and MySQL
- 3.2 PB WebDAV-based Storage
- 224 TB SSD Capacity for System, MySQL and Lucene
- 3.1 Billion Notes stored, 3.8 Billion Notes ever created
Chris talked here about the experience of migration from Lucene 2.9 to 4.x, so if you are inmerse in this task, you must see this talks, or you could talk to Chris for tips. Then, he talked about the benefits of index compression around the architecture, and why everyone should use it.
BTW, they are looking for a Senior Software Engineer for the Augmented Intelligence team focused in the Search platform. You can see the talk here: [youtube=http://www.youtube.com/watch?v=drOmahIie6c]
Real-Time Search at Twitter, Michael Busch
This talk is amazing, and the numbers that Michael shared here are insane:
- Tweets-per-second record: one-second peak of 143,199 TPS
- More than 2 Billion search queries per day
Twitter uses two different pipelines: EarlyBird for Real-Time and Archive index, then, throught Blender service, send requests to both services. Then, he went explaining all the architecture and the new Lucene Extensions package entered to play a key role in the architecture. This package has several libraries inside: an abstration layer for Lucene index segments, a Real-Time writer for in-memory index segments, a schema-based Lucene document factory and Real-Time faceting. You can see the talk here: [youtube=http://www.youtube.com/watch?v=F2CHE4VyB3c]
So, again: if you are working with Search, you must see all these talks from these guys, you will feel the incredible level of great tips inside these talks.
So, I just want to let you an invitation to the next Lucene/Solr Revolution event, which will be in Austin, TX in October 13–16. The registration link opens this spring, so stay tuned to the site or follow @LuceneSolrRev or @Lucidworks accounts at Twitter for the most recent news.