The HBase development team is doing in these days a great job, adding some rock-solid features to this amazing data store. The next release will be 0.96, and it brings great things which I discuss with you righ now. I will expose you here the best features based on my own opinion; I’m open to discussion, so, let a comment to enrich the blog post if you want. OK, let’s start the engine.
HBASE-5313: New low-level file storage format (HFile v3)
The first time that I heard about HFile v3 was when I was reading the release notes for 0.94.0, where it came out with a new low-level file format called HFile v2, improving the block storage for space optimization and better block access and caching. Then, I found a great blog post written by Matteo Bertozzi, a Software Engineer at Cloudera, where he did a great job explaning how HBase I/O works, HFile, the new features in HFile v2 (speed improvement, data block encoding, memory optimization, among others), and the upcoming HFile v3, which is being developed to improve compression, because one of the main things is to join all key together in the beginning of the block, and the values together in the end of the block. This could be beneficial to use different compression algorithms for keys and values, something seemed like Vertica does its data compression using different algorithms based in the type of the field.
HBASE-5521: New columnar encoder/decoder
This JIRA issue is very united to HBASE-5313, because here, they are trying to add a new columnar encoder/decoder, to move compression like a part of this component.
HBASE-5347: GC free memory management in Level-1 Block Cache
Like Prakash Khemani said on the comments of the JIRA: “On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache.
This will help us with:
- reducing gc pressure, especially in the old generation
- making it possible to have non-java-heap memory backing the HFile blocks
HBASE-5674: add support in HBase to overwrite hbase timestamp to a version number during major compaction
I just give you one recommendation here: read the comments on this JIRA; they are very interesting.
Like I said before, HBase 0.96 promises to be a very good release, so, keep it a marked eye above these JIRA issues and you will see the evolution of the platform in first hand.