Turning Bitcask into a Caching solution

Haven’t updated the blog for a while as sth interesting has kept me busy. We were trying to find or create a better caching solution for application development that would be: fast, large and simple.

Whether or not you’ve used Riak, or know about Erlang language, Bitcask is sth you might appreciate as a candidate key/value persistence storage solution. Caching is to some extent very similar to key/value persistence storage, well, not the persistence part you might argue, but the truth is, the longer those cached entries stay in the system, the better the caching result. Bitcask is amazing as it allows fast writes, reads even when we expand the key/value towards hard disks. When we were trying to create a large caching solution, it certainly provides lots of references, the sequential writes, lookups flow, all fit into a caching system nicely.

The drawback was obvious too, the meta must fit into RAM, somewhat difficult for a java application which might cache millions of entries or more. After all, if that doesn’t cause your GC hangs and performance issues, you might be happy with Guava caching already. Therefore we did some major modifications to use memory mapped files to hold all the metas and get them simplified as much as a LONG typed token.

While we strived to provide none-blocking reads, and consistent writes, another key/value storage solution provided even more value than Bitcask, that is LevelDB, although young and very much targeted to database kind of operations, one idea just couldn’t be ignored, the background optimizations, LevelDB uses snappy compression, all-way merge and sort to compact the large data set all in the background, it’s inspiring to us as we see that from a caching system, there’re essentially only 2 operations needed: Get & Put, every updates, delete, get is a form of those 2, and to optimize all the operations, we decided to extract everything that could be asynchronously done out from those 2 operations to keep them as minimal as possible. A good amount of optimizations, even functions were successfully extracted as background tasks, which would be run and monitored, but never, ever block the read/write requests.

To name a few: segment split post remedy task, invalidation, update compaction task, file compression task, byte buffer predicted allocation task, file read block size anticipation task etc. More could be experimented as we made the architecture robust and open to such async tasks.

This is still a POC phase, but we already see promising numbers, will update more when the result gets consolidated.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s