Cassandra’s Europe Summit – The Keynote – Extra Scaling

cassandraeyeAt the opening of the conference day at Cassandra Summit Europe 2013, Johnathan Ellis, Datastax CTO, made a point of positioning Apache Cassandra as an enterprise scalable database and one that scales in a linear fashion to massive scales. Datastax is the leading developer of, and commercial vendor of Apache Cassandra in the form of DataStax enterprise.

MongoDB was very much in the company’s sights as it showed benchmarks with Cassandra running 20 times faster than MongoDB – the reason was simple though the dataset for the benchmark was bigger than the available memory on the nodes. While MongoDB performs well with the dataset in memory, Ellis says most customers want their hot-data in memory and their cold-data on disk and thats where Cassandra has the advantage with a balanced approach to memory and disk.

Away from the benchmarking, Ellis described this years focus for Cassandra as having been on was of use. That meant enhanced CQL, the Cassandra Query Language, a new CQL protocol for language drivers, more emphasis on features like tracing, lightweight transactions for the 1% of cases that need it and cursors to reduce query complexity.

Internal enhancements were equally important though. For example, 2.0 took back control of a lot of memory management in Cassandra, from the JVM and over to a more traditionally manually handled memory manager tuned for Cassandra’s needs. This has allowed lots of data structures to reside more efficiently in memory improving performance.

Next week will see the release of Cassandra 2.0.2 which will add what the DataStax people call “rapid read protection”. This means that when a query goes out to a cluster, rather than waiting until a node times out to return an error, the system will look for return times that are out of the ordinary (in the 99th percentile) and return an error on them early. This should make the ability to respond to nodes over-paused in GC or suffering some other performance hit.

Ellis also talked about Cassandra 2.1 which is pencilled in for January 2014. This will see nesting and collection indexing added to the database. The filtering inside the Cassandra software should also be improved with a new combination of pessimistic allocation and smarter estimates of required space using HyperLogLog to work out what data overlaps between sets. Ellis described his slides in this though as “hand wavy” as there was no code written yet and asked “Don’t send me hate mail…” if it didn’t make 2.1.

DataStax’s own certified DataStax Enterprise is set to move to a Cassandra 2.0 base by the end of the year.