Just released: Socket.IO 1.0, Git 2.0 and OrientDB 1.7 – Snippets

Socket.IO 1.0: Socket.IO has hit version 1.0 – the Node.js and browser library which started life as an implementation of the WebSockets interface and has gone on to “become the EventEmitter of the web”. The 1.0 release and changes are broken down in a blog posting, the first on a newly redesigned, and much more useful, Socket.IO website. In brief, modularisation, tighter code, binary support (so you can emit blobs and buffers), automated testing, better scalability using redis, more integration (including PHP support), better debugging support (and silence by default), sleeker APIs and CDN delivery. And the future plans include handling Node.js streams, Socket.IO support in Web Inspector and Firefox Dev Tools and more language and framework support. A splendid tool to have in your arsenal.

Git 2.0: The distributed version control system which made distributed version control systems cool before even version control could be cool, Git, has reached version 2.0. In the announcement of the new release, there’s a long list of all the changes and notes on backward compatibility. The 2.0 release has been anticipated by the developers for a while so a lot of ground work had already been done in previous 1.x versions making the 2.0 release look more like a minor release than a major version bump but there’s still plenty of changes and a foundation prepared for future changes. On that subject, there’s a promise of a shorter release cycle for the next release as delays have meant a number of features ‘cooking’ for longer in the ‘next’ branch.

OrientDB 1.7: Version 1.7 of the Document/Graph/Sql/NoSQL database OrientDB is available. The announcement for 1.7 notes better perforamnce, new clustering options, support for SSL and sharding, simplified configuration, new SQL commands including parallel queries, plugins for Lucene-based full text searching and more. There’s an Apache 2 licensed community edition of the database and commercially sold and supported professional and enterprise editions.

ElasticSearch 1.0, TokuMX 1.4, Plan 9 GPLv2’d and Python 3.4RC1 – Snippets


ElasticSearch 1.0 springs out: The search-oriented NoSQL database, built upon Lucene, ElasticSearch has hit version 1.0. It’s a big release with a lot of changes and a lot of new features – an API for selective snapshot/restore, federated search, aggregation, distributed percolation and software “circuit breakers” to stop some more dangerous actions from overwhelming the system. An interesting post from Found.no on ElasticSearch sums up the pros and cons (like no authentication or authorisation) places ElasticSearch in the domain of “secondary store” to be used alongside a primary database.

TokuMX 1.4: Tokutek’s “MongoDB-with-Toku-engine”, TokuMX, has hit version 1.4 and is addressing the performance of sharding and replication. Toku’s engine is reputed to be very good for particular use cases and it’s interesting to see alternative storage engines under the MongoDB infrastructure.

Plan 9 goes GPL2: It’s been a long time under a open source (but unblessed-by-the-FSF) licence, but the venerable and inspiring Plan 9 has not been relicensed (mostly) under the GPLv2. In an announcement. It can be downloaded from the Akaros project (or cloned from the GitHub repo) which seems to be breeding Inferno/Plan 9 with their own many-core large-smp research.

Python 3.4 on final approach: Out of beta and hitting release candidate a few days ago, Python 3.4 is now imminent. It’s expected to land just over a month from now on March 16. Check back to our coverage of the last beta for more details of what’s coming.

Python 3.4 beta, Neo4J 2.0 RC1 and Redis 2.8.0 released – Snippets


  • Python 3.4’s beta days: The first beta of Python 3.4 has arrived and it has got the good stuff. Pathlib lets coders work with pure paths or filesystem dependent paths with the selection of the latter taken care of for them. There’s a standardised enum module along with new statistics, asyncio and tracemalloc modules. Throw in a new pickling protocol, new string and binary hashing algorithms, a C API for custom memory allocators and standardise on pip as a packaging format and you are talking a tasty new Python due to land at the end of February 2014.

  • Neo4J 2.0 goes RC: The Neo4J graph database is heading into the home straight with a 2.0.0 release candidate and a warning that if you’ve been tracking their version 2.0 milestones you will need to perform a manual update on your database before using 2.0.0RC1. Now tagged as feature complete, the new RC will be bringing matching with properties, optional matches, relationship merges and more simplified syntax to Neo4J’s Cypher query language. That’s in addition to the Neo4J browser and other changes made over the five other milestones (5, 4,3, 2, 1).

  • Redis 2.8.0: Salvatore Sanfilippo has announced that, after almost a year of development, Redis 2.8.0 is done. If you don’t know it, Redis is a key/value store which can also handle hashes, lists and sets. The new version include a partial resync for slaves option, iterable collections, a rewritten config system, IPv6 suppport, pub/sub keyspace notifications and better consistency support and key expiration. Actually 2.8.1 is out for download too – see the release notes for more on the BSD licensed key/value store.

Facebook Rocks, Open Source Managers and Funner Fonts – Snippets


  • Facebook Rocks: Another database open sourced by Facebook? Yup, and demonstrating that the term “database” covers a lot of ground, Facebook’s latest is RocksDB, an embedded key-value store for those userfacing situations where you need a lot of woosh, little latency. Lead developer, Dhurba Borthakur, explains in a blog posting that RocksDB is based on Google’s LevelDB and is tuned to run on many-core servers which making efficient use of storage to cut down on write wear. It’s implemented as a C++ library with arbitrary byte streams for keys and values and all the major components are pluggable and replaceable. It’s published under a BSD licence and comes with an additional patent licence.

  • OSI gets a GM: The Open Source Initiative has long been a purely volunteer organisation and that has limited what it has been able to do. But that’s changing with the appointment of the first employee, Patrick Masson, who’s taken on the post of General Manager at the OSI. Masson has introduced himself to the membership and is setting out on his tasks of running working groups, expanding membership and updating the OSI’s communications. It’ll be interesting to see what a difference it makes.

  • Cosmic Sans Neue: Who doesn’t like programmer fonts with their mono-spaced elegance? But maybe you want something a tiny bit quirkier. Check out Cosmic Sans Neue Mono, which has a tiny bit of quirkyness, not only in it’s name but in some of the character shapes. You can also find it on GitHub and it’s available under the SIL Open Font Licence

Hey! Presto – Facebook’s latest open source code

PrestoFacebook, in their now traditional goal of taking on big data problems, solving them and then open sourcing the result, have open-sourced Presto, a distributed SQL query engine “optimized for ad-hoc analysis at interactive speed”. This type of app is designed for the folks who need to work out what people who like chips and cheese and rock but dont like bagels or opera also have, statistically, in common. Its a simple enough question, but when you get up to Facebook scale, its a hard question to answer. This is the land of Hadoop and Hadoop has its own SQL-like query engine, Hive. 

But unlike Hive which converts queries into MapReduce tasks saving intermediate results to disk, Presto has a query and execution engine which runs in memory and is pipelined through the network. Presto is implemented in Java for easy integration with other parts of Facebook that are also built in Java and compiles parts of queries down to bytecode, letting the JVM JIT compile to machine code to get the best out of the Java environment. Although it doesn’t need Hive, Presto does need a datasource for its queries and it includes a plugin for Hive, though it only uses the Hive metastore service, presumably to obtain structural information, and then accesses the data over HDFS.

The Facebook announcement says “Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook” and has been in use internally since Spring of this year with multiple deployments and one cluster scaled to a thousand nodes. A thousand users actively use it with 30000 queries and processing a petabyte a day. Thats a good work out for any big data offering.

There’s plenty missing from Presto; various joins and aggregations are restricted and there’s no way to write results back into tables – they go straight to the client. Those issues, plus improved performance, query accelerators, hot cached data subsets and a high performance HBase connector are all on the roadmap for Presto.

Presto is licensed under the Apache License 2.0 but does not appear to be heading to the foundation with active development taking place around Facebook’s GitHub repository.

Slackware 14.1, MariaDB 10.0.5, Glassfish and Android Crypto – Snippets


  • Slackware updated: The venerable Slackware Linux has had its annual update for 2013 announced by Patrick Volkerding and a fine update it appears to be. A 3.10.17 Linux kernel, X11R7.7 X Windows, 64-bit UEFI installation support and updates across the board for dev tools, applications, desktops (Xfce 4.10.1 and KDE 4.10.5) and more. And Slackware ARM 14.1 is also available.
  • MariaDB 10.0 goes Beta: As MariaDB, the community-supported and developed MySQL fork, branches away from MySQL with version 10.0, the first 10.0 Beta has been released with enhanced replication, more storage engines supported, engine independent query statistics, regexps with PCRE, admin improvements with roles and more. Google sponsored one enhancement (parallel replication) and blogged about the release noting it is already deploying 10.0 into non-production MySQL instances to aid the MariaDB debugging and development process. In beta, the focus should be on stabilising the 10.x feature set, so if you are considering MariaDB 10.x for future use, now is a good time to check it out.
  • Glassfish goes open only: Oracle have pulled commercial support from the Glassfish server for future releases and are pointing users over at their commercial WebLogic Server. They are carrying on development of the server as the reference implementation of future Java EE platforms, but the fear is the quality of the RI will suffer with no commercial imperative to keep quality and performance high. Oracle may well have backed the wrong Java EE web server from a community point of view – I know no one who goes “Hey, lets do that on Weblogic” – but now the competitive field is wide open. The X-EE Factor auditions for series… One other takeaway comes from Tomitribe – Open source isn’t free and if we want it to be industrially healthy, then the industry needs to make sure some money ends up in the open source communities.
  • Android Crypto Misuse: Develop for Android (or Java in general)? Write code that uses cryptography? Then read this paper – An Empirical Study of Cryptographic Misuse in Android Applications(pdf). From the abstract, “We develop program analysis techniques to automatically check programs on the Google Play marketplace, and find that 10,327 out of 11,748 applications that use cryptographic APIs – 88% overall – make at least one mistake”. Scary eh. Very worth a read though.

FreeBSD 10.0beta3, SQL Injections, Rust stacks, InfluxDB and Circus renewal – Snippets


Catching up on Codescaling with some of the less mentioned things worth noting…

  • FreeBSD 10.0’s latest beta: It’s into the home/RC straight for FreeBSD 10 with the release of the third and hopefully last beta of the development cycle. The original schedule would have seen RC2 available around now, but with a focus on a quality release, there’s been a bit of slippage. Check out this FreeBSD News item from September for a feel of what’s going in. I’m looking forward to the switch to LLVM/Clang and seeing how the tickless kernel works out.
  • SQL injection attacks by Google?: Sucuri have come across an odd thing, Google doing SQL Injection attacks. Basically, Google’s bots crawl a site with links which would carry out an SQLi attack if followed… and then follow them like the bots they are which carries out the attack. Google may want to add at least some filtering to their bots in future, but its something to remind any application that ingests URLs from the web to follow them that URLs are not necessarily passive.
  • Rust reworks stack plan: For those interested in the implementation of languages, the Rust developers have decided to drop segmented stacks. Segmented stacks were stacks that were allocated small and expanded as needed. This would have allowed threads to have a much smaller footprint, but it didn’t quite work out that way. Followups on the thread discuss the cost of memory, both having it and accessing it, and alternative strategies.
  • InfluxDB: Databases for time series data are in and the latest open source addition to the game is InfluxDB which prides itself in no external dependencies. The Go-based MIT-licensed code has a JSONic HTTP API, an SQLish query language and a playground server to get running with. Its early days for InfluxDB, but its off to a good start.
  • Mozilla’s Circus Renewed: Mozilla’s Services project has announced a new version of its process/socket manager called Circus. Built using Python and ZeroMQ and recently redeveloped to be Python 3 compatible and fully asynchronous, the software lets an administrator manage processes and sockets on servers through a command line, Python API or web console. You can find the code on mozilla-services github.

Lime editor, HBase 96, Font Awesome and MOON LASERS – Snippets


  • Lime text editor: People love the Sublime Text editor. But being closed source does set some folks worrying. Some of them do something about it though, such as “quarnster” who has been creating Lime as an open source version of Sublime Text. Built with a combination of Go 1.1, Python3, Oniguruma and optional Qt5, Lime still has plenty to implement, including compatibility with Sublime’s Python API, keybindings and snippets, TextMate Snippits and getting solid cross platform support. But if you are looking for a project to work on…
  • HBase 96 arrives: The Hadoop-based “big data” database, HBase has been updated to HBase 0.96 with around 2000 issues closed and lots of contributed work. This included getting the MTTR (Mean-time-to-recovery) down to under a minute, support for snapshotting tables then moving and restoring snapshots, Cygwin-free native support on Windows, more efficient compacting, a switch to Google’s ProtocolBufffers (in part for futureproofing) and much more. There’s also a bunch of incompatible changes so do check the notes. Find the release and the release notes on the Apache Software Foundations pages.
  • Font Awesome 4.0: A font of icons? Yes, the rather spiffy Font Awesome is back with an even more awesome version 4.0, now with 370 icons in a single collection. Designed for Bootstrap 3.0.0, styled with CSS and free for commercial use. Check out the sample page and examples. And yes, you can use it without Bootstrap too.
  • And finally: NASA just announced they have got a 622Mbps download rate from the Lunar Laser Communication Demonstration. It’s asymmetric though… 20Mbps upload, but hey, to the Moon.

Cassandra’s Europe Summit – The Keynote – Extra Scaling

cassandraeyeAt the opening of the conference day at Cassandra Summit Europe 2013, Johnathan Ellis, Datastax CTO, made a point of positioning Apache Cassandra as an enterprise scalable database and one that scales in a linear fashion to massive scales. Datastax is the leading developer of, and commercial vendor of Apache Cassandra in the form of DataStax enterprise.

MongoDB was very much in the company’s sights as it showed benchmarks with Cassandra running 20 times faster than MongoDB – the reason was simple though the dataset for the benchmark was bigger than the available memory on the nodes. While MongoDB performs well with the dataset in memory, Ellis says most customers want their hot-data in memory and their cold-data on disk and thats where Cassandra has the advantage with a balanced approach to memory and disk.

Away from the benchmarking, Ellis described this years focus for Cassandra as having been on was of use. That meant enhanced CQL, the Cassandra Query Language, a new CQL protocol for language drivers, more emphasis on features like tracing, lightweight transactions for the 1% of cases that need it and cursors to reduce query complexity.

Internal enhancements were equally important though. For example, 2.0 took back control of a lot of memory management in Cassandra, from the JVM and over to a more traditionally manually handled memory manager tuned for Cassandra’s needs. This has allowed lots of data structures to reside more efficiently in memory improving performance.

Next week will see the release of Cassandra 2.0.2 which will add what the DataStax people call “rapid read protection”. This means that when a query goes out to a cluster, rather than waiting until a node times out to return an error, the system will look for return times that are out of the ordinary (in the 99th percentile) and return an error on them early. This should make the ability to respond to nodes over-paused in GC or suffering some other performance hit.

Ellis also talked about Cassandra 2.1 which is pencilled in for January 2014. This will see nesting and collection indexing added to the database. The filtering inside the Cassandra software should also be improved with a new combination of pessimistic allocation and smarter estimates of required space using HyperLogLog to work out what data overlaps between sets. Ellis described his slides in this though as “hand wavy” as there was no code written yet and asked “Don’t send me hate mail…” if it didn’t make 2.1.

DataStax’s own certified DataStax Enterprise is set to move to a Cassandra 2.0 base by the end of the year.

Updates for RethinkDB and FreeBSD and a 64-bit .NET JIT boost – Snippets


  • RethinkDB gets multi-indexing: The developers of the open source, NoSQL database RethinkDB have announced version 1.10 which comes with the ability to index rows with fields of multiple values, like say an list of tags for a blog entry. Looking for all records with a particular tag previously required slow iteration, but now with the multi-index it is possible to index the set of values within the field and then to “get_all” for a particular tag value using that index. RethinkDB server is written in C++ and AGPL licensed with Apache licensed client drivers.
  • FreeBSD 9.2 released: In the latest FreeBSD release ZFS gets added TRIM support for solid state drives and lz4 compression and there’s updates for OpenSSL (to 0.9.8y), DTrace (to 1.9.0), Sendmail (to 8.14.7) and OpenSSH (to 6.2p2). There’s also virtio drivers and enabled Dtrace in the “GENERIC” kernel. Read more in the FreeBSD 9.2 release announcement.
  • RyuJIT for .NET: Over in the world of .NET, interesting things are afoot with a new 64-bit just-in-time compiler, RyuJIT, making its debut as a CTP (Community Technical Preview). .NET’s had a 64-bit JIT for some time, though the JIT has apparently been quite slow. RyuJIT runs twice as fast and overall gives a 30% speed up to start up. One benchmark with regular expressions went off the scale, going from a 1.4GB working set and 60 seconds run time to 199MB and 1.8 seconds run time – yes the older compiler is particularly bad at regular expressions.