Just released: Socket.IO 1.0, Git 2.0 and OrientDB 1.7 – Snippets

Snippets.png
Socket.IO 1.0: Socket.IO has hit version 1.0 – the Node.js and browser library which started life as an implementation of the WebSockets interface and has gone on to “become the EventEmitter of the web”. The 1.0 release and changes are broken down in a blog posting, the first on a newly redesigned, and much more useful, Socket.IO website. In brief, modularisation, tighter code, binary support (so you can emit blobs and buffers), automated testing, better scalability using redis, more integration (including PHP support), better debugging support (and silence by default), sleeker APIs and CDN delivery. And the future plans include handling Node.js streams, Socket.IO support in Web Inspector and Firefox Dev Tools and more language and framework support. A splendid tool to have in your arsenal.

Git 2.0: The distributed version control system which made distributed version control systems cool before even version control could be cool, Git, has reached version 2.0. In the announcement of the new release, there’s a long list of all the changes and notes on backward compatibility. The 2.0 release has been anticipated by the developers for a while so a lot of ground work had already been done in previous 1.x versions making the 2.0 release look more like a minor release than a major version bump but there’s still plenty of changes and a foundation prepared for future changes. On that subject, there’s a promise of a shorter release cycle for the next release as delays have meant a number of features ‘cooking’ for longer in the ‘next’ branch.

OrientDB 1.7: Version 1.7 of the Document/Graph/Sql/NoSQL database OrientDB is available. The announcement for 1.7 notes better perforamnce, new clustering options, support for SSL and sharding, simplified configuration, new SQL commands including parallel queries, plugins for Lucene-based full text searching and more. There’s an Apache 2 licensed community edition of the database and commercially sold and supported professional and enterprise editions.

Hey! Presto – Facebook’s latest open source code

PrestoFacebook, in their now traditional goal of taking on big data problems, solving them and then open sourcing the result, have open-sourced Presto, a distributed SQL query engine “optimized for ad-hoc analysis at interactive speed”. This type of app is designed for the folks who need to work out what people who like chips and cheese and rock but dont like bagels or opera also have, statistically, in common. Its a simple enough question, but when you get up to Facebook scale, its a hard question to answer. This is the land of Hadoop and Hadoop has its own SQL-like query engine, Hive. 

But unlike Hive which converts queries into MapReduce tasks saving intermediate results to disk, Presto has a query and execution engine which runs in memory and is pipelined through the network. Presto is implemented in Java for easy integration with other parts of Facebook that are also built in Java and compiles parts of queries down to bytecode, letting the JVM JIT compile to machine code to get the best out of the Java environment. Although it doesn’t need Hive, Presto does need a datasource for its queries and it includes a plugin for Hive, though it only uses the Hive metastore service, presumably to obtain structural information, and then accesses the data over HDFS.

The Facebook announcement says “Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook” and has been in use internally since Spring of this year with multiple deployments and one cluster scaled to a thousand nodes. A thousand users actively use it with 30000 queries and processing a petabyte a day. Thats a good work out for any big data offering.

There’s plenty missing from Presto; various joins and aggregations are restricted and there’s no way to write results back into tables – they go straight to the client. Those issues, plus improved performance, query accelerators, hot cached data subsets and a high performance HBase connector are all on the roadmap for Presto.

Presto is licensed under the Apache License 2.0 but does not appear to be heading to the foundation with active development taking place around Facebook’s GitHub repository.

FreeBSD 10.0beta3, SQL Injections, Rust stacks, InfluxDB and Circus renewal – Snippets

snippets03

Catching up on Codescaling with some of the less mentioned things worth noting…

  • FreeBSD 10.0’s latest beta: It’s into the home/RC straight for FreeBSD 10 with the release of the third and hopefully last beta of the development cycle. The original schedule would have seen RC2 available around now, but with a focus on a quality release, there’s been a bit of slippage. Check out this FreeBSD News item from September for a feel of what’s going in. I’m looking forward to the switch to LLVM/Clang and seeing how the tickless kernel works out.
  • SQL injection attacks by Google?: Sucuri have come across an odd thing, Google doing SQL Injection attacks. Basically, Google’s bots crawl a site with links which would carry out an SQLi attack if followed… and then follow them like the bots they are which carries out the attack. Google may want to add at least some filtering to their bots in future, but its something to remind any application that ingests URLs from the web to follow them that URLs are not necessarily passive.
  • Rust reworks stack plan: For those interested in the implementation of languages, the Rust developers have decided to drop segmented stacks. Segmented stacks were stacks that were allocated small and expanded as needed. This would have allowed threads to have a much smaller footprint, but it didn’t quite work out that way. Followups on the thread discuss the cost of memory, both having it and accessing it, and alternative strategies.
  • InfluxDB: Databases for time series data are in and the latest open source addition to the game is InfluxDB which prides itself in no external dependencies. The Go-based MIT-licensed code has a JSONic HTTP API, an SQLish query language and a playground server to get running with. Its early days for InfluxDB, but its off to a good start.
  • Mozilla’s Circus Renewed: Mozilla’s Services project has announced a new version of its process/socket manager called Circus. Built using Python and ZeroMQ and recently redeveloped to be Python 3 compatible and fully asynchronous, the software lets an administrator manage processes and sockets on servers through a command line, Python API or web console. You can find the code on mozilla-services github.