Node.js synchronously: Node.js is sweet if you can adapt to the asynchronous model of start thing, say what you want to do when its done, do everything else anyway. Good for web request handling but bleh for trying to emulate a shellscript. Turns out that in Node.js 0.12 (coming soon? anyone? Bueller?) we get synchronous child processes to now you can run that curl or find or whatever and just wait till its returned with its results. The folks at Strongloop have written about these synchronous child process methods and how they make writing command line utilities in Node easier. Check it out Noders.
Serviced Polyfills: Polyfills fill gaps in browser functionality and standards compliance. The older the browser, the more Polyfill you need to fill the gaps and the newer, the less. But it gets hard working out how much Polyfill you are going to need. Fear not, as Samuel Giles at FT Labs has an answer, “Polyfills as a Service“. Add a simple script tag pointing at a source from the polyfill.io content delivery network to your pages and whatever browser views your page, it gets the polyfill it needs. This is because the system sniffs the browser agent and works out the best set of polyfill based on that. Neat idea, potentially very handy – and you can run your own private version if you need to.
Spark sparks: Apache Spark just got a 1.1 release. Spark is Hadoop data processing engine which can run on YARN-based Hadoop clusters or in standalone mode. Spark 1.1 improves the performance (and they already say they are up to 100 times faster than Hadoop MapReduce) and has SQL layer enhancements. 1.1 also adds more statistical functions, can take steaming data fromAmazon Kinesis and pull data from Apache Flume and more. If your into clusters and data crunching and haven’t looked at Spark, you might want to look into it.
Tangram Mapping: Do you want to render cool 2D and 3D maps? Check out Tangram, a Mapping Library then as it is building out from a WebGL implementation to other OpenGL platforms to make oodly cool dynamic map renders. Very slick.
SHAaaaaaa!: We mention the Google sunsetting of SHA-1 the other week. If you were unsure why this was important, can we send you off to Why Google is Hurrying the Web to Kill SHA-1 which explains why it all and includes a brief history of collision attacks in the wild.
Tails 1.0: The developers of Tails, the Linux distro built for anonymity and privacy, have declared the latest version Tails 1.0. Tails wires all its networking through Tor and leaves no traces on machines where its been livebooted. Its ideal in situations where you want your digital footprint minimised. Version 1.0 sees browser updates, Tor patches including a Heartbleed vulnerable blacklist, bug fixes and a new logo for the project. The announcement also lays out plans for 1.1 (A switch to Debian 7), 2.0 (better building for a longer life) and 3.0 (sandboxing and isolation) and invites developers to contribute… it is a project which has got some great reviews.
Debian 7.5: Talking about Debian, the latest bugfix and patch rollup release, Debian 7.5 has just arrived. If you keep your Debian system up to date, you’re already good, but if you install a lot of systems from spinning or stickish media then you may want to take this opportunity to update your images. Full details of the fixes, bug and security, are in the announcement.
Apache OpenOffice 4.1: The Apache OpenOffice project has announced AOo 4.1, the latest iteration in the direct descendent of the original OpenOffice. The release notes highlight the Windows version’s IAccessible2 support for better screen reader integration and the addition of comments and annotations for text ranges. In place field editing, interactive cropping, unified import/drag/drop for images, better vectors and new (Bulgarian, Danish, Hebrew, Hindi, Thai and Norwegian Bokmal) translations and other updated translations and dictionaries. Also, behind the scenes, AOo now uses NSS libraries rather than the older Mozilla networking code so that it is a bit more secure and a lot easier to build.
LXC goes 1.0: Linux Containers, LXC, is now at version 1.0, a major milestone which also brings together and completes a lot of things that have been working their way through the Linux kernel, like support for unprivileged containers, long term stuff like a stable API – this’ll be supported for five years, bindings for Lua and Python3 (and Go and Ruby out-of-tree support), backing storage support for directories, btrfs, zfs and more, cloning, snapshotting… and you may wonder “Hey, doesn’t Docker do many of these things” and yes it does, so it’ll be interesting to watch how things all work out. More details at the news post and check out Stephane Graber’s 10 part blog series on LXC 1.0 which is packed full of useful stuff.
Thrift double opened: Facebook brought Thrift(PDF) to the world in 2007 via Apache Thrift and many people found the network/data serialisation framework well handy. Thing is though that Facebook went and forked their own internal version of Thrift as they filled out the features and ramped up performance, something that took major rengineering over time. Now the company has announced fbthrift, available on Facebook’s Github repo, now open sourced under the same Apache 2.0 licence Apache Thrift is under.
Worth reading: WhatsApp’s Serving : From 2012, here’s a presentation on how WhatsApp does scale(PDF) with a combination of FreeBSD and Erlang – A New York Times profile of security reporter Brian Krebs who’s more like an entire security intel op in one person – Enjoy Stephen Colebourne on video presenting the Java 8’s Date and Time API at JAX 2013.
At the opening of the conference day at Cassandra Summit Europe 2013, Johnathan Ellis, Datastax CTO, made a point of positioning Apache Cassandra as an enterprise scalable database and one that scales in a linear fashion to massive scales. Datastax is the leading developer of, and commercial vendor of Apache Cassandra in the form of DataStax enterprise.
MongoDB was very much in the company’s sights as it showed benchmarks with Cassandra running 20 times faster than MongoDB – the reason was simple though the dataset for the benchmark was bigger than the available memory on the nodes. While MongoDB performs well with the dataset in memory, Ellis says most customers want their hot-data in memory and their cold-data on disk and thats where Cassandra has the advantage with a balanced approach to memory and disk.
Away from the benchmarking, Ellis described this years focus for Cassandra as having been on was of use. That meant enhanced CQL, the Cassandra Query Language, a new CQL protocol for language drivers, more emphasis on features like tracing, lightweight transactions for the 1% of cases that need it and cursors to reduce query complexity.
Internal enhancements were equally important though. For example, 2.0 took back control of a lot of memory management in Cassandra, from the JVM and over to a more traditionally manually handled memory manager tuned for Cassandra’s needs. This has allowed lots of data structures to reside more efficiently in memory improving performance.
Next week will see the release of Cassandra 2.0.2 which will add what the DataStax people call “rapid read protection”. This means that when a query goes out to a cluster, rather than waiting until a node times out to return an error, the system will look for return times that are out of the ordinary (in the 99th percentile) and return an error on them early. This should make the ability to respond to nodes over-paused in GC or suffering some other performance hit.
Ellis also talked about Cassandra 2.1 which is pencilled in for January 2014. This will see nesting and collection indexing added to the database. The filtering inside the Cassandra software should also be improved with a new combination of pessimistic allocation and smarter estimates of required space using HyperLogLog to work out what data overlaps between sets. Ellis described his slides in this though as “hand wavy” as there was no code written yet and asked “Don’t send me hate mail…” if it didn’t make 2.1.
DataStax’s own certified DataStax Enterprise is set to move to a Cassandra 2.0 base by the end of the year.
The text-search library Lucene and Solr, the search platform built on top of it, have both been updated to version 4.5. Version 4.4 came out in July so what’s changed in this version bump?
Well, first of all, for Lucene, the DocValues mechanism which allows typed storage to be associated with documents has been updated to allow for missing values and there’s now an in-memory supporting DocIDSet which is more efficient for carrying around smaller lists of documents. Other changes can be found in the Lucene 4.5 release notes.
Solr 4.5, as usual, benefits and supports these changes as it is built on Lucene, but the search platform has also had its own set of improvements. For example, when running a sharded cluster, its possible to now set up custom routing to the various shards, including routing based on field values. Faceted searches are now multi-threaded, the solr.xml configuration file is now storable in ZooKeeper and the CloudSolrServer has the ability to send updates directly to shard leaders. Again, more details are available in the Solr 4.5 release notes and the PDF of the updated Solr reference guide is available through the Apache mirrors. Both Lucene and Solr also have various bugfixes and performance improvements.