Node.js synchronously: Node.js is sweet if you can adapt to the asynchronous model of start thing, say what you want to do when its done, do everything else anyway. Good for web request handling but bleh for trying to emulate a shellscript. Turns out that in Node.js 0.12 (coming soon? anyone? Bueller?) we get synchronous child processes to now you can run that curl or find or whatever and just wait till its returned with its results. The folks at Strongloop have written about these synchronous child process methods and how they make writing command line utilities in Node easier. Check it out Noders.
Serviced Polyfills: Polyfills fill gaps in browser functionality and standards compliance. The older the browser, the more Polyfill you need to fill the gaps and the newer, the less. But it gets hard working out how much Polyfill you are going to need. Fear not, as Samuel Giles at FT Labs has an answer, “Polyfills as a Service“. Add a simple script tag pointing at a source from the polyfill.io content delivery network to your pages and whatever browser views your page, it gets the polyfill it needs. This is because the system sniffs the browser agent and works out the best set of polyfill based on that. Neat idea, potentially very handy – and you can run your own private version if you need to.
Spark sparks: Apache Spark just got a 1.1 release. Spark is Hadoop data processing engine which can run on YARN-based Hadoop clusters or in standalone mode. Spark 1.1 improves the performance (and they already say they are up to 100 times faster than Hadoop MapReduce) and has SQL layer enhancements. 1.1 also adds more statistical functions, can take steaming data fromAmazon Kinesis and pull data from Apache Flume and more. If your into clusters and data crunching and haven’t looked at Spark, you might want to look into it.
Tangram Mapping: Do you want to render cool 2D and 3D maps? Check out Tangram, a Mapping Library then as it is building out from a WebGL implementation to other OpenGL platforms to make oodly cool dynamic map renders. Very slick.
SHAaaaaaa!: We mention the Google sunsetting of SHA-1 the other week. If you were unsure why this was important, can we send you off to Why Google is Hurrying the Web to Kill SHA-1 which explains why it all and includes a brief history of collision attacks in the wild.
Node-RED updated: The most excellent graphical UI for connecting the Internet of Things (or just things in general), Node-RED has been updated to version 0.6. The announcement notes the process of separating the admin and server authentication to make deployment more robust has begun. Node-RED has nodes that accept HTTP connections and has a HTTP admin front end and previously these were all under one HTTP authentication mechanism – now the UI and nodes are more separate with the option to set a user/password for each. There’s some UI changes like a search filter for the palette of available nodes and easier flow importing by just dragging and dropping JSON onto the UI. In the node-red-nodes library, they’ve added Postgres, Amazon DynamoDB and Emoncms for more connections. There’s also fixes for the MQTT keepalive handing, an added socket timeout settings for TCP sockets and support for all 17 pins of WiringPi. More generally, there’s a range generating node now and the inject node can send empty payloads if needed. Finally, the MongoDB node now can send a user name and password – something I found I needed when writing this for MongoHQ.
Hadoop 2.3.0 released: In case you missed it, version 2.3.0 of the Apache Hadoop project got a release. The release notes list all the details. The short version is this is mostly about HDFS, the distributed file system and the changes include the ability to class the storage under HDFS so you can make tradeoffs between say spinning media, SSDs and memory, an ability to explicitly cache files or directories under HDFS (and local zero-copy reading from the cache) and the use of HDFS and YARN to simplify deploying MapReduce code. Hortonworks has a good writeup which also looks forward to Hadoop 2.4.0 with HDFS ACLs and rolling upgrades.
NetBeans 8 gets an RC: The NetBeans IDE has hit release candidate for 8.0. This is the version that will include JDK 8 support in the editor, Java SE Embedded and Java ME Embedded support, PrimeFaces code generators, AngularJS navigation and code completion, PHP 5.5 support and much more. There’s a summary in the announcement, a lot more detail in the New and Noteworthy wiki page and a pencilled in release date of mid-April.
Skrollr scrolls in: Recently spotted – Skrollr, a compact parallax scrolling and scrolling animation library for all your Webtml5.0 styled sites including the ability to “scale, skew and rotate the sh** out of any element”.
Facebook, in their now traditional goal of taking on big data problems, solving them and then open sourcing the result, have open-sourced Presto, a distributed SQL query engine “optimized for ad-hoc analysis at interactive speed”. This type of app is designed for the folks who need to work out what people who like chips and cheese and rock but dont like bagels or opera also have, statistically, in common. Its a simple enough question, but when you get up to Facebook scale, its a hard question to answer. This is the land of Hadoop and Hadoop has its own SQL-like query engine, Hive.
But unlike Hive which converts queries into MapReduce tasks saving intermediate results to disk, Presto has a query and execution engine which runs in memory and is pipelined through the network. Presto is implemented in Java for easy integration with other parts of Facebook that are also built in Java and compiles parts of queries down to bytecode, letting the JVM JIT compile to machine code to get the best out of the Java environment. Although it doesn’t need Hive, Presto does need a datasource for its queries and it includes a plugin for Hive, though it only uses the Hive metastore service, presumably to obtain structural information, and then accesses the data over HDFS.
The Facebook announcement says “Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook” and has been in use internally since Spring of this year with multiple deployments and one cluster scaled to a thousand nodes. A thousand users actively use it with 30000 queries and processing a petabyte a day. Thats a good work out for any big data offering.
There’s plenty missing from Presto; various joins and aggregations are restricted and there’s no way to write results back into tables – they go straight to the client. Those issues, plus improved performance, query accelerators, hot cached data subsets and a high performance HBase connector are all on the roadmap for Presto.
Presto is licensed under the Apache License 2.0 but does not appear to be heading to the foundation with active development taking place around Facebook’s GitHub repository.