The text-search library Lucene and Solr, the search platform built on top of it, have both been updated to version 4.5. Version 4.4 came out in July so what’s changed in this version bump?
Well, first of all, for Lucene, the DocValues mechanism which allows typed storage to be associated with documents has been updated to allow for missing values and there’s now an in-memory supporting DocIDSet which is more efficient for carrying around smaller lists of documents. Other changes can be found in the Lucene 4.5 release notes.
Solr 4.5, as usual, benefits and supports these changes as it is built on Lucene, but the search platform has also had its own set of improvements. For example, when running a sharded cluster, its possible to now set up custom routing to the various shards, including routing based on field values. Faceted searches are now multi-threaded, the solr.xml configuration file is now storable in ZooKeeper and the CloudSolrServer has the ability to send updates directly to shard leaders. Again, more details are available in the Solr 4.5 release notes and the PDF of the updated Solr reference guide is available through the Apache mirrors. Both Lucene and Solr also have various bugfixes and performance improvements.
The other other open source IaaS Cloud, CloudStack, has had an update with the release of CloudStack 4.2. What’s new? reveals a lot of work which the announcement summarises as 57 new features and 29 improved features such as the ability to plug in external or internal S3-compatible storage services and support for Cisco’s UCS compute chassis and SolidFire storage arrays.
A trawl through the release notes shows that there is far more than the headline items though. There’s a whole set of features to help support for regions, zone wide primary storage and a plug-in framework for writing UI extensions.
Networking has had a lot of work done to it too with initial support for IPv6 (as a technical preview), portable elastic IPs which can be transferred between zones, the ability to assign a VLAN to an isolated networks and persistent networks which can exist without VMs assigned to it. There’s also Cisco VNMC and VMware VDS support, enhanced support for Juniper gear and global server load balancing with health checks for load balanced instances.
Host support has not been left out. Windows 8 and Windows Server 2012 can now be VM Guest OS’s, ownership of VMs can now be changed by an administrator, resizable data disk volumes, storage migration (for XenServer and VMware), the ability to scale CPU and memory on running VMs (VMware and XenServer again), over-provisioning of memory and cpu (VMware, XenServer and KVM), bare-metal provisioning kickstarter, VM resetting on reboot and VMware VM snapshots.
Finally, there’s a who set of enhancements to the monitoring, maintenance and operations end of CloudStack, with support for auto purging alerts, API request throttling, forwarding of alerts to external SNMP and Syslog systems, a log collection tool, ability to change default password encryption and new VM snapshot and backup capabilities.
You can download the source or binaries (in deb and rpm packages) from cloudstack.apache.org where there is also documentation including installation and admin guides.
The PostgreSQL team have released PostgreSQL 9.3 ending the beta cycle which started in May. 9.3’s headline feature is the newly writable Foreign Data Wrappers (fdw). In 9.1 and 9.2, foreign data wrappers were read-only, allowing the database to only ingest information made available through an “fdw” driver, taking them from a legacy source or other database and materialising them as a table. In 9.3 though, these “fdw” drivers can be enhanced and support changes to the fdw tables being reflected back in the source. There’s also a PostgreSQL “fdw” driver for federating PostgreSQL instances.
Also new in 9.3 are twelve JSON functions (rather than 9.2’s two) for working with JSON data within the database, including path based extraction so JSON fields need not be extracted externally to be tested. Materialised views are also new and cache the output of a view as a physical table rather than repeatedly issuing the query that the view is based on – there’s also support for recursive views too. Simple views can now be updated too, rather than going back to the view’s source and updating that.
Other enhancements include a parallel version of pg_dump to speed up backups, a switch to POSIX shared memory which will make configuration easier, new trigger events from CREATE, ALTER and DROP and an implementation of lateral joins. Modules can now opt to work in the background as custom server processes and there’s support for data checksums and corruption reports.
Basically, it’s oodles of improvements in one big update – full details in the what’s new in 9.3 notes and release notes. You’ll find the open source licensed PostgreSQL ready for download on the PostgreSQL Download page.
Version 2.0 of the Apache Cassandra database has just been released. The Apache Software Foundation are leading on the addition of lightweight transations and triggers to the database. Cassandra originated at Facebook who donated it to Apache in 2008. It is designed to work with massive data sets and mixes Google’s Big Table data model with Facebook’s own distributed architecture Dynamo.
Datastax, who produce a commercial version of Cassandra, have the detailed blog entries on lightweight transactions which can ensure an update is committed to all replicas through a prepare/promise/propose/accept process, on triggers which can start processing tasks as changes in tables are detected and on the enhancements made to CQL, Cassandra’s SQLish query language. There’s also a roundup of all the other changes in Cassandra 2.0, such as the requirement to use Java 7, varios spring cleaning and performance optimisations. The Datastax documentation has also been updated for 2.0 and is also available as a PDF.
Usually with Apache project releases (and other events), the decision to release and the actual release can be a matter of some weeks, but this time round it was less than a week between those two events. Could this be a sign that the ASF will synchronise their announcements more with events than an artificially paced schedule? We shall see.
In the aftermath of Google’s bone-headed-but-determined execution of Google Reader, there has been some great work done developing alternatives to Google’s service. One open source implementation was NewsBlur, but at least from our experience at codescaling.com, it was a bit tetchy and the user interface was idiosyncratic. Among the other services we tried was Feedbin, with its clean stripped down user interface, growing app support and good RSS pickup speed. But it wasn’t open source, at least until now when Ben Ubois announced Feedbin was being opened.
While we at codescaling.com are still happy with Ubois’s hosted version of Feedbin at feedbin.me (currently priced at $3 a month or $30 per year), it’s really good news to see him open up the code under an MIT licence and host it on a GitHub repository. It means that users of the Feedbin service know they have an alternative they can host themselves, that they can get involved in development and help take the cause of better RSS aggregation forward. “It’s because Feedbin is making money that I felt comfortable doing this” said Ubois on a Hacker News thread.
The code itself is a Rails 4.0 application, running on Ruby 2.0 and using both Postgres 9.2 and Redis 2.6 for data storage duties. Instructions for getting the system running on Mac OS X are available in the Github readme; partial instructions for Ubuntu 12.04 are also present. “Install a local Feedbin server” is now on our to-do list (though that is a very long list).