Adapting VuFind as a Front-end to a Commercial Discovery System

graham seaman

Adapting VuFind as a Front-end to a Commercial Discovery System

Graham Seaman describes the adaptation of an open source discovery tool, VuFind, to local needs, discusses the decisions which needed to be made in the process, and considers the implications of this process for future library discovery systems.

VuFind is an open source discovery system originally created by Villanova University near Philadelphia [1] and now supported by Villanova with the participation in development of libraries around the world. It was one of the first next-generation library discovery systems in the world, made possible by the open source Solr/Lucene text indexing and search system which lies at the heart of VuFind (Solr also underlies several of the current commercial offerings, including Serials Solutions' Summon and ExLibris' Primo). The first wave of adopters of VuFind (the National Library of Australia, Minnesota State Colleges [2], Western Michigan [3] and Villanova itself in 2008) were therefore among the first users of next-generation discovery systems anywhere. For library patrons the prominent features were the modern interface, contrasted with the older proprietary OPAC interfaces that had often tended to stagnate; the use of faceted search to apply post-filtering to a large number of results ranked by relevancy, in comparison with the pre-filtered exact matches of the OPAC [4][5] and consequently a general privileging of exploratory search over lookup search [6] and the access to data from multiple sources merged within the VuFind index.

Between 2008 and early 2012, the use of next-generation discovery systems became more commonplace. Proprietary offerings such as Summon, Primo, EBSCO Discovery, and Worldcat Local all have broadly similar interfaces (and in fact the Summon development team includes the original primary developer of VuFind). VuFind has been joined by Blacklight as a second open-source system providing similar features but based on different technology (Ruby and Rails, instead of PHP). VuFind is therefore no longer distinguished by its interface or by its ability to pool data from multiple sources, and many users are already familiar with such features. In addition the proprietary systems may have access to data beyond that under library control, allowing them to add metadata from journal suppliers to their pool of indexed data and so obviate the need for federated search.

VuFind and Blacklight, however, are still distinguished by the ease of adaptation to local needs which separates free software from proprietary software. The primary aspect of this is the ability to control what is indexed and how. But it also includes the ability to act as front end for a range of services and so provide a single interface for users which is otherwise only possible if all software is purchased from a single supplier. These services may even include the new proprietary discovery systems: VuFind has provided a Summon interface from the start. As time has gone by the typical contributions to VuFind from developers in other libraries - later to include Royal Holloway, University of London - have been interfaces to new services. It was this secondary aspect of VuFind that drove the interest of Royal Holloway Library in its potential use.

Royal Holloway Library Software

In 2009 Royal Holloway had installed the open source application Xerxes as a front end to the ExLibris Metalib federated search tool. This had been a success, giving the Library an improved user interface, the flexibility to easily add extra pages and minor functions, and above all positive user feedback. The Metalib API provided by Ex-Libris was stable and the system needed virtually no maintenance. Xerxes was upgraded from one minor version to the next with minimal effort [7].

However, by early April of 2010 the decision had been taken to move away from federated search and to combine catalogue and electronic resource discovery using a single proprietary discovery service. Summon from Serials Solutions was adopted, with the decision made from the start to upload local catalogue data into Summon.

Shortly afterwards Summon also began to harvest both the Royal Holloway Institutional Repository and the Archives catalogue using OAI-PMH. At the start of the 2010 Academic year Summon was launched on the library front page as the main discovery service for both catalogue, repository and archive items, and journal articles. In spite of problems with linking to search results from some suppliers, the article discovery itself worked well enough to end use of Metalib by the following summer - on the assumption that the linking problems would eventually be resolved.

However, article discovery was not the whole story. Rather than reducing the number of interfaces library users had to learn - which was part of the intention in aggregating the data into a single discovery tool - we had actually increased them.

Firstly, Summon had replaced the Metalib search function, but not the resource details and subject lists previously maintained in Xerxes/Metalib. A quick solution to this problem was to set up a template-based wiki (using Semantic Mediawiki) which would allow the automatic import of resource and subject lists from Metalib while they were still available. The wiki has performed well and has been reasonably straightforward to maintain, but has a noticeably different interface from the other Library applications.

Secondly, Summon had not replaced all the functions of the Library OPAC (Aleph, from ExLibris). Summon does not include patron management features such as loans renewal or hold requests. For these, users still had to log in to the Library Catalogue. Summon is also not configurable enough to provide all the searches available in the OPAC: browse by Dewey number, or search by content type for CDs, for example.

Thirdly, there was still a strong dependence on SFX for openUrl resolution. Many results in Summon would lead not directly to the item, but to SFX. SFX was also the only Library tool for alphabetic listing of journals or search by journal title combined with subscription dates and embargoes for each source.

And finally, we had some problems entirely of our own making. Initially, for example, we had our exam papers entered in the catalogue but stored in the Institutional Repository. A search in Summon for an exam paper would return a link to the catalogue. This would in turn present a link to SFX, and from SFX the user would arrive at the Repository - where the fact that they were not previously logged in would cause their search details to be lost, and they would be left unceremoniously at the Repository front page. The obvious answer to problems of this kind was to change the catalogue data: but the work involved and the long lead time in uploading changes to Summon made it urgent for us to find other quick, if temporary, ways round such problems.

We had previously considered using VuFind as a modern front-end to catalogue, repository and archive, importing data into a Solr index and using VuFind for patron management through the Aleph API. Now we began to wonder if we could use it instead as a front-end to Summon. That would allow us to make our own local changes to functions, as well as potentially providing a single point for users to access all our applications. Work began on this approach in February 2011.

Requirements

When Royal Holloway began its work with VuFind, we identified a list of areas on which we would need to concentrate:

The ILS Driver

VuFind includes drivers for a range of Library Management Systems, including Aleph. However, the drivers did not at that time support some of the patron management features we needed, including holds, and the Aleph driver did not support either ExLibris PDS authentication for users or the recent ExLibris RESTful API.

The Summon Interface

The most common use of VuFind with Summon is to use the Solr index for local catalogue material and the Summon interface for e-resource discovery. In the Villanova instance of VuFind, the local search results are referred to as "Books and More", and the Summon-based search results as “Articles and More”; the two searches can be run in parallel, but the results cannot be merged. We had almost no need for the Solr interface at all, and needed to convert the Summon interface from an optional secondary search to the default search. This meant that local records returned by Summon had to be linked back their original sources in our catalogue, archive, or repository.

No Integration with SFX

The lack of integration with our OpenURL resolver, SFX, was another area for attention. Summon results would typically return a link to SFX which would need to be followed before the user could select a source for an article. In some cases, where electronic resources had been entered in our catalogue, Summon would provide a link to the catalogue, which would in turn provide a link to SFX. The number of steps was excessive.

A-Z Journal Listing

The lack of any built-in integration with the SFX journal listing service represented another aspect for consideration. SFX does allow its journal A to Z list to be incorporated into other products as a frame (as had previously been done with Xerxes), but without the ability to alter its appearance or functionality in any way.

No Integration with e-Resources Listing Wiki

The lack of any integration with the new E-resources listing wiki, the local replacement for the E-resource listings from Metalib, also required attention.

Additional Search Options

We also needed to compensate for the loss of the search options provided by our OPAC which were missing in Summon. Searches for DVDs or Audio materials, for example, had been replaced by a material-type search in Summon, whereas the browse by Dewey subject classification had no effective replacement.

Branding

Finally, we also needed to bring the branding of the discovery system in line with general University and Library web branding.

Figure 1: Data sources to be integrated in Vufind

Addressing the Requirements

Some of these requirements were much easier to satisfy than others. VuFind provides a selection of ‘themes’ which determine its appearance and which can be easily overridden and extended in part or whole to match local needs. The branding requirement could therefore be easily met by the stock VuFind implementation. Some of the requirements needed changes to the core functions of VuFind, while others required adding extra but self-contained functionality.

To minimise future support problems, we needed to make sure that where we made additions these could be fed back into the main VuFind development path, and thus become supported by the whole pool of VuFind developers. Like many other open source projects, VuFind is available in three forms. Development work is done through a version control system, and the current main-line development version of the application (commonly known as the ’trunk’) can be downloaded and used directly, though this often implies that the users are planning to be involved with development work themselves. Efforts are made to ensure that the trunk is always fully functional, but this may not be guaranteed.

At specific times - usually after important new functions have been added to the trunk - further additions to the trunk are suspended, and after testing a copy of the trunk is packaged as a formal numbered ’release’. VuFind was at release 1.0 when we began and is now nearing release 1.3. This is the closest to the proprietary model of distribution, and most users will start by installing a numbered release, and periodically updating their software as each release becomes available. The biggest difference here is that there is no pressure applied on users to update as there is with proprietary software, so the temptation is to let updates slide until the point is reached where an update becomes a major piece of work.

Finally, many projects have a second version controlled strand which includes experimental work on the next major version - VuFind 2.0 in this case. This version is unlikely to be fully functional, especially in the early stages. However, it may be possible to include the flexibility to deal with locally required features more easily in an experimental branch than in the main trunk. In fact, one of the main goals of VuFind 2.0 is to make it easier to maintain local variations across future minor releases.

As we needed to make changes to the core VuFind system using a fixed release was not an option in our case. We had a deadline of four months to have our new system ready for testing, making it impractical to work on a completely new, speculative, major version of the system. The remaining choice was to work with the live trunk. Some other libraries which have done this have simply diverged from the standard VuFind, treating it purely as a starting point for their own systems (most notably MnPals+ [2]). Following this approach depends on either being willing to freeze the system after initial development without adding new features, or on having a pool of developers to continue work. Freezing the system implies that the locally developed version will fall behind standard VuFind, which is being steadily improved. This may not be a problem if the software is being used purely to fill a short term gap.

In the case of Royal Holloway, the only developer available to support VuFind was on a fixed-term contract. The ideal situation would therefore be to feed local developments back into the main development trunk, where they could be supported and improved on by the entire VuFind team, and at the end of the process move back to a stable release which incorporated the local changes but did not need further local development work. This was what was attempted, with varying degrees of success, as documented below.

The Creation of LibrarySearch

VuFind is a relatively modular system, so in many cases changes can be contained in small areas, whether by extending single classes or by adding self-contained new modules. Our problems arose where we needed changes which were not self-contained and which conflicted with the requirements of other VuFind users. We have branded our local instance of VuFind as ‘LibrarySearch’, and in what follows that name is used to distinguish our local instance from the generic VuFind.

The LMS Driver

VuFind communicates with Library Management Systems through drivers. There is a different driver for each LMS, implementing a common interface; each driver is typically one class, in a single file. We started to work with VuFind before holds and recalls were fully supported by the system, and so although there was already an Aleph driver it did not have all the features we needed. It also communicated with Ex Libris’s older ‘X-server’ API rather than the newer RESTful interface. We added the features we needed, later rewriting them to match the standard VuFind interface for holds and recalls as those were defined. The original driver class was converted into a utility class with two child classes: one for the X-Server API, and one for the RESTful interface (which not all Aleph customers have access to). Unfortunately this work was done in parallel with development of another Aleph driver at a Czech library which took a completely different approach, loading Aleph configuration files to allow more flexible configuration. The differing gaps in coverage meant that neither version was an obvious automatic replacement for the original driver, while neither library had the time available to fully merge the two. The unfortunate result was that the original version remains the default one. However, as the driver is quite self-contained this is not a maintenance problem for us, though it implies extra work for future potential VuFind/Aleph users.

Figure 2: Circulation data from the LMS Driver

The end result of the driver work is that our patrons can log in to LibrarySearch using the ExLibris PDS system, and once logged in can use VuFind to see their loans, request holds on items out on loan, and book short loans. Logged in users also have the standard VuFind functionality of viewing their saved items and searches.

OpenURLs

Search results which returned an openURL would originally lead the user back to our openURL resolver, SFX. In addition to the two extra steps needed to reach a resource (opening SFX and then selecting a link from SFX) this meant that users were also presented with an additional interface with a very different appearance in the middle of their search.

We overcame this by using AJAX to bring the SFX results into VuFind itself, creating a 'resolver driver' modelled on the LMS drivers. As we had access to 360Link for evaluation we added a 360Link driver to test our ability to swap between resolver drivers. The process was very straightforward and the code submitted back to VuFind. It is now part of the standard distribution, and yet another driver has since been added by a German library. The process worked well, removing the need for us to maintain this particular part of the code.

Figure 3: In-place expansion of SFX link

E-journals, Recent Acquisitions and E-resources

One feature missing from both Summon and VuFind was indices of E-Resources and E-journals, for which we had previously used ExLibris’ Metalib (now replaced by our Eresources wiki) and SFX respectively. SFX does allow its A-Z listing to be pulled into external applications, but as a frame with no control over look-and-feel. We decided to replicate the SFX function within VuFind, but using data taken directly from SFX. A task was set up to copy the relevant journal data from SFX and e-resource data from the wiki into the LibrarySearch mysql database nightly. An additional request was for LibrarySearch to display recent library acquisitions. Rather than pulling this information live from Aleph using the LMS driver, we decided to import this data into the local database in the same way.

The internal structure of VuFind allows new self-contained modules to be added in a very straightforward way. We added three such modules, one for the E-journal index, one for recent acquisitions, and one for the E-resources index. These were not fed back to the VuFind team as they were felt to be of local interest only and raised no maintenance problems.

Search results from the E-journal index return an OpenURL. This meant that the AJAX resolver driver we had already created could be used to expand the result via SFX and so provide a direct link to the journal. Search results from the recent acquisitions module return an author name and catalogue number; these are used to generate a Summon author name search or a search for the exact item. In both these cases the user stays within LibrarySearch until accessing the item itself. The E-resources index was more complex: the results of the search could not be a direct link to the resource, as part of the function of the wiki was to provide information on login conditions for each resource and to manipulate the login address in various ways. Search results therefore provided a link to the relevant wiki page, which in turn linked to actual resource. The Library Liaison team felt that the additional step required negated any advantages from having the search incorporated within LibrarySearch, and the module was removed to be replaced by a simple link to the wiki.

Catalogue Browsing

VuFind includes a ‘browse’ module which allows users to browse the catalogue using catalogue data fields (loaded into Solr) as facets: we experimented with Author, Title, Topic, Classmark, Location and Period, but on the whole found this to be one of the less successful parts of LibrarySearch.

The first problem was with the derivation of the facet values from the catalogue Marc data. In testing, students commented on the absurdity and overlapping nature of both location and period subdivisions. These were dropped.

The second problem was in connecting the catalogue-derived browse data in VuFind with the corresponding items found by a Summon search. Where there were few items in a category, the connection was at first made by requesting a set of items from Summon using the boolean ‘OR’ of a series of catalogue IDs. Shortly before the launch of LibrarySearch in June 2011 the ability to logically OR IDs disappeared from the Summon API, and thereafter this method could only be used for single items. Instead, the search was repeated using the corresponding terms in Summon. This gave a generally reasonable result, though the number of results returned could be quite different.

The one option for which this approach failed was (Dewey) classmark. Broad classes such as 500 would work well; narrow subdivisions such as 500.2094021 would (correctly) return few items, which Summon would then attempt to bulk out by adding items with similar contents. Sorting the resulting list would often give a result page without a single item with the correct classification. With some regret, we removed the option to browse by classmark. The browse options which remained, however, were almost completely standard VuFind functions and raised no issues of maintenance.

Summon as Primary Search

By far the biggest changes needed were to persuade VuFind to work with Summon as its primary search engine. The VuFind 1.x series was built around Solr as its default engine, and when we started to work with it the Summon section was simply a separate module. Continuing with this approach for us would have meant duplicating many of the features in the Solr-based code: we wanted to treat records derived from Summon in the same way as the mainstream system treated records derived from Solr, and preferred reusing existing code to writing our own from scratch. Luckily shortly after we had started with this approach the VuFind trunk began to include ‘RecordDrivers’ tailored to records from different sources: the idea being that the system could treat Records from all sources in the same way, as long as it knew the correct RecordDriver to use to handle the particular idiosyncracies of each source of Records.

Figure 4: Summon Search results displayed in Vufind

While a good match in theory, the RecordDriver approach was too new to have been fully worked out. In practice, we found that assumptions that related only to Solr had been built in almost universally to the code, and that instead of being able to make changes in one self contained area of the code, or to override code using inheritance methods, we were making minor changes almost everywhere in the system.

As the project proceeded, it was subject to changes occurring in the data it was using. In June 2011 modifications were made to the format of Summon record identifiers which we had mistakenly assumed were stable. We wrote a quick series of patches to allow for the change.

In November 2011 Summon began to replace openURLs with encrypted ’direct links’, with links to all items (including our own catalogue records) routed back through a Serials Solutions’ computer. This threatened to stop our customised link handling for local resources from working, as well as blocking our openURL module. Again, we incorporated a quick workaround for this in our Summon driver.

The end result was that we had a system with scattered code changes (some developed in a reactive mode without forward planning) which could not easily be reintegrated with mainstream VuFind. This is not a fatal problem, but does make it harder than it would otherwise be to incorporate desirable new features as they are added to the VuFind trunk. Since development was not done in isolation, other VuFind developers have been well aware of the problems, as well as perceiving a need to allow VuFind to work with a range of discovery back-ends other than Solr and Summon. The forthcoming VuFind 2.0 will allow for this.

Testing and Launch

Intensive testing was done with a small group of undergraduate and postgraduate library users, walking through a defined set of tasks and commenting on aspects they found straightforward, difficult, or confusing. All were positive about the overall application; all had suggestions for improvement. This testing approach meant that the suggestions for improvement were concrete enough to be implemented before or shortly after launch. LibrarySearch was launched as our new discovery tool, replacing the native Summon front-end on our library front page, at the beginning of the quiet period at the end of the Summer term. The next month was used to make minor corrections, and when the new cohort of students arrived for the next academic year, LibrarySearch was already established.

Results and Conclusions

As with any project of this size, the results have been mixed.

On the positive side, after only a few months work we have created a system which successfully integrates several applications, looks good, and works well. We have been able to integrate improvements over the standard Summon interface which would not have been possible for us to make using Summon alone (though in some cases Summon has now caught up with these!). LibrarySearch is used by large numbers of students without problems. Changes we have needed to make since launch have mainly been minor and all have been made without any disruption to the service. Reactions from outside the university - both from Serials Solutions staff and from other libraries - have been highly positive; reactions from inside the university have been more muted. We hope to carry out further user testing later, and already have a list of additional features we could add without great difficulty. The largest problem we have been aware of has had nothing to do with LibrarySearch itself, being caused by difficulties resolving links coming from Summon, especially those referring to newspaper articles. Serials Solutions have been working to ameliorate these linking problems.

On the negative side the major issue has been and continues to be the tension caused by the rate of change in the Summon interface, given the absence of permanent developer support to cope with adapting VuFind. In practice, this has not so far resulted in great difficulties, but it is clearly an ongoing source of risk. Some of the changes resulting from the general direction of travel of Summon have not meshed well with the general approach behind our own VuFind adaptations, and as this direction seems set to continue more difficulties can be expected to arise: the way in which `direct linking' and FRBRisation are being tackled seem likely to be two such future problem areas.

Two general lessons could be drawn from these results, one regarding adoption of open-source applications, the other regarding uncritical adoption of `next generation' discovery.

Evaluating Open-source Applications

When creating a local instance of Xerxes using the Metalib API we laid out a set of nine practical criteria for evaluating open source applications [7]. One outcome of our work with VuFind would be to add a tenth criterion: if the software depends on an API, ensure that the API is both well-documented and guaranteed to be stable. The Summon API is well documented, and in fact Serials Solutions themselves provide the core PHP code used by the VuFind Summon driver. However, Summon’s adoption in mid-2011 of a more rapid fortnightly update cycle may create complications for some in the external use of an API. This kind of speed is not easily compatible with external use of an API. An API is not only a set of procedures that can be called externally, but also the data that are returned by those calls, and both are important. Even if the procedure calls stay unchanged, altering the structure and semantics of the data that is returned is a major source of instability for external users.

Next-generation Is Not a Single Thing

`Next-generation' discovery has never been clearly defined; it is a general direction of travel, rather than a specification (the initial definition was made I believe by Eric Lease Morgan in 2006; an elaborated version can be found in Next Generation Library Catalogs in Fifteen Minutes [8]). That direction may be inflected differently by different parties. `Web-scale management', `central provisioning', and `transaction transparency' are part of a widespread trend that began with the move to be `more like Google'. They can be seen as part of a logical progression from that starting point, which starts with an emphasis on discovery and exploration and a downgrading of `known item' search. From system suppliers' point of view they are also steps in a process which removes part of the burden of serving the library patron from the library and shoulders it externally. This is in the immediate interest of libraries inasmuch as it reduces costs (principally in cataloguing) and solves common patron problems (principally link resolution) that libraries have been unable to solve completely themselves. Conversely, it is against the immediate interest of libraries insofar as it removes their control in cases where the supplier may not get things quite right, or where common use cases which do not fit the next-generation pattern continue to be important. On the other hand, if the library retains control over (say) links, the external supplier loses its ability to keep full track of all usage of the items linked to, which might affect its ability to provide detailed usage statistics.

These differences have manifested themselves in our VuFind adaptation by making it more difficult to help library users find local print copies of items; making it harder to manage local links differently from remote ones; and providing a poorer representation of items on shelves than is available from the OPAC. In the overall picture they are minor points, and for now they can be bridged in part by open-source applications which restore some of that element of local control - but the tension is clearly growing.

References

John Houser, "The VuFind implementation at Villanova University", Library HiTech, 27(1):93-105, 2008 http://dx.doi.org/10.1108/07378830910942955
Todd Digby and Stephen Elfstrand, "Open Source Discovery: Using VuFind to create MnPALS Plus", Computers in Libraries, March 2011. http://www.infotoday.com/cilmag/
Birong Ho, Keith Kelley, and Scott Garrison, "Implementing VuFind as an alternative to Voyager’s WebVoyage interface", Library Hi Tech, 27(1):82-92, 2009
http://dx.doi.org/10.1108/07378830910942946.
William Denton and Sarah J. Coysh, "Usability testing of VuFind at an academic library", Library Hi Tech, 29(2):301-319, 2011 http://dx.doi.org/10.1108/07378831111138189.
Jennifer Emanuel, “Usability of the VuFind Next-Generation Online Catalog”, Information Technology and Libraries, 30(1), March 2011, ISSN 0730-9295, http://www.ala.org/ala/mgrps/divs/lita/publications/ital/30/1/index.cfm.
Gary Marchionini, "Exploratory search: from finding to understanding", Communications of the ACM, 49:41-46, April 2006, ISSN 0001-0782 http://doi.acm.org/10.1145/1121949.1121979
Anna Grigson, Peter Kiely, Graham Seaman, and Tim Wales, "Xerxes at Royal Holloway, University of London", January 2010, Ariadne Issue 62
http://www.ariadne.ac.uk/issue62/grigson-et-al/
Eric Lease Morgan, Next Generation Library Catalogs in Fifteen Minutes, November 2007
http://infomotions.com/musings/ngc-in-fifteen-minutes/

Author Details

Graham Seaman
Library Systems Officer
Royal Holloway, University of London

Email: graham.seaman@rhul.ac.uk
Web site: http://grahamseaman.info/

Graham Seaman is a freelance developer who works under direct contract or through his company, Libretech Ltd. His primary experience is with free and open source software, which he combines with a particular interest in library systems. Graham is now working on a federated search system for the M25 Library Consortium.