A full-scale implementation of the Journal Usage Statistics Portal (JUSP) would not be possible without the automated data harvesting afforded by the Standardized Usage Statistics Harvesting Initiative (SUSHI) protocol. Estimated time savings in excess of 97% compared with manual file handling have allowed JUSP to expand its service to more than 35 publishers and 140 institutions by September 2012. An in-house SUSHI server also allows libraries to download quality-checked data from many publishers via JUSP, removing the need to visit numerous Web sites. The protocol thus affords enormous cost and time benefits for the centralised JUSP service and for all participating institutions. JUSP has also worked closely with many publishers to develop and implement SUSHI services, pioneering work to benefit both the publishers and the UK HE community.
JUSP: Background to the Service
The management of journal usage statistics can be an onerous task at the best of times. The introduction of the COUNTER  Code of Practice in 2002 was a major step forward, allowing libraries to collect consistent, audited statistics from publishers. By July 2012, 125 publishers offered the JR1 report, providing the number of successful full-text downloads. In the decade since COUNTER reports became available, analysis of the reports has become increasingly important, with library managers, staff and administrators increasingly forced to examine journal usage to inform and rationalise purchasing and renewal decisions.
In 2004, JISC Collections commissioned a report  which concluded that there was a definite demand for a usage statistics portal for the UK HE community; with some sites subscribing to more than 100 publishers, just keeping track of access details and downloading reports was becoming a significant task in itself, much less analysing the figures therein. There followed a report into the feasibility of establishing a ‘Usage Statistics Service’ carried out by Key Perspectives Limited and in 2008 JISC issued an ITT (Invitation To Tender). By early 2009 a prototype service, known as the Journal Usage Statistics Portal (JUSP) had been developed by a consortium including Evidence Base at Birmingham City University, Cranfield University, JISC Collections and Mimas at The University of Manchester; the prototype featured a handful of publishers and three institutions. However, despite a centralised service appearing feasible , the requirement to download and process data in spreadsheet format, and the attendant time taken, still precluded a full-scale implementation across UK HE.
Release 3 of the COUNTER Code of Practice in 2009 however mandated the use of the newly-introduced Standardized Usage Statistics Harvesting Initiative (SUSHI) protocol , a mechanism for the machine-to-machine transfer of COUNTER-compliant reports; this produced dramatic efficiencies of time and cost in the gathering of data from publishers. The JUSP team began work to implement SUSHI for a range of publishers and expanded the number of institutions. By September 2012, the service had grown significantly, whilst remaining free at point of use, and encompassed 148 participating institutions, and 35 publishers. To date more than 100 million individual points of data have been collected by JUSP, all via SUSHI, a scale that would have been impossible without such a mechanism in place or without massive additional staff costs.
JUSP offers much more than basic access to publisher statistics, however; the JUSP Web site  details the numerous reports and analytical tools on offer, together with detailed user guides and support materials. The cornerstone of the service though is undeniably its SUSHI implementation, both in terms of gathering the COUNTER JR1 and JR1a data and - as developed more recently - its own SUSHI server, enabling institutions to re-harvest data into their own library management tools for local analysis.
JUSP Approach to SUSHI Development and Implementation
Once the decision was made to scale JUSP into a full service, the development of SUSHI capability became of paramount importance. The team had been able to handle spreadsheets of data on a small scale, but the expected upscale to 100+ institutions and multiple publishers within a short time frame meant that this would very quickly become unmanageable and costly in staff time and effort - constraints that were proving to be a source of worry at many institutions too: while some sites could employ staff whose role revolved around usage stats gathering and analysis, this was not possible at every institution, nor especially straightforward for institutions juggling dozens, if not hundreds, of publisher agreements and deals.
Two main issues were immediately apparent in the development of the SUSHI software. Firstly, there was a lack of any standard SUSHI client software that we could use or adapt, and, more worryingly, the lack of SUSHI support at a number of major publishers. While many publishers use an external company or platform such as Atypon, MetaPress or HighWire to collect and provide usage statistics, others had made little or no progress in implementing SUSHI support by late 2009 - where SUSHI servers were in place these were often untested or unused by consumers.
An ultimate aim for JUSP was to develop a single piece of software that would seamlessly interact with any available SUSHI repository and download data for checking and loading into JUSP. However, the only client software available by 2009 was written and designed to work in the Windows environment, or used Java, which can be very complex to work with and of which the JUSP team had limited expertise. The challenge therefore became to develop a much simpler set of code using Perl and/or PHP, common and simple programming languages which were much more familiar to the JUSP team.
We started the process of SUSHI implementation by first developing client software that would work with the Oxford Journals server . You would imagine that with a specified standard in place and with guidelines to work from, it would be plain sailing to develop such a tool. However, the lack of any recognised software to adapt inevitably meant there would be a great deal of trial and error involved and so we scoured the specific NISO SUSHI guidelines and trawled the Internet to find bits of code to work from.
Access to the Oxford Journals server is relatively straightforward, requiring only one authentication identifier (the "Customer Reference ID") and a registered IP pertaining to the institution in question. Within just a few days, we were successfully retrieving our first JR1 reports using a Perl language SUSHI client; then it was a case of working with the publisher to allow sites to authorise JUSP to gather their data directly. This can now be done by institutional administrators, who need merely to authorise JUSP to collect data on their behalf using a Web-based administration interface, and then supply us with their username (which doubles as the aforementioned Customer Reference ID in the request we send).
At a stroke we now had a working piece of software which could be used to gather Oxford Journals JR1 usage data for any participating institution. A minor tweak then allowed us to use the same software to collect the supporting JR1a data, the number of successful full-text article requests from an archive by month and journal. We were thus able to collect a full range of JR1 and JR1a files for all JUSP sites within a very short period of time dating as far back as the Oxford Journals server would allow (January 2009).
Buoyed by this triumph, we then began work on developing clients which would talk to SUSHI servers belonging to other JUSP publishers: some fell into place very rapidly as various publishers had established usage statistics platforms, for example Springer was using MetaPress and Publishing Technology had their own system in place. However, several publishers, including some of the larger providers, did not offer a SUSHI service prior to joining JUSP, or were inexperienced in supporting the standard. This meant that JUSP often pioneered SUSHI development, notably with Elsevier, with whom we became the first organisation to use such a method of gathering usage data.
Publishing Technology is delighted to be involved in this important industry initiative. By making usage statistics for http://www.ingentaconnect.com available to the Journal Usage Statistics Portal via SUSHI librarians will have the latest and most comprehensive usage data for their collections, helping to inform strategic decisions for digital content purchasing.
Rose Robinson, Product Manager, Publishing Technology 
The files we retrieve are fairly basic XML files, containing set fields for each journal. These include a journal title, a publisher, one or more electronic identifiers (Print and Online ISSN) and several "count" fields such as the number of full text downloads, and the breakdown of that figure by HTML, PDF and - occasionally - other formats such as postscript (PS).
Note that while most SUSHI servers have near-identical implementation in terms of the COUNTER-compliant output fields they supply, they can vary widely in their precise implementation. Thus it was thus far simpler in the early days to develop a new SUSHI client instance for each publisher or platform from which we wished to retrieve data. For example, different providers have differing requirements in terms of authorisation parameters, and this was a significant barrier to overcome. By Autumn 2011 we had around a dozen functioning SUSHI clients, and that number has continued to expand into 2012; by September 2012, we were using 27 different Perl-based and one PHP SUSHI client to collect monthly data for 33 publishers, with software developed and ready for when other vendors join JUSP. Creation of the client code base has been relatively simple after the initial learning curve, and has enabled us to generate new clients in a very swift manner as new publishers join JUSP.
In terms of what this means for libraries, JUSP is thus already saving manual download of thousands of reports on a monthly basis, and also provides a suite of tools to check and process the data prior to loading into the portal. Institutions are reassured that the data we collect has been audited according to the COUNTER guidelines in addition to undergoing a rigorous set of tests at Mimas before the data are loaded into the system. By September 2012, over 100 million individual data entries had been collected; this represents three and a half years' worth of data for 140 institutions - some sites have access to more than 30 sets of data within JUSP, and with some institutions not yet providing us with access to all their resources, the scale of usage data that can be collected and analysed is clear!
Data Collection in a Typical Month
The benefit of SUSHI becomes apparent when looking at a typical monthly data collection. In May 2012 we gathered usage data from 26 active publishers and host intermediaries for approximately 140 institutions. This comprised a grand total of 1833 JR1 and 976 JR1a files: a number of publishers do not offer a JR1a report, preferring instead to provide a JR5 - JUSP does not handle JR5 reports at present due to a lack of standardised implementation and its optionality.
Before SUSHI, the concept of handling 2809 files in a single month would have been unthinkable; even at a highly conservative estimate of 15 minutes per file to download and an equivalent time to check and process, that corresponds to around 180 working days a month; eight full time staff in effect just to handle data collection and processing! By stark contrast, the entire download time required to gather those 2809 files by SUSHI was approximately 24 hours - with an automated collection procedure running overnight and out of working hours, the time and effort saving that SUSHI affords is immediately apparent!
Note of course that the data are released by providers at different times during the month: some data become available within the first few days, whereas other datasets appear in the final few days. It is a COUNTER requirement for data to be provided within 28 days of the end of the reporting period in question (i.e. May 2012 data should appear no later than June 28) and this is almost always achieved. Very occasionally a publisher may experience difficulty in providing the usage data in this timeframe, but we are generally notified if this is the case.
Once the files have been collected, it is necessary to check that they have transferred correctly: we visually inspect file sizes to determine their validity before we run the files through our processing tools. The SUSHI collection is not always flawless, as evidenced by May's manual checks, which discovered a number of minor download errors. 22 files out of 250 for one publisher timed out during the data collection, while 41 out of 133 for another failed for the same reason. These files were subsequently re-gathered. However, a successful collection of 97.8% of data can be considered a decent return. Some months we have achieved 100% accurate data collection first time for every publisher, while occasionally we do run into these timeout issues for one or two platforms.
The average time taken to download a single file varies widely from publisher to publisher and platform to platform. Various factors can play a part, such as the file size to be collected (dependent on the number of included journal titles), differences in server configurations and the way reports are generated prior to transmission, and even the time of day. The average throughout May was for a single file to transfer from a publisher/platform to JUSP in 31 seconds, with times on individual platforms ranging between 1 second per file to 4 minutes per file. More than a dozen datasets, comprising thousands of files, transferred at an average rate of under 3 seconds per file with 100% accuracy.
Processing of these 2809 files was then achieved by a combination of automated checking tools and manual error corrections. The files were individually run through a Perl script which performed a wide range of checks, looking for things such as data integrity, correct formatting, the presence of journal titles and at least one identifier per record - matching these against title and identifier records held in the JUSP database tables - the presence of full text total counts, and various other criteria.
If a file failed to meet all the checks, the ensuing warning messages were analysed and we then undertook further investigation to remedy the problem(s). By far the most common issue that occurs when analysing datasets is the inclusion of new journal titles within publisher reports; titles are obviously added to collections or switch publisher year-round, and this is immediately identified by our checking software. In May 2012, we identified over 40 new journal titles across a handful of publishers; these were added to the journals table and then the relevant files were reprocessed.
May's data processing runs threw up an interesting series of issues in addition to the expected ‘new journal’ collection. You might expect that for a large dataset an error would be reproduced across all files, but this is very rarely the case. For example, one file out of 54 for a publisher had a missing pair of identifiers, but these were present in the remainder. The failed file was manually edited, the identifiers added, and the checking proceeded with no further hitch. The next publisher dataset showed one failure in processing 266 files: in this case a badly-encoded special character was the issue, and replacing it fixed the problem. One slightly bigger issue affected around 15% of files from one publisher; missing identifiers had been replaced with a combination of random text entries or whitespace characters; again, globally removing these solved the problem.
When a systematic error is spotted across a dataset, we notify the publisher and/or data provider in question; JUSP has had great success working closely with the publishers, and the solution of issues which might otherwise have been reported by 100 or more individual libraries is evidence of the value we can bring in this area. Although SUSHI is a recognised standard, errors in the production of files can - and clearly do - still occur periodically. The combination of initial manual checks, a sophisticated file checking tool and then visual inspection of the loaded data both by our team and by the institutions results in increasingly efficient error reporting and the (almost!) complete elimination of any systematic problems.
Taking into account all the above issues, and the sheer scale of files to be processed, May's entire checking procedure for 2,809 corrected files took just 244 minutes, at an average of around five seconds per file. An additional five hours were required to manually correct all the data issues outlined above, but even with this manual work the entire checking procedure for 26 publishers in May 2012 took the equivalent of a single working day. When comparing this to the (conservative) 180 working days to collect and handle spreadsheets, the benefits of SUSHI in saving time and staff costs are immediately obvious.
Building and Implementing a SUSHI Server for JUSP
Having the ability to gather COUNTER reports from publishers on behalf of libraries, using our various SUSHI clients, is all well and good - but it is only half the story. As an intermediary between publishers and libraries, it is also our responsibility to maintain the chain of COUNTER-compliance by being able to re-expose those reports via a SUSHI server, as mandated by the COUNTER Code of Practice.
The work we had carried out developing the clients and consuming COUNTER reports, as described above, proved to be invaluable when it came to creating our own SUSHI server. We had learnt a lot about the SUSHI protocol, both in theory by referencing the SUSHI standard , and in practice as we wrestled with the XML reports we were ingesting from publishers. Forewarned is forearmed, though, so the task of creating the JUSP SUSHI server was not difficult, although it was exacting: the standard is very detailed and XML is quite unforgiving about mark-up errors.
Our server implementation, written in PHP, conforms to version 1.6 of the SUSHI protocol and release 3 of the COUNTER Code of Practice for Journals and Databases. During the coding process we referred extensively to the standard, following it through, step by step, to make sure we implemented everything exactly as it should be. The whole process, from start to finish, took a little under three weeks to complete.
One vital aspect that we had to consider was that of authentication. It is extremely important that confidentiality be maintained and that access to data is restricted to the relevant parties. For the most part the SUSHI specification gives extensive guidance on its implementation, but in the key area of authentication it actually has very little to say.
Access to statistics within the secure JUSP portal requires users to login using Shibboleth/ OpenAthens in order to identify themselves. For machine-to-machine interaction via SUSHI we need a different method of authentication and authorisation. The SUSHI specification includes a field – the Requestor ID – which is used to identify who is making a request for data, but that is not sufficient in itself to guarantee the identity of the requestor: an institution’s Requestor ID could potentially get hijacked and be used by an unauthorised party. So an additional layer of authentication is required.
Having seen for ourselves the difficulties caused to harvesters by the idiosyncratic use of extensions to SUSHI, we decided to keep things simple and straightforward: following the lead of Oxford Journals, we adopted IP address authentication to verify requestor identities. Participating libraries are required to register the IP address(es) of the machine(s) that will send SUSHI requests to our server. The combination of IP address and requestor ID ensures that only those truly authorised can access their usage data.
However, not all of our participating libraries want to harvest directly from us. Some of them use a third-party software package, such as Ex Libris’ UStat software, to analyse their usage statistics. So, in addition to setting up authentication for direct consumers of our SUSHI service, we have also put in place procedures to allow access to data by authorised third-parties. In 2011, we worked closely with Ex Libris over a number of months to get procedures in place to allow JUSP and UStat to interoperate. Now, for a library to authorise UStat to harvest on its behalf requires nothing more than the checking of a box on their JUSP SUSHI administration page. More recently we have also worked with Innovative Interfaces to ensure that JUSP functions with their Millennium statistical package.
At the moment, we have a stable, reliable, canonical SUSHI service conforming to release 3 of the COUNTER Code of Practice in place. But we cannot rest on our laurels: looming on the horizon is the new release 4 of the COUNTER Code of Practice.
Release 4 is ‘a single, integrated Code of Practice covering journals, databases and books, as well as multimedia content’ which has a deadline date of 31 December 2013 for its implementation . To supplement the new release, the NISO SUSHI Standing Committee has a created a COUNTER-SUSHI Implementation Profile  which sets out ‘detailed expectations for both the server and the client of how the SUSHI protocol and COUNTER XML reports are to be implemented to ensure interoperability’. By harmonising implementations, adoption of the new profile should make life easier for both producers and consumers of SUSHI alike.
So, over the coming months, along with other vendors and services, we will be working on developing a new SUSHI server meeting the requirements of release 4.
Review of Findings
Table 1 highlights the enormous time savings that SUSHI affords across a range of publishers. We have anonymised the publishers, but these represent typical low, medium and high-use datasets for the UK HE community. Figures included are for May 2012 but are typical of any given month.
We have estimated that it would take each institution an average of 15 minutes to log onto a publisher Web site, collect the relevant month's file(s) and transfer them to an in-house package for analysis. This does not take into account any time spent looking up passwords, any connection problems, or time spent moving the files onto the JUSP server for processing.
We have also included figures showing the time taken to process and check files using our automated checking software. Before SUSHI became available and we collected files in that way, it took one member of staff an average of 15 minutes hands-on time to process manually one spreadsheet of data for one institution per month; the actual staff time savings indicated below are in reality even greater, as much of the processing is done automatically and only requires manual intervention when errors arise. This figure is as valuable a metric as the download time, as an individual institution would need to perform visual inspections manually on each JR1/1a file before reporting figures - files can contain thousands of individual journal titles, so a figure of 15 minutes is reasonable to assume as the average checking time.
Files to download
Average file size (kB)
Total time for all sites to gather files and pass to JUSP (mins, estimated) (a)
Total SUSHI download time for all files (mins)
SUSHI download time per file (seconds)
Files collected first time (%)
SUSHI time saving (%)
Manual processing time (b) (all files, mins)
JUSP software processing time (all files, mins)
JUSP error processing (all files, mins)
Processing time saving (%)
Table 1: Actual and estimated download and processing time savings for four publishers using SUSHI and in-house checking software
Notes to Table 1:
(a) - estimated 15 minutes per institution to log onto the publisher site, download that month's file(s) and email them to JUSP
(b) - estimated 15 minutes manual processing time per file in CSV/spreadsheet format
The time savings across the board are thus clearly huge. Factor in the ability for JUSP to gather and replace data very quickly when a publisher indicates that data have been restated or altered and the time and effort saved becomes even starker. One JUSP publisher recently restated nine months’ worth of usage data, and using SUSHI we were able to completely replace these data for more than 130 institutions in under two hours - sites dealing manually with usage data simply could not do this, even if they were aware that the statistics had been regenerated! JUSP thus affords great efficiencies for its participating institutions and the UK HE community.
Expanding the table above to encompass all JUSP publishers sees similar economies of time and effort; in May one dataset proved more problematic than the rest, requiring numerous attempts to gather some files due to time-out errors and other errors in data processing. Even so, the time saved compared to manual gathering and processing was still more than 88% using the above criteria. Across the board in May, 97.2% of files were gathered correctly on a first pass - this figure can be 100% some months, but is generally around the 97-99% mark.
Lessons Learned and Recommendations
The key lesson that can be drawn from our experiences is that a centralised, reliable harvesting service based on using SUSHI to collect data is practical, cheap to run, and provides enormous economies where staff time is concerned. JUSP simply could not operate at its present budget without SUSHI, and anyone contemplating setting up a similar service would do well to follow a similar course. The SUSHI protocol, allied with COUNTER compliance, has been a real godsend for the provision of standardised journal usage statistics.
Meanwhile work has been progressing to develop a stand-alone single instance of the code which any institution can access and repurpose for its own use. This ‘SUSHI Starters’ code is a Web-based client written in PHP and is available from the project Web site  - technical assistance is available to help install this locally if required.
We would also recommend to sites that wish to gather usage data for loading into their own statistics management software packages to make use of the JUSP SUSHI server. A handful of JUSP institutions have successfully downloaded hundreds of datasets from us, knowing that we have already checked the data from the publisher. Ex Libris and Innovative Interfaces have also added JUSP as preferred vendors within their statistics packages (UStat and Millennium, respectively), enabling downloads of data from the JUSP SUSHI server directly into their software.
A survey of JUSP users carried out in Spring 2012  supports these findings. 74.1% of users reported time savings as a result of using JUSP; participants who took a majority of deals with publishers not yet included in JUSP mainly accounted for the rest. Over 60% indicated that removing duplicated effort was a major advantage, and 70% reported that the speedy collection and reliability of data within JUSP afforded better decision making in terms of journal renewals and budgeting. 81.7% of librarians also felt that if they did not have access to JUSP it would have an adverse effect on their service, with one commenting:
We [would] have to go back to doing all the harvesting ourselves when we have used the staff time freed to enhance other services/resources and those services would have to go/be reduced.
The benefit of the JUSP SUSHI server, with the ability to extract data directly from JUSP into library management and statistics analysis tools, has also been reported. A librarian from Leeds commented:
Going to JUSP to download 20 spreadsheets is much faster than going to 20 vendor Web sites, and having the data harvested automatically to the Innovative ERM is even faster. The data acquisition involves little effort once it has been set up; and I can have the harvest happen on a schedule.
SUSHI allows JUSP to collect (and provide access to) audited, COUNTER-compliant usage data on an unprecedented scale and to do so with accuracy and a high degree of reliability. Without it, a service such as this would either be unfeasible or would require a lot of additional staff to handle the data processing alone. Not only is there a significant time and cost saving within JUSP itself, the time saved at each of the 140 participating institutions is also evident; no longer is there a requirement for a member of staff to connect individually to dozens of individual publisher platforms, download spreadsheets, and then load them into whatever package that site uses to analyse its data. Moreover, the additional checks that JUSP performs on the data ensure the highest possible chance that the figures will be presented to the end-user without any problems or errors. JUSP is also providing publishers with an additional level of quality assurance.
The use of SUSHI has demonstrably saved JUSP and the UK HE community hundreds of thousands of pounds of staff costs since its inception; add in an estimated 97%+ of time in data collection and processing every month and the dual benefits are enormous; as more publishers join, this efficiency will continue to increase. In an age of funding cuts and budget restrictions, the combination of JUSP and SUSHI thus affords an economical, high-quality alternative to the previously onerous and unending task of journal statistics gathering and management.
- The COUNTER code of practice Journals and Databases release 3 (2008)
- Conyers, A., Dalton, P. "NESLi2 analysis of usage statistics: summary report", March 2005 http://www.jisc.ac.uk/uploaded_documents/nesli2_usstudy.pdf
- JISC Usage Statistics Portal scoping study Phase 2: summary report, January 2010
- NISO SUSHI Web site http://www.niso.org/workrooms/sushi
- JUSP Web site http://jusp.mimas.ac.uk/
- Cradock, C., Meehan, P. R., Needham, P., "JUSP in time: a partnership approach to developing a journal usage statistics portal", Learned Publishing, 2011, 24(2), pp 109-114.
- The Journals Usage Statistics Portal (JUSP). "AiP, Nature and Publishing Technology now participating in the JISC Journal Usage Statistics Portal". Press release, January 2011
- The Standardized Usage Statistics Harvesting Initiative (SUSHI) Protocol
- COUNTER Codes of Practice http://www.projectcounter.org/code_practice.html
- COUNTER-SUSHI Implementation Profile
- SUSHI Starters project Web site
- JUSP Community Survey, June 2012 http://jusp.mimas.ac.uk/docs/JUSPSurveyPublic_june12.pdf
Web site: http://mimas.ac.uk
Paul Meehan is a Senior Development Officer, working at Mimas, the UK National Data Centre at The University of Manchester. Paul's main role is as developer and administrator of the JUSP service, as well as involvement in a range of other Mimas projects and services including IRUS UK, the e-Tekkatho Project to deliver academic resources to Burma and work with Arthritis Research UK. He previously held key roles in the Intute and CrossFire services.
Web site: http://www.cranfield.ac.uk
Paul Needham is the Research and Innovation Manager at Kings Norton Library, Cranfield University. Paul is responsible for Application Development within JUSP. Recent projects he has worked on for JISC include ETAS (Enhanced Transfer Alerting Service), SUSHI Starters, PIRUS2, IRUS and RAPTOR-JUse. Paul is a member of the NISO SUSHI Standing Committee and the Technical Advisory Boards of COUNTER and the KnowledgeBase+ Project.
Web site: http://mimas.ac.uk/
Ross MacIntyre currently works within Mimas, the UK National Data Centre at The University of Manchester. Ross is the Service Manager for the ‘Web of Knowledge Service for UK Education’, ‘JUSP’ (JISC’s Journal Usage Statistics Portal), ‘EuropePMC+’ and ‘Zetoc’. He is also responsible for Digital Library-related R&D services and had formal involvement with Dublin Core and OpenURL standards development. Recent projects for JISC include PIRUS2 (extending COUNTER to article-level) and IRUS (applying PIRUS2 findings to UK institutional repositories). Ross is Chair of UKSG and a member of the Technical Advisory Boards of COUNTER and the UK Access Management Federation.