Launching a New Community-owned Content Service
Caren Milloy describes some of the challenges overcome and lessons learned by JISC Collections during the development of JISC eCollections.
JISC eCollections is a set of e-resource platforms launched in November 2011 by JISC Collections, in partnership with the JISC data centres EDINA and Mimas. The platforms (Figure 1) are JISC MediaHub, JISC Historic Books and JISC Journal Archives; together, they are intended to provide a sustainable, value-for-money alternative to accessing licensed content on publisher platforms, by consolidating and hosting the broad range of historical book, journal archive and multimedia content purchased by JISC Collections on behalf of the UK education community. The vision is to provide a world-class collection that ensures users’ broadest information needs are well met, and to work in partnership with the community to improve and develop the platforms around evolving student and researcher expectations.
The primary role of JISC Collections is the licensing of content on behalf of its UK Higher Education (HE) and Further Education (FE) member organisations. Over the last 10 years, JISC Collections has invested over £40 million in centralised licensing of digital content, in perpetuity, on behalf of all its members. The first agreement was signed in 2002 for ProQuest’s Early English Books Online (EEBO). Since then, national licences have been negotiated for historic books, journal archives and multimedia content (Figure 1), such as documentaries and educational films. In 2010, JISC Collections invested a further £2.5 million in film and image content, representing UK and world history since 1987, specially selected for teaching and learning. The majority of JISC Collections’ member organisations would be unable to afford per-institution subscriptions to these book, journal and multimedia collections, so centralised licensing is critical to broadening access.
Figure 1: The three platforms that make up the JISC eCollections service
Why Develop JISC eCollections?
The platforms contain more than 4.5 million resources from over 20 providers. JISC Collections members were previously required to access this content via a range of separate services, each with different user interfaces and administrative requirements, and with a complex funding set-up including both JISC subsidies and publisher access fees payable by each institution. JISC Collections felt that its existing – and future – investments in content would best be protected and preserved by developing an independent service, as an affordable alternative to relying on content providers for access to perpetually licensed content. Such a service would allow the education community to take ownership of its acquisitions and assure it of future control. In 2011 each group of resources was consolidated into one platform to increase discoverability, simplify the user experience (making it more inclusive to users at all academic levels), reduce the administrative burden, and thereby enable maximum value to be derived from the initial content investments.
Figure 2: Home page montage from JISC MediaHub, giving a flavour of the kinds of resources available. The main image, of a hotspring in Yellowstone National Park, is from the Getty (Still Images) Collection.
It was envisaged that this approach would help expose the content to a wider range of institutions, particularly in Further Education, and help their users feel more confident in exploring and exploiting the content in teaching and learning.
What Principles Guided the Development?
The development of the three platforms was guided by a range of studies on the behaviour, information-seeking strategies and digital and information literacy of students and academics (Figure 3). They were undertaken in the UK and US (an overview of these studies is provided in the Digital Information Seeker Report ). Many of these studies were funded by JISC, with a focus on providing institutions with practical recommendations on how to improve library services to support the needs of users, and how to keep such services simple for end-users (‘the “simple” philosophy’). One of the studies, the JISC national e-books observatory project  was managed by JISC Collections and provided a great deal of insight into the frustrations and issues faced by users in accessing and using e-book platforms.
The most influential study, however, in terms of the development of the JISC eCollections platforms, was the User Behaviour in Resource Discovery (UBiRD) study undertaken by Wong et al. at the Middlesex University Interaction Design Centre . The findings and recommendations of this study supported the user behaviour observed and the feedback gathered during the e-books observatory project, and also the frustrations expressed by librarian members of JISC Collections at various advisory board meetings. For JISC, as the funder of the studies, and JISC Collections, as a shared service for its members, it was important to take on board the recommendations of the studies in order to base design and development decisions on the evidence gathered.
Figure 3: The development of the JISC eCollections platforms was guided by several studies into user behaviour, information-seeking strategies and digital / information literacy
By the Community, for the Community: Pre- and Post-launch Challenges
Developing and launching content delivery platforms is a new venture for JISC Collections and its partners. We have learned some valuable lessons and continue to identify problems and implement recommendations to help the service improve. In the interests of sharing our experiences with our members and the wider education community, I have written this paper to share the pre- and post-launch challenges we have encountered, and explore some of the ways we have addressed them to date.
Challenge 1: Many Publishers, Two Licences
JISC Collections uses a model licence for all its agreements. The model licence helps to standardise the many terms and conditions of use, acts as stamp of approval and helps librarians communicate these terms to users. In presenting the 'simple' philosophy to end-users and librarians, JISC Collections negotiated variations with all 42 publishers to ensure that librarians only had to sign two sub-licences – one for the JISC MediaHub platform and one to cover both JISC Historic Books and JISC Journal Archives. This was a major undertaking with some hard negotiations taking place to include text and data mining, open metadata and the creation of new metadata to supplement that which had been provided by the publisher.
In developing JISC MediaHub, metadata and thumbnails for all the content were made fully open and discoverable on the Web. Agreeing these clauses meant giving detailed explanations to providers, extolling the benefits to educational users – and to the content owners themselves – of openness.
Challenge 2: Many Resources, Seamless Access
Research by Head and Eisenberg  suggests that students apply a ‘consistent and predictable information-seeking strategy’ and therefore a 'less-is-more' approach may be more suitable in guiding students to resources’. Meanwhile, the 2006 Research Information Network (RIN) report, Researchers and discovery services: Behaviour, perceptions and Needs , highlighted that for researchers, access remains an issue. Connaway and Dickey state that this is especially the case for journal backfiles which are ‘particularly problematic in terms of access’ . The same report, which summarised the findings across 12 studies, notes that is commonly found that ‘library systems must do better at providing seamless access to resources’. This aligns with the feedback that JISC Collections hears directly through its close relationships with UK librarians.
Providing seamless access is not a simple task when dealing with a plethora of resources and different providers. While linking to multiple platforms may not be a major issue for some institutions (for example, those who make use of link resolvers to support their users’ discovery of resources), for others – especially those that have not previously been able to afford access fees, typically in FE – Head and Eisenberg’s ‘less is more’ approach simplifies and supports seamless access. By grouping the content formats together onto three platforms, the aim was to assist libraries in simplifying the user journey. Instead of linking to over 14 platforms, librarians need only link to three in order to direct their users to the content. In addition, users need only authenticate once, using their institutional username and password, on entry to each platform (Figure 4) and no more than once.
Figure 4: Logging into JISC MediaHub
Challenge 3: Complex Capabilities, Clean Interfaces
Keeping the interfaces of the platforms clean and simple was a requirement based on Connaway and Dickey’s recommendations, which they summarise by saying that the ‘evidence provided by the results of the studies supports the centrality of Google and other search engines’ [. The clear message is that users value familiarisation and convenience  and that ‘library systems and interfaces need to look familiar to people by resembling popular Web interfaces, and library services need to be easily accessible and require little or no training to use’ . Simple, clean interfaces make users feel comfortable, and familiar architecture helps reduce confusion when transitioning between services. These principles were taken forward in the development of the three interfaces, which feature plenty of white space and simple search boxes to help users feel comfortable and confident in the environment.
Challenge 4: Different Search Behaviours
Students and academics employ different search strategies across subject areas and at various stages of their academic career. Hampton-Reeves et al found that ‘students predominantly use keyword searches on a mixture of tools including internet search engines, library catalogues and specialist databases’ . The RIN study of researchers found that the most common search strategy was ‘refining down from a large list of results’ . However, across all the studies, the Google-type approach of entering keywords was a common strategy. Consequently, each platform aimed to offer search functionality that reflected the findings of the studies described above, while also taking into account how best to display the results for the content format and how to distinguish the provenance of the content (i.e. which collection it comes from). The JISC Historic Books platform landing page in its first iteration offered a single ‘Google-like’ search box to users (Figure 5).
Figure 5: First iteration of JISC Historic Books home page
All three platforms support filtering of search results by common parameters such as date. However, as aggregations of several resources, the results also needed to be filtered by each collection. For JISC Historic Books, which contains three collections – Eighteenth Century Collections Online, Early English Books Online and the British Library 19th Century collections – filtering was applied using colours and tabs as shown in Figure 6.
Figure 6: Search results display for JISC Historic Books, showing option to filter by colour-coded tabs (see detail) representing each collection within the platform
For JISC MediaHub and JISC Journal Archives, where too many collections are aggregated to support the tabbed approach, the filtering techniques and iconography used to indicate the source of the results displayed needed much more thought and testing in order that it remained intuitive (Figures 7 and 8).
Figure 7: Search results display for JISC Journal Archives, showing ability to filter by graphic icons (right) representing each collection within the platform
Figure 8: Search results display for JISC MediaHub, showing ability to filter by graphic icons (see detail below) representing different location and / or access rights for content
Challenge 5: Consistent Interfaces
The studies discussed above show that the efficiency and effectiveness of the user experience is driven by familiarity; when faced with an unfamiliar platform, users have to spend much more time thinking about where things are and how to get to them. This was evidenced in the deep log analysis of user behaviour and feedback from JISC national e-books observatory project focus groups where students struggled with interfaces, were frustrated at having to search to find the function buttons and often gave up . The UBiRD study sums this up well: ‘navigating from one system to another – all of which have different functionalities and different bells and whistles with respect to searching, limiting / refining, indexing, saving and storage or exportation – is confusing to users’ . In moving between different publishers’ platforms, Wong suggests that users are wasting valuable time as they have to ‘re-frame’ their minds each time to work out where the log-in is, where the print button is and so forth.
The need for consistency to support familiarity is another key principle behind the aggregation of the historic book, journal archive and multimedia collections on their respective platforms; for example, there are over 50 collections on JISC MediaHub, searchable and viewable in one central location. Users of JISC eCollections need only become familiar with three platforms, rather than a plethora of different provider platforms.
Challenge 6: Information Literacy
The UBiRD study found that users often do not understand the structure, organisation or contents of e-resource platforms , and therefore make assumptions about what the resource contains or how it functions. This finding had a specific resonance in the development of JISC Historic Books: the quality and contents of search results for this platform are highly dependent on the quality of metadata associated with each book and on the existence and quality of full text created either by the Text Creation Partnership (TCP)  or by Optical Character Recognition (OCR). Because neither process is completely accurate (Figure 9), it was decided to make the OCR / TCP full text and metadata available alongside the page images in order that users could see why the book had appeared in their search results; where there was no OCR / TCP full text for a book, this would be made clear to the user. The implications of this approach are discussed further below.
Figure 9: Example of original image and associated OCR text from JISC Historic Books, intended to help users understand how content might have appeared in their search results, but with evident weaknesses that led to some user misconceptions about quality.
Mistakes and Solutions: The Case of JISC Historic Books
In developing JISC eCollections and the platforms in accordance with the principles outlined above, issues have arisen, mistakes have been made, and solutions to challenges been found. This article is now going to focus on JISC Historic Books (JHB) and share some of the complications that arose and lessons learned in developing this platform.
Lesson 1: Poor Information Literacy Can Distort Judgement
Showing the OCR / TCP supporting each digital image was intended to support users by helping them evaluate the content and understand the limitations of the search (because of its dependence on the quality of the OCR / TCP and metadata). Those users who had previously used EEBO on the ProQuest platform were familiar with this approach, as the ProQuest platform also displays the TCP full text where it is available; however, some users had previously accessed ECCO content on the Cengage platform, which does not display the OCR alongside the images.
Soon after launch, we began to hear comments from researchers in institutions that had transferred to JHB from Cengage, suggesting that our OCR was of poorer quality than that on the Cengage platform. It is natural for users to compare and contrast JHB with the platforms they previously used – this was expected and has led to the development of a great deal of new functionality since launch – but in this case, we had actually licensed the content from Cengage using the same OCR. What was not expected was that it would be researchers – typically seen as adept users of historical book content – who were not fully aware of the limitations of OCR (because they had not seen it previously) and its impact on the accuracy of their search results.
It is evident therefore that information literacy levels, with regards to evaluating and assessing historic book content, need to be improved. Making the OCR visible to all users of JHB is a crucial first step in this awareness-raising exercise. We will continue to support the UBiRD principles by working with the JHB Advisory Board to develop users’ information literacy skills further, with regards to the content available on the platform, and its limitations.
Lesson 2: One Size Cannot Fit All
The original design of JHB, with a simple Google-style interface (as in Figure 1), was intended to support common student search strategy (keywords) and help them see JHB as a familiar interface. It was thought that this approach would also align with the aims of JISC eCollections to be more inclusive of all users, regardless of their academic level and to make the content more approachable to users in FE colleges.
However, upon release of the beta version in August 2011, the simple search and consequent filtering was rejected by researchers as inadequate for their search strategies. Those researchers who had in the past used Cengage and ProQuest to access the JHB content had developed search strategies in line with these platforms; in other cases, researchers were using their comprehensive knowledge of the content to undertake very specific searches, for example using the English Short Title Catalogue (ESTC) number. The feedback from researchers at this early stage led to a complete overhaul and implementation of the advanced search functionality (Figure 10), with the assistance of the JHB Advisory Board.
With hindsight, it became clear that we had focussed too closely on findings around keyword search, and could have paid more attention to the UBiRD finding that different user groups use ‘different combinations of search components … depend[ing] on their level of literacy and the domain knowledge’ .
Figure 10: Revised home page for JISC Historic Books following community consultation and redevelopment
Lesson 3: Do Not Undervalue Metadata
The agreement JISC Collections made with ProQuest to license the EEBO content in perpetuity on behalf of all UK HE and FE institutions, did not include the MARC records; at the time, there were insufficient funds to cover the costs of licensing the metadata at a national level, so institutions were only able to obtain the MARC records by purchasing them directly from the publisher. In some cases, digital transcriptions of the images were available (created by TCP) but in the case of EEBO, only 20% of the content (25,000 of 125,000 volumes) has TCP text.
For the remaining 100,000 volumes, only basic citation metadata (author, title, and image ID number) was available to JISC Collections as part of the licensed work. This meant that 80% of EEBO content could only be surfaced in searches if the user knew and entered the title or author of the book. To rectify this situation, considerable time was spent exploring options to license high-quality metadata to support the EEBO content. For example, the British Library and ESTC North America jointly own the English Short Title Catalogue (ESTC), which includes detailed and high-quality metadata for the titles within EEBO. The viability of each option was assessed in light of the approaching launch date, and it was decided to purchase the MARC records directly from ProQuest for use within the platform only as, again, it was not affordable to license the records at a national level for all.
Once purchased, it was necessary to integrate the MARC records with the existing schema applied to the ECCO and BL metadata. This schema had been devised by developers, rather than MARC record or historic books metadata experts, and it became clear that some fields of potential value to researchers had not been included; for example, the metadata for the British Library’s nineteenth-century books included tags for visual elements such as images and portraits, which had not been recognised in the schema. JISC Collections is now working to include such elements in order to surface them in searching. In summary, an analysis of the metadata from all three providers would have been a useful first step in the development process for JHB.
Who Will Lead eCollections Development?
The core vision of JISC eCollections is that of a ‘community-owned content service’ – developed by the community, for the community, to protect and preserve existing investments. The service enables the education community to take ownership of the content licensed on its behalf and drive forward developments to the service. Advisory boards, consisting of librarians, teaching staff and researchers – all experts in their fields – have been set up for each platform and are in charge of the service. The remit of these boards is to discuss new opportunities and to make sure that future developments and content licensing support use in education and research, and contribute to the ongoing sustainability of the service. The advisory boards have control over how the service fee revenue  is reinvested into the service, to ensure its development, expansion and preservation in line with the long-term expectations and needs of members.
For example, the JISC Historic Books Advisory Board  has set itself an ambitious Terms of Reference, focussing on pioneering new technologies, partnerships and ease of use. The board discusses, agrees and prioritises developments to the platform against the budget, provides guidance to JISC Collections on content acquisition and is the driving force behind new experiments. One particular project that the board is exploring is crowdsourcing corrections to OCR text in partnership with Eighteenth Century Connect, ProQuest and Cengage. By crowdsourcing and sharing corrections at an international level, the scholarly community would be working collaboratively to improve the quality of and use of historic book collections. This is an extremely complex and ambitious project, but the Advisory Board members believe in harnessing the power of the digital humanities community and see this as an important development for the future of JISC Historic Books and its users.
In the last two years, JISC Collections has faced some serious challenges and made some errors of judgement in its attempt to develop an independent service that protects and preserves the community’s content investments, simplifies access and use in line with information-seeking research, and cedes ownership to the community. Throughout the process we have welcomed feedback from the community and have benefited from its support and recommendations. Sitting firmly within the education community it serves, and with the advisory boards in charge, I believe JISC eCollections is now well placed to become a valued community-owned content service that reflects evolving user behaviour and the changing scholarly environment.
- Connaway, L., and Dickey, T. “The digital information seeker: Findings from selected OCLC, RIN and JISC user behaviour projects.” JISC Report. March 2010 http://www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx
- Rowlands, I., Nicholas, D., Huntington, P., Clark, D., Jamali, H., and Nicholas, T. “JISC national e-books observatory project: key findings and recommendations.” JISC Collections Final Report. November 2009 http://observatory.jiscebooks.org/reports/
- Wong, W., Stelmaszewska, H., Bhimani, N., Barn, S., and Barn, B. “User Behaviour in Resource Discovery.” JISC Report. November 2009 http://www.jisc-collections.ac.uk/Reports/UBiRD/
- Head, A. and Eisenberg, M. “Lessons Learned: How College Students Seek Information in the Digital Age.” 2009 http://projectinfolit.org/pdfs/PIL_Fall2009_finalv_YR1_12_2009v2.pdf
- Research Information Network. “Researchers and discovery services: Behaviour, perceptions, needs.” November 2006 http://www.rin.ac.uk/our-work/using-and-accessing-information-resources/researchers-and-discovery-services-behaviour-perc
- Prabha, C., Connaway, L., & Dickey, T. “Sense-making the information confluence: The whys and hows of college and university user satisficing of information needs.” 2006 http://www.oclc.org/research/activities/past/orprojects/imls/default.htm
- Connaway, L., Dickey., T., and Radford, M. “If It Is Too Inconvenient, I’m Not Going After It: Convenience as a Critical Factor in Information-seeking Behaviors.” Library & Information Science Research, v33n3, pp179-90. July 2011 http://dx.doi.org/10.1016/j.lisr.2010.12.002
- Hampton-Reeves, S., Mashiter, C., Westaway, J., Lumsden, P., Day, H, Hewertson, H., et al. “Students’ use of research content in teaching and learning.” JISC Report. 2009 http://www.jisc.ac.uk/media/documents/aboutus/workinggroups/studentsuseresearchcontent.pdf
- The Text Creation Partnership is funded by more than 150 libraries around the world to create electronic text editions of early print books http://www.textcreationpartnership.org/Text Creation Partnership
- Institutions pay a single service fee to support the cost of hosting and maintaining the JISC eCollections platforms. The service fees are transparent, kept as low as possible, and profits are ring-fenced for reinvestment in the long-term maintenance and development of the service.
- This JISC Historic Books Advisory Board is detailed at http://www.jiscecollections.ac.uk/advisory-board/jhbadvisoryboard/
Head of Projects
Web site: http://www.jiscecollections.ac.uk
Caren Milloy joined JISC Collections in 2003. She has since negotiated a wide range of agreements for digital content, and managed several projects including the JISC national e-books observatory. As Head of Projects, she manages an extensive portfolio of projects that research changes in user behaviours, pilot new business models, create new tools and develop new consortia licensing models. An essential part of this role is to ensure that the projects are delivered in a professional manner and that the recommendations and findings are embedded into the core licensing work of JISC Collections and are communicated to the scholarly sector. She is currently managing OAPEN-UK, a project that will pilot open access business models for scholarly monographs.