Institutional Repositories and Their 'Other' Users: Usability Beyond Authors
If institutional repositories (IRs) were all that their proponents could have hoped, they would be providing researchers with better access to research, improving institutional prestige, and assisting with formal research assessment . The reality, though, is that IRs are less frequently implemented, harder to fill, and less visible than their advocates would hope or expect .
While technical platforms for IRs, such as DSpace  and ePrints  have seen an abundance of research, little is known about the users of IRs, neither how they use IR software, nor how usable it is for them. IR users can be divided into three main groups: authors, information seekers, and data creators/maintainers; while authors are reasonably well understood, the latter groups are particularly under-studied.
Authors are better studied than any other users of IRs, perhaps because the first barrier to IR use is content recruitment, and authors are vital to content recruitment, even in IRs where deposit is performed by a third party. Several strategies to improve author involvement in IRs have been identified in the literature, and are summarised in Mark and Shearer's excellent review . The literature on authors will not be further reviewed here.
Information seekers are the end-users of any IR, and while there may be authors in this group (indeed, they may constitute the majority of this group), the goals and concerns of this group are very different from the goals and concerns of authors in their authorship role. Research shows that information seekers generally want to find information quickly and with a minimum of fuss , (though authors are differentiated by placing high value on peer-reviewed work ). It is clear that where information is freely available, information seekers are willing to use it, and trust it just as much as for-fee information . Even authors are visibly willing to use 'free' published work; 88% report having used self-archived materials . While IRs clearly have potential users, even researchers at a given institution are unlikely to know whether their institution has an IR ; hence IRs are likely to be largely unknown to researchers outside their host institutions (and must certainly be unknown to the general public, one of the purported beneficiaries of IRs ).
The first usability problem for IRs, then, is visibility; for IRs to be useful they must be seen, and it would appear at present that authors are quite right in perceiving them as 'islands' of information, set apart from the people who might be interested in them . This is a problem that can be addressed by search-engine harvesting of IRs, not just by Google Scholar (which attracts some use by academics, but is not usually the first information source they consult ) but by Google's main service, which is the first stop for information for academics and the public alike . This is not to say that other commercial search engines should be excluded; Google is mentioned by name here simply because it is the most popular . Reinforcing the importance of search-engine harvesting is Nicholas' surprising conclusion that search-engine indexing of a journal is at least as important as making that journal open access in terms of improving the journal's visitor numbers .
Other than this visibility problem, little is known about the usability of IRs for information seekers. A limited number of usability studies of IRs has been conducted (more on this below), but at the time of writing there are no known reports of actual usage of any IR. This dearth of usage data means we do not know: whether typical IR users are local or from outside the hosting institution; whether they find the IR via the institutional homepage or via search engine referrals; we do not know what kind of information they look for and use; nor how they use the functionality offered by IRs. While studies of IR usage would also be valuable, we can certainly learn from the usability studies of IRs and from the wealth of research about information seeking in other contexts. (In particular, this would be likely to advance our knowledge of how best to design IRs .)
Usability Studies of IR Software
The work in this field is very limited; at the time of writing, only one complete report of a usability study of any IR with a focus on users could be found. That report is a comparative analysis of two of the big software players in the IR field, e-Prints and DSpace . In this study, Kim performed a heuristic analysis of e-Prints and DSpace for a number of tasks (most of which involved searching for a known item), and then ran a between-groups study of users performing the same tasks. Kim predicted from the heuristic analysis that users would be faster and error rates would be lower for most tasks using DSpace (the reasons for this are analysed in depth in the paper); these predictions were proved accurate in the user studies. Despite the consistency of Kim's results, they are in contrast to Ottaviani's findings, which show problems with interface terminology and context indicators in DSpace in real-world use . Kim's findings also differ from Atkinson's experience, in which researchers found it very difficult to perform one of their common real-life tasks using DSpace .
The implications of this work are not software-specific; we can see that heuristic analysis can give good usability predictions, and that usability studies of specific tasks can tell us about software performance for those tasks. However, when the results are contradicted by studies of users attempting real-world tasks, we see that understanding what users would like to do with software, and ensuring that these tasks are both possible and simple to do, should be a priority when developing IRs.
Usage Studies of Online Research Resources
Despite the lack of IR usage studies, we can gain some understanding of how users are likely to use IRs for information seeking by looking to usage studies of journal databases and open access research repositories.
Recent studies of large journal databases show that users read and download an unexpectedly wide range of material in comparison with the range of papers actually cited. Obsolescence is not nearly as pronounced in downloads as it is in citations; while there is some recency effect (particularly in the sciences), older articles are downloaded much more frequently than they are cited . It is suggested that this may be a result of search engine use . Equally article popularity is not so clear cut as might be expected. In a one month study of a 'big deal'  nearly all available journals were accessed at least once, although the top half of the journals accounted for over 90% of the usage and three quarters of articles were viewed only once. Fewer than 1% of articles were downloaded more than ten times.
Studies of what people actually do with journal databases show that the typical user visits infrequently, views articles from just a single subject area, and views only a small number of articles . There is a correlation between the number of items viewed and the frequency with which users returned, suggesting that there is a small but significant minority of 'dedicated researchers'. The statistics also indicate a significant level of browsing, particularly of journals' tables of contents .
Usage studies of open access research collections in computing confirm the pattern of infrequent visitors who view and download only a small number of articles  . Moreover, users typically type in short queries (1-3 words), do not change default search options, and view only the first page of search results. Spelling errors are not often observed, but a number of queries returning few or no results are a result of less popular local spelling variants (for example 'optimisation' versus 'optimization') .
While these studies do not investigate IR usage, the systems they describe are similar in purpose to IRs and we can reasonably expect information seekers to use them in similar ways. Given that assumption, we can infer that typical IR users will visit infrequently, download only a few articles at a time, perform very simple searches, and use results from the top of the results list (though they will browse widely in other ways if offered the chance). Conversely, as a group, IR users are likely to use a wide range of articles, not just those that are new or popular, because their searches will return a wide range of articles, not just the most recent or popular articles. This picture of users suggests search mechanisms should be easy to use, that search defaults should produce a wide range of results, and that results should be displayed with the best possible relevance rankings. IRs should also facilitate browsing (preferably of the whole collection, as well as search results), and provide the widest possible range of articles.
When looking for information, users do more that just use a search box, particularly if it is not clear to them exactly what they are looking for. Instead, they engage in a process that continues until they find what they want, find something close enough (otherwise known as 'satisficing' ), or just give up . This process has been described in a number of models (see for example ), but the models are broadly similar and can be abstracted to six steps (though the process is often iterative). Those six steps are:
- Perceiving a need for information
- Investigating the ways in which the information need might be met, including assessing available sources of information, and possible searching and browsing for preliminary results
- Clarifying the information need to a small number of specific questions, based on the available resources and personal interest
- Querying information sources to meet the need
- Browsing and assessing results
- Assimilating results and refining queries if the information need remains unmet.
It is important to recognise this process when designing an information system, and to support it as much as possible. An example of an 'information system' that supports this process well is a library reference desk, where a librarian facilitates information seeking . In terms of IR usability, this process suggests we should include browsing functionality (to help users with assessing the information source, and clarifying their information need), and that we should allow users to interleave searching and browsing (this reflects the iterative nature of this process, and is supported by Nicholas  who shows that users of a journal database do interleave searching and browsing).
An IR's data creators/maintainers (henceforth referred to as 'data maintainers') are those who create metadata, upload documents, and generally contribute to or oversee the IR's document collection. Data maintainers may be librarians; the group may also include authors at institutions where author self-deposit is used. Very little research attention has been paid to this group, particularly in the usability field, yet they are vitally important to the creation and maintenance of any IR. Moreover, data maintainers are engaged in an entirely new role. This role is likely to require some combination of technical expertise, an understanding of metadata and metadata standards, copyright knowledge and the inclination to collate research publications . There is no comparable role within any other information system, particularly when it comes to author self-submission (authors who deposit in subject archives such as arXiv.org are sufficiently highly motivated to ignore usability problems , while it is difficult to motivate authors to submit to IRs at all ); thus there is no other research we can draw on to bolster our limited understanding of this role.
Librarians have demonstrated leadership in the IR field , and creating IRs  and encouraging OA mandates  is seen as a way forward for libraries in an age of digital information. A number of reports describe how well-suited librarians are to IR involvement  , and how they provide a wide range of necessary functions, including overcoming publisher and academic resistance , providing good metadata standards, and pushing for inclusion in external search services . Not only do librarians possess the necessary skills to provide leadership in the establishment of IRs, Carver  posits that libraries are best placed to use these skills, being at the nexus between published work, academics, and information access.
The benefits of librarians' involvement in IRs do not all flow one way, however; documented benefits to libraries include greater visibility within their research communities , opportunities to provide more tailored services for their patrons , and improved research collaboration with other libraries .
Despite all the potential benefits, it would be foolish to suggest that IR leadership never has any negative impact on libraries. Those with experience do caution about the amount of staff time that may be absorbed by IRs ; moreover, Piroun warns of the high level of technical expertise required by some systems (expertise not readily available in all libraries) . There are reports about the inertia of academics and their resistance to involving themselves in IRs , meaning that librarians must provide leadership, because they are the only people suited and available to do so. Finally, despite resolute predictions that self-archiving would make research literature free for the taking , not one of the library publications about IRs mentions a reduction in journal subscriptions (and hence cost) as a benefit.
Usability for Data Creators and Maintainers
Unfortunately, as with information seekers, there is only a limited number of reports of IR software usability for data maintainers. Despite assertions that authors can be easily trained to submit their own work , usability reports about IR software are predominantly negative, both in terms of what users can do with the software and how the software appears and behaves (though it should be noted that the research to date only covers DSpace and ePrints).
The main problems for data maintainers reported in the literature are:
- Terminology: The terminology used in the deposit and management interfaces of DSpace and ePrints was confusing and inappropriate for both authors and librarians (though in different ways) .
- Process: to deposit an item or to update its metadata, users were required to click through a number of screens. Often users needed to see only one or two of these screens, but they were nonetheless forced to follow the same linear progression through the screens. Even in early trials, users saw this as tedious and frustrating .
- Metadata requirements: In one study, the detailed metadata input fields displayed by ePrints and DSpace in their document deposit interfaces were daunting to both academic staff and librarians. Both groups complained that they often did not have all the metadata, and that it was not clear from either system which elements were required and which were optional . It is worth noting that the requirement of detailed metadata at the time of deposit is also raised as a problem in the design stage of a image repository system .
- Detail suppression: One group studied reported having difficulty making sections of a record private when using DSpace. This limited what that group was able to put in its repository .
- Formatting and authority control: Examples of formatting and authority control problems include: whether record titles were entered in title case or lower case; and whether author names were entered first name or last name first . Authority control for author names in particular has also been mentioned by bloggers working in the IR field , though requiring depositors to create authority versions of names during the deposit process has been shown to be unwieldy (at least for an image repository ). This issue is not only frustrating for librarians or authors who must enter details of a work, but may also cause problems for information seekers who type in the 'wrong' variant of a name. (It should be noted that ePrints version 3 has an 'auto-complete feature' that will help automate authority control . This version of ePrints was not the subject of any of the reports reviewed here.)
- Research reporting: A group of librarians who wished to use DSpace for research reporting (not an uncommon task for IRs  ) found the search and browse functionalities 'weak', and the search results display awkward .
It should be noted that while DSpace is more heavily criticised than ePrints, it is also more widely tested and thus may not be any less usable for data maintainers (though in a comparative test between DSpace and ePrints, a slight preference for ePrints emerged among both librarians and author-depositors ).
This literature reflects serious usability issues that may engender resentment among librarians taking time out of other duties to maintain IRs, and may also discourage authors who are ambivalent about self-deposit at best. However, at present it is impossible to calculate the real impact of these usability problems because we do not know how data creation and maintenance fits into the work practices of the authors and librarians involved (despite claims that self-archiving should take authors less than one hour per year each , and criticism of authors for not doing it ).
Libraries and librarians currently display a high level of commitment to IR data creation and maintenance; if this level of commitment is to be retained, it is necessary to pay attention to the needs and experiences of librarians. Conversely, if the commitment of authors is to be increased, it is necessary to ensure at least that their initial experience of the process is neither frustrating nor daunting. This means not only improving the usability of data creator and maintainer interfaces, but also understanding how the work involved in data creation and maintenance fits into the way people involved do their jobs, and make this fit in as streamlined a manner as possible. (For an example of how work practices of users were taken into account during the design of an image repository to great benefit, see Roda ).
Conclusions and Future Directions
In reviewing the literature about IR use and usability, we see that authors are well studied, and that there are a number of proven methods of engaging them. However, there are two other user groups for IRs that have not attracted nearly so much attention thus far, namely information seekers (or 'end-users') and data creators and maintainers.
Information seekers, while they are not closely studied with respect to IRs, are well studied in general, and by understanding both the information-seeking process and the behaviour of this group in similar systems, we can make predictions about how they may use IRs. Their visits are likely to be short, with short searches, and they are likely to view only a few articles. They will make use of browsing features, if they are provided, and this could lead to better information seeking. Collectively, they will use a wide range of articles spread over a long period of time. Even academic information seekers use Google (or other commercial search engines) first; IRs that are harvested by search engines will see a higher level of use than those that are not. All that we know about information seekers should be incorporated into our design of information seeker interfaces within IRs, but the few usability studies available suggest this has not been the case.
Data creators and maintainers have been largely ignored in the literature; despite their role being completely new, we know very little about how it fits into other job responsibilities and expectations. The suitability of data maintenance interfaces to their users is also largely unknown, though the available literature suggests that improvements are needed in this area.
For IRs to attract the level of use they need to revolutionise digital scholarship, they must be both useful and usable. For IRs to be useful, they must first have information in them, and little is known about their usability for the group (data creators and maintainers) who create the information. We know that this group is made up of librarians and authors, for the most part, but we do not know how the work of populating an IR fits into their workflow. Thus far, usability reports have been largely negative, and we can make suggestions as to how to avoid these mistakes, but we cannot make general suggestions for good design of IRs from a data maintainer's perspective. Observational studies of data maintainers would provide an understanding of the tasks IRs are used for and the way they fit into data maintainers' larger work roles, and also suggest ways of improving the fit of IR software to the tasks for which it is used. More formal usability testing could then be used for fine-tuning the design of data maintainers' IR interfaces. Virtually nothing is known about IR end-users. We do not know how many people are using IRs, whether they are academics or lay people, or how they most often find IRs, though it is reasonable to suspect they may find IRs with Google, given that this is the starting point for most information seekers. Studies of usage logs from a well-populated IR could answer these questions, and provide avenues for further investigation into how we might improve information seekers' IR experience.
The work to prepare this review was funded by the ARROW Project (Australian Research Repositories Online to the World) . The ARROW Project is funded by the Australian Commonwealth Department of Education, Science and Training, under the Research Information Infrastructure Framework for Australian Higher Education.
- Harnad, S. (1999) "Free at Last: The Future of Peer-Reviewed Journals". D-Lib 5(12),
retrieved: 7 March 2007.
- Harnad, S., et al., (2003) "Mandated RAE CVs Linked to University ePrint Archives: Enhancing UK Research Impact and Assessment". Ariadne, Issue 35, March/April 2003 http://www.ariadne.ac.uk/issue35/harnad/
- Lynch, C.A. (2003) "Institutional Repositories: Essential Infrastructure for the Scholarship in the Digital Age". Association of Research Libraries,
retrieved: 16 January 2007.
- Callan, P. (2004) "The Development and Implementation of a University-Wide Self Archiving policy at Queensland University of Technology: Insights from the Front Line". In Proc. SPARC IR04: Institutional repositories: The Next Stage. Washington D.C: SPARC
- Kim, J. (2006) "Motivating and Impeding Factors Affecting Faculty Contribution to Institutional Repositories". In Proc. Joint Conference on Digital Libraries. Chapel Hill, NC, USA: ACM Press
- Whitehead, D. (2005) "Repositories: What's the Target? An ARROW Perspective". In Proc. International Association of Technological University Libraries Conference. Quebec City, Canada: IATUL
- DSpace. http://www.dspace.org,
retrieved: 15 February 2007.
- ePrints. http://www.eprints.org/software/,
retrieved: 15 February 2007.
- Mark, T. and M.K. Shearer, Institutional Repositories: A Review of Content Management Strategies, in World Library and Information Congress: 72nd IFLA General Conference and Council. 2006: Seoul, Korea.
- Agosto, D.E. (2002) "Bounded Rationality and Satisficing in Young People's Web Based Decision Making". Journal of the American Society for Information Science and Technology. 53(1): p. 16-27.
- Bell, S.J. (2004) "The Infodiet: How Libraries Can Offer an Appetizing Alternative to Google". Chronicle of Higher Education. 50(24): p. 15.
- De Rosa, C., et al. (2005) "Perceptions of Libraries and Information Resources". OCLC
retrieved: 23 May 2007.
- Miller, R.M. (2006) "Readers' Attitudes to Self-Archiving in the UK". Napier University, Edinburgh. Masters dissertation thesis.
- Gadd, E., C. Oppenheim, and C. Probets (2003) "RoMEO Studies 3: How Academics Expect to Use Open Access Research Papers". Journal of Librarianship and Information Science. 35(3): p. 171-187.
- Davis, P.M. and M.L.J. Conolly (2007) "Institutional Repositories: Evaluating the Reasons for Non-Use of Cornell's Installation of Dspace". D-Lib 13(3), http://www.dlib.org/dlib/march07/davis/03davis.html,
retrieved: 13 March 2007.
- Nicholas, D., P. Huntington, and H.R. Jamali (2007) "The Impact of Open Access Publishing (and Other Access Initiatives) on Use and Users of Digital Scholarly Journals". Learned Publishing. 20(1): p. 11-15.
- Nielsen, J., Know the User, in Usability Engineering, J. Nielsen, Editor. 1994, Morgan Kaufmann: San Francisco, CA. p. 73-77.
- Kim, J. (2006) "Finding Documents in a Digital Institutional Repository: DSpace and ePrints". In Proc. 68th Annual Meeting of the American Society for Information Science and Technology. Charlotte, North Carolina: American Society for Information Science and Technolgy
- Ottaviani, J. (2006) "University of Michigan DSpace (a.k.a. Deep Blue) Usability Studies: Summary Findings". University of Michigan,
retrieved: 7 March 2007.
- Atkinson, L. (2006) "The Rejection of D-Space: Selecting Theses Database Software at the University of Calgary Archives". In Proc. 9th International Symposium on Electronic Theses and Dissertations. Quebec City, QC, Canada
- Huntington, P., et al. (2006) "Article Decay in The Digital Environment: An Analysis of Usage of OhioLINK by Date of Publication, Employing Deep Log Methods". Journal of the American Society for Information Science and Technology. 57(13): p. 1840-1851.
- Nicholas, D. and P. Huntington (2006) "Electronic Journals: Are They Really Used?" Interlending and Document Supply. 34(2): p. 48-50.
- Nicholas, D., et al. (2006) "The Information Seeking Behaviour of the Users of Digital Scholarly Journals". Information Processing and Management. 42(5): p. 1345-1365.
- Nicholas, D., P. Huntington, and D. Watkinson (2005) "Scholarly Journal Usage: The Results of Deep Log Analysis". Journal of Documentation. 61(2): p. 248-280.
- Jones, S., S.J. Cunningham, and R. McNab (1998) "An Analysis of Usage of a Digital Library". In Proc. European Conference on Digital Libraries. Heraklion, Crete: Springer p. 261-277.
- Mahoui, M. and S.J. Cunningham (2000) "A Comparative Transaction Log Analysis of Two Computing Collections". In Proc. European Conference on Digital Libraries. Lisbon, Portugal: Springer p. 418-423.
- Mahoui, M. and S.J. Cunningham (2001) "Search Behavior in a Research Oriented Digital Library". In Proc. European Conference on Digital Libraries. Darmstadt, Germany: Springer p. 13-24.
- Nordlie, R. (1999) "User Revealment"- A Comparison of Initial Queries and Ensuing Question Development in Online Searching and in Human Reference Interactions". In Proc. 22nd Annual ACM Conference on Research and Development in Information Retrieval. Berkeley, CA, USA: ACM Press p. 11-18.
- Kuhlthau, C.C. (1999) "Inside the Search Process: Information Seeking from the User's Perspective". Journal of the American Society for Information Science and Technology. 42(5): p. 361-371.
- Marchionini, G. (1995) "Information Seeking in Electronic Environments". Cambridge Series on Human-Computer Interaction, ed. J. Long. Vol. 9, Cambridge, UK: Cambridge University Press.
- Crabtree, A., et al. (1997) "Talking in the Library: Implications for the Design of Digital Libraries". In Proc. Second International ACM Conference on Digital Libraries. Philadelphia, PA, USA: ACM Press p. 221-228.
- Nicholas, D., et al. (2006) "Finding Information in (Very Large) Digital Libraries: A Deep Log Approach to Determining Differences in Use According to Method of Access". Journal of Academic Librarianshi p. 32(2): p. 119-126.
- Pinfield, S., M. Gardner, and J. MacColl (2002) "Setting Up an Institutional e-Print Archive". Ariadne Issue 31, March/April 2002 http://www.ariadne.ac.uk/issue31/eprint-archives/ retrieved: 7 March 2007.
- Pinfield, S. (2001) "How Do Physicists Use an E-Print Archive? Implications for Institutional E-Print Services". D-Lib 7(12),
retrieved: 28 Feb 2007.
- Piorun, M.E., L.A. Palmer, and J. Comes (2007) "Challenges and Lessons Learned: Moving From Image Database to Institutional Repository". OCLC Systems and Services. 23(2): p. 148-157.
- Woodland, J. and J. Ng, "Too Many Systems, Too Little Time": Integrating an Institutional Repository into a University Publications System, in Vala 2006 13th Biennial Conference and Exhibition. 2006: Melbourne, Australia.
- Bosc, H. and S. Harnad (2005) "In a Paperless World: a New Role For Academic Libraries". Learned Publishing. 2005(18): p. 95-100.
- Cervone, F.H. (2004) "The Repository Adventure". Library Journal. 129(10): p. 44-46.
- Bell, S., N. Fried Foster, and S. Gibbons (2005) "Reference Librarians and the Success of Institutional Repositories". Reference Services Review. 33(5): p. 283-290.
- Carver, B. (2003) "Creating an Institutional Repository: A Role for Libraries". Ex-Libris,
http://marylaine.com/exlibris/xlib181.html, retrieved: 7 March 2007.
- Jenkins, B., E. Breakstone, and C. Hixson (2005) "Content In, Content Out: The Dual Role of the Reference Librarian in Institutional Repositories". Reference Services Review. 33(3): p. 312-324.
- Lyon, L. (2003) "eBank UK: Building the Links Between Research Data, Scholarly Communication, and Learning". Ariadne Issue 36, July 2003 http://www.ariadne.ac.uk/issue36/lyon/, retrieved: 19 March 2007.
- Proudman, V. (2006) "The Nereus International Subject Based Repository: Meeting the Needs of Both Libraries and Economists". Library Hi Tech. 24(4): p. 620-631.
- Bevan, S.J. (2007) "Developing an Institutional Repository: Cranfield QUEprints -- A Case Study". OCLC Systems and Services. 23(2): p. 170-182.
- Fried Foster, N. and S. Gibbons (2005) "Understanding Faculty to Improve Content Recruitment for Institutional Repositories". D-Lib11(1),
retrieved: 12 January 2007.
- Carr, L. and S. Harnad (2005) "Keystroke Economy: A Study of the Time and Effort Involved in Self Archiving". University of Southampton,
retrieved: 12 February 2007.
- Cunningham, S.J., et al. (2007) "An Ethnographic Study of Institutional Repository Librarians: Their Experiences of Usability". In Proc. Open Repositories 2007. San Antonio, TX, USA
- Roda, C., et al. (2005) "Digital Image Library Development in Academic Environment: Designing and Testing Usability". OCLC Systems and Services. 21(4): p. 264-284.
- Chabot, S., (2006) "The DSpace Digital Repository: A Project Analysis" posted to Subject/Object:The Personal Website of Steven Chabot, Armarian on 9 November 2006.
retrieved: 28 May 2007.
- Sefton, P., (2006) "The Affiliation Issue in Institutional Repository software" posted to PT's Outing on 6 June 2006.
retrieved: 28 May 2007.
- Millington, P. and W.J. Nixon (2007) "EPrints 3 Pre-Launch Briefing". Ariadne Issue 50, January 2007
retrieved: 28 May 2007
- Harnad, S., et al. (2004) "The Access/Impact Problem and the Green and Gold Roads to Open Access". Serials Review. 30: p. 310-314.
- Australian Research Repositories Online to the World