Web Magazine for Information Professionals

Three Perspectives on the Evolving Infrastructure of Institutional Research Repositories in Europe

Marjan Vernooy-Gerritsen, Gera Pronk and Maurits van der Graaf report on the most significant results from two surveys conducted to provide an overview of repositories with research output in the European Union.

Since 2006, the EU-sponsored DRIVER Project has aimed to build an interoperable, trusted and long-term repository infrastructure. As part of the DRIVER Project, a survey was carried out in order to obtain an overview of repositories with research output in the European Union in 2006 [1]. This study was updated by an expanded survey in 2008, in which 178 institutional research repositories [2] from 22 European countries participated. In this article we will present the most important results [3].

As we will argue in this article, the institutional research repositories can be seen as an important innovation to the scientific information infrastructure. There are three kinds of stakeholder directly involved in this: authors, institutions and information users [4]. From the perspective of the authors, the institutional research repository will have an important function as an electronic archive for their own research output, for example, material published elsewhere, but also working papers, internal records and other 'grey literature', that are not (as yet) published elsewhere. Authors might use the electronic archive for generating a publications list on their own Web site, sending out URLs of publications to colleagues and so on. From the perspective of the institutions, the research repository might have two functions:

  1. an administration tool for the institutions in relation to annual reports, research assessment exercises, etc, and;
  2. a way of showcasing the research output of the institutions. From the perspective of the information user, the research repositories might be a source for grey literature and an alternative route to toll-access literature.

First, we will discuss the state of the art of research repositories in Europe using the results of the survey. Thereafter, we will discuss other eye-catching results of the survey from the viewpoint of the above-mentioned three stakeholders in research repositories. Lastly, we will discuss the stages in innovation adoption and implementation of research repositories and propose a three-track action plan for the further development of the research repository infrastructure in Europe.

State of the Art

Growing Number of Research Repositories in Europe

The number of OAI-PMH-compliant institutional research repositories in the European Union including Norway, Switzerland and Croatia can be estimated on the basis of the response rate of the survey at 280 to 290 respondents. They appear to have increased by some 25 to 30 repositories per year over the last 3 years. This number is lower than generally is assumed on the basis of the number of repositories registered in OPEN DOAR and similar registries. The difference lies in our definition: the term 'repository' is now used in a much broader sense than a few years ago, especially for datasets in the area of archives and heritage collections. Therefore, the definition of which repositories to include in the 2008 survey has been refined with the phrase 'containing research output from contemporary researchers'. The estimate of 280 to 290 research repositories in Europe means that – when compared to the number of universities in Europe of 593 according to the European University Association (full members) – nearly half of the universities have now implemented an institutional research repository.

One-third Full Text: More than 60% 'Grey' Literature

Which publication types and other materials are presently covered by the research repositories in Europe? The large majority of research repositories is focused on the full text of various publication types, while less than half also contain metadata-only records relating to publications. But quantitatively, metadata-only records take up 51% of all records, while full-text records take up one-third (see pie chart below in Figure 1). Only a small percentage of the records contains non-textual materials such as primary datasets, images, video and music. A closer look at the full-text records in the repositories shows that 62% of these records are grey literature (theses, proceedings and working papers); 38% of those records contain primary literature: journal articles and books/book chapters (see pie chart below in Figure 2).The respondents also estimated the number of records of each type in their repositories. From these data it appears that a typical research repository in Europe contained in total 8,545 items in September 2008.

diagram (13KB) : Figure 1 : Record types in research repositories. Full text: 33%; metadata only: 51%; non-textual materials: 4%; other materials: 12%

Figure 1: Record types in research repositories

diagram (14KB) : Figure 2 : Publication types of full-text records. Articles: 34%; books: 4%; theses: 39%; proceedings: 14%; working papers: 9%

Figure 2: Publication types of full-text records

Wide Variation in Depositing Work Processes

How is the material deposited in the research repository? The results of a question in the survey about the work processes of depositing are presented in the pie chart below (see Figure 3). Work processes vary from self-depositing by academics to independent collection of the materials by the repository staff. Compared to the results of the 2006 survey, there is a remarkable increase in the percentage of repositories that use a combination of various workflows (28% in 2006 versus 44% in 2008).

Is that a welcome development? We think that from the perspective of the authors, self-depositing will be the preferred working method, as they will use the repositories as an electronic archive for their materials and might not like a time-consuming procedure before they can ever be used. From the perspective of the institutions however, comprehensiveness might be important, especially with the purpose of showcasing the research output of the institution. In that instance, workflow C - collecting materials independent from the academics - might be helpful.

diagram (18KB) : Figure 3: Work processes. A: self-depositing by academics; quality control by specialized staff members: 21%; B: delivery by academics; depositing by specialized staff members: 20%; C: collection of materials by staff members indemepdent of academics: 9%; Combination of A, B or C: 44%; Other: 6%.

Figure 3: Work processes

Progress in Technical Harmonisation

Technical harmonisation is an important goal of the DRIVER Project and essential in order to build a common infrastructure for research repositories. Some progress has been made over the last years (see Table 1 below).

Which software package is used for the research of repository?
2008
ARNO
1.1%
CDSware
3.4%
Digitool
1.1%
DIVA
2.2%
DSpace
30.3%
Fedora
2.2%
GNU Eprints
19.7%
iTOR
0.6%
MyCoRe
2.2%
OPUS
8.4%
VITAL
0.6%
locally developed software package
16.9%
other
11.2%

Table 1: Table displaying which software package is used by research repositories.

Firstly, the software market is still fragmented, but has two clear market leaders (Dspace and GNU Eprints with 50% market share, an increase over the 43.9% in 2006). Secondly, the DRIVER guidelines have been developed by the DRIVER project to ensure high-level interoperability and retrieval of content. From the survey it appears that 82% of the respondents knew about the DRIVER guidelines and 54.5% make every effort to follow them.

Three Perspectives

Author Perspective: Electronic Archive Function

The function of a research repository as an electronic archive for the depositing author will be greatly enhanced by a persistent identifier for each deposited item and a guarantee of long-term availability for the entire archive. Personalised services, such as the ability to generate publication lists on the personal Web site of the author, will further enhance the utility of the research repositories for academics. The results of the survey are presented in Table 2 below. While some form of persistent identifier has now been implemented by most research repositories, together with long-term availability in more than half, the number of research repositories with personalisation functionality is still limited to less than 30%.

Survey data related to the electronic archive function2008
Persistent identifier84.3%
Long-term availability52.2%
Personal services28.7%

Table 2: Survey data related to the electronic archive function

Another important topic for the depositing authors might be the variation in the forms of availability of full-text content supported by the repository: Open Access, Open Access with embargo period, Campus Access or No Access (archive only). The percentages of repositories offering the various forms of availability for full text are presented in Table 3 below. It would appear that the repositories have been offering more options in recent years. However, from an additional analysis of the 2008 data, it appears that 47% of the research repositories still offer only one form of availability, namely the Open Access option. For Dutch repositories, SURFfoundation recommends the Open Access option as the default option in the deposit workflow while giving authors the choice of other options.

With regard to the availability of the full text materials (articles, books, book chapters, theses etc.): how are they available?2008 2006
Open Access: publicly available96.6%94.7%
Open Access with embargo: publicly available after a certain period of no access32.6%18.4%
Campus Access: only available for users within our institution30.3%26.3%
No Access: archived but NOT available at all18.0%14.0%
Other6.7%7.9%

Table 3: How full-text materials are made available

Lastly, the depositing author of full text of material published elsewhere is confronted with copyright rules and - in the case of journal articles - the question of which version should be deposited. From Table 4 displaying the data from both surveys, it appears that there is a clear trend from the pre-print form and/or the published form towards the post-print form.

Which statement best describes the form of journal articles in your research repository? 20082006
Most articles are available in pre-print form only (pre-refereeing)10.3%17.6%
Most articles are available in postprint form (final draft post-refereeing)46.2%30.4%

Table 4: Form of articles in research repositories

Institutional Perspective: Administration Tool and Showcasing Research Output

Comprehensive coverage is an important success factor for the function of showcasing research output as well as for the administration tool function for annual reports and/or research assessment exercises. Coverage by the research repositories was estimated by the respondents of the 2008 survey to be on average 35% of the research output of their institutions. In another estimate, respondents indicated that the percentage of academics of their institution delivering material to research repositories was on average 33%. These estimates are similar to those made by the respondents to the 2006 survey, suggesting no real progress in this respect.

This finding leads to the next question: what is the institutions' policy with regard to depositing? The results are presented in the Table 5 below, together with the figures from the 2006 survey. Nearly one-third of the institutions (32%) have some sort of (partly) mandatory deposit. This percentage has increased somewhat since 2006 (24.6%). Just over half of the institutions have an official policy of voluntary deposit while nearly 15% have not formulated any official policy.

Which statement best describes the policy of your institution for the academics with regard to depositing material? 20082006
Mandatory deposit: academics are required to deposit materials11.8%8.8%
Partly mandatory deposit: academics are required to deposit some materials (for example theses), and are free to voluntarily deposit other materials20.2%15.8%
Voluntarily deposit with strong encouragement: academics are strongly encouraged to deposit materials29.2%30.7%
Voluntarily deposit: academics are free to deposit materials23.0%20.2%
There is no official policy14.6%21.9%
Other1.1%2.6%

Table 5: Institutional policy on deposit

User Perspective: Accessibility and Retrievability

Accessibility – and in particular accessibility of full text – and retrievability is the most important issue for the users of the repositories. In Table 6 below, data from the survey on accessibility are presented. Overall, accessibility of repository information by search engines appears to be increasing. The large majority of the repositories are now accessible via Google (91%) and Google Scholar (72.5%). In 2006 these proportions were much lower (respectively 64.9% and 51.8%). However, accessibility by other specialised search engines (Scirus, Scientific Commons) is limited, while just under half of the contents of repositories are also listed in the local and/or regional library catalogue (see Table 6 below). Retrievability of items in a repository will be enhanced by the addition of an author identifier together with keyword- and subject- indexing (in the English language or in other languages). The data from the survey (see Table 6) show that there is room for improvement in these respects and no noticeable progress since 2006.

Factors related to accessibility and retrievability
Accessible via:
2008
2006
Google
91.0%
64.9%
Google Scholar
72.5%
51.8%
Scirus
29.8%
18.4%
Scientific Commons
28.7%
-
Library catalogue
47.8%
53.5%
National or regional catalogue
44.4%
47.4%
Enhanced retrieval with:
2008
2006
Author identifier
30.9%
32.5%
English language keyword- and subject indexing
51.7%
60.5%
Keyword- or subject indexing in a non-English-language
34.2%
31.6%
No keyword-orsubject indexing system
14.0%
7.9%

Table 6: Factors related to accessibility and retrievability

Next Steps in the Further Development of European Repositories

Actions at Institutional Level

Institutional research repositories are a major innovation within the scientific information infrastructure. In the box below a model for adoption and implementation of IT innovations within an organisation is described [5].

Stages in Innovation Adoption and Implementation
  1. Initiation: a match is found between innovation and its application in the organisation
  2. Adoption: a decision is reached to invest resources to accommodate the implementation effort
  3. Adaptation: the innovation is developed, installed and maintained. Procedures are developed and maintained. Members are trained both in the new procedures and in the innovation.
  4. Acceptance: organisational members are induced to commit to the innovation's usage.
  5. Routinisation: usage of the technology application is encouraged as a normal activity.
  6. Infusion: increased organisational effectiveness is achieved through using the IT application in a more comprehensive and integrated manner to support higher-level aspects of work.

The above model implies that the first stages – initiation, adoption and adaptation – will be mainly processes on the level of a research institution or university. This survey shows that nearly 300 institutions in Europe have already passed the adoption stage and this number is increasing annually by approximately 30 institutions. To help the new entrants in the research repository world, DRIVER has set up a number of programmes such as the DRIVER Guidelines, the Mentor Service and the Tutorial for data providers [6]. Clearly, these kind of programmes, supported by lobbying efforts directed at decision makers within the research institutions, will be needed for some years to come in order to reach the stage whereby the overwhelming majority of research institutions will have a research repository.

Actions at the Authors' Level

With regard to authors within an institution with a research repository, the stages of acceptance and routinisation will apply. A series of studies [7][8][9][10] has been carried out in the area of self-archiving. The results point to two gaps in the self-archiving behaviour of academic authors:

Actions by research repositories with regard to authors should therefore focus on closing these gaps. We emphasise the need for a user-friendly depositing procedure if we are to convince authors who still use their departmental or personal Web sites for archival purposes to change to their institutional repository instead. We also recommend personalisation functionality to authors such as the option to generate a publication list on their personal homepage and so on. In addition, the manifest benefits of long-term availability and persistent identifiers are compelling arguments in wooing those authors. Research repositories should be offering services that will lower the threshold for authors who do not self-archive as yet, offering, for instance a wider variety in options of availability.

Why is there no mandatory policy in some institutions? From the comments made by respondents it appears that two kinds of mandatory policies currently prevail:

  1. only depositing of theses is mandatory
  2. the 'depositing' of metadata pertaining to research publications is mandatory.

Only a very few responses point to a mandatory policy for all research publications produced by the institution's researchers. One respondent commented: 'we have an institutional mandate, but: never urge an academic, you won't get anything. Convince them'. We concur with this and believe that a 'seduce-to-use' policy will ultimately be more successful.

Actions at the Infrastructure Level to Support Usage

When the institutional research repositories are fully incorporated in the scholarly communication system, authors and institutions can only reap the benefits of this innovation when the material in the repositories is widely used by users/readers of scholarly information. Optimal usage can be achieved through a reliable infrastructure for research repositories that enhances the accessibility and retrievability of their content. DRIVER aims to develop such a pan-European infrastructure, offering sophisticated services and functionality for authors, institutions and information users.

Conclusion

Clearly, the present state of research repositories is not yet in the final phase of innovation implementation called 'infusion', whereby the scholarly communication system as a whole will function at a higher level. However, working on three tracks to improve the functionality of the repositories for authors, institutions and users will in our view make this final stage of innovation adoption achievable within a decade.

References

  1. Maurits van der Graaf, "DRIVER: Seven Items on a European Agenda for Digital Repositories" 2007, Ariadne, Issue 52,
    http://www.ariadne.ac.uk/issue52/vandergraf/
    The European Repository Landscape; Maurits van der Graaf, Kwame van Eijndhoven, 2008
    http://www.aup.nl/do.php?a=show_visitor_book&isbn=9789053564103
  2. A research repository is defined by (1) containing research output from contemporary researchers and (2) OAI-PMH-compliant. In addition, 22 thematic repositories took part in the study: their results are not reported here.
  3. The complete results of this 2008 survey will be shortly presented in an elaborate report to be downloaded from DRIVER Support Web site http://www.driver-support.eu/
  4. The academic publishers are another important stakeholder, not to be discussed here. Their role will be studied extensively in the PEER Observatory - see http://www.peerproject.eu/
  5. R. B. Cooper, R.W. Zmud, Information technology implementation research: a technological diffusion approach, Management Science archive, Volume 36 , Issue 2 (February 1990), 123 – 139.
  6. Respectively:
    DRIVER Guidelines 2.0: Guidelines for content providers - Exposing textual resources with OAI-PMH, November 2008
    http://www.driver-support.eu/documents/DRIVER_Guidelines_v2_Final_2008-11-13.pdf
    About the DRIVER Project
    http://www.driver-support.eu/managers.html
    DRIVER: Mentor Service
    http://www.driver-support.eu/mentor.html
  7. A. Swan and S. Brown, Open access self-archiving, 2005.
  8. Antelman K., Self-archiving practice and the influence of publisher policies in the social sciences, Learned Publishing 19, 85-89, 2006.
  9. Loughborough University Institutional Repository: Finding open access articles using Google, Google Scholar, OAIster and OpenDOAR http://hdl.handle.net/2134/4084
  10. Faculty attitudes and behaviors regarding scholarly communication: survey findings from the University of California, 2007.

Author Details

Dr Marjan Vernooy-Gerritsen
Programme Manager IT & Research
SURFfoundation
Utrecht
The Netherlands

Web site: http://www.surf.nl/

Gera Pronk
Project Manager IT & Research
SURFfoundation
Utrecht
The Netherlands

Web site: http://www.surf.nl/

Maurits van der Graaf
Pleiade Management and Consultancy
Amsterdam
The Netherlands

Email: m.vdgraaf@pleiade.nl
Web site: http://www.pleiade.nl/

Return to top