Filling Institutional Repositories: Practical Strategies from the DAEDALUS Project
DAEDALUS  is a three-year project based at the University of Glasgow funded under the JISC Focus on Access to Institutional Resources (FAIR) Programme . The main focus of the project has been developing institutional repositories to hold content ranging from peer-reviewed published papers to theses and working papers. Separate repositories have been developed for published material  and other material .
This article will detail some of the strategies we have adopted in gathering existing content for the institutional repositories we have developed at Glasgow. It will not consider the case that requires to be made in order to persuade academics to deposit content in repositories, as this is already well documented. Instead it will concentrate on practical strategies that can be used to identify potential content and which will result in real content being added to a repository. The majority of the strategies discussed should be seen in the context of the need to gather existing content in order to populate a repository. At the outset of developing a repository it is vital to be able to demonstrate to stakeholders how it will work, and it is only possible to do this with content in place. Addressing the need to establish a process whereby academics systematically self-archive or, at the very least, provide publication details on an ongoing basis requires a different approach.
This article will concentrate solely on published peer-reviewed journal articles, as this is the area which has proved most challenging in terms of persuading academics to give us their content, and also in relation to publishers' copyright agreements. This material is at the heart of the open access/scholarly communications crisis debate. Although the project is also collecting departmental working papers, technical reports and other related material we have not experienced any major difficulties in persuading University departments to allow this content to be deposited. Similarly, although the process of changing the University regulations in order to permit the electronic deposit of theses is proving slow we do not anticipate being faced with the same sort of challenges in gathering this content. We hope that institutions in the process of building repositories will find our experiences useful.
At the outset of the project it was clear that it would be vital to get academics on board as soon as possible. A first step in this direction was to ask prominent academics from the three territorial research areas of the University to become members of the DAEDALUS Project Board. This helped us to begin developing relationships with the different faculties of the University. A small number of academics known to be interested in open access issues were also contacted and asked if they would be willing to submit content. However, the main activity in the first few months of the project consisted of giving presentations to a wide variety of University committees, ranging from faculty research committees to departmental and library committees. Our focus during these initial talks was the benefits of institutional repositories to academics and information about how they could contribute content. It soon became clear that most (though not all) academics were fairly sympathetic to the aims of the project, though many of them were concerned about copyright issues.
Despite a generally encouraging response, this did not translate into real content being deposited in the repository. During talks to staff we had explained that project staff would be happy to add content on behalf of authors, although a self-archiving facility was also in place. We found that it was difficult to get staff to give or send us electronic copies of their papers, even when they had promised to do so. This was our first indication that while staff may be sympathetic many of them do not have the time or the inclination to contribute. They were happy to give us permission to do the work on their behalf, but could not commit to doing the work themselves. Clearly the advantages of institutional repositories were not yet sufficiently convincing to academics to persuade them to play an active part in the process.
Within the first year of the project a University-wide event on open access and institutional repositories was held. The event was publicised by email and in the University Newsletter. Subsequent to the event each of the attendees was contacted individually, and follow-up meetings were arranged. It was hoped that this would be a good method of generating content. However, although it helped to open a dialogue with academics, this did not always translate into content.
With levels of content in the service still relatively low it was clear that additional strategies would need to be developed. Experience so far had indicated that relying primarily on staff to come up with content was not sufficient. Practical strategies that would actually result in some content in the repository were required.
Staff Web Sites
As a means of gathering support from staff who were likely to be sympathetic to the open access cause a survey of personal staff Web pages was carried out. This enabled us to establish which members of staff were already in the habit of making the full text of publications available on their personal Web sites. An approach was made to these staff explaining the aims of the project, and asking if they would be interested in their content being made available in the repository. In most cases the staff we contacted were keen, although some pointed out that they thought publishers would not mind articles being made available on personal sites, but would be less keen on the organised nature of an institutional repository. Others had not considered the fact that they might not actually be permitted even to post articles on a personal site. However, the majority of academics we contacted were happy for us to establish which of their publications could be added to the repository.
Publisher Copyright Agreements
In adopting this strategy we were committing to checking the copyright agreements pertaining to each of the individual articles we hoped to add. This proved to be a challenging and time-consuming activity. The only central resource currently available for checking publisher copyright policies is the list created by the RoMEO Project (Rights Metadata for Open Archiving) and now maintained by the SHERPA Project (Securing a Hybrid Environment for Research Access and Preservation) . While an extremely useful resource and one that is growing all the time, the list does not cover all publishers. As this is the case it has been necessary to track down policies from publishers' Web sites, or to contact publishers directly where these do not exist or where they do not address the issue of whether an author is permitted to make his or her paper available in a repository. No two publisher polices are exactly the same, and many do not explicitly state what rights authors have in relation to repositories. In some cases this may be deliberate, but in many cases the lack of information is more likely to stem from a lack of awareness on the part of the publisher that authors want to know what rights they retain in relation to deposit in repositories. Interpreting publisher copyright policies is also a difficult area, particularly as there is no real precedent and no case law.
Where copyright policies did not exist or where they were unclear, we contacted the publishers directly and asked for permission. Where it was possible to identify an individual responsible for rights management we contacted them directly either by email or by letter. Generally speaking we have found that publishers are happy to accept such requests by email. Although some publishers reply quickly, others may take some weeks and some do not reply at all. We found that publishers were more likely to give permission for specific papers to be added than to outline their general policy on the issue. Consequently permissions for most articles have to be established on a case-by-case basis.
As an additional means of populating the repository we decided to identify journals and/or publishers with copyright policies which permitted deposit in repositories. The SHERPA/RoMEO list was particularly helpful in this respect. Having established a particular journal to target, our next step was to find out which Glasgow authors had published in the journal. The easiest means of doing this was by searching standard abstracting and indexing databases such as Web of Science, MEDLINE, etc. using the Journal Title and Author Affiliation fields. Having established who our target authors were we then made contact with them. Our approach was to explain the aims of the project, explain that we were contacting them as authors of an article within a particular journal, outline the copyright policy saying that this permitted articles to be added to a repository, and then ask for permission to add the article. In some cases we asked the authors to let us know if they were willing to give permission or not, but we have also made use of an opt-out strategy, whereby we said to authors that we would go ahead and add the article unless they asked us not to. We have only adopted this opt-out policy after discussion with a senior member of staff within the department or faculty concerned. So far no members of staff have ever got back to us and asked us not to add their article, but it is unclear whether this is because they support the project or because they have no strong feelings about the issue. Contacting staff individually and asking for a response allows a relationship to be developed, but issues of lack of time and apathy do mean that it is inevitable that some members of staff will not reply to requests.
As a small case study we decided to choose a journal with a copyright agreement that left us in some doubt as to whether it permitted deposit in an institutional repository. We were aware that the journal Nature had publicised the fact that from February 2002 they no longer required authors to sign away their copyright but were instead asking authors to sign an exclusive license. The message from the publishers was that this would permit authors to retain more rights than previously, as they would be 'free to reuse their papers in any of their future printed work, and have the right to post a copy of the published paper on their own websites'  . This statement was further clarified within the Nature Author License FAQ :
'The license says I may post the PDF on my "own" web site. What does "own" mean?
It means a personal site, or portion of a site, either owned by you or at your institution (provided this institution is not-for-profit), devoted to you and your work. If in doubt, please contact firstname.lastname@example.org'
As this does not explicitly include institutional repositories, (they are not devoted to the work of one particular author), we decided that it would be a useful exercise to contact Nature and ask if deposit in an institutional repository fell within the terms of the license. In order to do this we began by establishing which Glasgow authors had published in Nature, (following the same procedure as outlined above). We then contacted each of them individually and asked if they would be willing for us to approach Nature on their behalf. Of 22 individuals contacted 16 replied, and all were happy for us to go ahead. We were pleasantly surprised to receive a positive reply from Nature, indicating that they were willing to allow us to add the articles to the repository so long as the authors were members of staff at the University. Interestingly they did not fully address the issue of a perceived distinction between personal Web sites and institutional repositories, and this meant that it was not entirely clear whether their positive response could be applied more widely to other Nature Publishing Group publications.
The authors were then contacted to advise them that permission had been granted. Again we did not experience any resistance to going ahead and adding the articles to the repository. The Nature articles did raise an interesting practical issue - as many of the articles were fairly short they frequently ended half way down a page. This meant that sections of unrelated articles also appeared on the same page. Seeking permission from the authors of these unrelated articles was not practical, but we did not feel that blanking out areas of the page was appropriate either. As Nature had specifically indicated that we could use the pdf copy available on the journal Web site we made the decision to use this version without making any changes.
Open Access Journals as a Source of Content
Assuming that Glasgow academics who had published in open access journals would not be averse to these articles being made available in the repository, we identified relevant articles from BioMed Central journals , and approached the authors concerned. It would be useful to be able to identify additional content in other open access journals, but so far we have not found an easy way of doing this. The Directory of Open Access Journals  is very useful, but it does not enable searching by institution or author affiliation. Other repositories have been able to add the entire content of a particular open access journal where that journal is hosted by the institution, and this is a useful way of seeding a repository.
All of the strategies described so far have been relatively small-scale approaches to the challenge of filling an institutional repository with content. While very worthwhile in the short term, it is clear that such strategies are not sustainable, particularly as they rely on a large investment of project staff time for relatively small gains. In addition, the strategies deal with existing content, and do not address the need to establish a system whereby new content is added to the database automatically, (either through self-archiving or some other systematic method of gathering content). It was clear that a more wide-ranging strategy was necessary.
Faculty and Departmental Publications Databases
Any strategy likely to be successful had to take into account the fact that most academic staff, while supportive, were unwilling to deposit content in the repositories themselves. Until such times as they are required to do so, either by the institution or by funding bodies, it seems unlikely that the majority of staff will undertake this activity. At the same time, we needed to find a way of gathering content that was more systematic than our current approach. Taking advantage of the fact that each of the faculties within the University were at the beginning of gathering publication details for the forthcoming Research Assessment Exercise, we decided to investigate the possibility of becoming involved in this process. A number of faculties had chosen to use bibliographic software packages such as Reference Manager to collate details about their publications. As staff were going to be required to provide publications details to departmental or faculty administrative staff for inclusion in the database, we started to test out the possibility of importing bibliographic details from Reference Manager into our repository for published and peer-reviewed papers. A Perl script was written which enables bibliographic details in RIS format (a tagged format developed by Research Information Systems) to be imported directly into the ePrints.org software.
Having established that this process was feasible technically, it was then necessary to convince the faculties to allow us to import the details from their databases. One faculty already had a substantial publications database developed over several years, but most faculties were in the position of starting from scratch. This proved to be an advantage, as the Faculty Support Teams within the Library were approached by a number of faculties to aid with the creation of such databases. In return for carrying out this work we have been able to open a dialogue about the possibility of importing the records into our repository for published and peer-reviewed papers. We are currently in discussion with several faculties about the practicalities of setting this up, initially as a pilot. Such imports will only populate the repository with bibliographic details, and so from each faculty we are seeking authorisation to go ahead and add full text wherever this is permitted without the requirement to contact all authors individually. Gaining such a commitment from even one faculty would have a significant impact on the amount of content in the repository.
It is important to point out that publications databases maintained by departments and faculties are not viewed as a competitor to the repository we are developing. In many cases they are used to hold data in addition to bibliographic details, e.g. relating to research grants etc., and this is not information which would be appropriate for the public domain. In addition, software such as Reference Manager is not OAI-compliant, and so is not suitable for being searched by harvesters such as OAIster . Instead we hope that the two can be complementary, and that our repository will form the publicly accessible face of the information in the databases.
Ultimately we are aiming to develop a workflow which would enable us to add content systematically on a University-wide basis. This would operate on the basis that each faculty or department would create and maintain a locally held publications database using Reference Manager or a similar package. We anticipate that this would operate in the following way:
- Academics periodically provide updated publications details for departmental/faculty publications database;
- Departmental/faculty publications database administrators provide periodic updates for the repository by sending records for import into ePrints;
- Bibliographic details are imported into ePrints;
- Full text of articles is added where publishers permit - where possible the pdf version will be used. If this is not possible or we do not have a subscription to the title in question, staff will be contacted directly to ask if they can provide an electronic copy.
Although the proposed workflow allows a significant percentage of the process to be automated staff input will still be necessary. Most significantly time will need to be spent checking whether publisher copyright agreements allow articles to be added or not. In addition, metadata will have to be checked and subject headings will have to be added. The resource implications of such a model should become clearer over the next few months of the project as we begin to import details from faculty publications databases on a trial basis. This will help us establish whether such a model will be sustainable in the long term. However, a key element of the model is that academics will only be required to submit details of their publications once, thus no additional work is required on their part.
Filling a repository for published and peer-reviewed papers is a slow process, and it is clear that it is a task that requires a significant amount of staff input from those charged with developing the repository. Although we have succeeded in adding a reasonable amount of content to the repository we have also been offered significant amounts of content that cannot be added because of restrictive publisher copyright agreements. In some cases academics have offered between ten and twenty articles and we have not been able to add any of them to the repository. This is a clear demonstration that major changes need to take place at a high level in order for repositories to be successful. Although some academics have taken the decision to try and avoid publishing in the journals of publishers with restrictive policies, this is still relatively rare. We can inform staff about the issues, but we cannot and should not dictate in which journals they publish. Change is only likely to happen if staff are required, either by the funding councils or by their institution, to make their publications available either by publishing in open access journals or in journals that permit deposit in a repository. Academics also need to be assured that their chances of scoring highly in the Research Assessment Exercise will not be adversely affected by publishing in open access journals. It is clear that while academics can see the benefits of institutional repositories, there has not yet been a sufficient cultural shift to persuade them to take action. It will be very interesting to see whether the policy adopted by the Queensland University of Technology requiring academics to deposit their research outputs in the University's Eprint repository  is more widely adopted.
On a more positive note, there have been a number of recent encouraging developments, in particular the statement from the Wellcome Trust supporting open access , and the ongoing Parliamentary Inquiry into Scientific Publications. Changes in the scholarly communications process at this level will make a huge difference to the success of the institutional repository movement. At the same time open access issues are starting to become mainstream news for academics, and greater awareness of the issues can only help the development of repositories. It will be critical for repositories to start to prove themselves in the foreseeable future, and it is to be hoped that such developments will go a long way towards helping them to do this.
- DAEDALUS Project http://www.lib.gla.ac.uk/daedalus/index.html
- JISC FAIR Programme http://www.jisc.ac.uk/index.cfm?name=programme_fair
- DAEDALUS repository for published and peer reviewed material http://eprints.gla.ac.uk
- DAEDALUS repository for pre-prints, grey literature, theses etc. https://dspace.gla.ac.uk
- SHERPA/RoMEO list of publisher copyright policies http://www.sherpa.ac.uk/romeo.php
- Nature Author News
- Nature Author License FAQ
- BioMed Central http://www.biomedcentral.com
- Directory of Open Access Journals http://www.doaj.org
- OAIster http://www.oaister.org
- Queensland University of Technology: E-print repository for research output at QUT (Policy document) http://www.qut.edu.au/admin/mopp/F/F_01_03.html
- Wellcome Trust Position Statement in support of open access publishing http://www.wellcome.ac.uk/doc_WTD002766.html