Preserving Electronic Scholarly Journals: Portico
The work of academics - in teaching and research - is not possible without reliable access to the accumulated scholarship of the past. As scholars have become more dependent upon the convenience and enhanced accessibility of electronic scholarly resources, concern about the long-term preservation and future accessibility of the electronic portion of the scholarly record has grown. One recent survey found that 83% of academic staff surveyed believe it is 'very important' to preserve electronic scholarly resources for future use . As usage of electronic scholarly resources continues to grow, the urgency of this concern is likely to rise, and it is a concern that is well founded. Recent studies have found that 13% of Internet sources cited in three prestigious journals were not retrievable from the original hyperlink only 27 months after publication . The fragility of electronic resources has significant implications for scholars who are troubled by the possibility of gaps in the scholarly record and the impact that these may have upon their ability to generate new scholarship which builds upon the work of today's researchers. The concerns for libraries may be even more striking. In 2003-04, libraries surveyed by the Association of Research Libraries (ARL) expended total institutional resources of US$301,699,645 to license electronic materials. On average, 31% of total library material expenditures are devoted to electronic resources . If one extends this expenditure trend beyond the membership of the ARL, the total investment which libraries across the higher education community are making in licensing access to electronic resources is truly noteworthy and suggests that efforts to protect and preserve these resources would be a wise investment.
And yet even as scholarly use of electronic resources has grown and library expenditures have been increasingly directed toward electronic resources, reliable long-term preservation arrangements for this critical part of the scholarly record have not yet fully emerged. Because many librarians are uncertain of the source of ongoing preservation and access for the digital materials for which they are expending considerable institutional resources, they are, at least in the case of electronic scholarly journals, frequently continuing to receive, process and store the print format journals even as they license access to the electronic format. Because libraries do not yet feel secure in relying exclusively upon the electronic format they - and scholarly publishers - are not yet able to decrease the expenditures associated with the receipt and storage of print journals. In short, until a reliable archiving arrangement is in place, neither libraries nor publishers are fully able to make the transition to secure reliance upon the electronic format which is so clearly preferred by scholars, researchers and students . This need for a robust archiving solution is perhaps best expressed in the endorsement in late 2005 by the ARL, the Association of College and Research Libraries (ACRL) and others of the 'Urgent Action Needed to Preserve Scholarly Electronic Journals' statement which documents the urgency of the preservation need and recognises that now is the time for the library community to act in support of initiatives that will ensure enduring access to scholarly e-journals .
Characteristics of Digital Preservation Archive
Even as the urgency of the preservation need has become clear, key questions have remained about who is responsible for carrying out long-term preservation of digital resources, how this activity will be financed over the long term, and how the reliability of any given archival approach can be gauged. Some of the earliest answers to these fundamental questions were posed in the May 2002 report, 'Trusted Digital Repositories: Attributes and Responsibilities', produced by the Research Libraries Group (RLG) and OCLC. The current work underway at RLG and the Center for Research Libraries (CRL) is advancing this work even further as will the forthcoming survey commissioned by ARL .
From even this very early work, a number of key characteristics are emerging as essential to a reliable archiving effort. First, organisational mission is critical. Long-term preservation must be at the core of the mission of the organisation undertaking the archival activities. This focus will help to ensure that preservation will be at the fore as resource allocation decisions are made and organisational priorities identified. Second, the archival organisation must have an economic model able to sustain the work of long-term preservation. In order to reduce risk, it is important for this financial support to come from multiple sources so that the archive is not overly reliant upon - and vulnerable to - any single source of funding. Third, the archive must have a technological infrastructure - including hardware, software and appropriately skilled staff - to meet the challenge of preserving complex and varied electronic resources. There must also be a commitment and ability to maintain, enhance and update this infrastructure as technologies and formats change over time. Finally, the archive must have relationships with both libraries, which serve the interests of current and future students and researchers, and with content creators, such as scholarly publishers, who are responsible for creating and distributing the electronic resources. These five factors taken together form a useful foundation upon which a robust archival effort can be established, and as the community gains more experience with digital preservation, no doubt further requirements will also be identified.
From this early understanding of what is required to ensure that the long-term preservation of electronic scholarly resources grows, possible models for meeting this need are beginning to emerge. National libraries such as the Koninklijke Bibliotheek in the Netherlands are beginning to ingest and hold electronic journals, and the Library of Congress through the National Digital Information Infrastructure Preservation Program is building a network of archival partners that is working to ensure the long-term preservation of a variety of digital materials. Third parties such as OCLC, and more recently Portico, are also emerging . Long-term preservation is still such a new endeavour that we will no doubt continue to learn more about how to construct reliable archiving solutions; however, an overview of one entity, Portico, may provide a useful illustration of how the key characteristics of an archival entity can be given practical shape.
Portico: An Overview
Anticipating the need for robust preservation of electronic journals, in 2002, JSTOR launched a project which has now become Portico . Portico is a new, not-for-profit electronic archiving service established in order to address the scholarly community's critical and urgent need for a robust, reliable means to preserve electronic scholarly journals. Portico builds upon and advances JSTOR's efforts to provide a trusted and reliable community-based archive, and Portico works with JSTOR to expand significantly the preservation infrastructure developed on behalf of the scholarly community. JSTOR has provided initial support for Portico's development together with Ithaka, The Andrew W. Mellon Foundation and the Library of Congress. Portico's mission is to preserve scholarly literature published in electronic form and to ensure that these materials remain accessible to future scholars, researchers, and students.
Portico began as the JSTOR Electronic-Archiving Initiative launched by JSTOR with a grant from The Andrew W. Mellon Foundation and was intended to build upon the Foundation's seminal e-journal archiving programme . The charge of the initiative was to build an infrastructure and economic model able to sustain an electronic journal archive. Initially the focus of the project was designing and prototyping content handling and archival systems, crafting potential archive service models, testing possible models with libraries and publishers and a drafting business model able to support a long-term archival effort. For more than two years, project staff worked on the development of technologies necessary to meet the project objectives and engaged in extensive discussions with publishers and libraries to craft an approach that balances the needs of both communities while researching what would be necessary to build a sustainable business model for the archive. During 2004 the project was transferred to Ithaka, and efforts to hone the Portico archival service continued . These efforts involved wide-ranging discussions with a large and informal network of librarians from more than fifty academic institutions of all types and sizes and the engagement of ten publishers ranging from small scholarly societies, to a university press, and large commercial publishers who agreed to participate in the discovery phase of the project .
Building from the findings that emerged from our iterative and collaborative discussions with the community, a new service, now known as Portico, was shaped and launched in 2005. The Portico electronic archiving service is initially focused on the long-term preservation of electronic scholarly journals. The Portico archive, which is a centralised repository, is open to a scholarly publisher's complete list of journals, including those titles which may be published in electronic format only, or print and electronic formats, or which may have been 'reborn' or digitised from print. Portico is focused on preserving the intellectual content of the electronic scholarly journal; we do not attempt to recreate or preserve for the long term the exact look and feel of the journal or the publisher's Web site or delivery platform.
Portico's archival approach for electronic journals is managed preservation focused on the publishers' e-journal source files. Source files are the electronic files containing graphics, text, or other material that comprise an electronic journal article, issue, or volume. Source files may differ from files presented online most typically by including additional information such as richer mark-up or higher- quality graphics. Portico receives source files directly from the scholarly publishers who have agreed to contribute content to the Portico archiving service. Portico subjects the publishers' source files to a systematic normalisation process that migrates the content from the publishers' proprietary data structure to an archival format based upon the NLM Archive and Interchange DTD . Both the source file and the normalised files are retained in the archive, and Portico takes responsibility for the long-term preservation and management of the archived materials .
Accessing the Archive
Portico's normalisation efforts are focused on ensuring that content remains available and accessible into the future. Portico recognises that while access to e-journal literature today may not be a concern, librarians and their constituents do need to have assurance of future access, a theme echoed in the Urgent Action statement noted earlier . To address this need, all libraries supporting the Portico archive have campus-wide access to archived content when specific trigger events occur, and when titles are no longer available from the publisher or other source. Trigger events include when a publisher ceases operations; or ceases to publish a title; or no longer offers back issues; or suffers catastrophic and sustained failure of a publisher's delivery platform.
In addition to these trigger events, both publishers and libraries have recognised that in some cases, even after a library has terminated a licence to an electronic resource, it may be necessary for that library to continue to have ongoing access. This is commonly known as 'perpetual access' or post-cancellation access. A publisher may choose to extend perpetual access to a library and that access can be provided through the Portico archive, if the publisher desires. In addition, select librarians at participating libraries are granted password-controlled access to the archive for verification purposes. This verification access, which is granted to the entire archive, is not intended to be used as a replacement for commercial document delivery services or to fulfil inter-library loan requests. Finally, all publishers participating in the archive have full access to their own content and any content for which a trigger event prevails.
The Portico archive relies upon the co-operative participation of both publishers and libraries. To participate in Portico, a publisher:
- signs a non-exclusive archiving licence that gives Portico the right to ingest, normalise, archive, and migrate the publisher's content
- indicates whether Portico will serve as a perpetual access mechanism
- supplies electronic journal source files in a timely way, and
- makes an annual financial contribution.
To participate in the Portico archive, a library:
- signs an archiving licence agreement
- makes an annual support payment, and
- provides IP or other relevant information for user authentication purposes.
Sustaining the Archive
Financial support is critical to a long-term preservation effort of any kind. Two kinds of support are needed: funds for initial development of technological infrastructure and early operations, and funds that can support the operation of the archive over time. Portico has secured substantial grant support from JSTOR, Ithaka, the Library of Congress, and The Andrew W. Mellon Foundation to cover the costs of initial development. Unlike in the for-profit sector, the 'venture capital' funds made available for Portico's start-up phase do not need to be repaid. The 'return' that these investors seek is a functioning archival service.
While Portico will not attempt to recover its initial funding, the organisation does need to cover its ongoing operating costs from diversified funding sources; it cannot rely upon any single revenue stream. The chief beneficiaries of the archive - publishers and academic institutions - are asked to provide the primary sources of funding; however, charitable foundations and government agencies will also be expected to provide support. As noted above, publishers make an annual contribution to the archive and share with other supporting publishers the ongoing costs to receive, normalise, store, and migrate journal content. Fees are based on publishers' total journals revenues (i.e. subscription, advertising, licensing) and range from US$250 to US$75,000 per year.
Libraries are also asked to make an annual payment to support the ongoing work of the archive, including the addition of new content to the archive, maintenance and enhancement of the technological infrastructure, and format migrations as technology evolves. Library Annual Archive Support payments are tiered and vary according to the amount which a library expends on building and maintaining its collections. A library self-reports to Portico their total Library Materials Expenditure (LME), and the corresponding Annual Archive Support payments range from US$1,500 to US$24,000 per annum (U.S. dollars). To encourage broad participation in Portico from the outset, from institutions of all types and sizes, Portico designates early participants (institutions who begin Portico support in 2006 and 2007) as 'Portico Archive Founders', and recognises their early support of this important initiative by providing significant savings on Founders' annual support fees. We believe that robust and international support of this new electronic archiving service - very early on - will send an important signal to all constituents in the scholarly community that the long-term preservation of born-digital content is an important priority and is being dealt with seriously by those most affected. (Details of the publisher and library contributions are available from the Portico Web site .)
Portico Today and Looking Ahead
As of mid-April 2006, nine publishers have committed more than 3,200 journals to the Portico archive. Participating publishers include Elsevier, John Wiley & Sons, Oxford University Press, American Mathematical Society, American Anthropological Association, University of Chicago Press, UK Serials Group, Berkeley Electronic Press, and Symposium Journals (UK). We are in discussions with a large number of publishers - commercial houses, university presses, and scholarly societies - and are encouraged by how these are progressing. Response from libraries has also been encouraging with nearly three dozen institutions committed to supporting the Portico archive since announcing library participation fees at the American Library Association Midwinter meeting. We are currently reaching out to libraries both in the US and internationally and are encouraged by the responses which we are receiving. We expect to build the base of participating publishers and libraries over time and as participation in Portico grows, we will keep the community informed via updates to the Portico Web site.
We recognise that participating in a long-term preservation effort is a new collaborative activity for both publishers and for libraries. As with any new endeavour in the earliest days, questions may be more abundant than answers; nonetheless, it is important to find a way to begin archiving. We are pleased that Portico has been able to play a small role in moving forward on this complex issue, but we are mindful that we as a community will continue to learn more about the challenges of digital preservation and to build upon these lessons. As we do so, we will rely upon ongoing dialogue with the community, which was so integral to Portico's creation, and upon the guidance offered by the Portico Advisory Committee, which is comprised of leaders from across the spectrum of organisations involved in scholarly communication . Our participation in the Library of Congress' National Digital Information Infrastructure Preservation Program has already begun to yield helpful findings, and we look forward to broadening our engagement even as we work to meet our primary responsibility to preserve the scholarly literature entrusted to us for long-term preservation care and keeping.
- "Electronic Research Resources" survey of 7,403 faculty conducted in 2003 by Odyssey, a market research firm, on behalf of Ithaka (unpublished).
- Dellavalle, R. P., Hester, E. J., Heilig, L. F., Drake, A. L., Kuntzman, J. W., Graber, M., Schilling, L., "Information Science: Going, Going, Gone: Lost Internet References", Science 302, no. 5646, October 31, 2003, p. 787-8. Analyzed journals included the New England Journal of Medicine, the Journal of the American Medical Association, and Science. A separate study found that up to 33% of footnote citations did not yield the quoted source (see Carlson, S., "Scholars Note 'Decay' of Citations to Online References", The Chronicle of Higher Education 51, no. 28, March 14, 2005, p. A30.)
- Average 2003-04 library expenditures for electronic materials was $2,718,015. See Young, M. and Kyrillidou, M., "ARL Statistics 2003-04", Association of Research Libraries, 2005.
- In 2003, 78% of surveyed faculty characterized electronic scholarly journals as "invaluable research tools." See "Electronic Research Resources" survey.
- See ARL Endorses Call for Action to Preserve E-Journals
http://www.arl.org/arl/pr/presvejrnloct05.html Accessed March 13, 2006.
- See RLG-OCLC report "Trusted Digital Repositories: Attributes and Responsibilities"
http://www.rlg.org/legacy/longterm/repositories.pdf For a useful overview of the current effort to develop repository certification processes, see Dale, R., "Making Certification Real: Developing Methodology for Evaluating Repository Trustworthiness" RLG DigiNews, October 15, 2005
http://www.rlg.org/en/page.php?Page_ID=20793&Printable=1&Article_ID=1780. Accessed April 5, 2006. The report forthcoming from ARL in mid-August 2006 is described in the ARL Bimonthly Report 245, April 2006
http://www.arl.org/newsltr/245/preserv.html. All accessed April 5, 2006.
- For additional information on each of these efforts see:
http://www.kb.nl/index-en.html, http://www.digitalpreservation.gov/, http://www.oclc.org/
and http://www.portico.org/ Accessed April 5, 2006.
- Portico Web site http://www.portico.org/
Note that an extended overview of Portico will appear in a forthcoming issue of Serials Review. Details regarding JSTOR are available at: http://www.jstor.org.
- More information on this programme is available from
http://www.diglib.org/preserve/ejp.htm Accessed April 3, 2006.
- For information on Ithaka see: http://www.ithaka.org Accessed March 13, 2006.
- Publishers participating in the Portico pilot included: the American Economic Association, American Mathematical Society, American Political Science Association, Association of Computing Machinery, Blackwell, Ecological Society of America, National Academy of Sciences (PNAS), Royal Society, University of Chicago Press, and John Wiley & Sons.
- Details of the Journal Archiving and Interchange Document Type Definition created by the National Center for Biotechnology Information of the National Library of Medicine are available at
http://dtd.nlm.nih.gov/. Accessed March 13, 2006.
- Additional details regarding Portico's archival approach are available from Owens, E., "A Format-Registry-Based Automated Workflow for the Ingest and Preservation of Electronic Journals," November 8, 2005, Digital Library Federation Fall Forum, Charlottesville, VA.
http://www.portico.org/about/community.html. Accessed March 13, 2006.
- See Urgent Action Needed to Preserve Scholarly Electronic Journals
http://www.arl.org/osc/EjournalPreservation_Final.pdf Accessed March 12, 2006.
- The members of the Portico Advisory Committee are: John Ewing, American Mathematical Society; Kevin Guthrie, Ithaka; Daniel Greenstein, University of California; Anne R. Kenney, Cornell University; Clifford Lynch, Coalition for Networked Information; Carol Mandel, New York University; David M. Pilachowski, Williams College; Rebecca Simon, University of California Press; Michael Spinella, JSTOR; Suzanne Thorin, Syracuse University; Mary Waltham, http://www.MaryWaltham.com; Craig Van Dyck, John Wiley & Sons, Inc.