Increasing scholarly use of computers and electronic resources raises a number of related challenges.
Computer-based research produces digital data with significant secondary use value. Yet that value cannot fully be realised unless the data are created and described according to relevant standards, systematically collected, preserved, and reported to the widest possible community.
The outpouring of digital resources which make up a growing share of our cultural heritage makes digital preservation an urgent cause. Commercial publishers and information services, the entertainment industry, and more traditional repositories - museums, archives, and libraries - are regularly "publishing" in electronic form. The worldwide web hosts other forms of cultural expression.
Scholars need to search for relevant information across the numerous on-line indices and catalogues which provide references to the resources they require.
In 1995 the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils committed £1,500,000 over three years to establish The Arts and Humanities Data Service (AHDS). The AHDS will respond to and address these several problems as they confront a well-defined range of academic disciplines. It will collect, describe, and preserve the electronic resources which result from scholarly research in the humanities, and make its collections readily available to scholars through an on-line catalogue designed to interoperate with other electronic finding aids.
Because these aims can only be achieved by adopting community wide standards, and because our users demand and deserve seamless access to scholarly resources irrespective of where or by whom they are managed and of the form that they take (e.g. paper-based, digital, or artefactual), the AHDS will also seek the widest possible collaboration to develop a generalised and extensible framework for digital resource creation, description, preservation, and location. That framework will act as an essential guide to the Services own work, and be documented in a Service Providers Handbook and Standards Reference Guidelines. The AHDS will also produce less technical Guides to Good Practice which will help raise awareness amongst the scholarly community about the importance and value of electronic information and provide guidance in its creation, description, and use.
Organisationally the AHDS is a distributed service comprising an Executive and a number of Service Providers as follows: The Executive, King's College London, Library, Strand, London WC2R 2LS
Daniel Greenstein, Director
phone/fax: 0171 873-2445;
The Archaeology Data Service, Department of Archaeology, University of York, The King's Manor, York YO1 2EP
Julian Richards, Director
phone: 01904 433901
The Performing Arts Data Service, Glasgow University, Glasgow G12 8QQ,
Stephen Arnold and Tony Pearson, Directors
email: email@example.com, firstname.lastname@example.org;
phone: 0141 339-8855 The Visual Arts Service (to be selected, 1996)
The Service Providers will:
The Executive will co-ordinate and support the work of the Service Providers, and take the lead in developing the AHDS's collection policy and in producing the Service Providers Handbook and Standards Reference Guidelines, the AHDS catalogue, and its Guides to Good Practice.
The division of labour amongst the Service Providers (by both discipline and data format) provides the AHDS with a significant strategic advantages by enabling the interdisciplinary and intermedia discussions which will identify the standards and good practices the AHDS needs to implement. Long-term data storage provides an example of an archival issue that the AHDS will have to address and about which it will want to make recommendations to the wider community. From the data archivists' point of view, it may be desirable to treat electronic images, texts, and databases rather differently, notably because the case for compression is more compelling for images than it is for texts and databases. Clearly, every arts and humanities discipline will require access to information stored in each of these formats, and these needs will be reflected in the Service Providers' mixed media collections. Yet it is not efficient to encourage every Service Provider to develop the same level of expertise with every data format. Rather, members of a single Service Provider will immerse themselves in the relevant literature and consult with other experts in the UK and abroad, in order to inform discussion within the AHDS about the practices which are most relevant for a particular data format.
The AHDS's organisation will also facilitate the development of a coherent approach to data description or metadata. The information which is required to adequately describe or catalogue an electronic text may be rather different than that which is required for an electronic image or a database. Again, we feel that media specialists within the AHDS must initiate discussion about what level of description is most appropriate for the type of data with which they are most familiar. This division of labour will also ensure that the data description standards adopted by the AHDS will satisfy the needs of scholars across the humanities. Arguably, historians, archaeologists, and visual arts scholars will want to know rather different things about a collection of digitised images that is based, for example, on some of the holdings of the British Museum. Currently there is substantial work underway in the development of data description standards, but this is narrowly focused on particular kinds of collections and the needs of particular communities. For example, library catalogues interoperate because there is a substantial level of agreement between the various flavours of MARC (a standard for machine-readable bibliographic records). A framework for information interchange between communities demands a more inclusive approach to metadata than has hitherto emerged, one which the AHDS's interdisciplinary and intermedia organisation is designed to facilitate.
In order to ensure the long-term viability and interchange of electronic resources some attention must be paid to data creation standards. A common approach to data creation is required at least four different levels :
Agreement about the use of common, or at least interoperable standards ensures a level of consistency across accessioned digital materials without which electronic resource management and preservation is impossible. They also enable the migration of electronic information from one processing (hardware and software) platform to another as technologies change.
Yet standards are not exclusively in the interest of digital archivists and data creators. They serve users' needs as well ensuring, for example, that the electronic resources which they require can be obtained in a form that is compatible with their local hardware and software environments. It is precisely because data creation standards promise to benefit the widest possible community, that they must be identified, documented, and actively promoted. Data creation standards should not be conceived as a set of prescriptive or restrictive practices. Rather we need to develop a flexible standards framework that accommodates local practice while ensuring the consistency essential for effective information interchange. Such a framework of data creation standards will be identified by the AHDS in the widest possible consultation and documented closely in the Service Providers' Handbook and Standards Reference Guidelines. They will also feature largely in the Guides to Good Practice which will offer explanation and instruction to a wider community.
Collection is central to the AHDS's mission but it is essential to redefine the term as appropriate to this extensively networked and increasingly digital age. Certainly the AHDS Service Providers will act as repositories for digital research data and will actively encourage scholars who are conducting computer-based research to safeguard their electronic outputs by depositing them with the AHDS. Indeed with its Guides to Good Practice, the AHDS will point potential depositors to the data creation standards and practices which they will need to consider in order to secure the longevity of their materials. Yet preservation is only one of the incentives with which the AHDS will attract depositors. Documentation is another. Without information about its contents and form, an electronic resource cannot be accessed. Imagine using a traditional library which does not have a catalogue or other signposts to the books, journals, monographs, manuscripts and other objects that comprise its collection. Nor is it sufficient for data creators simply to provide that level of documentation which they think is most appropriate for their electronic datasets. Just as preservation relies upon attention to data creation standards, location requires some level of conformity in the way that electronic resources are described. By depositing their data with the AHDS, data creators will ensure that their electronic products are documented according to the data description standards which are beginning internationally to emerge. Accordingly they will enhance the possibility that potentially interested users will find the resources they require.
The promises of preservation, documentation, and location have proved attractive to several UK funding agencies which support computer-based humanities research in the UK. So far, the Economic and Social Research Council and the Humanities Research Board of the British Academy require grant-holders to offer any datasets they create to the AHDS for deposit; the Leverhulme Trust and the Wellcome Unit for the History of Medicine encourage their grant-holders to consider the same. The combined experience of the Historical Data Service and the Oxford Text Archive - two Service Providers which predate the AHDS - demonstrates that the same promises are attractive to data creators more generally and they are invited to approach the AHDS to discuss the long-term disposition of their electronic resources.
The AHDS's collection policy must also take account of the fact that scholarly resources know no national boundaries. The AHDS's users will want to identify electronic resources which result from scholarly research outside the UK. Accordingly, we need to extend our definition of "collection" to include datasets stored by other digital archives with which the AHDS can negotiate reciprocal agreements. There are good precedents for this already within the AHDS. The Oxford Text Archive has an agreement with the electronic text centre in Michigan. The Historical Data Service is part of an international network of social science data archives and benefits substantially from an integrated catalogue which allows users to search across their respective holdings. The AHDS seeks actively to multiply such agreements and to extend them into areas which are appropriate for the other arts communities that it serves, notably in archaeology and the visual and performing arts.
Not all datasets need to be deposited with the AHDS or with one of its associated data archives in order to be known to the AHDS's catalogue and its users. Increasingly, computer-based scholarly research results in datasets which are made available over the network from numerous sites. The AHDS shares the scholarly community's interest in preserving these materials and in enabling users to locate them. Where data creators and the AHDS can agree compatible data preservation and description procedures, data deposit is neither necessary nor desirable. Accordingly, our concept of "collection" needs to be extended still further to include electronic resources which are known to the AHDS's catalogue but neither stored at nor managed by any of its Service Providers or associated data archives.
Just as the AHDS collection policy cannot require central deposit, it cannot require that every item in the collection be made freely available. Scholars need to find the materials upon which their research and teaching depends, irrespective of whether those materials are in the public domain. Equally, those responsible for commercial and other resources to which access may be restricted, have an interest in preserving those resources and making their existence known to the wider scholarly community. The AHDS's collection policy must take these realities into account. Accordingly, it will negotiate the acquisition of some commercial and other resources to which access may be restricted. Equally, it will ensure that its catalogue acts as a gateway to restricted resources that are maintained and managed at other sites and by other agencies.
In sum, the AHDS's collection policy will be built on our understanding that the time has passed (if it ever even existed) when any single agency could create a vast and comprehensive collection of scholarly digital resources. The challenge today is to develop "collections" which can be preserved according to the same minimum standards and which may be integrated from the users point of view - that is, accessed globally through several information gateways.
Digital resource preservation is vital to the scholarly community. Archaeologists provide a vivid example as to why. In the process of excavation, archaeologists may "destroy" some of the primary evidence upon which their scholarly investigations are based. Excavation records accordingly take on seminal importance; they are the only window onto sites which no longer exist in their original or undisturbed form, and must be preserved. Where such records are kept on paper, well-defined archival practices and cataloguing principles ensure their availability to scholars working 50 or even 100 years hence. Where they take the form of complex databases, image banks, and digital site-maps - as they do now with increasing frequency - there are as yet no accepted (in some case even tested) models which promise to achieve the same end.
Historians' use of computer databases provides a somewhat different view of the same problem. For a generation historians have compiled databases - electronic summaries of information culled from primary sources such as censuses, parish registers, legislative records, and newspapers. Unlike the digital record left by the archaeologist, that created by the historian does not significantly alter the artefacts on which it is based. Subsequent scholars can if they wish refer to the manuscripts and printed editions from which so many historical databases derive. Yet the machine- readable record is no less important. It forms the building blocks upon which more comprehensive historical analyses may be developed as databases are extended, reworked, and compared with other, newer electronic collections. Historical databases may also record typological information which simply cannot be found elsewhere.
A computer-aided analysis of early modern communities may develop a machine- readable catalogue of regional saints' days or standardised values for idiosyncratic currencies and other monetary measures. A socio-economic analysis of industrialising societies may develop a machine-readable classification of eighteenth-century British occupations and place names. These reference materials require substantial intellectual investment and may, irrespective of the data on which they are based, be useful to subsequent scholars . Linguistic corpora - databases in their own right - are similarly worth preserving. They, too, may act as the building blocks of increasingly comprehensive and synoptic analyses of language. They may also hold a key to accurate and instantaneous machine-translation of spoken and written texts. And of course, library, museum, and archive catalogues, indeed all indices of scholarly and other information, are themselves a kind of database without which access to information would be improbable if not impossible. The case for preserving digital databases is not therefore parochially academic. It is universal and it is compelling.
The case for computer-tractable texts is different, yet again, but no less urgent. Texts are fundamental to scholarship in the humanities and are regularly rendered into machine readable form. For more then a generation, arts scholars have been producing electronic texts in support of linguistic content, stylistic and other analyses which are most effectively conducted by computer . More recently, scholars have begun to deliver electronic critical editions . As the corpus of electronic text expands, so do the horizons for scholarly investigation, but only if the corpus can be maintained over time. But there are other, perhaps more compelling reasons to preserve electronic texts. At present an increasing number of late-twentieth century "texts" - one thinks of on-line digests of legal, medical, and economic information, the enormous output of government departments, particularly in the United States, and of course, the vast quantities of textual material currently present on the worldwide web - are only available in machine-readable form. Commercial and scholarly book and journal publishers are turning increasingly toward electronic editions many of which are not or cannot be mirrored by more traditional paper based ones . The situation with images and with digital audio and video recordings is similar to that of texts; only the technology is newer so the corpus of currently available material is not perhaps so large. Yet the high tide is approaching. The entertainment industry is actively developing digital technologies and it is only a matter of time before its combined outputs are only available in computer-tractable form. Museums, archives, and libraries are also experimenting digitising collections in order to extend access to them (some good virtual exhibitions already exist on the "net") or to protect the rarest objects from the ravages of physical handling and use. Without establishing viable methods of digital resource preservation these databases, texts, images, and sounds will be lost to future generations. What is at stake is nothing lest than our cultural heritage.
It is one thing to recognise the urgent case for digital preservation. It is another to address it. The problems are vast and as yet without satisfactory solutions . There are technical problems to be sure. For example, no satisfactory or reliable estimates exist regarding the longevity of particular magnetic media. Strategic issues are more intractable. There is no agreement even about what preservation entails. Is it possible to preserve electronic information independently of the processing platforms upon which it is initially mounted without any loss of significant content? Does the content of a multimedia installation, in other words, comprise simply a collection of digital texts, images, and sounds linked together by a set of explicit pointers? If so, then it is feasible to store the data independently of the software and hardware which present these features to a user in a particular way. Alternatively, if the look and feel of the multimedia application, that is its look and feel to the user, is considered a crucial component of its content, then we need to think as well about preserving particular computers, operating systems, and software applications. Should we adopt this latter view then the job of the digital archivist converges closely with that traditionally belonging to the curator of a science and technology museum. Debate is as vigorous about the definition of a "digital publication" particularly where that publication exists originally in a networked environment. Is the contents of a web page strictly limited to the text, images, and sounds which are represented on that page together with a series of pointers to other pages, or does it extend to the external information which the pointers identify? If we accept the latter view, then preserving the AHDS web page could involve a digital picture of the entire contents of the worldwide web. The scale of the problem in this case is only compounded when we think that the worldwide web is constantly and dynamically revised, updated and amended.
As may be expected, those communities which are traditionally responsible for preserving our cultural heritage - library, archive, and museum communities - are the ones struggling to define the problems inherent in digital resource preservation and to recommend tentative steps which may produce solutions . In particular these communities are seeking experimentation with different models of digital preservation, which can be applied to particular and well-defined subsets of electronic information, and then documented carefully to enable them to be costed and scaled to fit the needs of other preservation initiatives. The AHDS was established fully with this approach in view. Focusing on its own holdings, and in consultation with the wider community, it will develop and implement strategies for digital preservation, and document them in the Service Providers Handbook and Standards Reference Guidelines. Here we will not merely describe our practices. We will also cost them so that they may be scaled either up or down and evaluated with respect to their prospective application to other digital collections and in digital archives organised differently than the AHDS. In this respect, the AHDS hopes to make a significant contribution to the wider discussion which must take place within the library and archive communities in order to ensure that the electronic outputs of today are available for use and evaluation tomorrow.
Data description standards are crucial and must be adopted, documented, and implemented on a community wide basis if we are to enable scholars to search seamlessly across the numerous on-line finding aids which point to the resources they require. Accordingly, the AHDS will collaborate extensively with other agencies to identify appropriate data description standards, document these in the Service Providers Handbook and Standards Reference Guidelines, and implement them with regard to its own collections and catalogue. The problem that we face is integrating the very different descriptions which are used to document the various resources upon which scholars depend. For example, records from a library catalogue may provide MARC-conformant information. Those from digital text archives, museums and archives may reveal information more closely conformant to the recommendations made by the TEI, The Consortium for the Computer Interchange of Museum Information (CIMI) and the Encoded Archival Description (EAD), respectively . What is required is a means of positioning the rich and distinctive descriptions that are appropriate to particular resources within a more general framework. In this regard, we are encouraged by work on the Dublin Core and the Warwick Framework, and with object-oriented data models. Together these may enable hierarchical integration of domain specific data description standards .
No framework may be developed for the preservation, integration, and location of scholarly electronic resources which does not benefit from the lessons of practical application. Accordingly, the AHDS catalogue will be developed as a means of testing, evaluating, and refining those recommendations which bear directly on data description, resource location and interchange. The catalogue will provide users with seamless access to the resources that are deposited at and managed by the AHDS Service Providers and to those which reside at sites with which the AHDS has data exchange agreements. To test our extension of the defiition of "catalogue" we also seek participation from a select number of institutions which manage on-line catalogues of both digital and non-digital materials, notably from the university, library, archive, and museum communities. Though we will concentrate initially on resources managed within the UK, we hope to extend our efforts at least on a limited international basis.
To elucidate take as an example an Elizabethan scholar who is interested in the Bard. That user must be able to enter a gateway (or, more probably, one of several gateways) to humanities resources and search for "Shakespeare, William B". An initial query may return some very rudimentary information about the many resources which are known to a variety of interoperating electronic catalogues, indices, and other finding aids. Accordingly, the first five records returned by such a query may be drawn from catalogues which are maintained by the Oxford Text Archive, the Archaeology Data Service, a Theatre Museum, a manuscript archive, and a university library. To permit this level of integration, metadata records describing these holdings must share at least a small range of information. Yet this range of information is not yet sufficient for the scholar to assess whether the resources identified are worth acquiring or pursuing further. A richer level of description is required for the electronic text, the digital excavation record, and the objects listed respectively in the performing arts museum, archive, and library catalogues. We may imagine, then, that the user conducts a second-order search to retrieve fuller information on the excavation record from the Archaeological Data Service and acquires the more specific detail appropriate to that resource. In addition specific information may be required to enable the scholar to acquire, mount, and use the data locally. This more technical description may be acquired in a third-order search and may only be needed for digital data.
The more integrated approach to resource location which we envisage is predicated on an extended definition of "collection". Not all datasets need to be deposited with the AHDS or with one of its associated data archives in order to be known to the AHDS's catalogue and its users. Increasingly, computer-literate scholars and commercial information services are making digital resources available over the network. Additionally, more traditional respositories of our cultural heritage - libraries, archives, and museums - are providing information about their holdings in on-line catalogues and other finding aids. The AHDS shares the scholarly community's interest in enabling users easily to locate these digital resources and to exploit these finding aids. Accordingly, our resource location tools need knowledge of electronic resources regardless of their physical location. Our aim is not to construct a single gateway to humanities resources or to centralise the management of them; only to construct a working prototype which may demonstrate the prospects for interoperability and interchange on a far wider scale.
We have already indicated that digital resource preservation and interchange requires agreement amongst information services with regard to how they store, describe, preserve and provide access to the electronic resources which they manage. We have also shown that information services rely upon data creators to adopt data encoding and formatting standards which will ensure their electronic outputs can be preserved over the longer term and be included in and accessible from resource location or cataloguing systems. Accordingly it is not sufficient to document a framework for greater interoperation amongst information services. We must also educate a larger community of data creators about the importance of digital resource preservation and interchange and about the practices which they should consider adopting if we are collectively to achieve these dual aims. The AHDS's contribution to what must inevitably be a much broader exercise is a series of publications collectively referred to as Guides to Good Practice. These will target scholars contemplating data creation or secondary analysis and highlight issues and methods which they need to consider. They will also identify potential pitfalls and provide comprehensive references for further reading about particular subjects. Perhaps most importantly, they will be written by subject specialists (e.g. literary scholars) for like- minded subject specialists (e.g. other literary scholars) and thus employ vocabulary and illustrative examples which are more approachable than so much of the methodological literature that is available today. Some of the pamphlets will be written by the AHDS Service Providers and provide general guidance, for example, in the construction of historical databases, linguistic corpora, and archaeological site- mapping materials. Others will be more narrowly focused on particular methodological issues (e.g. nominal record linkage, encoding critical apparatus in an electronic text, the production of "archive-quality" images) and will be commissioned from scholars actively working in related fields.
A framework for data creation, description, preservation, and interchange cannot be developed by the AHDS working in isolation. Success requires substantial collaboration on at least two fronts. We must solicit input from both scholars who have an interest in using our collections and those who will add to the collections through deposit. Members of these two most crucial communities will be invited to inform us of their requirements so that we may ensure that they are met by the resources we choose to collect, by the framework that we document in our Service Providers' Handbook and Standards Reference Guidelines, by the operation of our catalogue, and by the instructional materials we provide in the Guides to Good Practice.
On another front, we must collaborate with other information services. The development of robust and viable strategies for digital resource preservation requires experimentation with different models and substantial collaboration amongst digital archivists and librarians. It also requires dialogue with organisations which document and promote the data creation standards that we all require. To enable scholars more coherent and uniform access to the vast and growing number of on-line catalogues, indices, and digital resources, the organisations which construct and maintain such finding aids and collections must work together to develop compatible approaches to data description and to build interoperable systems. While input from standards initiatives is crucial, we must also prototyping common solutions in collaboration with the institutions which create and maintain the on-line tools on which scholars increasingly rely in.
Elsewhere the AHDS is described as a broker facilitating collaboration amongst these various communities. This function derives directly from the AHDS's very narrowly defined remit and from our recognition that that our goals cannot adequately be fulfilled without extensive consultation and co-operation. We believe that by collaborating in the development of a generalisable framework for the preservation and interchange of electronic resources, all stakeholders have the opportunity to improve their own services or practices, extend and encourage access to their own collections, and elaborate their own institutional or professional identities. In the hope that our causes are one and the same, the AHDS invites the widest possible participation in its work.
Daniel Greenstein, Director
Jennifer Trant, Collections and Standards Development,
Arts and Humanities Data Service Executive
King's College London, Library
London WC2R 2LS
fax/phone: +44 (0)171 873-2445
 This typology, specifically as applied to imaging is explored in Jennifer Trant, "Framing the Picture: Standards for Imaging Systems". A paper presented at the International Conference on Hypermedia and Interactivity in Museums, San Diego, California, October, 1995.
 Ron Zweig, "Virtual Records and Real History", History and Computing, 4(1992), 174-82. For another description of the extent of the problem see Proposal for the legal Deposit of Non-Print Publications to the Department of National Heritage from the British Library. January 1996.
 Margaret Hedstrom, "Mass storage and long-term preservation", paper delivered at Reconnecting Science and Humanities in Digital Libraries. A Symposium Sponsored by The University of Kentucky and The British Library, 19-21 October 1995, Lexington, Kentucky.
 Preserving Digital Information. Draft Report of the Task Force on Archiving of Digital Information, commissioned by the Commission on Preservation and Access (CPA) and the Research Libraries Group (RLG). Version 1.0, August 1995. "Long Term Preservation of Electronic Materials: a JISC/British Library workshop as part of the Electronic Libraries Programme. Organised by UKOLN 27th-28th November 1995 at the University of Warwick", British Library R & D Report 6238 (London, British Library, 1996); U.S. National Archives and Records Administration. (1994). Digital Imaging and Optical Digital Disk Storage Systems: Long-Term Access Strategies for Federal Agencies. Technical Information Paper No. 12. National Technical Information Service, Washington, D.C. (ftp://ftp.nara.gov/pub/technical_information_papers and gopher://gopher.nara.gov:70/11/managers/archival/papers/ postscri).
 International Documentation Committee of the International Council of Museums (CIDOC) offers a somewhat different approach than CIMI to the description of museum information.
 OCLC/NCSA Metadata Workshop Report, by Stuart Weibel, Jean Godby, Eric Miller, and Ron Daniel; "A Syntax for Dublin Core Metadata: Recommendations from the Second Metadata Workshop", by Lou Burnard, Eric Miller, Liam Quin, and C.M. Sperberg-McQueen; "Issues of Document Description in HTML" , Eric J. Miller; "On Information Factoring in Dublin Metadata Records", C. M. Sperberg-McQueen