Web Magazine for Information Professionals

The Tasks of the AHDS: Ten Years on

Alastair Dunning reviews 10 years in the history of the Arts and Humanities Data Service.

An article by Dan Greenstein and Jennifer Trant in an early edition (July 1996) of Ariadne introduced readers to the aims and organisation of the fledging Arts and Humanities Data Service (AHDS) [1]. Exactly ten years on from that, as the AHDS undergoes a systematic review by its funders, it seems appropriate to take stock of how the AHDS has evolved, comparing its current position with that envisaged for it when the organisation commenced work in the 1990s.

Since Greenstein’s and Trant’s article, the AHDS has grown considerably and the service has become an established member of the digital library community. Staff numbers have roughly doubled; the AHDS now employs around 25 core staff and 15 other project staff. Annual funding is now just over a million pounds, doubling the original annual sum provided by Joint Information Systems Committee [2]. Significantly, the Arts and Humanities Research Council (AHRC) now shares 50% of these annual costs with the JISC [3]. The AHRC, as the crucial stakeholder in the arts and humanities community, sits well next to JISC, which is already extensively embedded within the information science community. The AHDS also now receives significant project income: in 20056 this totalled close to a million pounds.

Despite this growth, the issues that Greenstein and Trant identified in their article - e.g. data creation, awareness and education, collections, preservation - remain key to the AHDS’ task; indeed the continued success of the AHDS is evident from the fact that the statement of purpose defined by Greenstein and Trant still offers a clear direction today: ‘It will collect, describe, and preserve the electronic resources which result from scholarly research in the humanities, and make its collections readily available to scholars through an on-line catalogue.’

Further evidence emphasises the AHDS’ continued focus on these issues. The number of collections ingested in a year exceeded 100 for the first time in 20045 (166 collections deposited - the previous high was 97). Overall, the AHDS can boast over 60,000 images, 1 million archaeological records and several hundred texts, database and multimedia collections. User hits continue to grow, as do the number of telephone and email inquiries [4]. A complete preservation service has been established. The AHDS has run over 100 workshops, and has, on behalf of the AHRC, marked over 1800 applications for funding from the scholarly community [5][6].

Yet, and this is probably true of any team within the JISC community, or indeed of any Web-based service, running such an organisation requires constant reassessment of these issues. One could perhaps define the AHDS’ work as a variation of the Sisyphean task. Sisyphus, you may remember, strove to push his giant rock to the summit of a mountain, but each time he reached it, the rock fell back to the bottom [7]. The AHDS has not been so unfortunate in having to return to the foot of each mountain, but there are certain similarities. The rapidly changing digital environment reconfigures and recontextualises the objectives of an online service; once one summit is reached, a whole host of new ones home into view and the organisation must prepare itself to push onwards to those new summits.

This article therefore reviews the AHDS by articulating how the AHDS has climbed the summits initially outlined by Greenstein and Trant and pointing to the new peaks that have appeared on the horizon.

Data Creation and Description

The original Ariadne article on the AHDS [1] declared that ‘the use of common, or at least interoperable standards ensures a level of consistency across accessioned digital materials without which electronic resource management and preservation is impossible.’ It noted that the AHDS would have a key role in developing this consistency of standards.

The AHDS has committed plenty of its attention to advising the research community on the use of particular standards for data creation and description. The service can content itself that within the arts and humanities community, the use of open standards is, where feasible, now a default position by most creators of any serious data resources (although this is not the case for smaller, more ‘personal’ resources) [8]. Uncompressed tiff and jpeg are the acknowledged formats for image creation; xml and txt for text files; and databases are generally SQL-compliant.

The AHDS has been careful, however, to ensure that additional information is wrapped around this guidance on data formats. The focus of the AHDS as an advice service is not data capture by itself but the whole orbit of a digitisation project. Thus AHDS advice has drawn attention to themes such as project management, metadata, workflow, copyright - emphasising that the process of creating a digital resource is more than selecting the correct technical standard, but something that requires the integration of a whole range of issues.

The AHDS has pursued this task through multiple pathways. Information papers have drawn attention to particular issues in digital resource creation [9]. Case studies have highlighted projects that have dealt successfully with these themes [10]. AHDS Digitisation Workshops have introduced over 900 delegates to digitisation, and provided a valuable meeting point for potential projects to convene and engage with one another. Perhaps most valuably, users have been free to call AHDS staff to discuss general or specific problems. Well over 2,000 users’ queries were made in the period from August 2005 to April 2006 alone.

Of course, the AHDS has not been alone in this work. Within the UK, the Technical Advisory Service for Images, UKOLN, TechDIS, the New Opportunities Fund and the Minerva guidelines have all done valuable work in promoting the standards message [11]. Such a variety of services discussing standards has also been useful to promote debate on how standards are adopted, enforced or adapted - for example, the recent article on pragmatism and open standards given at the WWW Conference in Edinburgh [12].

So, has the AHDS climbed the mountain named data creation? Well, yes, the large number of projects employing common standards, and the general discourse of open standards that is associated with digital resource creation means that the AHDS has reached the summit it originally set out to climb. But this new viewpoint confirmed that establishing data standards for digitisation should not be the final point on this journey.

Implementing standards and best practice when creating a resource for research or teaching is not simply about allowing a resource to function in a technical sense. How a source is digitised directly impinges on how the resource will be analysed and exploited by the academic community. Technical choices made early on can radically affect the impact of the resource on scholars in the future.

The AHDS acknowledged this upon its inception. Its series of Guides to Good Practice provides subject-specific information relating to the development of digital resources, which introduce subject specialists to the overlap between intellectual methodologies and standards in digitisation [13]. So for example historians can discover how best to exploit spatial-temporal data [14]; archaeologists have a trusted source for using Computer-Aided Design (CAD) [15].

But after ten years it has become apparent that this issue needs to be reinvigorated when considering resource creation, the standards and methodologies used, and a resource’s place in the research agenda. All types of questions materialise: is this methodology for digitisation the most fruitful for the exploitation of a resource? Should a rich or a light mark-up be employed here? Should these artefacts be captured at a high or low resolution? Will this resource work better as a 2D or a 3D image? How will a resource created with this methodology add to the intellectual debate? Is digitisation even worthwhile? Determining such questions demands both technical and intellectual input; issues about data standards and practices used in digital resource development cannot be separated from the academic reasons for digitising.

To provide the necessary environment for such discussion, the AHDS needs to continue working with other important stakeholders; indeed, in some areas it has already started to do so. For instance the major funding programme for digital resource development, the AHRC’s Resource Enhancement Scheme, has now begun a process of seeking community engagement so that digitisation can occur in a more strategic fashion [16]. The AHDS and the ARIA (Arts and Humanities Research ICT Awareness and Training) Project [17] at De Montfort University have also combined to produce the ICT Guides [18] that inform researchers about methods, research projects and training in resource creation. Such events and partnerships will provide the basis for more coherent discussion on the technical and intellectual contexts for digitisation.

Preservation

“The problems are vast and as yet without satisfactory solutions” - the original Ariadne portrait of the AHDS cited a rather gloomy 1995 preservation study. More than a decade later, the problems are still vast. The growing number of expert bodies is pinpointing the larger array of challenges when trying to sustain datasets, Web sites, emails, etc over changes in technological platforms and standards [19].

Nevertheless, effective methods for tackling the issues of digital preservation are now much more apparent. ‘Solutions’ is perhaps too definitive a word - finding systems, processes and methods for undertaking preservation work is perhaps more apt than the conclusiveness suggested by the word ‘solution’.

The task of preservation - of ensuring that digital files can function in the future whatever technical platform they run on - remains an essential element of the AHDS’ raison d’être. Greenstein and Trant’s article highlighted the range of electronic resources being developed as part of the research process - text, image, database etc. Such resources would increasingly play a fundamental role within scholarship; failing to preserve these texts, corpora, datasets and multimedia collections would be undermining the framework of evidence and record upon which researchers develop their scholarly argument.

After ten years, the AHDS has reached a point where it can look back upon its preservation achievements with a certain amount of pride. Having established its own digital repository and an accompanying systematic workflow for the ingest of digital collections, the AHDS has an infrastructure as well as policies and procedures in place to ensure the long-term availability of the collections deposited with it. The establishment of the OAIS (Open Archival Information System) reference model offered the AHDS a framework upon which to base its repository. Layered upon this the AHDS has a range of Preservation Handbooks, Ingest Manuals and metadata tools to assist AHDS staff [20].

AHDS expertise in the field is well acknowledged; AHDS preservation projects include Sherpa Digital Preservation (or Sherpa DP), the E-Prints Feasibility Study (both of which investigated preservation issues for e-prints) and the Digital Images Archiving Study [21]. The Medical Research Council recently consulted with the AHDS over the creation of its own digital archive.

The presence of AHDS subject-specialist staff has been important here. The sheer range of digital material developed in the arts and humanities demands not only wide-ranging technical knowledge but also understanding of how the resource will be used by the research community(s). This in turn affects how the resource is described at a technical level (i.e. in terms of preservation and collection-level metadata), and prepared for ingest within the AHDS repository. Without this blend of skills it would be difficult for any organisation to develop a working preservation system.

Inevitably, however, other challenges present themselves as the AHDS grapples with the steep slope of preservation. Sustaining the functionality of a digital resource, beyond its raw data, is the most pressing of these challenges. Current strategies for preservation function well for smaller or straightforward datasets (single texts, still image collections, small to medium databases or multimedia collections). However, when the resource demands the type of user interaction (whether it be to analyse, sort or subset) that is provided by hosting the resource online, then matters of preservation become a little more tricky. No longer is it a case of just preserving the raw data, but also of preserving tools, scripts and functionality that allow for the data to be sifted through and interrogated. Increasingly, such online resources also require intellectual preservation - the expectation that the content of the resource itself will be updated and enhanced with new discoveries and that emails and queries relating to this content will be acknowledged and answered.

Debate is ongoing about how best to confront this new challenge. Various options suggest themselves: the need for funding councils to take a more stringent line in dictating standards for both data creation and also in relation to the tools and scripts that analyse such data; the need to fund a network of expertise (whether as part of the AHDS or as part of a much wider network) that can undertake the sustainability of different Web sites’ functionality and content; the need to provide a common set of tools that can be applied to various different resources.

As with the data creation issues, this is not something the AHDS can tackle alone; it requires a larger infrastructure for such an environment to develop. How the AHDS and the relative stakeholders in the digital library community respond to this challenge will be crucial in grappling with this new preservation peak.

Collecting and Locating Resources

The past ten years have been a boom time for digitisation. The JISC report on digitisation in the UK estimated that over £130 million has been spent on digitisation during the lifetime of the AHDS [22]. Naturally this has resulted in greater numbers of collections being deposited with the AHDS, especially in the past few years. In the first six years of its life, the AHDS ingested approximately 260 collections. In the three-year period 2002-5, the figure was a more substantial 342.

As envisaged by Greenstein and Trant in 1996, the AHDS has focussed on developing tools to allow users to locate these resources. However, achieving this has been a long and somewhat painful task. An important internal milestone for the AHDS was developing a metadata framework that could describe all deposited collections. The initial scope of the AHDS perhaps gave a little too much freedom to describing collections in individual subject areas, causing subsequent problems when the initial AHDS Gateway was developed; the level of interoperability between resources from different subject areas was low. Developing a common metadata framework that underpins the new AHDS cross-search catalogue (released in 2003) has rectified this problem [23]. Users can now easily search for and download AHDS collections via this catalogue. There are also separate search tools for single items, such as archaeological records and visual arts images [24].

The bulk of the AHDS’ more recent collections come from projects funded by the AHRC. AHRC grant holders are obliged to offer copies of their digital outputs - a strategic decision that has been of crucial import to the AHDS, enabling a much closer relationship with AHRC award winners, and the ingest of some fascinating collections.

For instance, Designing Shakespeare illustrates 40 years of Shakespearian performance in Stratford and London [25]; the Spanish ‘Little War’ database allows historians to formulate questions on guerrilla action in the Spanish Napoleonic Wars [26]; the Sheffield Corpus of Chinese is a valuable corpus allowing analysis of different generations of Chinese language [27]; the Stone in Archaeology database catalogues the use of different geological types [28]; the Imperial War Museum: Posters of Conflict is a digitised archive of military posters from World War One to the present day [29].

While this variety is pleasing and makes for a joyful serendipity when searching the catalogue, it also highlights another key factor for the AHDS. Back in 1996, Greenstein and Trant noted that the AHDS would only be one of many cultural organisations making material available online. To develop a critical mass of material relevant to a scholar’s needs requires a high level of interoperability between many of these disparate content providers. Ten years on, this is still a problem, not just for the AHDS but also the wider digital information community. An unpublished report by Sheila Anderson, Director of AHDS, notes that ‘a major barrier to the full exploitation and use of the material is presented by the majority of this work taking place within discipline and institutional silos, thus massively reducing the ability to engage meaningfully across disparate collections of digital content, domains and discipline areas.’ [30] The whole could be, but is not yet, greater than the sum. This is one mountain that loomed large in 1996 and continues to do so to this day.

This mountain will need to be approached from numerous angles; AHDS work with the e-Science agenda will be one of these. Back in 2002, an Ariadne conference report entitled ‘The Information Grid’ highlighted the mutual importance of metadata and grid technologies in developing systematic interoperability [31]. Work undertaken at the recently opened Arts and Humanities E-Science Support Centre has been able to demonstrate this importance [32]. Briefing papers on the development of ontologies and the exploitation of Web Services indicate how greater interoperability between resources can be nurtured and then exploited [33]. The Centre is also working with AHRC-funded e-Science projects that are looking at issues related to grid-enabling disparate digital datasets [34].

Once again, it will be vital to work with others in the humanities computing community. Developing and exploiting the grid is, by definition, not something that can be done alone. The related AHDS E-Science Scoping Study [35] has also begun to address this issue, bringing together scholars and technologists to discuss the tools, resources and skills needed to ensure that grid-enabled resources are interoperable. The final report from this project, due in Autumn 2006, will play an important role in developing a broader strategy for enhancing and joining-up digital collections in the arts and humanities.

Conclusion

This article has highlighted some of the key issues that provide a link between the AHDS in 1996 and that of 2006. But there are numerous other topics for the AHDS, or any publicly funded digital archive to address; the relationship between e-prints, research datasets and other academic materials; the impact of mass digitisation; programmes such as Google Print; the need to preserve personal digital materials, e.g. notes, emails, sketches; licensing, authenticating and reusing digital resources over a distributed grid network; methods of automating metadata creation - a list that could go on and provides the AHDS with a whole new range of peaks to ascend within the next few years.

References

  1. Greenstein D., Trant J., “AHDS: Arts and Humanities Data Service”, Ariadne issue 4, July 1996 http://www.ariadne.ac.uk/issue4/ahds/
  2. JISC Web site http://www.jisc.ac.uk/
  3. Arts and Humanities Research Council Web site http://www.ahrc.ac.uk/
  4. The JISC Monitoring Unit provides statistics on AHDS performance. See http://www.mu.jisc.ac.uk/servicedata/ahds/. Other statistics, relating to collections ingested, enquiries received, technical appendices marked, as well as links to notable projects, publications and resources can be found on the AHDS timeline at http://ahds.ac.uk/about/ahds-timeline.htm
  5. For details of AHDS workshops since 2003 see http://ahds.ac.uk/news/events/past-events.htm
  6. The AHDS have been marking the Technical Appendices submitted as part of applications to the AHRC since 19992000. For more details on the relationship between the AHDS and the AHRC see http://ahds.ac.uk/ahrc/ahrc-advice.htm
  7. For more information on Sisyphus see http://en.wikipedia.org/wiki/Sisyphus
  8. By more personal resources, I mean the type of resources developed quickly and roughly for personal use, without any thought to given to re-use by others. Numerous such databases, notes, sketches etc. sit on academics desktops and laptops.
  9. For the AHDS Information Papers (including those on copyright, metadata, project management) see http://ahds.ac.uk/creating/information-papers/
  10. AHDS Case Studies http://ahds.ac.uk/creating/case-studies/
  11. TASI http://www.tasi.ac.uk
    UKOLN http://www.ukoln.ac.uk; TechDIS http://www.techdis.ac.uk/
    NOF Advisory Service http://www.ukoln.ac.uk/nof/support/
    Minerva guidelines http://www.minervaeurope.org/publications/technicalguidelines.htm
  12. Kelly, B., Dunning, A., Rahtz, S., Hollins, P and Phipps, L., “A Contextual Framework For Standards”, WWW 2006 Edinburgh, http://www.ukoln.ac.uk/web-focus/papers/e-gov-workshop-2006/
  13. A list of all AHDS Guides to Good Practice: http://ahds.ac.uk/creating/guides/
  14. Gregory I., “A Place in History: A Guide to Using GIS in Historical Research”, 2002, http://ahds.ac.uk/history/creating/guides/gis/.
  15. Eiteljorg II H., Fernie K., Huggett J., Robinson D., “CAD: A Guide to Good Practice”, 2002, http://ads.ahds.ac.uk/project/goodguides/cad/
  16. Details about this call for submissions are available at: http://www.ahrc.ac.uk/apply/research/sfi/ahrcsi/strategic_resource_enhancement_programme.asp
  17. The ARIA Project Web site is at http://aria.dmu.ac.uk/
  18. The ICT Guides are at http://ahds.ac.uk/ictguides/
  19. For example, the work of the Digital Preservation Coalition http://www.dpconline.org/ or the Digital Curation Centre http://www.dcc.ac.uk
  20. All AHDS documents on its repository are at http://ahds.ac.uk/preservation/
  21. Sherpa DP (Digital Preservation) http://ahds.ac.uk/about/projects/sherpa-dp/ Digital Image Archiving Study http://ahds.ac.uk/about/projects/archiving-studies/ A page for the E-Prints Feasibility Study will be ready in August 2006, viewable via http://ahds.ac.uk/projects/
  22. Parkinson N., “Digitisation in the UK: the case for a UK Framework - A report based on the Loughborough University study”, 2005, http://www.jisc.ac.uk/parkinson.html
  23. The AHDS common metadata framework and other AHDS documents on metadata can be found at http://ahds.ac.uk/metadata/
  24. All AHDS collections can be searched via http://ahds.ac.uk/collections/
  25. Designing Shakespeare is available at http://ahds.ac.uk/performingarts/designing-shakespeare/
  26. The Spanish ‘Little War’, 1808-1814 database can be downloaded from http://www.ahds.ac.uk/catalogue/collection.htm?uri=hist-5095-1
  27. The Sheffield Corpus of Chinese can be downloaded from http://ahds.ac.uk/catalogue/collection.htm?uri=lll-2481-1. Unlike the other collections cited here, the corpus was the outcome of a pilot project funded by the British Academy and is a collaborative project between the School of East Asian Studies and the Humanities Research Institute of University of Sheffield
  28. The Stone in Archaeology archive can be searched at http://ads.ahds.ac.uk/catalogue/archive/stones_ahrb_2005/
  29. Details about and access to Posters of Conflict are available at http://vads.ahds.ac.uk/collections/IWMPC.html
  30. Anderson S., “The Arts and Humanities Data Service: future challenges, future possibilities”. Unpublished report
  31. The Information Grid”, Ariadne issue 32, July 2002, http://www.ariadne.ac.uk/issue32/information-grid/
  32. The Arts and Humanities E-Science Centre ( http://www.ahessc.ac.uk/ ) is a JISC-funded project hosted at King’s College London, by the AHDS and the AHRC’s Methods Network http://methodsnetwork.ac.uk/
  33. The Briefing Papers are available from http://www.ahessc.ac.uk/BriefingPapers.html. Besides Web Services and Ontologies, there are articles on the Access Grid, the Grid and Virtual Research Environments.
  34. These projects are documented at http://www.ahessc.ac.uk/WorkshopsDemonstrators.html
  35. E-Science Scoping Study http://ahds.ac.uk/e-science/e-science-scoping-study.htm

Author Details

Alastair Dunning
Communications Manager
Arts and Humanities Data Service
King’s College London

Email: alastair.dunning@ahds.ac.uk
Web site: http://ahds.ac.uk/

Return to top