Hopes and Deliverables
Near the start of CSIP, a list of project deliverables was drawn up. To encourage 'buy-in', top of the list was something of evident value - a Gallery Services application to help staff give customers what they wanted to know at the point of enquiry. But to deliver this and other applications, the 'Virtual Repository' would be necessary.
An early ambition was to be able to link images to the National Art Library (NAL) catalogue; unlike the collections database, the library catalogue software couldn't talk to the digital asset management system. The library contains not just text reference books, but also artefactual material - rare books, manuscripts, book art etc.
Christopher showed a diagram by SSL illustrating the system architecture of CSIP. Four boxes represented the sources: the Collections Information System, the Picture Library, the National Art Library and the Theatre Museum Archive. In the middle sits the Virtual Repository, and two sample applications are shown on top - Gallery Services, and a system giving public access to images from the National Art Library.
The diagram revealed that links between the VR and databases is achieved through a Z39.50 access mechanism, while the EAD-encoded archive material is harvested through the Open Archives Initiative's Protocol for Metadata Harvesting. Exchange of information between the Virtual Repository and end-user applications is achieved by messages following the SOAP protocol.
The V&A team faced a number of issues. One was the hierarchical nature of information in archives and collections information. A chest of drawers is an object, but also has components. Within the CIS, the Museum needs separate records and images for each part. However, you don't want to bother the public with these records of components. Likewise in a library, you may have a description of a series of books as well as the books individually.
'With archives it is the same problem, but writ large', said Christopher. Consider the V&A's archives for Habitat. The individual document is catalogued, also the file it's in, the series from which that file comes, and the department of Habitat it came from; then you need to deal with Habitat as an entity - when the firm started, how it kept records, etc. Archival catalogues typically have layers of description, making it complicated to work out what to serve up to an enquirer.
Different museum standards provide for different levels of granularity. Take people's names: in the archive standard, a name is a single data field. In the library system, it is more broken down; and in the museums standard (SPECTRUM) it is broken into lots of bits: forename and surname and title and pre-title; dates-of, occupation and much more. Getting such differently-constructed record sets to work together and look the same is hard.
At first, the team planned to use the Dublin Core model, with fields such as Description, Subject, Identifier, Coverage etc. A draft mapping of CIS data to DCMI was sketched out; but was soon found to be inadequate. The DCMI Subject field mapped to ten distinct library fields, and half a dozen collections information fields, and losing those distinctions would be anathema.
The team also studied the Conceptual Reference Model of CIDOC, the International Committee for Museum Documentation. CIDOC-CRM is 'a formal ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information' . It seemed like logical perfection; but they concluded, as Christopher said, 'it was beyond our feeble mental capacities to apply it to the whole of our collections and bibliographic and archival data.'
They settled for a compromise: an expansion of Dublin Core, adding a column with fields that they needed, mapped to the CIS, library and archival system data. Thus they derived their own Common Data Model for mapping source material into the Virtual Repository.
Surely there are thousands of museums facing similar problems; wasn't the Museum re-inventing the wheel? Apparently few museums have made the attempt; there is no model yet to follow. The CSIP team at the V&A are keen to talk to people about they work they have done, so others can learn from the difficulties and successes they have encountered.
Where Has CSIP Reached?
The Virtual Repository is now in place - and it works. A prototype Gallery Services application exists, serving information and pictures in a way that Gallery staff and the public can use, but it isn't installed yet. Progress has slowed recently, but Christopher is optimistic that they will be able to push ahead and deliver.
Sarah Winmill has drafted a list of 'lessons learned so far':
- It is possible to integrate your data without putting it all in one place.
- High-level buy-in for the project is essential.
- Market your project carefully; talk about benefits and deliverables, not technology.
- The major challenge is no longer the technology, but the underlying understanding of our data.
- Don't wait for perfection!