Jonathan Kendal on the creation of LEODIS, a Public Libraries sector digitization and database project.
![]()
To begin with, as this is predominantly a libraries publication I feel an introduction to my background may be helpful in understanding this approach to digitisation.
My relationship with the Leodis Database is as technical creator and manager and my background is purely technical. I studied Printing and Photographic Technology for 3 years at Kitson College Leeds and then Computer Science for 3 years at Manchester Metropolitan University followed by 1 years research in Computer Modelling at Manchester Metropilitan University, Department of Mechanical Engineering.
I state this now as I firmly believe that it is important to realise that digitisation projects are quite technical and probably best led by qualified technical staff, rather than traditional library staff. The process of indexing and placing of images online is relatively simple; it is the back end implementation which requires great consideration and technical knowledge in order to achieve a stable and future proof solution.
In 1996 Leeds City Council established the Internet Project Office with the goal of delivering The City of Leeds website (www.leeds.gov.uk). The Internet Project Office was formed as a section of Leeds Libraries & Information Services, though corporately funded from the centre. This office comprises 1 Project Manager, 1 Internet Programmer (yours truly), 1.5 Web Designers and 1 Administration Assisitant.
The website delivered was sub-divided into 6 key headings, one of which remains as Libraries. The content under Libraries at this stage was considered a little dry and the Internet Project Office team decided to create an added value element to the site. We contacted local studies and offered to represent the Local Historic Photographic Collection online. Still in the early days of internet technology we hastily constructed an access database, indexed and scanned a selection of 200 images and put them online by way of a CSV (comma separated variable) file export, interrogated by cgi-bin program written in the C programming language.
The popularity of this basic service led to an increase in the online collection to 2,100 by October 2000. when the service was terminated and replaced by the Leodis Database (www.leodis.net) which had been under construction since January 1999.
The Leodis Database now holds some 9,500 images and forms the basis of a UK NOF (New Opportunities Fund - www.nof.org.uk) bid to enable funding for the indexing of the entire Leeds Collection, 40,000 plus images, playbills and maps.
The development and launch of the new Leodis Database stems from an early stage when it was obvious that the photographs were popular and that given the right vehicle we could publish the entire collection. We had viewed others efforts and considered many inappropriate due to their closed architecture and proprietary approach. Latterley, increased interest has come about because of the UK Governments drive toward digitisation of archive materials held in public collections.
As an aside to the main thrust of our work on the City of Leeds website we began to investigate further:
As a crucial element of the database we were to implement we sought open standards for a catalogue, image format (for true digitisation standard i.e. 300dpi) and a programming language with which to interact with the catalogue.
The use of open standards would guarantee that any efforts made today could be easily ported to new systems of the future at the end of the project we aimed to have underlying transferrable data tied to platfrom independent images NOT a product of the day tied to a proprietary solution. The results would also be published on the internet and necessitated a second image saved as a 72dpi JPG - the most suitable type for the nature of our collection i.e. continuous tone images.
Preliminary research resulted in the discovery of the Dublin Core (www.purl.org/dc/) as suitably open cataloguing standard, TIFF as a platform independent image file and SQL (Structured Query Language) as an open language by which to interact with a finished database. We also noted that the Dublin Core was able to catalogue more than simple images it held the possibility of cataloguing maps, movies, playbills etc.
To pilot stage; we had enough money in our Capital Budget to purchase an NT server with a large amount of disk space, fitted with Internet Information Server and MSSQL 7.0. This affordable platform would form the basis for an intranet pilot scheme. The choice of MSSQL was an economically convenient choice it is also well regarded. Oracle would have served as well but it was considered too expensive for the pilot.
An MSSQL table was designed to match the Dublin Core Catalogue and methods of interaction with the databse were considered. Client Software to interact with MSSQL can prove costly and does not easily offer the kind of interface desired to enable easy upload of new material. Internet Information Server offers the ability to engage in rapid development and deployment of intranet client based interfaces to MSSQL 7.0 databases via Active Server Page (ASP) server based applications. These applications could be written in Javascript or VBScript as VBScript is native to the platform where the solution was to reside we chose VBScript (there are no browser compatibility issues here as the applications are server NOT client based).
Realising this ability (cheap and rapid client development) it was decided to enable viewing and upload to the database via Internet Explorer (or Netscape). Internet Explorer provided us with free clients, not requiring licenses to interact with MSSQL 7.0 the envisaged user interface for uploading would be no more complicated than a Q and A form and allow anyone with little training to successfully and safely enter new records.
The result is a scalable database based on a respected cataloguing system, tied to 300dpi TIFF images (offline) and 72dpi JPG images (online). Catalogue viewing is via the internet and upload via closed intranet. The viewing and uploading applications are server based, written as Active Server Page applications in VBScript using embedded SQL to interact with the MSSQL 7.0 server.
We have maintained our aim of transferrable data and images. The upload and viewing applications are todays best offer, tomorrow may require another method of delivery and we feel extremely confident that the architecture we have in place will see our collection through the many changes in delivery that will no doubt be required.
The indexing of the images in accordance with the Dublin Core (www.purl.org/dc/) is a separate maunal process undertaken by Local Studies staff.
The Internet Office interprets and advises use of the Dublin Core definitions to index the image collection and other media.
Each image has a label attached which is completed by knowledgable staff, this indexed image is then ready to enter the system.
In practice the system is proving extremely robust and practical. Evidence of this is found in the ease with which Local Studies has employed and trained temporary staff to scan and upload images via the intranet client.
The process is basically a 4 stage process
In this way the Leodis Database has grown to some 9,500 images to date.
The fact that the backend of Leodis is a database interfaced by SQL and that each record can be uniquely identified has enabled the addition of parallel tables to the database tied by each elements unique identifier (Resource Identifier element of the Dublin Core)
This ablility has enabled the rapid development of interactive features based on the existing catalogue.
We have so far developed :
Referring back to the Dublin Core and the mention that it has the ablility to catalogue more than simple images, we are now beginning to integrate new media:
You will note that display pages need to respond differently to each type of media they encounter for display this is detected via the Dublin Core format field. Using this field to predict the behaviour of each page displayed opens up the possibility of cataloguing any media file capable of view or download via the internet.
The site has now been online as the Leodis Database Pilot since October 2000, though no marketing has been carried out aside from online registration via the main search engines.
Figures for access (compiled via MediaHouse Inc, LiveStats 5.0) are encouraging and the growth rate is steady.
In terms of the popularity of the added value services we are currently hosting :
The most valuable asset at the end of a digitisation program is the raw data and the images. Ensure that this data is easily transferrable to new systems i.e. can the native database system export to various other industry standard formats and, at the most basic level, to a CSV (Comma Separated Variable) file. Ensure that images are retained safely at a high quality in a platform idependent format, from which new copies for future applications can be derived.
The user interface that goes on top to form the internet site is temporary and will need many changes over time though is important in creating a respectable image.
Use technically qualified staff to evaluate and implement solutions or to advise on the purchase of external solutions.
Ensure that the delivered database and image store product can be managed and updated by novice staff with basic training.
You will (probably) always require qualified technical staff to implement new interfaces to your data.
|