Web Magazine for Information Professionals

The LEODIS Database

Jonathan Kendal on the creation of LEODIS, a Public Libraries sector digitization and database project.

Personal Background

To begin with, as this is predominantly a libraries publication I feel an introduction to my background may be helpful in understanding this approach to digitisation.

My relationship with the Leodis Database is as technical creator and manager and my background is purely technical. I studied Printing and Photographic Technology for 3 years at Kitson College Leeds and then Computer Science for 3 years at Manchester Metropolitan University followed by 1 years research in Computer Modelling at Manchester Metropilitan University, Department of Mechanical Engineering.

I state this now as I firmly believe that it is important to realise that digitisation projects are quite technical and probably best led by qualified technical staff, rather than traditional library staff. The process of indexing and placing of images online is relatively simple; it is the back end implementation which requires great consideration and technical knowledge in order to achieve a stable and future proof solution.

Project Background

In 1996 Leeds City Council established the Internet Project Office with the goal of delivering The City of Leeds website (www.leeds.gov.uk). The Internet Project Office was formed as a section of Leeds Libraries & Information Services, though corporately funded from the centre. This office comprises 1 Project Manager, 1 Internet Programmer (yours truly), 1.5 Web Designers and 1 Administration Assisitant.

The website delivered was sub-divided into 6 key headings, one of which remains as Libraries. The content under Libraries at this stage was considered a little ‘dry’ and the Internet Project Office team decided to create an ‘added value’ element to the site. We contacted local studies and offered to represent the Local Historic Photographic Collection online. Still in the early days of internet technology we hastily constructed an access database, indexed and scanned a selection of 200 images and put them online by way of a CSV (comma separated variable) file export, interrogated by cgi-bin program written in the C programming language.

The popularity of this basic service led to an increase in the online collection to 2,100 by October 2000. when the service was terminated and replaced by the Leodis Database (www.leodis.net) which had been under construction since January 1999.

The Leodis Database now holds some 9,500 images and forms the basis of a UK NOF (New Opportunities Fund - www.nof.org.uk) bid to enable funding for the indexing of the entire Leeds Collection, 40,000 plus images, playbills and maps.

Technical Architecture

The development and launch of the new Leodis Database stems from an early stage when it was obvious that the photographs were popular and that given the right vehicle we could publish the entire collection. We had viewed others efforts and considered many inappropriate due to their closed architecture and proprietary approach. Latterley, increased interest has come about because of the UK Government’s drive toward digitisation of archive materials held in public collections.

As an aside to the main thrust of our work on the City of Leeds website we began to investigate further:

As a crucial element of the database we were to implement we sought open standards for a catalogue, image format (for true digitisation standard i.e. 300dpi) and a programming language with which to interact with the catalogue.

The use of open standards would guarantee that any efforts made today could be easily ported to new systems of the future – at the end of the project we aimed to have underlying transferrable data tied to platfrom independent images NOT a product of the day tied to a proprietary solution. The results would also be published on the internet and necessitated a second image saved as a 72dpi JPG - the most suitable type for the nature of our collection i.e. continuous tone images.

Preliminary research resulted in the discovery of the Dublin Core (www.purl.org/dc/) as suitably open cataloguing standard, TIFF as a platform independent image file and SQL (Structured Query Language) as an open language by which to interact with a finished database. We also noted that the Dublin Core was able to catalogue more than simple images – it held the possibility of cataloguing maps, movies, playbills etc.

To pilot stage; we had enough money in our Capital Budget to purchase an NT server with a large amount of disk space, fitted with Internet Information Server and MSSQL 7.0. This affordable platform would form the basis for an ‘intranet’ pilot scheme. The choice of MSSQL was an economically convenient choice – it is also well regarded. Oracle would have served as well but it was considered too expensive for the pilot.

An MSSQL table was designed to match the Dublin Core Catalogue and methods of interaction with the databse were considered. Client Software to interact with MSSQL can prove costly – and does not easily offer the kind of interface desired to enable easy upload of new material. Internet Information Server offers the ability to engage in rapid development and deployment of intranet client based interfaces to MSSQL 7.0 databases via Active Server Page (ASP) server based applications. These applications could be written in Javascript or VBScript – as VBScript is native to the platform where the solution was to reside we chose VBScript (there are no browser compatibility issues here as the applications are server NOT client based).

Realising this ability (cheap and rapid client development) it was decided to enable viewing and upload to the database via Internet Explorer (or Netscape). Internet Explorer provided us with free clients, not requiring licenses to interact with MSSQL 7.0 – the envisaged user interface for uploading would be no more complicated than a Q and A form and allow anyone with little training to successfully and safely enter new records.

The result is a scalable database based on a respected cataloguing system, tied to 300dpi TIFF images (offline) and 72dpi JPG images (online). Catalogue viewing is via the internet and upload via closed intranet. The viewing and uploading applications are server based, written as Active Server Page applications in VBScript using embedded SQL to interact with the MSSQL 7.0 server.

We have maintained our aim of transferrable data and images. The upload and viewing applications are today’s best offer, tomorrow may require another method of delivery and we feel extremely confident that the architecture we have in place will see our collection through the many changes in delivery that will no doubt be required.

Indexing the Images

The indexing of the images in accordance with the Dublin Core (www.purl.org/dc/) is a separate maunal process undertaken by Local Studies staff.

The Internet Office interprets and advises use of the Dublin Core definitions to index the image collection and other media.

Each image has a label attached which is completed by knowledgable staff, this indexed image is then ready to enter the system.

In Practice

In practice the system is proving extremely robust and practical. Evidence of this is found in the ease with which Local Studies has employed and trained temporary staff to scan and upload images via the intranet client.

The process is basically a 4 stage process –

  1. Enter indexing data associated with image on to the database via Interent Explorer (IE) Form interface on Internet site.
  2. Record the unique number returned by the IE interface and attach to the physical image for scanning reference.
  3. Scan and save physical image as unique number TIFF and JPG in chosen directories.
  4. Mirror Intranet changes to Internet Site www.leodis.net, upload database and transfer newly associated JPG images to site. Record TIFF’s on suitable offline storage medium e.g. CD

In this way the Leodis Database has grown to some 9,500 images to date.

New Services

The fact that the backend of Leodis is a database interfaced by SQL and that each record can be uniquely identified has enabled the addition of parallel tables to the database tied by each element’s unique identifier (Resource Identifier element of the Dublin Core)

This ablility has enabled the rapid development of interactive features based on the existing catalogue.

We have so far developed :

New Media

Referring back to the Dublin Core and the mention that it has the ablility to catalogue more than simple images, we are now beginning to integrate new media:

You will note that display pages need to respond differently to each type of media they encounter for display – this is detected via the Dublin Core format field. Using this field to predict the behaviour of each page displayed opens up the possibility of cataloguing any media file capable of view or download via the internet.

Current Usage

The site has now been online as the Leodis Database Pilot since October 2000, though no marketing has been carried out aside from online registration via the main search engines.

Figures for access (compiled via MediaHouse Inc, LiveStats 5.0) are encouraging and the growth rate is steady.

October 2000 : 4503 Host Sessions
November : 5467
December : 6014
January 2001 : 7201

In terms of the popularity of the added value services we are currently hosting :

165 Web Cards - posted over the last 12 weeks
1 Photo Comments - posted over the last 8 weeks.
16 Photo Albums – created over the last 4 days
£1,340.00 in Photographic Orders since inception.

Conclusion

The most valuable asset at the end of a digitisation program is the raw data and the images. Ensure that this data is easily transferrable to new systems i.e. can the native database system export to various other industry standard formats and, at the most basic level, to a CSV (Comma Separated Variable) file. Ensure that images are retained safely at a high quality in a platform idependent format, from which new copies for future applications can be derived.

The user interface that goes on top to form the internet site is temporary and will need many changes over time though is important in creating a respectable image.

Use technically qualified staff to evaluate and implement solutions or to advise on the purchase of external solutions.

Ensure that the delivered database and image store product can be managed and updated by novice staff with basic training.

You will (probably) always require qualified technical staff to implement new interfaces to your data.

References

  1. Dublin Core: http://www.purl.org/dc/
  2. Leodis Database: http://www.leodis.org
  3. Microsoft (IIS and MSSQL): http://www.microsoft.com/
  4. Mediahouse LiveStats: http://www.mediahouse.com
  5. NOF (New Opportunities Funding): http://www.nof.org.uk

Author Details

 
Jonathan Kendal
Jonathan.kendal@leeds.gov.uk
Internet Database Manager
Leeds Library & Information Services
Leeds City Council