Web Magazine for Information Professionals

SEAMLESS: Introduction to the Project

Mary Rowlatt describes SEAMLESS, the Essex-based project.

SEAMLESS is a two year research project, funded by the British Library, which aims to develop a new model for citizens’ information - one which is distributed, and based on partnerships and common standards.

The objectives of the SEAMLESS project are to:

Currently the project team (Essex Libraries, Fretwell Downing Data Systems Ltd. and Education for Change Ltd.) are working with 29 organisations in Essex (national government departments, County Council departments, District Councils, Health authorities, business organisations, educational establishments, CABs, voluntary and charitable groups etc.) to develop the necessary standards and set up a prototype system. The application of metatags and the creation of a common thesaurus are being investigated. Once the system has been tested, modified, and proved viable, it is hoped that the system will be opened up to all information providers in the region, and that it will form the basis for the development and delivery of citizen’s information in Essex in the future.

Why do we need a profile for citizens’ information?

Developments in three discrete, but inter-related areas are converging to create a need for a new profile, or standard attribute set, for citizens’ information which will support greater interoperability, help to improve resource description and discovery, and act as a basis for the development of new e-services:

Although this approach is gradually being upgraded to a www based environment, which allows access to an increasing amount of information, it still produces ‘information ‘islands’ which can only be bridged through superficial high level hyper-links. Another problem, which is well recognised by information professionals, is that the current state of indexing and description of documents and resources on the web is inadequate, which means that searches tend to favour recall at the expense of precision. There is a need for further development, and practical application, in areas such as the use of metadata, and automated techniques based on harvesting and web crawlers, in order to improve this situation.

* the publication of the influential report on the future for Britain’s public libraries ‘New Library - the Peoples’ Network’ [1] http://www.lic.gov.uk/publications/newlibrary.html has highlighted the need for public libraries to be linked up to a high speed, high capacity digital network. Attention now is turning to the content and services that public libraries will be able to deliver over that network. The provision of citizens’, or community information, has traditionally been one of the public library’s core functions and there is considerable interest the question of whether and how citizens’ information resources held locally can be aggregated, or made available, as a national resource.

Related research

The British Library funded CIRCE project (http://www.gloscc.gov.uk/circe/index.htm ) has been investigating the potential for networking public library community information databases. The fundamental difference between CIRCE and SEAMLESS is that the SEAMLESS team do not see a long term future for public library community information databases as such. Rather they take the view that there is a danger that public libraries may become marginalised as information providers unless the twin ‘threats’ of competition from other information providers and the trend to remote access to information encouraged by the development of the www are addressed. The SEAMLESS project proposes to develop, test and evaluate a new model for citizens’ information provision in which the public library becomes the facilitator, co-ordinator and standard setter for a distributed system (made up of the information resources of a network of local information providers) and provides expertise and training on demand.

Two basic, but crucial, pre-conditions underpin this new model. The first is that a substantial degree of co-operation is needed between the various information providers in any given locality: no one organisation can provide a successful citizens’ information service in isolation. The second is that some common technical and information standards need to be developed and adopted in order to facilitate successful co-operation and to enable the necessary sharing of data between partners and efficient dissemination of data to the wider public.

One of the key aims of the SEAMLESS project is to test whether some of the large body of previous research into interoperability and metadata could beneficially be applied to a new domain - that of citizens’ information. (See www.ukoln.ac.uk/metadata/, www.ukoln.ac.uk/elib/ and www2.echo.lu/libraries/en/metadata/matahome.html for more information on a number of European Union (EU) and Joint Information Services Committee (JISC) projects funded under the Telematics for Libraries and Electronic Libraries (e-Lib) programmes. Interest in this area continues to grow and JISC and BLRIC (British Library Research and Innovation Centre) have recently established UK Interoperability Focus to explore, publicise and mobilise the benefits and practice of interoperability across diverse information sectors (www.ukoln.ac.uk/interop-focus/).

Extant profiles

Standard attribute sets are a useful starting point for considering data representation in any area. A number of these either currently exist or are emergent in the area of citizen’s information. The SEAMLESS team studied existing standard attribute sets and compared their elements and possible application. The team also looked at a variety of sources describing the general application of metadata.[2] [3] [4] http://www.ukoln.ac.uk/metadata/desire/overview [5] [6]

US MARC Community Information Format
This is the extension of the US MARC attribute set that covers cataloguing of community information. Further details about this attribute set are available at the Library of Congress website (http://lcweb.loc.gov/marc/community/eccihome.html).
 
Dublin Core
The Dublin Core seeks to establish a way to describe documents and “document-like objects” such as web pages, in a way which will enable search engines to index and retrieve them. Further information is available from the website ( http://purl.org/dc/ ).
 
GILS
The Government (or Global) Information Locator Service is the result of an international agreement (based on original work among government departments in the US) to provide a standard for locating information, whether held in libraries, data centres, or published on the Internet. The standard adopted for this service is ISO 23950, also known as (ANSI) Z39.50.[7] Further information is available from the website ( http://www.usgs.gov/gils/ ).
 
CIMI
Consortium for the Computer Interchange of Museum Information. Since 1990 CIMI has made substantial progress in the development of standards for structuring museums’ data and enabling widespread search and retrieval capabilities. Further information is available from the CIMI website ( http://www.cimi.org ).
 
IMS
Instructional Management Scheme. The IMS Project is developing and promoting open specifications for facilitating online activities such as locating and using educational content, tracking learner progress, reporting learner performance and exchanging student records between administrative systems. Further information can be found in the IMS website (http://www.imsproject.org/what.html ).
 

Development of the SEAMLESS profile

SEAMLESS was established with the intention that a wide range of types of organisation should be included, so it was important to ensure that the final system would be hospitable to different types of information and that it would meet the needs of varying types of organisation and the particular needs of their customers. In setting out to define a common information profile (attribute set) the project team contacted a wide variety of potential partner organisations, selected to include some who had expressed interest following the launch conference, some who had worked with the library service before, and some whom it was felt would enhance the variety of information challenges for the pilot project.

Meetings were held with each organisation during Spring 1998 to give them more information about SEAMLESS and to collect information about their role and services, and a workshop was held in April. An Information audit was carried out during June and July 1998 to analyse the organisations’ information products and systems in detail and to assist them in the selection of information sets to make available for the pilot project. This information was then collated and there followed an iterative process of developing a set of information attributes which were both broad enough to encompass the range of domains represented and suitably constrained so as to be manageable in the real world working environment of the organisations concerned.

Following the research into existing standards the team undertook a detailed analysis of the sample data supplied by partner organisations during the Information Audit. The team identified and mapped the various elements within each data set to establish overlaps and common terms. Research staff from Essex Libraries, Education for Change and Fretwell Downing then met to discuss the various options.

The original proposal for the SEAMLESS project postulated a information profile based upon the Dublin Core. Initial research within the project indicated that GILS provided a better basis for development. It was felt that it provided a more hospitable attribute set for the elements identified within the sample data than Dublin Core, while being less complicated to apply and offering more potential for accommodating future developments than USMARC. It is also compliant with the international standard for information searching, ISO 23950 (Z39.50) which is used in the project.

Having decided that GILS might be the standard to use, the research team then undertook detailed matching of the data obtained from partners in the Information Audit to the full GILS Core Elements. The profile had to be able to cope with elements from three data formats: data bases where every field would need to be tagged in order to be displayed, web pages where only searchable elements were required and word documents where again searchable elements were needed but where substantial editing might be required to produce useable data.

This work proved that the majority of data would fit into the GILS Core Elements. The major gap was for information relating to educational courses where there seemed to be nowhere to include information about entry requirements, resulting qualification, target audience or the duration or type of course.

The team therefore reconsidered the other extant standards and decided that the IMS profile included elements which would plug this gap. Following advice from Fretwell Downing four IMS elements were included in the SEAMLESS Information Profile as a Learning Provision Subset. In addition, discussions with the participating information providers indicated a desire to incorporate the Alta Vista format for the keyword and description attributes. These therefore appear in the SEAMLESS profile without the SEAMLESS prefix (se.), the intention being that these tags can be recognised by the Alta Vista robots as well as by SEAMLESS.

Matching also showed that for the majority of the data currently included in the project, the full GILS Core Elements was not required. GILS includes some quite complicated nested tags and requires some expertise to implement correctly. The intention is that partner organisations will add the tags themselves and the team was conscious that the process had to be simplified as much as possible. The workload involved in manipulating data for SEAMLESS had already been identified as a potential problem by many of the organisations and it was felt that any long and complicated tagging process might cause some organisations to drop out of the project.

After discussion with GILS experts at Fretwell Downing and Sebastian Hammer of Index Data, Denmark, the team developed a set of 33 SEAMLESS information attributes (the ‘SEAMLESS Information Profile’) which can for the most part be mapped directly onto the equivalent GILS Core Elements.

Details of the SEAMLESS profile

The 33 elements are (mandatory elements in bold type):

Element No.

Name

Description

1

title

assigned title or description of the resource

2

source

the organisation or provider who is making the information available to SEAMLESS

3

date-last-modified

in the form DD/MM/YYYY

4

channel

term(s) from the SEAMLESS Channels list

5

keywords

term(s) from the SEAMLESS thesaurus

6

originator

the body primarily responsible for the intellectual content of the information.

7

contact-name

the person to contact for more information

8

contact-organisation

the name of the organisation to contact for more information

9

contact-address

the address of organisation to contact for more information

10

contact-network-address

Email address to contact for more information

11

distributor

This element will apply mainly to bibliographic items

12

cost

cost information

13

begin-date

in the form DD/MM/YYYY

14

end-date

in the form DD/MM/YYYY

15

time-textual

Time/date expressed in words

16

linkage

Show URL, URI, SICI, PII, DOI, PURL, ISBN, ISSN etc. here

17

linkage-type

e.g. HTML, MIME, plain text etc.

18

medium

e.g. CD-ROM, Book, Video etc.

19

place

one term plus it’s post town, e.g. Chelmsford

20

description

a textual description relating to the general nature and content

21

contributor

e.g. co-author

22

date-of-publication-structured

in the form DD/MM/YYYY

23

date-of-publication-textual

date expressed in words

24

language

language of the intellectual content of the resource

25

general-constraint

e.g. copyright, use & reuse, intellectual property etc.

26

control-identifier

any local reference number that uniquely identifies the resource within its domain

27

record-review-date

in the form DD/MM/YYYY

28

supplemental-information

a field to map miscellaneous information

29

body

Body text (where appropriate). Basic formatting (white space) is preserved.

Learning provision sub-set

 

30

ims.prerequisite

entry requirements for courses

31

ims.educationalobjective

qualification or intended learning result of course

32

ims.level

the target audience or level of the course

33

ims.duration

length of the course and/or the type of study e.g. full time, part time etc.

Mapping of SEAMLESS Profile Attributes to GILS Core Elements

The mappings are as shown in the table below. Note that where GILS provides several groupings of sub-elements, the decision was taken within the SEAMLESS project to provide a “flat” (i.e. non-nested) schema, which it was felt would ease the process of data preparation across a wide variety of locations and by staff with varying levels of technical understanding.

 

SEAMLESS Element

No.

Name

GILS

Element No.

Equivalent GILS Core Element

1

title

4

Title

2

source

1019

Record source

3

date-last-modified

1012

Date of last modification

4

channel

2074

Controlled Subject Index sub-group: Controlled term

5

keywords

2074

Controlled Subject Index sub-group: Controlled term

6

originator

1005

Originator

7

contact-name

2023

Point of Contact sub-group:

Contact Name

8

contact-organisation

2024

Point of Contact sub-group:

Contact Organization

9

contact-address

2025 - 2029

Point of Contact sub-group:

Contact Street Address

Contact City

Contact State or Province

Contact Zip or Postal Code

10

contact-network-address

2030

Point of Contact sub-group:

Contact Network Address

11

distributor

2006

Availability sub-group:

Distributor Name

12

cost

2055

Order Process sub-group:

Cost Information

13

begin-date

2072

Availability sub-group:

Beginning Date

14

end-date

2073

Availability sub-group:

Ending Date

15

time-textual

2045

Availability sub-group:

Available Time Textual

16

linkage

2021

Availability sub-group:

Linkage

17

linkage-type

2022

Availability sub-group:

Linkage Type

18

medium

1031

Availability sub-group:

Medium

19

place

2042

Spatial Domain sub-group:

Place Keyword

20

description

62

Abstract

21

contributor

1003

Contributor

22

date-of-publication-structured

31

Date of Publication sub-group:

Date of Publication Structured

23

date-of-publication-textual

31

Date of Publication sub-group:

Date of Publication Textual

24

language

54

Language of Resource

25

general-constraint

2005

Use Constraint

26

control-identifier

1007

Control Identifier

27

record-review-date

2051

Record Review Date

28

supplemental-information

2050

Supplemental Information

29

body

None

None

Learning provision sub-set

  

30

ims.prerequisite

None

SEAMLESS/IMS specific sub-group

31

ims.educationalobjective

None

SEAMLESS/IMS specific sub-group

32

ims.level

None

SEAMLESS/IMS specific sub-group

33

ims.duration

None

SEAMLESS/IMS specific sub-group

Mapping of SEAMLESS Profile Attributes to Dublin Core Elements

During discussion several partners expressed concern about implementing a SEAMLESS attribute set which would not provide additional retrieval advantages in the wider web community beyond those systems already recognising GILS. There was some feeling particularly in the academic organisations that they did not wish to cut themselves off from the Dublin Core community. The team therefore decided to include a mapping of SEAMLESS attributes to Dublin Core Elements as part of the system. This is shown in the following table. For details of similar work see the ‘Dublin Core/MARC/GILS crosswalk’.[8] http://www.loc.gov/marc/dccrocc.htm

 

SEAMLESS ELEMENT

Purpose

DUBLIN CORE ELEMENT

Purpose

title

The assigned title or description of the resource.

Title

The name Given to the resource by the Creator or Publisher.

originator

To identify the organisation(s) or person(s) responsible for the creation of the resource.

Creator

The person(s) or organisation(s) primarily responsible for the intellectual content of the resource.

keywords

To specify the subject or topic of the resources using a controlled vocabulary that describes its content for resource description and discovery purposes.

Subject

The topic of the resource, or keywords or phrases that describe the subject or content of the resource.

description

A textual description relating to the general nature and content of the resource.

Description

A textural description of the contents of the resource, including abstracts in the case of document-like objects or contents descriptions in the case of visual resources.

distributor

To identify the entity responsible for making the resource available in its present form such as a publishing house, university department or corporate entity.

Publisher

The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity.

contributor

To identify other significant contributors to the intellectual content of the resource in addition to the originator.

Contributor

Person(s) or organisation(s) in addition to those specified in the Creator element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the creator element.

date of publication

To show the date the resource was published.

Date

The date the resource was made available in its present form.

medium

To specify the physical format and data representation of the resource.

Type

The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary.

linkage

To provide the location or address of an automatic linkage to an electronic resource.

Identifier

String or number used to uniquely identify the resource.

linkage type

To identify the data content type associated with the electronic resource e.g. HTML for a web page, PDF for a Portable Document Format file.

Format

The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image.

None at present

(GILS: SOURCES OF DATA)

 

Source

The work, either print or electronic, from which this resource is derived, if applicable.

language

To indicate to the user the language of the intellectual content of the resource.

Language

Language of the intellectual content of the resource.

None at present

(GILS: CROSS REFERENCE RELATIONSHIP, CROSS REFERENCE LINKAGE)

 

Relation

Relationship to other resources.

begin-date

end-date

time-textual

place

To indicate any start or end dates associated with the resource; to indicate the expression of dates and times in words; to indicate the location where the activity occurs

Coverage

The spatial locations and temporal durations characteristic of the resource.

general constraints

To indicate if any access constraints pertain to the use of the resource.

Rights

The content of this element is intended to be a link ( a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intention of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made if such a field is empty or not present.

Searchable attributes

For the initial implementation the following attributes will be searchable:

keyword, subject, name, place and date.

Comments please

The SEAMLESS team would welcome comments on the proposed citizens’ information profile as outlined above from colleagues active in the fields of metadata and interoperability research and from public libraries and other organisations providing information to the public. Please contact either Mary Rowlatt (maryr@essexcc.gov.uk ) or the SEAMLESS team (seamless@essexcc.gov.uk ).


References:

[1] New Library - the Peoples’ Network
Library and Information Commission, 1997
Available from: http://www.lic.gov.uk/publications/newlibrary.html
[2] Dempsey, Lorcan and Heery, Rachel
Metadata: a current view of practice and issues
Journal of Documentation, Vol. 54(2), March 1998, p145 - 172
[3] European Commission, DGXIII -E4
Report of the Metadata Workshop held in Luxembourg, 1st and 2nd December, 1997
[4] Dempsey, Lorcan and Heery, Rachel, with contributions from Martin Hamilton, Debra Hiom, Jon Knight, Traugott Koch, Marianne Peereboom and Andy Powell
A review of metadata: a survey of current resource description formats
DESIRE 1, deliverable 3.2(1), March 1997
Available from: http://www.ukoln.ac.uk/metadata/desire/overview/
[5] Younger, Jennifer A
Resource description in the digital age
Library Trends, Vol. 45(3), Winter 1997, p462 - 487
[6] Heery, Rachel
Review of metadata formats
Program, Vol. 30(4), October 1996, p345 -373
[7] ISO 23950 1998/ANSI?NISO Z39.50 1995
Information retrieval (Z39.50): application service definition and protocol specificationISO, 1998
[8] Dublin Core/MARC/GILS crosswalk
Network Development and Marc Standards Office, last updated 04/07/97
Available from: http://www.loc.gov/marc/dccrocc.html

Author details

Mary Rowlatt
Information Services Manager and Project Leader for SEAMLESS
maryr@essexcc.gov.uk

Cathy Day
Research Assistant
SEAMLESS project
seamless@essexcc.gov.uk

Jo Morris
Research Assistant
SEAMLESS project
seamless@essexcc.gov.uk

Kevin Atkins
Network Services Consultant
Fretwell-Downing Data Systems Ltd.
katkins@fdgroup.co.uk