Unique Identifiers in a Digital World
Andy Powell reports on a seminar organised
jointly by Book Industry Communication and the UK Office for Library and Information
Networking on the use of unique identifiers in electronic publishing.
This article appears in the Web version only of Ariadne.
On the afternoon of Friday the 14 March more than 50 people involved in electronic
publishing met for a seminar reviewing recent developments in the unique identification of
digital objects. Delegates included representatives of publishers, libraries and other
organisations. The seminar was organised jointly by Book Industry Communication (BIC)
and the UK Office for Library and Information Networking (UKOLN) with support from
the eLib programme. A brief report follows:
Introduction - Why we need identifiers
Brian Green (BIC) and Mark Bide (Mark Bide and Associates) introduced the seminar
with an overview of why the publishing industry needs identifiers [1].
Unique identifiers for digital objects are an essential part of the technology that allows:
- electronic trading including rights transactions;
- copyright management;
- electronic tables of contents;
- production tracking and other in house administration;
- bibliographic control and resource discovery.
Several issues were highlighted:
- What level of 'granularity' is required? Traditionally publishers have worked at the
book or journal level, using the International Standard Book Number (ISBN) and
International Standard Serials Number (ISSN) as identifiers. However, the unit of
publication is getting smaller. Recent schemes allow for the identification of individual
articles within publications. Increasingly we need to identify much smaller fragments of
complete works, for example parts of text, images, video clips, pieces of software, etc.
- Identifiers are either 'dumb' or 'intelligent'. A dumb identifier has no inherent meaning
and can only be resolved by looking it up in a database. Intelligent identifiers contain
some meaning. Consider the ISBN. This is a relatively intelligent identifier because its
various parts have some meaning. The first part identifies the country, language or
geographic region in which the book was published for example. However, as book
rights are sold from one publisher to another the intelligence of the ISBN decreases and
any particular ISBN can only be resolved by querying it against a central database. As
the unit of publication gets smaller, the number of identifiers required grows and it
becomes increasingly difficult to maintain intelligent identification schemes. The trend
is likely to be towards 'dumb' identifiers.
- It is important to distinguish between identification and location. The Uniform
Resource Locator (URL) that we are all familiar with on the Web is a locator rather
than an identifier. If an object moves, its associated URL changes and people using the
old URL are likely to get a failure indicating that it is no longer available. A true
identifier must remain the same whatever the current location of the object. The IETF
URN Working Group are in the process of defining Uniform Resource Names (URNs)
[2] which are persistent identifiers for information resources.
- How persistent should an identifier be? In the Internet world it is generally accepted
that identifiers for digital objects need to last for a long time - significantly longer than
the objects they identify. Indeed, they probably need to outlast current Internet
technology and computer systems.
There is another complication at the moment in that we are in a transitional period of
publishing. Publishers must continue to deal with traditional paper publications, while also
being involved with new electronic only publications and with parallel publications.
The music industry is facing similar problems. In response the International Confederation
of Authors and Composers' Societies (CISAC) [3] has been developing the Common
Information System (CIS). This system includes identifiers for various manifestations of
content and for creators and publishers. A recent development is the International
Standard Work Code (ISWC) which identifies the musical composition itself, rather than
the recorded or printed expression of the work. It has been suggested that the ISWC
might be extended to cover literature and the visual arts as well. Creators and publishers
are identified by the Compositeur, Auteur, Editeur (CAE) number, which will be extended
and renamed the Interested Party (IP) number.
The Digital Object Identifier
Carol Risher (Association of American Publishers (AAP) and Albert Simmonds (RR
Bowker) gave an overview of the Digital Object Identifier (DOI) [4]. Their presentation
included a video based largely on the first public demonstration of the DOI given in
February that showed documents and other files being retrieved on the Web using DOIs
rather than URLs. Development of the DOI is being performed by RR Bowker and the
Corporation for National Research Initiatives (CNRI) on behalf of the AAP.
A DOI contains two parts. The first part, known as the 'Publisher ID', indicates the
numbering agency and publisher and is assigned by the DOI Agency. The second part,
known as the 'Item ID', is assigned by the publisher and can be made up of any alpha-
numeric sequence of characters. The use of an existing standard scheme in the Item ID, a
SICI or PII for example, is encouraged though some publishers may choose to use a
proprietary scheme. A DOI can be assigned to any digital object at a level of granularity
that is appropriate to the publisher. Typically this might mean that a separate DOI is
assigned to each component (text, image, sound, video) of a multimedia document.
The DOI system has two parts - the 'DOI agency' and the 'DOI computers'. The DOI
agency assigns Publisher IDs, issues guidelines for DOI usage and works with the relevant
standards bodies to maintain the integrity of the system as a whole. The DOI computers
form a distributed system that resolve any DOI to its associated URL. The system is based
on the CNRI handle system [5]. Any user who knows the DOI of a digital object can
query the DOI Directory directly by typing it into a Web based search form. Typically
however, DOIs are likely to be embedded in Web pages, hidden behind clickable buttons.
Queries to the DOI Directory are resolved and the client passed direct to the publisher's
system.
The current state of the DOI system is as follows:
- the DOI system is real and can be used now;
- publisher procedures are still being formulated;
- DOIs tend to be long but in general will not be seen;
- the DOI system is free to readers;
- publishers will have to pay to register a Publisher ID with the DOI agency;
- a European agency is planned.
Once assigned, a DOI remains unchanged. If the ownership of an object changes, the new
owner registers the change with the DOI agency. If the object pointed to by the DOI
moves (that is, the URL changes), the DOI entry for that object can be updated.
It is anticipated that the charges associated with registering with the DOI agency will be
small enough that DOIs will be used in non-commercial areas of the Internet as well as by
commercial publishers. The DOI agency will assign Publisher IDs to individuals and other
organisations in addition to traditional publishers.
The DOI is non-proprietary and will be introduced to ISO in May. Development of the
DOI system will continue over the summer culminating in a full demonstration at the
Frankfurt Book Fair in October 1997.
The SICI and the BICI
Sandy Paul (SISAC/BISAC) gave an overview of the Serial Item and Contribution
Identifier (SICI) [6], a scheme for identifying serials and parts of serials. The scheme has
been in use since the late 1980's and is now widely used, mainly at the issue level, by a
broad range of publishers in EDI message transactions and by libraries and subscription
agents.
The original version of the SICI allowed an identifier to be assigned to each issue of a
serial (the Serial Item Identifier) and to each contribution (article) within a serial (the
Serial Contribution Identifier). Recently the SICI has been updated to identify fragments
other than articles (for example a table of contents, an abstract or an index) and to identify
particular physical formats. The SICI contains the ISSN of the serial.
A final draft of the Book Item and Component Identifier (BICI) [7] is now available. This
is essentially a book version of the SICI, using the ISBN in place of the ISSN. The BICI
can be used to identify a part, a chapter or a section within a chapter, or any other text
component, such as an introduction, foreword or index. It can also identify an entry in a
directory, encyclopaedia or similar work that is not structured into chapters.
The PII
Norman Paskin (Elsevier Science) gave an overview of the Publisher Item Identifier (PII)
[8] which was developed in 1995 by the Scientific and Technical Information (STI) group
of publishers. The requirements for the PII were:
- format independence;
- capability for future extension;
- one document per identifier, one identifier per document;
- easy to generate;
- generated by the publisher;
- minimal restrictions on applicability;
- compatible with other standards.
The PII is made up of 17 characters and contains the ISBN or ISSN in order to guarantee
uniqueness. It is a 'dumb' identifier that has the capacity of 10000 items per journal per
year. Future versions of the PII will have extensions to cover document components and
versions. Development of any new version of the PII will take account of developments in
other areas, for example the DOI system and URNs.
Some interesting figures were given for the numbers of identifiers required for the STI
area of publishing. Estimating 1 million articles per year, identifying all the versions of all
the components of those articles may require somewhere in the region of 1014 identifiers!
[9]
Group Sessions
The seminar closed with three group sessions covering:
- Copyright management applications
- Using DOIs in the information supply chain
- DOI syntax and system.
These were followed by group reports and a plenary discussion. Some interesting issues
were raised.
- Should the DOI Agency be closely aligned to a country's ISBN agency?
- Should the publisher part of the Publisher ID be based on the Interested Party
number from the Common Information System (see above)?
- Can DOIs can be assigned to off-line (for example CD-ROM based) digital objects?
Yes.
- Does the DOI have any relevance to traditional print-only publications? No.
- What happens to 'dead' DOIs?
- How does the DOI system cope with digital objects that are mirrored across several
sites? The DOI resolves to the URL of a page that contains a list of pointers to the
mirrored resources.
It was generally agreed that the group sessions could have gone on for far longer than the
45 minutes allocated and that follow-up meetings in specific areas may be required.
This was an interesting seminar and thanks are due to Brian Green (BIC) and Rosemary
Russell (UKOLN) for organising a very successful event.
References
- Unique Identifiers: a brief introduction, Brian Green and Mark Bide, ISBN 1-873671-18-0
http://www.bic.org.uk/bic/uniquid
- IETF URN Working Group,
http://www.bunyip.com/research/ietf/urn-ietf/
- International Confederation of Authors and Composers' Societies (CISAC),
http://www.cisac.org/
- Digital Object Identifiers,
http://www.doi.org/
- CNRI Handle System,
http://www.handle.net/
- SICI standard,
http://sunsite.Berkeley.EDU/SICI/
- A Standard Identifier for Book Items and Contributions - draft (Report prepared for BIC and the British National Bibliography Research Fund), David Martin - available after 21 April 1997
http://www.bic.org.uk/bic/bici.html
- The PII as a means of Document identification,
http://www.elsevier.nl/inca/homepage/about/pii/
- Information Identifiers, Norman Paskin,
Learned Publishing (vol 10 issue 2, pp 135 -156)
Author Details
Andy Powell,
Technical Development and Research Officer,
Email:
A.Powell@ukoln.ac.uk
Web page:
http://www.ukoln.ac.uk/~lisap/
Tel: +44 1225 323933
Address: UKOLN,
University of Bath, Bath, BA2 7AY

Material on this page is copyright Ariadne/original
authors.
This article last updated/links checked on 8-Apr-1997