This article is based on a presentation given at the Innovations in Reference Management workshop, January 2010.</p> </div> </div> </div> <!-- v3: amended in light of author's further final-read revisions 2010-02-12-11-11 rew --><!-- v3: amended in light of author's further final-read revisions 2010-02-12-11-11 rew --><p>It seems fair to say that the lion's share of work on developing online tools for reference and citation management by students and researchers has focused on familiar types of publication. They generally comprise resources that can be neatly and discretely bound in the covers of a book or journal, or their electronic analogues, like the Portable Document Format (PDF): objects in established library or database systems, with ISBNs and ISSNs underwritten by the authority of formal publication and legal deposit.</p> <p>Yet, increasingly, native Web resources are also becoming eminently citable, and managing both the resources, and references to them, is an ongoing challenge. Moreover, the issues associated with referencing this kind of material have received comparatively little attention, beyond introducing the convention that includes the URL and the date it was accessed in bibliographies. While it may be hard to quantify the "average lifespan of a web page" [<a href="#1">1</a>], what is undeniable is that Web resources are highly volatile and prone to deletion or amendment without warning.</p> <p>Web Preservation is one field of endeavour which attempts to counter the Web's transient tendency, and a variety of approaches continue to be explored. The aim of this article is to convey the fairly simple message that many themes and concerns of Web preservation are equally relevant in the quest for effective reference management in academic research, particularly given the rate at which our dependence on Web-delivered resources is growing.</p> <p>Digital preservation is, naturally, a strong theme in the work of the University of London Computer Centre (ULCC)'s Digital Archives Department, and Web preservation has featured particularly strongly in recent years. This article will draw upon several initiatives with which we have been involved recently. These include: the 2008 JISC Preservation of Web Resources Project (JISC-PoWR) [<a href="#2">2</a>], on which we worked with Brian Kelly and Marieke Guy of UKOLN; our work for the UK Web Archiving Consortium; and the ongoing JISC ArchivePress Project [<a href="#3">3</a>] (itself, in many ways, a sequel to JISC-PoWR).</p> <p>Another perspective that I bring is as a part-time student myself, on the MSc E-Learning programme at Edinburgh University. As a consequence I have papers to read, and write, and a dissertation imminent. So for this reason too I have a stake in making it easier to keep track of information for reading lists, footnotes and bibliographies, whether with desktop tools or Web-based tools, or through features in online VLEs, databases and repositories. To bring more clarity to such discussions, the PILIN Project has devised an abstract model of identifiers and identifier services, which is presented here in summary. Given such an abstract model, it is possible to compare different identifier schemes, despite variations in terminology; and policies and strategies can be formulated for persistence without committing to particular systems. The abstract model is formal and layered; in this article, we give an overview of the distinctions made in the model. This presentation is not exhaustive, but it presents some of the key concepts represented, and some of the insights that result.</p> <p>The main goal of the Persistent Identifier Linking Infrastructure (PILIN) project [<a href="#1">1</a>] has been to scope the infrastructure necessary for a national persistent identifier service. There are a variety of approaches and technologies already on offer for persistent digital identification of objects. But true identity persistence cannot be bound to particular technologies, domain policies, or information models: any formulation of a persistent identifier strategy needs to outlast current technologies, if the identifiers are to remain persistent in the long term.</p> <p>For that reason, PILIN has modelled the digital identifier space in the abstract. It has arrived at an ontology [<a href="#2">2</a>] and a service model [<a href="#3">3</a>] for digital identifiers, and for how they are used and managed, building on previous work in the identifier field [<a href="#4">4</a>] (including the thinking behind URI [<a href="#5">5</a>], DOI [<a href="#6">6</a>], XRI [<a href="#7">7</a>] and ARK [<a href="#8">8</a>]), as well as semiotic theory [<a href="#9">9</a>]. The ontology, as an abstract model, addresses the question 'what is (and isn't) an identifier?' and 'what does an identifier management system do?'. This more abstract view also brings clarity to the ongoing conversation of whether URIs can be (and should be) universal persistent identifiers.</p> <h2 id="Identifier_Model">Identifier Model</h2> <p>For the identifier model to be abstract, it cannot commit to a particular information model. The notion of an identifier depends crucially on the understanding that an identifier only identifies one distinct thing. But different domains will have different understandings of what things are distinct from each other, and what can legitimately count as a single thing. (This includes aggregations of objects, and different versions or snapshots of objects.) In order for the abstract identifier model to be applicable to all those domains, it cannot impose its own definitions of what things are distinct: it must rely on the distinctions specific to the domain.</p> <p>This means that information modelling is a critical prerequisite to introducing identifiers to a domain, as we discuss elsewhere [<a href="#10">10</a>]: identifier users should be able to tell whether any changes in a thing's content, presentation, or location mean it is no longer identified by the same identifier (i.e. whether the identifier is restricted to a particular version, format, or copy).</p> <p>The abstract identifier model also cannot commit to any particular protocols or service models. In fact, the abstract identifier model should not even presume the Internet as a medium. A sufficiently abstract model of identifiers should apply just as much to URLs as it does to ISBNs, or names of sheep; the model should not be inherently digital, in order to avoid restricting our understanding of identifiers to the current state of digital technologies. This means that our model of identifiers comes close to the understanding in semiotics of signs, as our definitions below make clear.</p> <p>There are two important distinctions between digital identifiers and other signs which we needed to capture. First, identifiers are managed through some system, in order to guarantee the stability of certain properties of the identifier. This is different to other signs, whose meaning is constantly renegotiated in a community. Those identifier properties requiring guarantees include the accountability and persistence of various facets of the identifier—most crucially, what is being identified. For digital identifiers, the <strong>identifier management system</strong> involves registries, accessed through defined services. An HTTP server, a PURL [<a href="#11">11</a>] registry, and an XRI registry are all instances of identifier management systems.</p> <p>Second, digital identifiers are straightforwardly <strong>actionable</strong>: actions can be made to happen in connection with the identifier. Those actions involve interacting with computers, rather than other people: the computer consistently does what the system specifies is to be done with the identifier, and has no latitude for subjective interpretation. This is in contrast with human language, which can involve complex processes of interpretation, and where there can be considerable disconnect between what a speaker intends and how a listener reacts. Because the interactions involved are much simpler, the model can concentrate on two actions which are core to digital identifiers, but which are only part of the picture in human communication: working out what is being identified (<em>resolution</em>), and accessing a representation of what is identified (<em>retrieval</em>).</p> <p>So to model managing and acting on digital identifiers, we need a concept of things that can be identified, names for things, and the relations between them. (Semiotics already gives us such concepts.) We also need a model of the systems through which identifiers are managed and acted on; what those systems do, and who requests them to do so; and what aspects of identifiers the systems manage.</p> <p>Our identifier model (as an ontology) thus encompasses:</p> <ul> <li><strong>Entities</strong> - including actors and identifier systems;</li> <li><strong>Relations</strong> between entities;</li> <li><strong>Qualities</strong>, as desirable properties of entities. Actions are typically undertaken in order to make qualities apply to entities.</li> <li><strong>Actions</strong>, as the processes carried out on entities (and corresponding to <strong>services</strong> in implementations);</li> </ul> <p>An individual identifier system can be modelled using concepts from the ontology, with an identifier system model.</p> <p>In the remainder of this article, we go through the various concepts introduced in the model under these classes. We present the concept definitions under each section, before discussing issues that arise out of them. <em>Resolution</em> and <em>Retrieval</em> are crucial actions for identifiers, whose definition involves distinct issues; they are discussed separately from other Actions. We briefly discuss the standing of HTTP URIs in the model at the end. It was a Tuesday, over coffee, that the esteemed editor of this publication presented me with a copy of <em>Website Optimization</em> and asked if I would be interested in reviewing it. Two days later, at a regular team meeting for the Repositories Support Project [<a href="#1">1</a>] (RSP), we discussed (rather generally) how we might boost the search ranking and usage of the RSP Web site. The only interesting persistent identifiers are also persistently actionable (that is, you can "click" them); however, unlike a simple hyperlink, persistent identifiers are supposed to continue to provide access to the resource, even when it moves to other servers or even to other organisations.</p> <p><a href="http://www.ariadne.ac.uk/issue56/tonkin" target="_blank">read more</a></p> issue56 feature article emma tonkin ansi california digital library cnri darpa ietf iso niso oclc portico ukoln university of bath archives ark bibliographic data blog browser cataloguing content management cool uri data database digital library digital object identifier dissemination dns doi ftp handle system identifier infrastructure licence metadata multimedia naan name mapping authority namespace national library openurl persistent identifier preservation purl request for comments research rfc search technology software standardisation standards unicode uri url urn utf-8 video z39.88 Tue, 29 Jul 2008 23:00:00 +0000 editor 1413 at http://www.ariadne.ac.uk DCC Workshop on Persistent Identifiers http://www.ariadne.ac.uk/issue44/dcc-pi-rpt <div class="field field-type-text field-field-teaser-article"> <div class="field-items"> <div class="field-item odd"> <p><a href="/issue44/dcc-pi-rpt#author1">Philip Hunter</a> gives a personal view of this workshop held in Glasgow, 30 June - 1 July, supported by NISO, CETIS, ERPANET, UKOLN and the DCC.</p> </div> </div> </div> <p>A Digital Curation Centre (DCC) Meeting on Persistent Identifiers was held over 30 June - 1 July 2005 at the Wolfson Building at the University of Glasgow. This is a new construction (2002) just opposite the 1970s Boyd-Orr building, mentioned before in <em>Ariadne</em>'s pages. The architecture of this building is quite unlike the Boyd-Orr building however, being light and airy, with more imaginative use of space: the lecture theatre in which the meeting took place is in the shape of an eye, situated at the edge of the main open space.

A Digital Curation Centre (DCC) Meeting on Persistent Identifiers was held over 30 June - 1 July 2005 at the Wolfson Building at the University of Glasgow. This term comes from advice provided by W3C (the World Wide Web Consortium). The paper "Cool URIs don't change" <a href="#ref-01">[1]</a> begins by saying: