![]()
Common Web tools and techniques cannot easily manipulate library resources. While photo sharing, link logging, and Web logging sites make it easy to use and reuse content, barriers still exist that limit the reuse of library resources within new Web services. [1][2] To support the reuse of library information in Web 2.0-style services, we need to allow many types of applications to connect with our information resources more easily. One such connection is a universal method to copy any resource of interest. Because the copy-and-paste paradigm resonates with both users and Web developers, it makes sense that users should be able to copy items they see online and paste them into desktop applications or other Web applications. Recent developments proposed in weblogs [3][4] and discussed at technical conferences [5] [6] suggest exactly this: extending the 'clipboard' copy-and-paste paradigm onto the Web. To fit this new, extended paradigm, we need to provide a uniform, simple method for copying rich digital objects out of any Web application.
The initial Microsoft Live Clipboard specification provides a straightforward way of accomplishing Web clipboard copy and paste. It uses a combination of JavaScript-based in-browser code and an XML wrapper for item content, to provide users with clipboard functionality for some common types of objects defined by microformat specifications [7]. Early work using the Live Clipboard technique in the National Science Digital Library, a U.S. National Science Foundation programme, integrates this technique with more robust digital library protocols, providing for clipboard copy and paste of complex digital objects in software environments with robust architectures [8]. These are tremendously exciting innovations and demonstrate the potential of this approach.
The Live Clipboard demonstrations show copying of event or business card-like information by copying these simple objects between commonly used Web sites and desktop applications. The NSDL demonstration moves complex objects [9][10] between what are presumed to be scholarly communications tools. Both sets of demonstrations are alike in that they are driven by interface events whereby users click and choose to perform clipboard actions in menus. However, the Web 2.0 approach calls for an additional blurring of the lines between user- and machine-driven operations, so it is necessary to devise an approach that also allows software to drive clipboard-style copy and paste functions on users' behalf.
In an automated processing model that supports scripted copying of objects found on Web pages, the following three functional criteria must be met:
Without a standard way for software to identify objects on Web pages, scripts must resort to screen scraping and other unsustainable techniques for guessing where objects start and end. The same logic applies to the requirements of a standard way to find an API entry point, and a common definition of an API for retrieving objects. Without these, third-party applications have to hard-code or guess at the locations and protocols offered by the plethora of Web 2.0 and digital library APIs and their various implementations across the Web.
Fortunately, the digital library community already comes close to satisfying each of these requirements. Protocols such as OAI-PMH [11] and OpenURL [12] each provide frameworks for implementing services that support standardised object fetching through an API. The COinS convention for embedding OpenURL ContextObjects in the HTML SPAN element [13] provides a standard way of identifying objects on Web pages whenever a ContextObject contains an identifier reference.
Members of the gcs-pcs-list [14] first experimented in this area by starting with these digital library tools, because they were already available and well-known. To provide scriptable object copying from Web pages, we combined COinS (with identifiers) on Web pages with JavaScript-based calls to the OAI-PMH functions ListMetadataFormats and GetRecord. To enable Javascript code to find OAI-PMH services automatically, we added HTML LINK tags pointing to relevant OAI-PMH services for our resources, following the pattern for feed auto-discovery now widely implemented across the Internet [15]. These experiments took the form of Greasemonkey scripts [16] which, upon finding COinS with identifiers and OAI-PMH LINK elements, would automatically query the OAI-PMH services' ListMetadataFormats functions and present users with direct links to OAI-PMH GetRecord functions for each available format [17].
This worked quite well, and provided an interesting set of demonstrations. We wrote connectors for a variety of well-known Web sites - the Library of Congress American Memory collections, the arXiv.org pre-print service, Google Books, Amazon.com, and more [18]. These connectors, combined with the 'get this item in formats X, Y, or Z' links that were automatically written into Web pages for users to click, showed great promise and interested our colleagues.
The main problem with this approach was the difficulty explaining how it worked, especially to those less familiar with library-specific technologies. Despite the widespread adoption of OpenURL and the proliferation of OAI-PMH-based content and service providers, few people in the library profession understand the steps necessary to implement these services, and far fewer people outside the library profession can successfully wade through the jargon necessary to understand either. Even if each were readily understood, still more barriers make this approach unlikely to succeed. Firstly, relatively few resources are actually available over OAI-PMH; among the few collections with OAI-PMH interfaces, most typically provide access only to metadata, not full objects [19]. Secondly, many OAI-PMH providers use item identifiers unique to metadata records, and therefore items are not cross-referenced by more widely-known content identifiers. Given these conditions, it would be a mistake to presume that this approach could quickly scale to provide bare object access to a much larger swathe of library resources. Ultimately we want to provide automated access to our resources through clearly defined and familiar techniques that can be implemented in only a few hours of work by a typical Web developer. These rapid implementations need to accommodate both the data provider making resources accessible as well as the downstream clients that need to access such resources. For such a framework to succeed, these techniques must be understandable to the Web community at large without prior knowledge of specific digital library standards.
We addressed this problem by writing a much simpler specification that meets the requirements listed above and remains easy to understand and implement. The 'unAPI' specification [20] is less than two pages long and defines only three components, one to address each of the functional criteria listed previously:
We developed this specification on the public gcs-pcs-list between January and June 2006. Revisions were released nearly every month during this period and made publicly available at http://unapi.info . At every revision stage, after discussing and deciding on issues collected along the way, participants developed at least three independent test implementations similar to those described in the next section. This helped ensure that the specification was indeed manageable, and it enabled us to understand the issues raised during implementation. We completed and published unAPI Version 1 on 23 June, 2006 [21].
The unAPI specification itself contains a simple informative example, excerpted here. The unAPI HTML convention for identifying objects in Web pages is patterned after the technique developed by the microformats.org community for combining machine-readable data values as attribute values in ABBR elements with human-readable representations of that data as text content inside the ABBR:
<abbr class="unapi-id"
title="http://unapi.info/news/archives/9"></abbr>
The unAPI LINK autodiscovery pattern mimics the pattern used by Web browsers to discover news feeds:
<link rel="unapi-server"
type="application/xml" title="unAPI"
href="http://unapi.info/news/unapi.php" />
The unAPI HTTP functions comprise a 'list all object formats' function with no parameters, a 'list formats for a particular object' function with an identifier parameter, and a 'get a particular format for a particular object' function with identifier and format parameters. The first two functions return a simple XML response listing formats that are supported for all items available from the unAPI service. For example, a call to an unAPI service such as this:
http://unapi.info/news/unapi.php?id=http://unapi.info/news/archives/9
...might return an XML response like this:
<?xml version="1.0" encoding="UTF-8"?> <formats id="http://unapi.info/news/archives/9"> <format name="oai_dc" type="application/xml" docs="http://www.openarchives.org/OAI/2.0/oai_dc.xsd"/> <format name="mods" type="application/xml" /> </formats>
The following examples all implement unAPI Version 1. Each example includes a link to user-visible records and unAPI links to one of the records in that view. Follow the links to see for yourself what unAPI looks like.
http://dev.gapines.org/opac/extras/opensearch/1.1/-/html-full/title/Tales+of+the+gross+and+gruesome
Sample unAPI formats list:
http://dev.gapines.org/opac/extras/unapi?id=tag:dev.gapines.org,2006:biblio-record_entry/307171/-
Sample object via unAPI:
http://dev.gapines.org/opac/extras/unapi?id=tag:dev.gapines.org,2006:biblio-record_entry/307171/-&format=marcxml

Figure 1: Screenshot of Evergreen Record
Sample record view:
http://ualweb.library.ualberta.ca/uhtbin/cgisirsi/x/0/0/57/5?user_id=WUAARCHIVE&searchdata1=1565847547%7B020%7D
Sample unAPI formats list:
http://chelsea.library.ualberta.ca/unapi/server?id=2623311
Sample object via unAPI:
http://chelsea.library.ualberta.ca/unapi/server?id=2623311&format=mods

Figure 2: Cocoon pipeline for SRU Proxy
Sample record view:
http://canarydatabase.org/record/488?view=export
Sample unAPI formats list:
http://canarydatabase.org/unapi?id=http://canarydatabase.org/record/488
Sample object via unAPI:
http://canarydatabase.org/unapi?id=http://canarydatabase.org/record/488&format=bibtex

Figure 3: Canary Database record export links
Umlaut unAPI service:
http://umlaut.library.gatech.edu/unapi?
Sample unAPI formats list:
http://umlaut.library.gatech.edu/unapi?id=ctx_ver%3DZ39.88-2004%26ctx_enc
%3Dinfo%253Aofi%252Fenc
%253AUTF-8%26rft_id%3Dinfo%253Adoi%252F10.1038%252F438531a
Sample object via unAPI:
http://umlaut.library.gatech.edu/unapi?id=ctx_ver%3DZ39.88-2004%26ctx_enc%3Dinfo%253Aofi%252Fenc
%253AUTF-8%26rft_id%3Dinfo%253Adoi%252F10.1038%252F438531a&format=umlaut-xml
Sample Flickr.com record view:
http://flickr.com/photos/dchud/31568800/
Sample unAPI formats list:
http://opa.onebiglibrary.net/?id=http://flickr.com/photos/dchud/31568800/
Sample object via unAPI:
http://opa.onebiglibrary.net/?id=http://flickr.com/photos/dchud/31568800/&format=jpeg_Medium
Sample blog view:
http://lackoftalent.org/michael/blog/
Sample unAPI formats list:
http://lackoftalent.org/michael/blog/unapi.php?id=oai:lackoftalent.org:technosophia:45
Sample object via unAPI:
http://lackoftalent.org/michael/blog/unapi.php?id=oai:lackoftalent.org:technosophia:45&format=marcxml

Figure 4: MARCXML from WordPress
Sample blog entry view:
http://unapi.info/news/archives/16
Sample unAPI formats list:
http://unapi.info/news/unapi.php?id=http%3A//unapi.info/news/archives/16
Sample object via unAPI:
http://unapi.info/news/unapi.php?id=http%3A//unapi.info/news/archives/16&format=oai_dc
Greasemonkey scripts: unAPI links are not necessarily visible to users by default. Client-side page rewriting scripts can provide a visual indicator of available unAPI links and formats where such indicators might not otherwise be present in page design. The following Greasemonkey scripts both provide visual indicators of available unAPI links using different styles; both are based on the same code.
To use these scripts, first install Greasemonkey [16], restart your Firefox browser, and then install one or both of these scripts. Both will work on all of the sample record views listed above.

Figure 5: Web-based unAPI Validator
Like any other standards development process, the unAPI specification process had its share of difficult and contentious issues. Ultimately we chose the simplest solutions that would work in the widest possible set of applications. Some of these issues included:
With unAPI Version 1 complete, it is now available for general use. We have started to experiment by combining the unAPI copying functions with the Atom Publishing Protocol or with the Live Clipboard as a paste function. For example, we enhanced the unalog social bookmarking application to copy out objects found via unAPI in bookmarked pages, and to paste these objects in using Atom. Objects pasted into unalog are then available to other users in both the unalog user interface and through a new unAPI interface in unalog. Figure 6 demonstrates this; the images, from Flickr, were obtained through OPA's unAPI interface and its JSON object wrapping format.

Figure 6: unAPI Copy and Atom Paste in unalog.
As we move forward with implementing unAPI Version 1, we continue to watch related developing techniques such as microformats and HTTP header Link Templates [29]. If a microformat for identifying arbitrary identifiers in HTML or a similar technique within HTML itself emerged, unAPI would not need to specify use of the ABBR pattern. Similarly, if Link Templates or another technique made API patterns more easily discovered and specified, unAPI would not even need to define its own parameter names or LINK element semantics. If all of these missing pieces were to appear in widely accepted solutions, the unAPI object-copy paradigm could exist as a mere one-paragraph convention (a significant reduction from the present length of one and a half pages). In the meantime, we believe that unAPI Version 1 can help to get more out of library - or any other - Web applications. It follows the Unix traditions of doing one thing well and being easily connected and combined to form more complex functionality. We hope that it proves useful and helps to bring the library community closer to the level of simplified integration demanded by users today.
![]()
Ariadne is published every three months by UKOLN. UKOLN is funded by MLA the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. Material referred to on this page is copyright Ariadne (University of Bath) and original authors.