Librarian / Programmer
Yale Center for Medical Informatics
Towards Library Groupware With Personalised Link Routing
'Library groupware' - a set of networked tools supporting information management for individuals and for distributed groups - is a new class of service we may choose to provide in our libraries. In its simplest form, library groupware would help people manage information as they move through the diversity of online resources and online communities that make up today's information landscape. Complex implementations might integrate equally well with enterprise-wide systems such as courseware and portals on a university campus, and desktop file storage on private individual computers. Ideally, successful library groupware should provide individuals and groups with a common set of information functions they may apply to any information they find anywhere.
In this article we make a simple case for library groupware as a unifying service model across disparate information environments. We consider the distributed, personalised collection development model that groupware would serve, and propose an architectural model which might provide a first step in an evolutionary path from today's commonplace digital library services towards integrated library groupware.
Why Do We Need Groupware?
Consider three networked applications that are already used constantly: link resolvers , which short-cut access from one Web resource to related resources or library services; bibliographic reference managers, which enable users to manage records about information resources they might need to reference again; and weblogs, which let anyone write whatever they want about anything they like. Support for each of these applications varies widely in today's libraries. Link resolvers are centralised tools used via the Web by users and library staff to connect licensed resources and library services; traditional tools for reference management are desktop applications introduced to library visitors via bibliographic instruction, although recent versions and new products make Web-based reference management possible; weblogs are typically managed by weblog users themselves, with only a few examples of weblog support provided by libraries or campus computing services to be found.
It is interesting to examine the relationships between these tools and what they help users to do. For example, is following a cited reference link to a link resolver the same kind of action as following a link on someone's weblog? Are citing a work in a peer-reviewed paper and citing a work on a weblog the same action, or are they different somehow? Because the support levels libraries provide for each kind of application vary widely, it might seem natural to consider that these applications and their functions are quite different. But it also seems likely that to the library users following and citing many references from many sources as they manage the bibliographic lifecycle of their ongoing work, the functions these applications provide are quite similar.
In a fluid world where users move regularly between informal discussion and scholarly/research domains, we can consider the functional areas of linking, reference management, and weblogging to be service points on a single continuum of information gathering, study, and creation. Following a reference from a weblog or from a scholarly article are each similar steps in exploring threads of related ideas. Capturing a reference in your own weblog or reference library indicates that the citation somehow relates to your own thought process. Publicly citing a reference more closely associates your thinking with that of others.
The link resolver solution works because it simplifies navigation through diverse library resources. There are so many online resources with so many different interfaces that it can become nearly impossible for users to move naturally through the threads of ideas embodied in the content of those resources without link resolvers. Libraries that provide reference-linking services with link resolvers provide navigational clarity to this sea of interface complexity. Resolver services also let librarians customise the connections between the formally published resources contained within the centralised information space defined by library collections. Although these are major improvements for users and librarians, the benefits are limited to the use of centrally collected library resources.
The broader information landscape - including library resources among weblogs, pre-print archives, and decentralised information resources and repositories mingling with enormous desktop computing power and storage on private devices - is where users and groups find, collect, and use information today. We would do well to consider how we might bring better navigational clarity and the ability to customise connections to this more diverse and decentralised information landscape.
Formalising Personal Collection Development
The increasing network-savvy of information consumers, always connected in multi-user gaming, chat, and file-sharing environments, symbolises a shift from a model of centralised collection development . To consider the relationship between these newer patterns of information usage and traditional library collection development is to realise quickly that we have enlarged the idea of what collection development means. More than ever, libraries are sharing collection development responsibilities with library users. As decision-making about how to organise information expands from the centre (libraries) to the edge (users and user groups), we need to find ways to make the resources libraries provide fit more easily into a larger and more dynamic information landscape.
We are beginning to see efforts addressing this need. The Interactive University at UC Berkeley is building the Scholar's Box  application to enable users better to integrate digital resources from libraries with other information sources and tools. The Scholar's Box makes it easier to create personalised and themed collections of digital cultural objects for use as research and learning materials. Benefits of such a tool include simplifying integration of digital primary source materials into teaching and learning, and simplifying integration of user-built collections with other end-user and institutional tools for managing and sharing information.
The Scholar's Box application enables these functions by bundling
- the ability to import, export, and transform collections packaged using contemporary standards such as the Metadata Encoding and Transmission Standard (METS) and the IMS Content Packaging specification (IMS-CP)
- the ability to search databases internal (library public access catalogues) and external (Amazon, Google) to academic environments, and
- the ability to publish collections in standard document formats and to personalised resources like weblogs.
A core motto of the Scholar's Box project is 'Gather, Create, Share.' This motto speaks of the need to put more control in the hands of individuals to select information from a diversity of sources, to collect and organise that information as they see fit, and to enable broad use in a manner not limited by the boundaries of traditional systems and individual applications. These objectives match those already sought by the aforementioned information consumers as they navigate their own information landscapes.
Managing Information Across Communities
The most prominent current example of individuals and groups defining the shape of their own information landscapes is the tremendous growth of weblogs. No longer just the realm of undergraduates talking about their online friends and social lives, scholars use weblogs for both scholarly and avocational reasons . Some use weblogs to keep up with their own academic work and that of their colleagues. Other academics use weblogs for personal material or to write about personal opinions that may or may not have to do with their scholarly work. These sites can be particularly illuminating, as academics seem to feel freer to express their opinions in their own places on the Web. Among technologists and scientists, Web pages and weblogs are (and have been for a while) quite common. As scientific communication has moved online, scientists have begun to post pages with reprints, supplemental materials, and other publications more frequently. In the last few years a larger community of social scientists, humanists, and scholars from many other disciplines have also moved online, especially as new software has made starting and contributing to weblogs easier.
We are in the early stages of understanding how people use weblogs for research, but it seems clear that weblogs have already become essential methods of interaction in academia, where weblogs help academics to connect, augmenting formal scholarly communications. In a way, weblogs represent the informal end of the continuum from formal to informal scholarly communications: starting at the other end with peer-reviewed publications, we can envision pre-print and e-print archives, institutional repositories, online community forums, mailing lists, and weblogs as tending to have varying degrees of formality depending on the level and character of administrative policy, peer review, and institutional stewardship brought to each. Weblogs can also be seen as a new tool for controlling and personalising both formal and informal aspects of research and teaching. Keeping weblog pages with results of saved searches or tables of contents, for instance, is an easy way to link storage and sharing with traditional information-seeking tools. Course weblogs are also proving to be useful, directing students to Internet resources on certain topics, and allowing teachers to post material and get feedback (often through comments) from students. In this context, the boundaries between scholarly communication systems, weblogs, and dedicated courseware systems as teaching tools are not so clearly drawn.
Indeed, much like bibliometric techniques for formal communication systems, tools for connecting weblogs to each other and to other information services are already in widespread use. 'TrackBack,' for example, allows a weblog author to connect their own comments directly to others' posts:
"In a nutshell, TrackBack was designed to provide a method of notification between websites: it is a method of person A saying to person B, 'This is something you may be interested in.' To do that, person A sends a TrackBack ping to person B... the TrackBack ping has created an explicit reference between my site and yours. These references can be utilized to build a diagram of the distributed conversation. Say, for example, that another weblogger posted her thoughts on what I wrote, and sent me a TrackBack ping. The conversation could then be traced from your original post, to my post, then to her post. This threaded conversation can be automatically mapped out using the TrackBack metadata." 
TrackBack looks much like the same kinds of citation practices followed in scholarly and other publishing contexts for generations. That members of the blogosphere have defined techniques for accomplishing this indicates that people are perhaps more willing than ever to speak informally, and to speak publicly, in ways that bolster connections forward (by leaving TrackBack ping URLs) for others as readily as backward (by citing preceding sources).
For instance, other tools gaining prominence include Blogdex , which generates a summary of popular links anywhere on the Web by analysing the outward link patterns from weblogs, and Technorati , which provides an impact factor-like ranking of weblogs by inbound links from other weblogs. Delicious , Furl  and the authors' unalog  are 'shared link logs' allowing distributed individuals and groups to quickly categorise and share bookmarks and recently read links. Biologging  directly connects weblogging to the Pubmed database by allowing users of a custom Pubmed interface to add entries for interesting citations to a shared weblog.
These new services and tools indicate that increasingly people want to share information about what they are reading, and what they have to say about it, and what others have to say about what they say. Many new services such as weblogs are quickly becoming mainstream, as major institutions such as Harvard Law School and MIT are bringing up public weblog services for their community members. Information-sharing innovations are also coming from within academia. The University of Minnesota Libraries, for example, has added their community weblogging system UThink as a target in their link resolver system, so users can post a citation directly onto their own weblogs .
In the current library software marketplace, where digital library services can include metasearch portals with citation clipboards and UThink with private weblogs connected to link resolvers, it seems clear that we are in the middle of a wave of innovation and integration of these new services. The common thread running through these innovations is that each new service helps individuals move and connect more kinds of information from more diverse resources through the various information communities in which they participate. We are still at a stage where each innovation adds value within a well-defined community or information context, even while we are learning that we will have to meet the needs of users who regularly move between formal and informal communities, and between public and private contexts. Before long, our ability to meet these users' needs will be limited by our inability to allow users to create and connect information sources and services as they see fit.
A Simple Architectural Solution: Personalised link routing
Because these services have so much in common, it seems likely that one or more common architectural patterns could help formalise the roles and relationships of each. One view of how to build these services can be found in a reconception of our first-generation link resolution systems. In most deployments (aside from UThink), link resolvers take a single anonymous user at a single library from one information source to one of many services of use for that source: from a reference to a full-text article, for instance, or from an article to a cited reference list, or from a metadata record to an inter-library loan form. Hence the term 'resolver': the system reports which services - as pre-defined by librarians - are available relative to a given information object, and resolves a user service choice by redirecting users to their chosen site. The entire transaction is stateless, in that which service one user chooses for any arbitrary source has no effect on his next choice for a different source, or on the next user's available choices for the same source. For each source item, one set of suggested services for that source appears, and usually one service request is then resolved.
There are many potentially interesting artifacts from these transactions, such as usage logs, and analysis thereof, or anecdotal user feedback. But typically there is no remembered state, in that there is no attempt within each separate transaction for the system to recognise the user involved (aside from simple authentication and authorisation in, say, an off-campus proxy context), and there is no attempt to determine any preferences that user might have for potential service targets. A user cannot specify her own source and target categories; she is left to enter the process only from - and exit only to - sources and targets defined by library staff. There is no opportunity for insertion of per-user rules that will trigger secondary services for a given source link, such as link logging to a private or group weblog, or automated subject-specific indexing based on the referring source. Users cannot configure resolvers to automate services to be performed, for instance, in the background at the same time as they select a resolution service, (e.g. 'log all my links automatically but this time I want to read the full text'). And users cannot stipulate that, perhaps, they want resolution to happen within a different library's resolver, (e.g. 'I'm just visiting this university library for the week, please forward these requests to the resolver at my home institution').
These missing features can be summarised in the phrase 'personalised link routing.' 'Personalised' means the addition of functions that will vary depending on the user, either as predetermined by librarians or as specified by the user. 'Link routing' means arbitrary rewiring of the current single transaction paradigm (source -> service list -> target). 'Personalised link routing' would allow hooks at and between each phase of the current resolver pipeline, which would support multiple, arbitrary, parallel, or sequential actions to be specified by either librarians, as at present, or by users.
To explore this model further, here are examples of how some personalised and routable actions might be wired in to the various steps:
- Adding a bibliographic reference to a weblog
As UThink demonstrates, this can be a new target added to the list of resolver services. In UThink, the resolver offers to send the reference to a campus-based weblog. In contrast, a personalised router would expand on this by allowing a user to customise a list of additional targets, such as other weblogs, and then to send the bibliographic reference to one or more of those. Weblog targets need only to be able to parse, store, and later render the reference. To become a link router source, the weblog target needs to be able to send the reference back to the router, or resolve the bibliographic reference itself.
- Routing to multiple targets
From the previous example, it is also easy to imagine sending reference information to multiple targets, such as courses in a courseware server. In this scenario, a lecturer could quickly add current news links to a list of course readings, and simultaneously route the same references to both his own personal reference library, and a group link log shared with colleagues from the same discipline. Ideally these diverse options would be presented in the same screen where users now see a choice between resolver targets. The lecturer could choose to send references automatically to his own reference library by default, and from one reference to the next, which other resources (the course reading list, the group link log) he chooses might vary.
- Visiting researcher router bounce
For a visiting researcher without a local account, the router could offer temporary guest access, during which the researcher could specify his home institution. Within the same session, router lookups could be automatically bounced to his home library's router, where that router's knowledge of the researcher's own preferences would manage what happens next - saving the reference in one of his course collections, for instance. This model of 'bouncing' or 'chaining' routers makes sense in today's world, where remote resources guess a user's affiliation based on IP addresses (and, thus, might continue first to send the visiting researcher to the local link router, rather than his home library's router, even if he enters databases from his home library's home page). Request 'chaining' might come into play if the researcher wants both to read a full-text article locally (in the institution he is visiting) and also store the reference in his home institution course collection or perhaps publish it to a weblog.
This 'personalised routing' model seems very conducive to imagining additional scenarios and information paths, with a variety of potential connections between information and services in any of these systems. For instance, an instructor using a system like Scholar's Box to build collections for use in teaching could seamlessly route found items to colleagues, or to a reference library for later use in authoring research articles. The same instructor could also route information in the other direction, from a reference library or a colleague's weblog back into one or more teaching collections.
To build personalised link routers, two implementation paths are available which involve enhancements to existing systems. The first, and most straightforward, would involve layering personalised routing services onto existing resolvers. Enhancing link resolution rule engines to add new hooks in the different request phases and more flexible routing/bouncing/chaining should be feasible. Layering in user and group management services should also be feasible, especially if we leverage recent work such as the Open Knowledge Initiative  or Shibboleth  specifications, among others, for which enterprise-class implementations are available or under development.
A second implementation path might involve integration with MyLibrary and UPortal-type systems, which are 'personalisation' services by definition. Interesting questions along this path might include how close a binding might be necessary between personalised link routers and portals. Should all the personalisation happen in a portal, with routers just serving as rule engines for service resolution? Or should portal systems just be tuned to be well-behaved sources and targets, with personalised routing functions living in the routers? It is easy to imagine different institutions with different I.T. administration models preferring a design that allows different pieces to live under different management branches.
In considering either of these implementation models, issues of distributed storage, security, portability and descriptive models quickly come to the fore. Fortunately, significant progress is being made on each of these issues, and it seems possible that modular solutions might be ready for integration very soon. Indeed, modular separation of services and easy integration was a core design principle largely responsible for the success of the link resolver paradigm. Ideally, it should also be possible to integrate next-generation groupware services with external non-library toolkits (with 'non-library' meaning 'blogosphere and otherwise from the general Internet community'). As highlighted earlier in this article, many of these technical innovations occur far away from libraries. Supporting users who define and implement their own processing models reinforces the pattern of increasingly distributed collection development.
When faced with the difficulties involved in integrating link resolvers, federated search engines, courseware servers, and other contemporary systems, the library community has solved one set of problems related to collection development and navigation. At the same time, we have amplified the integration problem by investing in our own incompatible resources with their own administration and navigation nightmares. If we can be successful in delivering a second-generation user front-end to these disparate services and resources that successfully integrates with how users move through and manage information in 2004, we will have taken a significant step toward a vision of integrated groupware. The idea of 'library groupware' suggests a change in library services and philosophy, but helping users manage information across their diverse personal collections and information communities remains true to the core mission of libraries.
- The canonical overview of the general link resolver model is Van de Sompel H., Hochstenbach, P., "Reference Linking in a Hybrid Library Environment: Part 1: Frameworks for Linking", D-Lib Magazine 5(4), April 1999, http://dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt1.html
- An excellent overview of the social shift toward "information consumption" can be found in 2003 OCLC Environmental Scan: Pattern Recognition, available online at http://www.oclc.org/membership/escan/
- More information on Scholar's Box can be found at http://iu.berkeley.edu/IU/SB and http://raymondyee.net/wiki/ScholarsBox
- Glenn D., "Scholars Who Blog: The soapbox of the digital age draws a crowd of academics." The Chronicle of Higher Education 49(39), June 2003, available online at http://chronicle.com/free/v49/i39/39a01401.htm
- Trott M., Trott B., "A Beginner's Guide to TrackBack." Available online at http://www.movabletype.org/trackback/beginners/
- Blogdex, available online at http://blogdex.net/
- "Top 100 Technorati," available online at http://technorati.com/cosmos/top100.html
- Delicious, available online at http://del.icio.us/
- Furl, available online at http://furl.net/
- unalog, available online at http://unalog.org/
- Biologging, available online at http://www.biologging.com/
- Nackerud S., "Post Database Citations in Your Blog!" http://blog.lib.umn.edu/archives/000477.html
- Open Knowledge Initiative specifications, available online at http://web.mit.edu/oki/specs/
- Shibboleth, available online at http://shibboleth.internet2.edu/