Web Magazine for Information Professionals

The Networked Library Service Layer: Sharing Data for More Effective Management and Cooperation

Janifer Gatenby identifies criteria for determining which data in various library systems could be more beneficially shared and managed at a network level.

Libraries’ collections fall into three parts: physical, digital and licensed. These are managed by multiple systems, ILS (Integrated Library System), ERM (Electronic Records Management), digital management, digital repositories, resolvers, inter-library loan and reference. At the same time libraries are increasingly co-operating in collecting and storing resources. This article examines how to identify data that is best located at global, collective and local levels. An example is explored, namely the benefits of moving data from different local systems to the network level to manage acquisition of the total collection as a whole and in combination with consortia members. Also covered is how to achieve rapid development of standards to plug existing gaps that are hindering system interoperability.

Evolution of Library Management Systems

The integrated library management system (ILMS or just ILS) was conceived in the 1970s when library collections were purely physical and library users visited the libraries to consult and borrow the materials. The ILS was conceived as an ideal state where all systems within the library would share common data, centered on bibliographic metadata describing the collection. It was the successful solution to duplication in separate cataloguing, acquisitions and circulation systems and, as bonus by-products, it made possible management statistics and the online public access catalogue (OPAC)[1].

The library’s ILS operated as an independent and isolated system even within the library’s own institution. Interaction with external libraries and systems was limited. Union catalogues existed for co-operative cataloguing but the records were downloaded to the library’s ILS and public discovery was mainly at the local level, supplemented by abstract and index databases (A&I). The union catalogues were also the vehicle for inter-library loans which was usually seen as a supplementary service for privileged users. Purchase orders were sent electronically to library suppliers.

The central importance of the ILS remained relatively unchanged until the turn of the twentieth century but the ILS evolved to interface with self-service and theft-detection systems and institutional authentication systems. Portals came into favour with the capacity to present users with one interface to search multiple targets, typically the local library, A&I databases and the databases of other libraries. However, most significant changes have arisen in the period roughly since 1998 with Internet, the World Wide Web and the growth of digital publishing eliciting new, more powerful portals, systems that manage electronic licences, digital repositories, physical inter-library loans, resolvers that link from citations to full-text and question-and-answer services.

Thus, somewhere in the period 1998-2008, the ILS no longer merited its capital “I”: it no longer integrated a library’s processes in one system. Large libraries now have multiple systems, which are often better coupled with external data sources than with their peer systems. Thus the original problem of the 1970s has re-manifested itself. Andrew Pace [2] put it this way: “We end up with a non-integrated or disintegrating suite of services, databases, lists and content … which are a huge mess”. Lorcan Dempsey [3] separates the library collection into three; the bought collection (managed by the ILS), the licensed collection (managed by the electronic resource management system ERM and the resolver) and the digital collection (managed sometimes by multiple systems, e.g. one for digitised materials and one for the institution repository). These multiple systems are increasingly expensive and the ensemble is clumsy to maintain.

Current Library Priorities

As libraries respond to the challenges of a 3-part collection and a user community with expectations of national and increasingly global discovery and delivery, they are forming into larger co-operatives. Consortia are forming making virtual shared physical collections, treating all user requests equally, allowing unmediated requests. Examples include the PiCarta [4] service of the Dutch union catalogue, GBV [5] and Libraries Australia’s [6] copies direct. The consortia may operate via union catalogues, such as WorldCat [7], TEL [8], Libraries Australia, GBV, Sudoc [9] or via virtual union catalogues. Co-operative stores are also emerging and now in the U.S., 21% of library stores are for collectives [10][11]. Consortia are also sharing digital resources and rationalising digital subscriptions, though as Müller warns [12], licensing too often poses barriers to inter-library access. Examples of digitisation co-operatives are numerous, including The European Digital Library (9 national libraries) [13], Louisiana Digital Library (19 libraries) [14], Memòria Digital de Catalunya (17 libraries)[15] and the Arizona Memory Project [16].

With physical collections, close library co-operation was geographically bound, often constrained by the limits of an internal courier service. This is not so anymore: libraries can make alliances to share electronically with other libraries anywhere on the globe with similar or complementary collections. Thus international co-operation has moved from the sidelines to centre stage. This has resulted in a phenomenal growth of WorldCat [7] in 2006 and 2007 with the loading of large union catalogues from Europe, Australasia and Africa and the former RLG [17] database.

In recognition of user preferences for digital material that can be delivered immediately and remotely, libraries are now realising that they must shift their focus from the physical to the digital collection [18] by allocating more of the budget to digital collection and services. To this end they are now increasingly collecting digital materials, buying licence access to digital materials and creating their own digital content. Their main library management systems based on architecture conceived in the 1970s for entirely physical collections are not well adapted to the collection, management, exposure and delivery of digital material.

Matching Systems to Priorities

The architecture of the ILS is a product of its time when storage was expensive, communications were slow, narrow-band and less robust, where the World-Wide Web was non-existent and terms such as social networking, mashups and syndication had totally different meanings. The ILS was designed to ingest data (‘into the building’) but not to emit it, nor to access external data. Yet now, from the standpoint of libraries, linking externally is imperative and the nature of the links is diverse; links to other libraries, services, search engines, social network sites, archives, discovery sites, encyclopaedias, virtual learning environments (VLEs), e-commerce systems, distant learning systems etc.

It is time to re-examine the architecture of the ILS for smoother integration into the current environment. The Digital Library Federation [19] has taken the initiative via two groups, the ILS Discovery Interface Group and the Electronic Resource Management Initiative, both producing recommendations for interoperability of ILS and ERM systems. These recommendations seek to make an immediate improvement in data availability and reusability. At the same time it is also necessary to take a longer-term view by re-considering the data within all current library systems, including those that grew up alongside the ILS, as the ILS itself changes.

Optimum Data Storage

One way to re-examine the architecture of the ILS and other library systems is to look at the data that are held and assess the optimum level of storage for those data. Are the data sharable, poolable or private? The following characteristics indicate that there are benefits in sharing or aggregating data:

Data that are not sharable include:

In the case of dynamic and sensitive data, summary, snapshot and anonymised data may be shared where there is a perceived benefit.

There is a clear need to define data at multiple levels:

The following tables illustrate the various categories of library data that are managed by various local and network systems. The columns indicate various data types and data sources. The two rows underneath the columns represent enquiry and maintenance functions performed on the data.

Core data Mined data Other harvested data Articles Social data Links
* direct members
* union catalogues
* QA dept
* Dewey
* QP history
* audience level
* holdings count work + levels
* collections
* identities
* fiction finder
* TOC
* reviews
* biographies
* publisher data
* circulation data
* tags
* reviews
* lists
* work clusters
* external links from URIs
* external links via search
add online, add in background, upload / harvest, modify / enhance, download, feeds
enquire online / enquire by programme

Table 1: Bibliographic and Authority Data supporting Search, Present, Select, Evaluate, Annotate, Download

Discovery operates increasingly at a network level. The ILS is designed to be the first point of discovery but that is not the first choice of users. OCLC’s report on The perceptions of libraries and information resources [20] indicates that only 2% use a library portal as the first point of call whereas 84% start in a search engine. These statistics have been largely digested by libraries and as a result they are increasingly exposing their collections in search engines, either directly or via worlcat.org [21]. Shunning the OPAC of ILS systems, some libraries have adopted independent portals such as Encore [22], Primo [23], Endeca ProFind [24] and WorldCat Local, a localised view of worldcat.org [7]. However the path from discovery to delivery is still far from smooth for the user. One need is to position the ILS at a secondary point in the discovery-to-delivery sequence; that is, to make locational, availability and statistical usage information available to external discovery and delivery systems.

Bibliographic
Holdings data

Knowledge Base

* Dbase Sources
* article issue links
* serial patterns
Library Registry

* services
* policies
* resolvers
* OPACs
* addresses

Copyright Registry

Registry of digital masters
User data

* addresses
* courses
* privileges
* preferences
* payments
* circulation
add online, add in background, upload / harvest, modify / enhance, download, feeds
enquire online / enquire by programme / access

Table 2: Locate and Deliver; User circulation

Most of these data files reside at the local level but are candidates for moving to the network level, particularly registry and knowledge base information. User data and circulation are exceptions, but external access should be possible. Circulation log information stripped of borrower identification may be loaded centrally and combined to indicate popularity of resources. The Danish Danbib [25] and Slovenian COBISS [26] systems collect loan statistics from their nation’s local library systems and combine them with the union catalogue to provide services such as the Slovenian “best read books”.

Suppliers

* addresses
* contacts
* contracts
* performance
* parameters & rules
Holdings

* status
* usage history
* issue level holdings
Subscriptions

* physical / online
* aggregator
* duration
* conditions
* packages
Copyright
Registry


Registry of digital masters
Orders

* collection policies
* selections
* orders
* invoices
* funds & budgets
add online, add in background, upload / harvest, modify / enhance, download, feeds
enquire online / enquire by programme / access

Table 3: Acquisitions, Collection and Resource Management

Collection Acquisition and Management at the Network Level

As library collections are increasingly shared, there may be significant advantages (in terms of both cost and efficiency) in moving more acquisitions and licensing data and processes to the network level where they can be shared among the ILS, ERM and repositories and with other libraries. Moreover, libraries are finding their ILS acquisitions modules inadequate for managing the acquisition of the newer parts of whole collections. There is already a clear need for the acquisitions of the three parts of the collection to be managed as a whole; moving data to the network, thereby enabling shared network services, is one solution.

Storage and budgetary demands are pressing libraries to collect, digitise, store and preserve collectively. Network level data facilitate co-operative selection and collection building. Moreover, at the network level, it is easier to enhance the data pool with more evaluative content and to provide seamless links with user discovery and requesting services, and with user reference services. Moreover, the data available to the user are also enriched as the collection strengths of libraries in relation to other libraries become more explicit.

At first sight, the supplier file (including providers, vendors and licensors [27]) is an obvious candidate for network level data. Typically, ILS systems have discrete files including suppliers’ names, physical and electronic addresses, contacts and other such information that are manually keyed and maintained in each instance of an ILS and ERM.

Exploring this further, there may be advantages in pooling and making available information on materials reviewed (and review comments), selected (and the budget to which the resource was attributed), rejected (and reasons), and re-located to storage (with reasons, e.g. low recent circulation). Selections could be linked to the reviews that inspired them or could be flagged as a user’s direct request, possibly with a link to a reference enquiry. To assist selection, centralised metadata could be enriched by data that are mined, loaded or linked, indicating such things as in-print status, copyright status, sales and circulation statistics. OCLC’s WorldCat Selection service [28] is a start in this direction. It groups suggestions provided by a growing number of suppliers, then downloads consolidated selections to the library’s ILS for completion of the order process. This circumvents each library needing either to pre-load the suggestions or to visit each supplier’s Web site individually; thus it gives some efficiencies, but more could be achieved by better ILS integration and by centralised access to non-financial and non-sensitive information. A complementary service is OCLC’s Collection Analysis service [29] which allows the comparison of collections and as such can serve as a basic tool underlying co-operative collections.

Another area where ILS data have not been shared but could be is in the area of serials prediction. It is necessary to predict the date of appearance of the next issue of a serial in order to know when to claim non-receipt. Failure to claim in time results in gaps that frequently cannot be closed for serials with limited print runs. Complex patterns are recorded for each serial, often far more complex than represented in MARC21, that are used by ILS systems to predict both the citation (enumeration and chronology) and appearance date of the next issue of a serial, including indexes, supplements, tables of contents and other special issues. The algorithms to do this are very complicated and remain only successful up to a point. All of which represents an obvious candidate for a Web service; one that could be much smarter if it knew the latest issues received by a large number of libraries. If serial check-in data could be recorded at a network level, extending network holdings to the issue level, the amount of guess-work in serial claims could be significantly reduced. It would be better still if the issue could be linked (directly or via a resolver) to a table of contents and available online article content, if applicable. Andrew Pace [30] proposes a more radical approach to serials management. Electronic serials do not require serials prediction, and where serials are released in both print and electronic formats, claims for late or missing physical issues can be based on the existence or absence of the electronic issue. Thus the art of serial prediction becomes redundant for a large part of most libraries’ serial collections.

Full Text

* articles
* datasets
* digitised collections
* locations
Full Text Links

* union catalogue
* authority files
* biographies
Full Text Rights Data

* owners
Preservation Data
add online, add in background, upload / harvest, modify / enhance, download, feeds
enquire online / enquire by programme / access

Table 4: Digital Data

The optimal place to store and maintain data could be local, regional, national or global, depending on the nature of the data and infrastructures available. And data may be maintained at one location then stored or replicated in another, or summaries of local data may be made available to a networked environment. At first glance, dynamic data such as physical circulation and private information, e.g. financial, should be at the local level; though historic summaries and statistics could be made available to the network level. In a similar vein, acquisitions information underlying the management and growth of the collective collection, with the exception of financial data, needs to be managed at a network level, possibly regional, national or thematic. These collections may be exposed to multiple Web sites; a global stage where they are more easily discovered and accessed.

As the data are stored at various levels, systems need to adapt to address the data wherever they are located.

Need for Standardisation

Moving data to the network level will help to disentangle the ILS, ERM, resolver, digital management, digital repository and reference systems and make their data accessible to all systems, where the network system bears the brunt of interoperability. There is the potential for the data to be of higher value in a well- managed network environment. Disentangling the data, however, necessitates a standards layer that does not currently exist. NISO [31] started a Web services initiative, VIEWS [32] but this has lain dormant since 2004. The fastest way to achieve this layer, and possibly the best way, is to encourage adoption by using existing extensible standards.

Arguably, all interoperation can be modelled as either enquiry or as maintenance (additions, updates, deletions). Even transactions within protocols such as interlibrary loans can be divided into notifications of action taken or request for action and these can be conveyed as changed data fields. Thus, if a data schema can be developed and agreed for each class of data to interoperate, the brunt of the standardisation will have been accomplished. The same schemas can be used for enquiry and maintenance. This has the potential to reduce standards efforts significantly and to achieve interoperability as a reality in a more timely way.

There are several standards offering enquiry, in particular Z39.50 [33], OpenSearch [34] and SRU [35]. In addition NCIP [36] and OpenURL [37] provide information on single records. SRU is arguably the most suitable standard to consider, as it is easily extensible and has the best architecture capable of handling result sets and their manipulation [38]. It includes metadata about the result set including record count and result set position. SRU extensions allow the definition of different search context sets (access points or indexes) and record schemas.

Several SRU elements needing definition can already been identified. These include:

For each distinct dataset a search context set and a response data schema will meet the requirements. The process for registering and achieving consensus is considerably easier than for a full standard and can incorporate a trial use period.

Similarly for data addition and maintenance there are existing standards that can be extended to encompass new data schemas. Such standards include OAI-PMH [23], Atom syndication format [40] and RSS [41], so-called PULL mechanisms where a database makes the data available for external systems to harvest and does not monitor the subsequent use of the data. The widely employed FTP [42] is either a PUSH or a PULL mechanism. SRU record update [43] and the Atom Publishing Protocol [44] are PUSH mechanisms designed for a system to update another in real time or background as if it were an online client. SRU record update allows for the exchange of diagnostics and linking identifiers. All these standards support multiple data schemas.

Conclusion

Moving appropriate data to a network level data with basic enquiry and update Web services is a first step in re-engineering library systems. In fact it is not only the ILS that needs re-engineering, but also the newer solutions that, like the ILS, have created silos of data, often locked inside proprietary systems and databases. It is important for libraries to own and control their data resources; to be free to share them, provide access to them and to expose the data. It is less important that the libraries own or run the software that manipulates and manages the data.

References

  1. OPAC: Online Public Access Catalogue. Typically a module supplied with an ILS system.
  2. Pace, Andrew K. Private discussion, 2008.
  3. Dempsey, Lorcan (2007) The network reconfigures the library systems environment. Lorcan Dempsey’s weblog, 6 July 2007. http://orweblog.oclc.org/archives/001379.html
  4. PiCarta: Online database comprising the Dutch Union catalogue and article content http://www.oclc.org/nl/nl/picarta/
  5. GBV: Web portal of the GBV Common Library Network of the German States Bremen, Hamburg, Mecklenburg-Vorpommern, Niedersachsen, Sachsen-Anhalt, Schleswig-Holstein, Thüringen and the Foundation of Prussian Cultural Heritage http://www.gbv.de/vgm/
  6. Libraries Australia: Public interface of the national union catalogue of Australia. http://librariesaustralia.nla.gov.au/apps/kss/
  7. WorldCat: Union catalogue of global dimensions managed by OCLC http://www.worldcat.org/
  8. TEL: The European Library. Catalogue of 44 European national libraries. http://www.theeuropeanlibrary.org/portal/
  9. Sudoc: Système universitaire de documentation (Sudoc) http://www.sudoc.abes.fr/
  10. Payne, Lizanne, Library storage facilities and the future of print collections in North America. 2007.
  11. Shared print collections program, OCLC Programs and Research, 2007. http://www.oclc.org/programs/ourwork/collectivecoll/sharedprint/default.htm
    OCLC is a non-profit, membership, computer library service and research organisation http://www.oclc.org/about/
  12. Müller, Harald, Rights and distribution: legal problems of document delivery by libraries. Keynote paper IFLA ILDS Singapore, October 2007. http://www.nlbconference.com/ilds/speakers-muller.htm
  13. European Digital Library Project: A Targeted Project funded by the European Commission under the eContentplus Programme and coordinated by the German National Library http://www.edlproject.eu/
  14. Louisiana Digital Library http://louisdl.louislibraries.org/
  15. Memòria Digital de Catalunya http://www.cbuc.cat/mdc/
  16. Arizona Memory Project http://azmemory.lib.az.us/
  17. RLG: Research Libraries Group. Now a part of OCLC. http://www.oclc.org/community/rlg/
  18. National and State Libraries Australasia. The big bang: creating the new library universe, 2007 http://www.nsla.org.au/publications/papers/2007/pdf/NSLA.Discussion-Paper-20070629-The.Big.Bang..creating.the.new.library.universe.pdf
  19. Digital Library Federation Electronic Resource Management Initiative Phase 2 White paper on interoperability between acquisitions modules of ILS and ERM systems, 2008
    http://www.diglib.org/standards/ERMI_Interop_Report_20080108.pdf
    Also: Draft recommendations, Digital Library Federation ILS Discovery Interface Group, 2008
    http://project.library.upenn.edu/confluence/download/attachments/5963787/ILS-DI-Snapshot-2008-Feb15.doc
  20. Perceptions of libraries and information resources, OCLC, 2005 http://www.oclc.org/reports/2005perceptions.htm
  21. Accessing library material through Google and other Web sites. Gatenby, Janifer. Paper for ELAG (European Library Automation Group), May 2007, Barcelona, Spain. http://elag2007.upf.edu/papers/gatenby_2.pdf
  22. Encore: Library portal system provided by Innovative Interfaces http://www.encoreforlibraries.com/
  23. OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting http://www.openarchives.org/OAI/openarchivesprotocol.html
  24. Endeca Profind: Enterprise providing an Information Access Platform http://endeca.com/
  25. Danbib: National union catalogue of Denmark http://www.dbc.dk/top/top_danbib_eng.html
  26. COBISS: National union catalogue of Slovenia http://www.cobiss.net/cobiss_platform.htm
  27. These files do not typically include publisher data which are buried within bibliographic description in a non-normalised form and thus difficult to parse and reuse.
  28. WorldCat Selection: A centralised selection service managed by OCLC http://www.oclc.org/selection/
  29. Collection Analysis: Service provided by OCLC to allow libraries to analyse their collections in relation to other libraries http://www.oclc.org/collectionanalysis/
  30. Pace, Andrew K. Electronic Resource Management: Homegrown perspective 2005 http://www.lib.ncsu.edu/e-matrix/presentations.html
  31. NISO: National Information Standards Organization (USA) http://www.niso.org/
  32. Views: Vendor Initiative for enabling web services.Hosted by NISO. http://www.niso.org/committees/VIEWS/VIEWS-info.html
  33. Z39.50: Information retrieval protocol http://www.loc.gov/z3950/agency/
  34. OpenSearch: A collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation developed by A9, an Amazon subsidiary http://www.opensearch.org/Home
  35. SRU: Search and Retrieve via URL. Search mechanism hosted by the Library of Congress http://www.loc.gov/standards/sru/
  36. NCIP: Z39.83 Circulation Interchange Protocol managed by NISO. http://www.niso.org/standards/standard_detail.cfm?std_id=728
  37. OpenURL: Z39.88 - 2004 The OpenURL Framework for Context-Sensitive Services http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListSets
  38. SRU updates Z39.50. NCIP does not handle a result set; OpenSearch does not have standard searches and can be viewed as a subset of SRU; and OpenURL is not intended to be a search mechanism.
  39. ISO Holdings schema ISO 20775. Information and Documentation: Schema for Holdings Information http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=39735
  40. Atom syndication format http://www.ietf.org/rfc/rfc4287.txt
  41. RSS: formally “RDF Site Summary”, (known colloquially as “Really Simple Syndication”) is a family of Web feed formats used to publish frequently updated content http://www.rssboard.org/rss-specification
  42. FTP: File transfer protocol: IETF RFC 959 http://tools.ietf.org/html/rfc959
  43. SRU record update: Update mechanism developed by the SRU community, hosted by the Library of Congress http://www.loc.gov/standards/sru/record-update/
  44. Atom Publishing Protocol http://www.ietf.org/rfc/rfc5023.txt

Author Details

Janifer Gatenby
Research Integration and Standards
OCLC

Email: janifer.gatenby@oclc.org
Web site: http://www.oclc.org/

Return to top