Skip to Content

Motivations for the Development of a Web Resource Synchronisation Framework

Printer-friendly versionPrinter-friendly versionSend to friendSend to friend

Stuart Lewis, Richard Jones and Simeon Warner explain some of the motivations behind the development of the ResourceSync Framework.

This article describes the motivations behind the development of the ResourceSync Framework. The Framework addresses the need to synchronise resources between Web sites.  Resources cover a wide spectrum of types, such as metadata, digital objects, Web pages, or data files.  There are many scenarios in which the ability to perform some form of synchronisation is required. Examples include aggregators such as Europeana that want to harvest and aggregate collections of resources, or preservation services that wish to archive Web sites as they change.  Some of these use cases are described in this article, along with concrete examples of them in practice, together with issues that they may pose to the Framework as it is developed.

Background

Interoperability lies at the heart of the Internet.  Without protocols, standards, and frameworks, the Internet would not and could not exist.  These standards are used to determine how clients and servers communicate with each other, how resources are transferred, and how they are described in order to support their use.  Each standard has been developed in response to a need for some form of resource handling or description.

Within the library, repository, and general Web resource world, there are a number of protocols and frameworks used to work with resources.  They include well-known standards such as:

  • Transfer:
    • HTTP [1]: Hyper Text Transfer Protocol
    • FTP [2]: File Transfer Protocol
    • SCP [3]: Secure Copy
  • Metadata Harvesting:
    • OAI-PMH [4]: Open Archives Initiative Protocol for Metadata Harvesting
  • Search:
    • OpenSearch [5]: Search results in a syndication format
    • Z39.50 [6]: Searching remote library databases
    • SRU/SRW [7]: Search and Retrieve via URL or Web Service
  • Discovery:
    • RSS [8]: Rich Site Summary (often known as Really Simple Syndication) feed format
    • Atom [9]: Syndication feed format
  • Deposit:
    • SWORD [10]: Simple Web-service Offering Repository Deposit

One requirement that spans several of these areas, yet is a distinct problem in its own right, is that of resource synchronisation. We define resource synchronisation as the need to keep two or more collections of resources synchronised, or ‘in sync’: that is, additions, updates, and deletions from one collection are reflected in the other, so that, subject to any delay in synchronisation, they contain identical resources.  

There are variations on this definition related to factors such as whether the whole or part of the collection needs to be synchronised, the number of sources or destinations that are kept synchronised, or the latency between changes occurring in the source collection and being reflected in the destination collection. The process of defining these differences allows the gathering of requirements for a generalised Web resource synchronisation framework.

Terminology

The use cases in this article are described using a number of conventions that are described below:

  • Source: When two collections are being kept in sync, the source is the collection that is being copied.
  • Destination: The destination is where the source is copied into.
  • Synchronisation: The process of keeping the destination in sync with the source such that the destination has accurate copies of the resources at the source.
  • Mode of deployment: The fashion in which the combination of source and destination are configured.  For example this could be a typical pattern of one source syncing to one destination, many sources to a single destination, or one source to many destinations.
  • Push and Pull: When a resource is to be synchronised, there are two ways that the transfer can be initiated: the first is the source can push the content to the destination, and the second is that the destination can pull it from the source.

ResourceSync Background

This article defines a series of use cases for Web resource synchronisation. They are being used by the NISO and OAI ‘ResourceSync’ project. The project, funded by the Alfred P. Sloan Foundation and JISC, aims to research, develop, prototype, test, and deploy mechanisms for the large-scale synchronisation of Web resources.

These use cases provide both the motivation for the development of a Web resource synchronisation framework, and the requirements that it must fulfil. When developing anything new, be that a product, service, or framework, use cases [11] provide an easy method to think about ways in which that new development will be used.  When developing use cases, actors (the bodies involved) and actions are put together to describe the different behaviours or goals that must be supported.  The use cases below show different purposes for a Web synchronisation framework, and therefore help to define the requirements it must fulfil. When evaluating the success of the new framework, it can be judged by whether it can fulfil all of the identified use cases.

Dimensions

There are a number of dimensions that are useful in parameterising the use cases.  These dimensions vary between each use case, and the combination of the dimensions makes each use case unique in the requirements-gathering exercise for ResourceSync:

  • Number of items: How many items there are to synchronise?
  • Rate of change of items: Do the items change (and therefore require syncing) relatively frequently or infrequently?
  • Types of change: What is the type of changes, for example the addition of new items, the modification of existing items, or deletions of old items?
  • Resource size(s): Are the resources large, medium, or small?
  • Access restrictions (to resource and/or change communications): Are there requirements to keep the resources protected by a username and password, or a similar method?
  • Network type (local, open Web, etc): Are the source and destination a local private network and therefore close to each other with fast transfer speeds, or at geographically distant locations on the Internet connected at varying connection speeds?
  • Data formats: What type of files or resources are being transferred?
  • Transfer protocols: What protocols are used to transfer the resources from the source to the destination?

Use Cases

16 use cases have been defined that are guiding the development of the ResourceSync Framework.  Each one is briefly described and is accompanied by a diagram showing the source of the resources, the destination of the synchronisation action, and a pictorial expression of the resource(s) being synchronised.  The diagrams were drawn during the first project technical meeting in Washington, DC, on 14 June 2012.

In order to explain the use cases, each has a description of how that case is unique, how it is typically deployed, some concrete examples of systems that need this requirement which they are already providing via an alternative method, and, if relevant, a list of issues that need to be considered when defining functionality to fulfil this particular use case.

1. One-to-one Sync between Two Systems

Two paired systems - a Source and a Destination - that are collections of resources. The Destination must be kept up to date with the Source.

Figure 1: One-to-one sync between two systems

Figure 1: One-to-one sync between two systems

Features of Use Case

This is the most basic synchronisation use case, and deals with the all the essential features of ResourceSync: initial synchronisation, incremental update, and baseline audit to ensure the collections are indeed synchronised.

Mode of Deployment

One Source, one Destination.  It is possible that the Source and Destination are formally aware of each other, but this is not always necessary.

Concrete Examples

arXiv.org [12] mirrors.  The arXiv repository service hosted at Cornell University is mirrored to a number of geographically dispersed sites.  This provides both data redundancy (for disaster-recovery purposes) and speed of access via local mirrors.

2. Aggregator (Many-to-one)

A single Destination system that is synchronising from multiple Sources for the purposes of building an aggregation of their resources.

Figure 2: Aggregator (many-to-one)

Figure 2: Aggregator (many-to-one)

Features of Use Case

In this case, a single system is attempting to represent the content from multiple systems, possibly in some kind of union catalogue (eg for cross-searching).

Mode of Deployment

Multiple Sources, one Destination.  The Sources may not be formally aware that the Destination is synchronising their content if they are offering their content on the open Web for harvesting.

Concrete Examples

  • OAISter [13]
  • Europeana [14]

OAISter and Europeana are aggregators of metadata and content.  They harvest them from many sources using public interfaces, and then offer the aggregated resources via their own search service.

Issues

Resources that are duplicated across the Source systems (not necessarily within one Source) may result in duplicates in the Destination.

3. Master Copy (One-to-many)

Many Destinations synchronise from a single Source. This is considered to be the most likely scenario for ResourceSync usage from the perspective of a Source.

Figure 3: Master copy (one-to-many)

Figure 3: Master copy (one-to-many)

Features of Use Case

There is a single Source that is providing resources to multiple Destinations, which will, therefore, be either mirrors or partial mirrors of that Source.

Mode of Deployment

One Source, multiple Destinations. The Source is unlikely to have any agreement with or be formally aware of all the Destinations.

Concrete Examples

Many traditional institutional repositories are harvested in this way.  They offer their open content and metadata for harvesting and reuse.

4. Selective Synchronisation

The Source wishes to supply and/or advertise sub-sets of its full set of resources, to allow Destinations to synchronise with one or more of those sub-sets.  If the Destination wishes to synchronise selectively from the Source, the criteria for selection is that provided by the Source.

Figure 4: Selective synchronisation

Figure 4: Selective synchronisation

Features of Use Case

It indicates that the Source is not just a large aggregation of resources, but that each of those resources may have properties or belong to sets or collections about which the Destination may be interested in knowing, prior to any synchronisation attempt.

Mode of Deployment

One Source, advertising metadata to be used for selection, and any number of Destinations, each of which may wish to synchronise a different sub-set of the Source’s resources.

Concrete Examples

To provide similar or equivalent functionality to OAI-PMH Set [15]. For example DSpace provides its content in ‘Collections’, which are logical divisions in the overall repository content.

Issues

There are two ways that this could be presented to the Destination:

  1. As metadata associated with the full aggregation, so that filtering of resources can be performed at the Destination
  2. As an API, so that the Destination can request the sub-sets it requires from the Source

An alternative approach would be to avoid support with a Source but instead present multiple Sources, one for each collection to be exposed. In this way a Destination would select the appropriate Sources to synchronise with (which could have overlapping content).

5. XML Metadata Harvesting / Identification

The set of resources in the Source may include some that are metadata records for other resources in that Source.  The Destination may want to only synchronise the metadata records, but may also want the option in future to trace back to the other resource(s) that the metadata record describes.

Figure 5: XML Metadata Harvesting/Identification

Figure 5: XML Metadata Harvesting/Identification

Features of Use Case

It indicates that the Destination may care about what kind of resource a given resource is, and which other resources it is related to.

Mode of Deployment

One Source with a mixture of metadata records and other resources, and any number of Destinations interested in synchronising metadata only.

Concrete Examples

OAI-PMH is used to synchronise collections of XML datastreams that are identified by an item identifier, a metadataPrefix [16], and a datestamp.

6. Statistics Collection

A Service wishes to listen to change notifications from a Source, and keep a record of the changes that have taken place (perhaps including types of change, frequency, etc), and to make available statistics regarding the Source.  The Service does not, at any point, synchronise the resources itself.

Figure 6: Statistics collection

Figure 6: Statistics collection

Features of Use Case

In this use case we are not so much interested in the change, as in the fact that a change has occurred.  This is a very simple use case, as it does not require any content synchronisation.

Mode of Deployment

One Source providing change communication with sufficient metadata for the one Service listening to create the desired statistics.

Concrete Examples

Sites such as the Registry of Open Access Repositories (ROAR) [17] and the Directory of Open Access Repositories (OpenDOAR) [18] are already using OAI-PMH for similar purposes (eg providing resource counts).

7. Large Data Files

In some environments the resources to be synchronised may be very large (in the order of many gigabytes for research data systems).  Due to the load this may place on the consumer, and limits imposed by transfer protocols or file systems, they may require some specific information about the size of the resource in advance of the synchronisation taking place, or the option to synchronise only part of the resource.

Figure 7: Large data files

Figure 7: Large data files

Features of Use Case

It deals not so much with the need for synchronisation as for the need to make consumers aware of what the implications of the synchronisation action will be, and/or to offer them appropriate synchronisation options (such as partial synchronisation of changed content using a tool such as diff).

Mode of Deployment

One Source and one Destination exchanging large data files.  The Source may need to provide an indication of the size of the resource, any available retrieval/diff protocols, whether it is an interesting change (from a Destination perspective), when it was last modified, and fixity information.  It is likely that to use more specialist retrieval/diff protocols will mean that the Source and the Destination will need to be formally aware of each other.

Concrete Examples

Research Data Management can require the movement of large files or packages of files over the network asynchronously from the usage or production of the data. The DataFlow [19] Project at the University of Oxford is transferring zipped research data between a client (DataStage) and server (DataBank) environment using SWORDv2.

Issues

We separate out the notion of providing hooks for the efficient update of large data files from the transfer methods themselves. There are various complexities around providing alternative synchronisation options which are Out of Scope.

8. Protected Resources / Access Control

Some information systems keep their sensitive content hidden for a number of reasons.  However some require, for example, a separate public user interface to publish those materials that can be public.  The key to this use case is that the synchronisation is carried out over an authorised trust boundary.

Figure 8: Protected resources/access control

Figure 8: Protected resources/access control

Features of Use Case

The publisher of changes is likely to be a private system, requiring a trusted or protected relationship with the synchronising downstream system.

Mode of Deployment

One Source and one Destination, where there is likely to be a trusted one-to-one relationship between them.  The synchronisation will need to be able to expose resources which are not publicly accessible.

Concrete Examples

CRIS (Current Research Information Systems) often manage and store information about the research outputs of an institution.  Some of these may be surfaced through a repository or research portal.  Whilst they often employ PUSH technologies, a PULL technology allowing synchronisation could be an alternative.

Legal deposit of digital published content (to a national library repository) would require a framework where only trusted Destinations could harvest the content.

Issues

Successful interaction in the context of Web trust and security mechanisms is in scope. Development of additional ResourceSync mechanisms is Out of Scope.

A resource on the Source has changed its identifier or has moved from its original URI to a new one.  The Destination does not need to re-sync, but it may need to update its references.

Figure 9: Link maintenance

Figure 9: Link maintenance

Features of Use Case

No synchronisation of the physical resource needs to take place, but the Destination needs to be aware that a change has taken place, and to be able to update its references.

Mode of Deployment

One Source which has moved some of its resources, and a Destination which has previously synchronised resources from the Source.

Concrete Examples

Any Web-based system holding resources that moves resource identifiers internally.

Issues

Is there a difference between a resource ‘move’ and a resource ‘delete followed by a (re-)create’?  Should we treat a move as a combination of a related delete- and-create?

10. Migration / One-off Sync

Sometimes a system will want to migrate all of its data to a newer environment as a one-off operation, prior to shutdown of the legacy system.  The use of a protocol like ResourceSync would be to alert the new system of the resources that it needs to import and then to provide the resources.

Figure 10: Migration / one-off sync

Figure 10: Migration / one-off sync

Features of Use Case

It is a one-off operation, and also may need to guarantee a prompt response from any service that is synchronising the legacy data, to ensure that migration takes place in a timely way prior to the shutdown of the legacy system.

Mode of Deployment

One Source (the legacy system) and one Destination (the new system), which are formally aware of each other.

There are also likely specific rules that the new system would want to implement over the legacy system’s data to import them into a new structure, and it is unclear at this point whether there is a role for ResourceSync there or not.

Large-scale migration is usually down at least twice; first time for testing the new system, and a last and final full migration is done just before cutover. The original system during the final migration is either down or read-only. Parallel work on both the legacy system and the new would allow staged cutover.

Concrete Examples

Any system migration which needs to maintain legacy data will have this kind of requirement; examples are numerous and unbound.  A concrete example might be an institution moving from an EPrints [20] repository to a DSpace [21] repository.

Issues

Expecting systems outside the legacy/new system pairing (which might have synched with the now defunct legacy system) to understand that this operation has taken place is Out of Scope, unless a redirect at the protocol level is practical.

11. Service Description

A Destination has discovered a ResourceSync endpoint, and wants to know what the capacities/features/supported components of that endpoint are, as well as other relevant administrative information regarding the service.

Figure 11: Service description

Figure 11: Service description

Features of Use Case

This focuses on how a Destination learns about the Source’s features prior to engaging in any synchronisation activities.

Mode of Deployment

One Source providing information about its service, and one Destination determining which features exposed by the Source it can take advantage of.

Concrete Examples

A similar example is used in AtomPub [22] and SWORD that provide a Service Document that describes the capacities of the server.

Registries such as ROAR [17], OpenDOAR [18], could use such a description to populate their registry knowledge base.

Issues

None

12. Auto-discovery

A user or user agent is at a Web site and wishes to discover any ResourceSync endpoints on behalf of the Destination which will then use them.

Figure 12: Auto-discovery

Figure 12: Auto-discovery

Features of Use Case

This is about how the ResourceSync service provided by a Source is discoverable from its normal Web site representation.

Mode of Deployment

One Source, with a front-end or other interface that can direct the user or Destination server to the appropriate place to carry out synchronisation processes.

Concrete Examples

There are lots of examples of auto-discovery on the Web, including:

  • Host-Meta [23]
  • robots.txt [24]
  • sitemap [25]
  • .wellknown [26]

13. Discovery Layer / Directory

It will be necessary for potential consumers of content to be able to find sources to synchronise from (assuming that the relationship between the client and server is not by prior arrangement). This use case addresses the need to provide directories of potential Sources that support the ResourceSync protocol.

Figure 13: Discovery layer / directory

Figure 13: Discovery layer / directory

Features of Use Case

It is concerned not so much with the synchronisation of resources as with the discovery of Sources with which to be synchronised.

Mode of Deployment

Many Sources being discoverable by a Destination, possibly via some kind of aggregator or directory service.  The Source must present enough information to allow the construction of such directories.

Concrete Examples

In the domain of Open Access Repositories there are registries of systems (such as OpenDOAR [18], ROAR [17] and re3data [27]) that support the discovery of repositories, the kinds of content they hold, the API endpoints they have, and other information about the collection.

Issues

Building a Directory itself is Out of Scope for the project.

14. Pre- / Smart-caching

A Destination synchronises all or a subset of resources from a Source in order to provide a cached copy.

Figure 14: Pre- / Smart-caching

Figure 14: Pre- / Smart-caching

Features of Use Case

Only a subset of the “operational” copies of the resources need to be synchronised, and they are not being permanently synchronised, only for the purposes of speeding up delivery.

Mode of Deployment

One Source, providing the master copies of resources, and one Destination acting as a local cache of the resources in the Source.

Concrete Examples

Content Delivery Networks (CDN) [28] which provide a global network of local caches or mirrors for the efficient and fast transfer of content.

Issues

Usage statistics of the resources at the Source need to be accumulated from all cache destinations; however this is an issue for all cache use cases, not just for this project.

15. Cache Invalidation

An application that consumes data from one or more remote datasets uses cache that stores local copies of remote data. These caches need to be invalidated when the remote data are changed.  That is, locally cached content is marked as invalidated if the resource changes in the Source.

Figure 15: Cache invalidation

Figure 15: Cache invalidation

Features of Use Case

This uses the change communication as a trigger for local behaviour changes, rather than strictly for synchronisation (although synchronisation may ensue, it is not the primary consequence)

Mode of Deployment

One Source, providing the master copies of resources, and one Destination acting as a local cache of the resources in the Source.

Concrete Examples

DSNotify [29]

Issues

Notification (push) for low latency may be required.

Possible important notification types: Updated, Deleted, Expired

16. Linked Data Triple Synchronisation

The Source is or has a large triple-store, and the Destination does not want to synchronise the entire dataset whenever a triple in that store changes.  A dataset consumer wants to mirror or replicate (parts of) a linked dataset. The periodically running synchronisation process needs to know which triples have changed at what time in order to perform efficient updates in the Destination dataset.

Figure 16: Linked data triple synchronisation

Figure 16: Linked data triple synchronisation

Features of Use Case

Effectively this means that the resolution of the identifiers available in the Source is more granular than the resolution that the Destination actually wants: only parts of a resource are being synchronised, not the whole resource.

Mode of Deployment

One Source that is or contains a triple-store, and a Destination which wishes to keep up to date without transferring the whole dataset each time.

Concrete Examples

Any triple store that needs to be synchronised, for example DBpedia, the structured data form of Wikipedia.

Issues

This is a specific case of ‘diff’, at the level of the entire dataset unless portions of the triple-store are exposed as resources that can be separately synchronised.

Use Cases Judged Out of Scope

The following use cases have been designated Out of Scope for an initial specification of ResourceSync, but they should be taken into account as the development goes on, to ensure that no avenues are closed off for future versions.  Reasons for being out of scope are given in the ‘issues’ section for each use case.

1. Temporary Resources / TTL

The resource will only be available for a known (by the Source) fixed amount of time.  There may be some systems that only hold content for a limited period of time before it is deleted, such as systems that are used for staging content in workflows.  The content that is announced via ResourceSync then might have a known time to live (TTL), before it is no longer available.

Figure 17: Temporary resources/TTL

Figure 17: Temporary resources/TTL

Features of Use Case

It suggests that the resource in the change communication may only be available for a limited time, and so the Destination must synchronise in a timely manner.

Mode of Deployment

One Source, which contains resources which will only be available for a fixed amount of time, and one Destination which is capable of responding sufficiently quickly.

Concrete Examples

Twitter search results are often only available for a fixed length of time due to the complete mass of tweets being too large to all be fully indexed.

Any system which offers support to a workflow, and expects the content to move on in time (such as a staging repository), or other environments which only retain information for a short period of time.

2. Destination-defined Selective Sync

The Destination wants to synchronise with a sub-set of the total set of resources held by the Source, and wants to provide a set of query parameters to the Source in order to be given a set of change communications which meet those criteria.

Destination-defined Selective Sync

Destination-defined Selective Sync

Features of Use Case

It places the onus on the Source to provide an API that has the ability to provide filtering on queries sent by the Destination.

Mode of Deployment

One Source that supports Destination-defined queries, one Destination which wants to query for sub-sets of the Source’s resources.

Concrete Examples

OAI-PMH Sets are similar, except they are usually defined by the Source, not the Destination.

Issues

This has significant overlap with the notion of an interoperable search facility. It would rely on agreed information about resources being indexed.

3. Complex Web Objects

Sometimes it will be necessary to synchronise not only atomic resources but larger complex resources (such as those represented by an ORE [30] Resource Map).  While resource maps themselves could be synchronised like atomic resources, the synchronisation may require referenced resources to also be synchronised.  Furthermore, if synchronising such resources results in their URIs being translated into the namespace of the target system, then the resource map being synchronised may need to be rewritten as it is synchronised.  Some composite objects may also need to be transferred atomically.

Figure 19: Complex Web objects

Figure 19: Complex Web objects

Features of Use Case

It suggests that the synchronisation operation is both a) not a strict copy, as some parts of the synchronised resource may need to be localised in the target system, and b) not limited to synchronising just the primary resource but also resources which it references.

Mode of Deployment

One or more Sources providing composite objects which may span across multiple sources, and one Destination wishing to synchronise those resources.

Concrete Examples

This scenario refers to any resource that references other resources in order to provide their full expression.  Examples would include ORE Resource Maps, which describe an aggregated set of resources to be viewed as a whole; an HTML page which references images and other embedded content; or a SWORD Statement which references its various packaged resources.

Issues

How would it handle synchronisation recursion depth for resource references? How to handle cross-site resources?

4. Reuse Conditions of Content

Some content will have reuse conditions which are required to ensure that the synchronised resource is not inappropriately passed on by the Destination to other Destinations.

Figure 20: Reuse conditions of content

Figure 20: Reuse conditions of content

Features of Use Case

Some resources have metadata associated with them that is specifically to do with the rules by which they should be synchronised.

Mode of Deployment

One Source with licensed content, one Destination.

Concrete Examples

Licensed content. Embargoed content with release conditions.

Private->Private-Public synchronisation chains where the first sync may be between two private systems; however future downstream synchronisations then make the resources openly available.

Issues

This is a complex topic for which only partial solutions in limited domains currently exist.

5. Intra-application Event Notification

Software applications, for example traditional institutional repositories are often made up of discrete components in a Service Oriented Architecture [31] style allowing the platform to be installed on a single server, or to scale-out and be split across multiple servers.  Applications such as this often have change event mechanisms to inform other components when resources have changed and need to be propagated.  If applications such as this will be developing and deploying ResourceSync, this could replace some of the intra-application communication with a standardised protocol, allowing more interoperability between components from different platforms.

Figure 21: Intra-application event notification

Figure 21: Intra-application event notification

Features of Use Case

ResourceSync is being used to replace internal change event notification systems as well as providing an outward-facing change event publisher.

Mode of Deployment

One publisher of changes, only a few internal consumers of those changes, even though it is likely there are external consumers of the same change notification system (although would the internal version contain different / more information?).  A low level of latency would be required, and should be possible due to the natural inter-relatedness of the components.

Concrete Examples

The search indexers used in DSpace receive event notifications when resources have changed, so that they can re-index them (and create / delete events for adding / deleting resources from the index).

Issues

ResourceSync may be useful within applications but the focus of this project is the Web. Applications using ResourceSync internally may want to namespace or extend event types.

6. Cyclic Synchronisation

Some systems require synchronisation in both directions - from a source to a destination, and then back again.  These may be chained together in several steps.  For example:

A -> B -> C -> A

How do we prevent unstoppable cycles of synchronisation?

Figure 22: Cyclic synchronisation

Figure 22: Cyclic synchronisation

Features of Use Case

The systems involved will need to track identifiers and versions of records as they move through the synchronisation chain to ensure that change events do not constantly cycle around the system. Furthermore, parallel updates need to be flagged and notified on.

Mode of Deployment

A tree of nodes, with changes being made at the bottom by a large number of nodes, propagating up the tree to fewer and fewer nodes, which then propagate back down to larger numbers of nodes.

Concrete Examples

Library catalogue records often sync up to union catalogues and beyond (perhaps local to regional to national to international).  Changes are made at the local level, and can then propagate up to other systems.  Changes also propagate down from higher union catalogues.

Issues

How to know that two Web resources are the same?  This could be a provenance issue. But equally it might just be the fact that the content has changed (otherwise the fixity information will be the same).

Some advice needs to be given to implementers, even though this is considered Out of Scope.

Conclusions

This article has described the purpose of the ResourceSync Framework that is currently under development. In particular it has described both the use cases that will guide the development, but also the purpose of use cases in the development process.

The variety of use cases shows that the ResourceSync Framework will be able to fulfil many different uses, from transferring large datasets in a laboratory, to populating Web archives of frequently changing Web sites; from providing mirrors of Web sites, to performing wholesale migrations of resources from old to new sites.

When the Framework is fully developed, this list of use cases, together with their associated issues for consideration, can be used as a checklist to ensure that the Framework supports all of the functions and modes that it needs to.

Acknowledgements

The ResourceSync team gratefully recognises the support of the Sloan Foundation for its support of the project.  In addition the team members acknowledge the generous support of JISC in funding the participation of several UK members to the technical committee of ResourceSync.

This set of use cases was initially formed by the authors of this article, but was subsequently developed and completed by the whole project team.

The core project team consists of:

  • Los Alamos National Laboratory and The Open Archives Initiative: Martin Klein, Robert Sanderson, Herbert Van de Sompel
  • Cornell University and The Open Archives Initiative: Berhard Haslehofer, Simeon Warner
  • NISO: Todd Carpenter, Nettie Lagace, Peter Murray
  • Old Dominion University and The Open Archives Initiative: Michael L. Nelson
  • University of Michigan and The Open Archives Initiative: Carl Lagoze

The technical group consists of:

  • Manuel Bernhardt, Delving B.V.
  • Kevin Ford, Library of Congress
  • Richard Jones, Cottage Labs
  • Graham Klyne, University of Oxford
  • Stuart Lewis, University of Edinburgh
  • David Rosenthal, LOCKSS
  • Christian Sadilek, Red Hat
  • Shlomo Sanders, Ex Libris Inc.
  • Sjoerd Siebinga, Delving B.V.
  • Ed Summers, Library of Congress
  • Paul Walk, UKOLN
  • Jeff Young, OCLC

References

  1. Hypertext Transfer Protocol, RFC 2616 http://www.ietf.org/rfc/rfc2616.txt
  2. File Transfer Protocol, RFC 959 http://www.ietf.org/rfc/rfc959.txt
  3. Secure Copy http://en.wikipedia.org/wiki/Secure_copy
  4. Open Archives Initative Protocol for Metadata Harvesting
    http://www.openarchives.org/OAI/openarchivesprotocol.html
  5. OpenSearch http://www.opensearch.org/
  6. Z39.50 http://www.loc.gov/z3950/agency/Z39-50-2003.pdf
  7. SRU/SRW http://www.loc.gov/standards/sru/
  8. RSS http://en.wikipedia.org/wiki/RSS
  9. The Atom syndication format, RFC 4287 http://www.ietf.org/rfc/rfc4287.txt
  10. Simple Web-service Offering Repository Deposit (SWORD) http://swordapp.org/
  11. Use cases http://en.wikipedia.org/wiki/Use_case
  12. arXiv.org http://arxiv.org/
  13. The OAIster database http://www.oclc.org/oaister/
  14. Europeana http://www.europeana.eu/
  15. OAI-PMH sets http://www.openarchives.org/OAI/openarchivesprotocol.html#Set
  16. OAI-PMH metadataPrefix
    http://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces
  17. Registry of Open Access Repositories (ROAR) http://roar.eprints.org/
  18. Directory of Open Access Repositories (OpenDOAR) http://www.opendoar.org/
  19. DataFlow Project http://www.dataflow.ox.ac.uk/
  20. EPrints repository platform http://www.eprints.org/software/
  21. DSpace repository platform http://www.dspace.org/
  22. The Atom Publishing Protocol (AtomPub), RFC 5023 http://tools.ietf.org/html/rfc5023
  23. Web Host Metadata, RFC 6415 http://tools.ietf.org/html/rfc6415
  24. About robots.txt http://www.robotstxt.org/
  25. Sitemaps.org http://www.sitemaps.org/
  26. Defining Well-Known Uniform Resource Identifiers, RFC 5785 http://tools.ietf.org/html/rfc5785
  27. Registry of Research Data Repositories http://www.re3data.org/
  28. Content delivery network http://en.wikipedia.org/wiki/Content_delivery_network
  29. Popitsch, N., Haslhofer, B., “DSNotify – A solution for event detection and link maintenance in dynamic datasets”, Web Semantics: Science, Services and Agents on the World Wide Web, 9 (3), September 20122 DOI:10.1016/j.websem.2011.05.002
  30. ORE Specifications and User Guides http://www.openarchives.org/ore/1.0/toc
  31. Service-oriented architecture http://en.wikipedia.org/wiki/Service-oriented_architecture

Author Details

Stuart Lewis
Head of Digital Library
University of Edinburgh

Email: stuart.lewis@ed.ac.uk
Web site: http://www.ed.ac.uk/

Stuart Lewis is Head of Digital Library Services at the University of Edinburgh where he is currently responsible for a service portfolio including acquisitions, metadata, e-resources, digital library development, information systems, repositories, and research publications. He has worked with open repositories in various roles over the past six years and has a particular interest in interoperability issues. He is the Community Manager of the SWORD v2 Project, which continues to develop the SWORD repository deposit standard.  Prior to working at Edinburgh, Stuart held the position of Digital Development Manager at the University of Auckland, New Zealand, and before that led the Web Applications and Repository Projects Team at Aberystwyth University.

Richard Jones
Founder Cottage Labs

Email: richard@cottagelabs.com
Web site: http://www.cottagelabs.com/

Richard Jones has been working in Open Source and in/around Higher Education for over a decade. He is a long-term contributor to open source software, and in particular the DSpace repository platform. He is also an advocate of Open Access, and has written numerous articles on the subject, as well as co-authoring a book on a related topic. He has worked for a number of large HE institutions over the years, including the University of Edinburgh, the University of Bergen and Imperial College London. Subsequently he moved out of HE and first into commercial research and development (at HP Labs and Symplectic), and then on ultimately to founding Cottage Labs.

Simeon Warner
Director of the Repositories Group
Cornell University Library

Email: simeon.warner@cornell.edu
Web site: http://www.cornell.edu/

Simeon Warner is Director of the Repositories Group at Cornell University Library. Current projects include development of an archival repository, the arXiv e-print archive , and Project Euclid . He was one of the developers of arXiv and his research interests include Web information systems, interoperability, plagiarism detection, and open-access scholarly publishing. He has been actively involved with the Open Archives Initiative (OAI) since its inception and was one of the authors of the OAI-PMH and OAI-ORE specifications.

 

Date published: 
30 November 2012

This article has been published under Creative Commons Attribution 3.0 Unported (CC BY 3.0) licence. Please note this CC BY licence applies to textual content of this article, and that some images or other non-textual elements may be covered by special copyright arrangements. For guidance on citing this article (giving attribution as required by the CC BY licence), please see below our recommendation of 'How to cite this article'.

How to cite this article

Stuart Lewis, Richard Jones, Simeon Warner. "Motivations for the Development of a Web Resource Synchronisation Framework". November 2012, Ariadne Issue 70 http://www.ariadne.ac.uk/issue70/lewis-et-al


article | about seo