Mining the Archives: Metadata Development and Implementation

Martin White looks through the Ariadne archive to track the development and implementation of metadata in a variety of settings.

I was an early starter in the world of metadata. Within hours of arriving at the offices of the British Non-Ferrous Metals Research Association in Euston Street, London, in 1970 to start a career as an information scientist I was writing my first abstract. ‘Writing’ is the correct verb as my A3 abstract would be typed up on an IBM golfball typewriter for production. At the bottom of this form was a section called ‘Index Terms’ and it was made very clear at the outset that mistakes in the abstract were regrettable, but mistakes in indexing were unforgivable. A small team would take my index terms and cut holes in 10,000-hole optical coincidence cards to enable anyone in the Association to find information of importance to them. An error in indexing could well mean that the article might be lost for ever, or turn up in a totally unrelated search. The word ‘metadata’ was never used, but to this day I can remember the care with which the list of controlled terms was compiled, and the pleasure I gained when my suggestion for adding Memory Alloys to the list was approved.

In this column I am working back through the Ariadne archives on metadata. The term is now so pervasive in information management that my choice has been a very personal one, highlighting a few of the many papers that at the time of publication set out new directions for metadata standards and adoption. There is no better place to start than with Paul Miller’s definitive paper [1] from 1996 on the structure and value of metadata. Paul trained as an archaeologist but was soon moving towards a life-long interest in applying structure to information and data through metadata. Paul currently runs the Cloud of Data Consultancy [2]. It remains one of the classic papers on metadata, providing a good bibliography of early developments and a summary of the key elements of the Dublin Core metadata framework. Paul is slightly vague about the origin of the use of the term metadata and since I like an information discovery challenge, the earliest reference I have found is by Philip Bagley in a 1968 US Air Force research report entitled 'Extension of Programming Language Concepts' [3]. Only after finding this document did I check out the Wikipedia entry on metadata [4] and found the same reference.

By 1997 UKOLN [5] had appointed a Metadata Officer, Michael Day, and he contributed a short paper to Ariadne on the implications of metadata for digital preservation. Michael set out five important questions which to this day still represent challenges for the profession:

  • Who will define what preservation metadata are needed?
  • Who will decide what needs to be preserved?
  • Who will archive the preserved information?
  • Who will create the metadata?
  • Who will pay for it?

It is good to know that Michael is currently Digital Preservation Officer at the British Library and no doubt continues to muse upon the questions he has raised over the years. 1997 was also the year in which the Metadata Corner series [6] was established in Ariadne, usually written by one of the Metadata Group team at UKOLN. I’ve not considered these columns in this review, a decision made on space and not on value, which was enormous in guiding practitioners through the challenges and opportunities of metadata implementation.

Early in 1998 Paul Miller was reporting on a meeting organised by the Telematics for Libraries Programme of the European Commission's Directorate General XIII (of fond memory!) to consider how to integrate metadata, and in particular Dublin Core, into the projects being funded by the Telematics Programme of the European Commission. Normally I do not include conference reports in this column, but this event was especially important in recognising the value of high-quality metadata in a European context. The title of the report was “Dublin Comes to Europe” [7], a reminder that the initial work on the Dublin Core was initiated at a meeting organised by OCLC in Dublin, Ohio in 1995.

In my opinion metadata came of age with the launch of the Resource Description Network [8] in November 1999. RDN had its origins in the 1994 Follett Report on IT in Higher Education [9]. The description of the event by Alistair Dunning brought back many memories as I attended the conference and emerged really very excited by the potential of RDN; but Alistair also hints at the amount of work and politics behind the scenes to set up RDN with Kings College and the University of Bath (UKOLN) as the lead institutions. I was also present at one of the early meetings on the RDN initiative and, given some of the comments made, was amazed that it was launched at all. To move out of chronological sequence for a moment, a very good review of the subsequent development of RDN was published in Ariadne in 2006. Debra Hiom provides a narrative history [10] and a complementary timeline [11]. The date on the top of the timeline paper is December 2011. A little metadata problem!*

To return to the story, arguably one of the most important of the contributions published on metadata in Ariadne was an early description of application profiles. The problem that application profiles addressed is elegantly summarised by Rachel Heery and Manjula Patel in their paper “Application Profiles: Mixing and Matching Metadata Schemas” [12] in which they refer to the respective roles of standards makers and implementors:

Both sets of people are intent on describing resources in order to manipulate them in some way. Standard makers are concerned to agree a common approach to ensure inter-working systems and economies of scale. However implementors, although they may want to use standards in part, in addition will want to describe specific aspects of a resource in a “special” way. Although the separation between those involved in standards making and implementation may be considered a false dichotomy, as many individuals involved in the metadata world take part in both activities, it is useful to distinguish the different priorities inherent in the two activities.[12]

The importance of application profiles cannot be over-stated, as without them metadata schemas would have grown into metadata silos, and arguably much of the development of resource discovery (I use the lower case with deliberation) could not have taken place. The clarity of the paper is typical of the ethos of Ariadne under editors Philip Hunter and Richard Waller. Another aspect of the editorial direction of Ariadne was the willingness to publish papers on initiatives that had encountered difficulties on the way and which described how they were handled. A good example is a paper by David Little, “Sharing History of Science and Medicine Gateway Metadata Using OAI-PMH” published in 2003 [13]. Protocol for Metadata Harvesting (PMH) was developed by the Open Archives Initiative. The subsequent history of OAI is a subject in its own right so I won’t dwell on it any further, but it does represent another aspect of the importance and the difficulties associated with cross-domain metadata harvesting and integration.

The challenges of metadata development and implementation are substantial, requiring all concerned to possess a very high degree of patience. A good example is the development of metadata for learning objects. Carol Jean Godby outlined the initial work on learning object metadata [14] in 2004 in the context of the development of application profiles. Four years later, Sarah Currier summarised the progress in “Learning Resources: An Update on Standards Activity for 2008” [15] in which she highlighted the number of initiatives underway in this area of activity. There is more than a hint of frustration at the lack of coherence and progress. Only relatively recently has the Learning Resources Metadata Initiative started to fulfil the ambitions of the IEEE in particular, expressed over a decade ago. These things really do take time and have to balance the work involved in tagging with the benefits to the user. This will continue to be a challenge for all involved in metadata development and adoption. To end on a positive note, take a look at the paper by Richard Gartner on the Linking the Parliamentary Record Through Metadata (LIPARM) Project for the UK Parliament [16].

The LIPARM Project should for the first time make feasible the joining up of the scattered UK Parliamentary record and bring to fruition the potential of the digitised record which has until now remained to some extent latent. Such an ambition has been held by many Parliamentary historians, librarians, archivists and publishers for some time, and while the project can only represent the initial steps to achieving this, it has established a robust architecture which integrates well with existing resources and so should be readily extensible as new collections, both historical and contemporary, adopt it. [16]

This article exemplifies what can be achieved in resource discovery through carefully matching metadata schemas to user requirements, and combines both a detailed account of how the project was undertaken with a vision of what the benefits will be. I read it and was immensely encouraged, but even so it is just one step in a particular domain. There remains plenty of work for metadata visionaries to define and accomplish, and I look forward to reading further success stories in a few years time.

Figure 1 of The LIPARM Project: A New Approach to Parliamentary Metadata: Four major collections of Parliamentary proceedings, each using a different interface

Figure 1 of "The LIPARM Project: A New Approach to Parliamentary Metadata": Four major collections of Parliamentary proceedings, each using a different interface [16]

Reading articles in Ariadne, in common with any Web magazine, the length, depth and width of an article is difficult to appreciate. Furthermore one of the attributes of Ariadne is the range of content, from short reviews to a major contribution of nearly 6000 words. The paper I have in mind was written by a team from UKOLN working on a Jisc-funded project on application profiles for digital repositories [17]. In 2010 Talat Chaudri [18] and his co-authors highlighted the need for a collaborative approach to the development of application profiles. I’d like to quote a paragraph in full because it should be on the desktops (virtual or physical) of any development team.

Application profiles, in order to be workable, living standards, need to be re-examined in their constituent parts in far greater detail than before, and that a range of implementation methods need to be practically tested against the functional and user requirements of different software systems. This can only be achieved through a process of engagement with users, service providers (such as repository managers), technical support staff and developers. While the ideal target audience are the end-users of the repository service, it is in practice difficult to engage them in the abstract with unfamiliar, possibly complex, metadata schemas. So much of the process must inevitably be mediated through the repository managers' invaluable everyday experience in dealing directly with users – at least until the stage in the process when test interfaces, test repositories or live services can be demonstrated. In order to engage developers in the process of building and testing possible implementation methods, it is absolutely crucial to collect and present tangible evidence of user requirements. It is highly likely that practical implementation in repositories will vary greatly between individual services based on different software platforms. However, it may well be that there are other more significant factors in individual cases. [17]

This paper also shows the depth of expertise in metadata at UKOLN. Talat paid special attention to the Functional Requirements for Bibliographic records which IFLA started to develop in 1998 and which I have to admit I never really understood even with the help of a very clear exposition of the FRBR Entity-Relationship [19] that Talat contributed to Ariadne in 2009.

diagram (29KB) : Figure 1: left: FRBR Group 1: Entities and 'vertical' relationships; right: FRBR: Creators, contributors and agents. From Assessing FRBR in Dublin Core Application Profilesdiagram (30KB) : Figure 1: left: FRBR Group 1: Entities and 'vertical' relationships; right: FRBR: Creators, contributors and agents.

Figure 1: left: FRBR Group 1: Entities and 'vertical' relationships; right: FRBR: Creators, contributors and agents. From "Assessing FRBR in Dublin Core Application Profiles" [19]

I felt much more at ease with the 2006 paper by another UKOLN metadata specialist, Emma Tonkin [20], on plain text tagging using folksonomies [21].What is remarkable about this paper is that the term ‘folksonomy’ [22] had only been coined a couple of years previously and yet already UKOLN was considering the opportunities and challenges of crowd-sourced tagging.

diagram (3KB): The sum of the records from various tagging services creates a 'tag ensemble'. From Folksonomies: The Fall and Rise of Plain-text Tagging

Figure 1: The sum of the records from various tagging services creates a 'tag ensemble'. From "Folksonomies: The Fall and Rise of Plain-text Tagging" [21]

Compiling this review caused me to go back to look at the origin of the term. One of the many benefits from working my way through the Ariadne archive is uncovering the timeline of library and information science. We often forget, or do not have the time to check on, who invented what over the last couple of decades, and we sometimes fail to remember the rich research heritage that is now just a click away.

Editor's note: Actually, it represents the date Debra's timeline was restructured during Ariadne's platform re-design in 2011. Pragmatism ruled: too time-consuming to alter.

References

  1. Paul Miller. "Metadata for the Masses". September 1996, Ariadne Issue 5
    http://www.ariadne.ac.uk/issue5/metadata-masses/
  2. The Cloud of Data http://cloudofdata.com/
  3. Philip Bagley. (1968). 'Extension of Programming Language Concepts'. US Air Force research report (.pdf format) http://www.dtic.mil/dtic/tr/fulltext/u2/680815.pdf
  4. Wikipedia entry on Metadata, 3 October 2014 http://en.wikipedia.org/wiki/Metadata
  5. UKOLN http://www.ukoln.ac.uk/
  6. Ariadne Metadata Corner articles: Michael Day. "Metadata Corner: Working Meeting on Electronic Records Research". July 1997, Ariadne Issue 10 http://www.ariadne.ac.uk/issue10/metadata/ ;
    Rachel Heery. "Metadata Corner: Naming Names - Metadata Registries". September 1997, Ariadne Issue 11 http://www.ariadne.ac.uk/issue11/metadata/ ;
    Tony Gill, Paul Miller. "Metadata Corner: DC5 - the Search for Santa". November 1997, Ariadne Issue 12 http://www.ariadne.ac.uk/issue12/metadata/ ;
    Michael Day, Rachel Heery, Andy Powell. "Metadata Corner: CrossROADS and Interoperability". March 1998, Ariadne Issue 14 http://www.ariadne.ac.uk/issue14/metadata/ ;
    Michael Day. "Metadata Corner". July 1998, Ariadne Issue 16 http://www.ariadne.ac.uk/issue16/delos/
  7. Paul Miller. "Dublin Comes to Europe". March 1998, Ariadne Issue 14
    http://www.ariadne.ac.uk/issue14/dublin/
  8. Alistair Dunning. "RDN: Resource Discovery Network". December 1999, Ariadne Issue 22 http://www.ariadne.ac.uk/issue22/dunning/
  9. Lynne J. Brindley, (1994) "Joint Funding Councils' Libraries Review Group (the ‘Follett’) Report — the contribution of the Information Technology Sub‐committee", Program, Vol. 28 Iss: 3, pp.275 – 278. See also: Joint Funding Council's Libraries Review Group: Report (The Follett Report), December 1993 http://www.ukoln.ac.uk/services/papers/follett/report/
  10. Debra Hiom. "Retrospective on the RDN". April 2006, Ariadne Issue 47
    http://www.ariadne.ac.uk/issue47/hiom/
  11. Debra Hiom. "RDN Timeline". April 2006, Ariadne Issue 47 http://www.ariadne.ac.uk/issue47/hiom
  12. Rachel Heery, Manjula Patel. "Application Profiles: Mixing and Matching Metadata Schemas". September 2000, Ariadne Issue 25 http://www.ariadne.ac.uk/issue25/app-profiles/
  13. David Little. "Sharing History of Science and Medicine Gateway Metadata Using OAI-PMH". January 2003, Ariadne Issue 34 http://www.ariadne.ac.uk/issue34/little/
  14. Jean Godby. "What Do Application Profiles Reveal about the Learning Object Metadata Standard?". October 2004, Ariadne Issue 41 http://www.ariadne.ac.uk/issue41/godby/
  15. Sarah Currier. "Metadata for Learning Resources: An Update on Standards Activity for 2008". April 2008, Ariadne Issue 55 http://www.ariadne.ac.uk/issue55/currier/
  16. Richard Gartner. "The LIPARM Project: A New Approach to Parliamentary Metadata". November 2012, Ariadne Issue 70 http://www.ariadne.ac.uk/issue70/gartner
  17. Talat Chaudhri, Julian Cheal, Richard Jones, Mahendra Mahey, Emma Tonkin. "Towards a Toolkit for Implementing Application Profiles". January 2010, Ariadne Issue 62
    http://www.ariadne.ac.uk/issue62/chaudhri-et-al/
  18. Talat Chaudhri Network & Editing Services http://talatchaudhri.net/
  19. Talat Chaudhri. "Assessing FRBR in Dublin Core Application Profiles". January 2009, Ariadne Issue 58 http://www.ariadne.ac.uk/issue58/chaudhri/
  20. Emma Tonkin: Research Portal, King's College, London
    https://kclpure.kcl.ac.uk/portal/emma.tonkin.html
  21. Emma Tonkin. "Folksonomies: The Fall and Rise of Plain-text Tagging". April 2006, Ariadne Issue 47 http://www.ariadne.ac.uk/issue47/tonkin/
  22. Thomas Vander Wal. Folksonomy Coinage and Definition. 2 February 2007, vanderwal.net http://www.vanderwal.net/folksonomy.html

Author Details

Martin White
Managing Director
Intranet Focus Ltd
12 Allcard Close
Horsham
RH12 5AJ
UK

Email: martin.white@intranetfocus.com
Web site: http://www.intranetfocus.com

Martin White has been tracking developments in technology since the late 1970s, initially in electronic publishing in the days of videotext and laser discs. He has been a Visiting Professor at the iSchool, University of Sheffield, since 2002 and is Chair of the eScience Advisory Group of the Royal Society of Chemistry.

Date published: 
Friday, 13 February 2015
Copyright statement: 

This article has been published under Creative Commons Attribution 3.0 Unported (CC BY 3.0) licence. Please note this CC BY licence applies to textual content of this article, and that some images or other non-textual elements may be covered by special copyright arrangements. For guidance on citing this article (giving attribution as required by the CC BY licence), please see below our recommendation of 'How to cite this article'.