Web Magazine for Information Professionals

The LIPARM Project: A New Approach to Parliamentary Metadata

Richard Gartner outlines a collaborative project which aims to link together the digitised UK Parliamentary record by providing a metadata scheme, controlled vocabularies and a Web-based interface.

Parliamentary historians in the United Kingdom are particularly fortunate as their key primary source, the record of Parliamentary proceedings, is almost entirely available in digitised form. Similarly, those needing to consult and study contemporary proceedings as scholars, journalists or citizens have access to the daily output of the UK's Parliaments and Assemblies in electronic form shortly after their proceedings take place.

Unfortunately, the full potential of this resource for all of these users is limited by the fact that it is scattered throughout a heterogeneous information landscape and so cannot be approached as a unitary resource.  It is not a simple process, for instance, to distinguish the same person if he or she appears in more than one of these collections or, for that matter, to identify the same legislation if it is referenced inconsistently in different resources. As a result, using it for searching or for more sophisticated analyses becomes problematic when one attempts to move beyond one of its constituent collections.

Finding some mechanism to allow these collections to be linked and so used as a coherent, integrated resource has been on the wish-list of Parliamentary historians and other stakeholders in this area for some time. In the mid-2000s, for instance, the History of Parliament Trust brought together the custodians of several digitised collections to examine ways in which this could be done. In 2011, some of these ideas came to fruition when JISC (Joint Information Systems Committee) funded a one-year project named LIPARM (Linking the Parliamentary Record through Metadata) which aimed to design a mechanism for encoding these linkages within XML architectures and to produce a working prototype for an interface which would enable the potential offered by this new methodology to be realised in practice.

This article explains the rationale of the LIPARM Project and how it uses XML to link together core components of the Parliamentary record within a unified metadata scheme. It introduces the XML schema, Parliamentary Metadata Language (PML), which was created by the project and the set of controlled vocabularies for Parliamentary proceedings which the project also created to support it.  It also discusses the experience of the project in converting two XML-encoded collections of Parliamentary proceedings to PML and work on the prototype Web-based union catalogue which will form the initial gateway to PML-encoded metadata.

Background: The Need for Integrated Parliamentary Metadata

The UK's Parliamentary record has been the focus of a number of major digitisation initiatives which have made its historical corpus available in almost its entirety: in addition, the current publishing operations of the four Parliaments and Assemblies in the UK ensure that the contemporary record is available in machine-readable form on a daily basis. Unfortunately, these collections have limited interoperability owing to their disparate approaches to data and metadata which renders the federated searching and browsing of their contents currently impossible. In addition, the disparity of platforms on which they are offered, and the wide diversity of user interfaces they use to present the data (as shown by the small sample in Figure 1), render extensive research a time-consuming and cumbersome process if it is necessary to extend its remit beyond the confines of a single collection.

Figure 1: Four major collections of Parliamentary proceedings, each using a different interface

Figure 1: Four major collections of Parliamentary proceedings, each using a different interface

A more integrated approach to Parliamentary metadata offers major potential for new research: it would, for instance, allow the comprehensive tracking of an individual's career, including all of their contributions to debates and proceedings. It would allow the process of legislation to be traced automatically, voting patterns to be analysed, and the emergence of themes and topics in Parliamentary history to be analysed on a large scale.

One example of the linkages that could usefully be made in an integrated metadata architecture can be seen in the career of Sir James Craig, the Prime Minister of Northern Ireland from 1921 to 1940.  Figure 2 illustrates some of the connections that could be made to represent his career:-

Figure 2: Sample of potential linkages for a Parliamentarian

Figure 2: Sample of potential linkages for a Parliamentarian

The connections shown here are to the differing ways in which he is named in the written proceedings, to his tenures in both Houses, the constituencies he represented, the offices he held and the contributions he made to debates. Much more complex relationships are, of course, possible and desirable.

The advantages of an integrated approach to metadata which would allow these connections to be made have long been recognised by practitioners in this field, and several attempts have been made to create potential strategies for realising them. But it was only in 2011 that these took more concrete form when a one-day meeting sponsored by JISC brought together representatives from the academic, publishing, library and archival sectors to devise a strategy for integrating Parliamentary metadata. Their report proposed the creation of an XML schema for linking core components of this record and the creation of a series of controlled vocabularies for these components which could form the basis of the semantic linkages to be encoded in the schema [1]. These proposals then formed the basis of a successful bid to JISC for a project to put them into practice: the result was the LIPARM (Linking the Parliamentary Record through Metadata) Project.

The LIPARM Project

The LIPARM Project, a joint venture of King's College London, the History of Parliament Trust, the Institute for Historical Research, the National Library of Wales and Queen's University, Belfast, has four core components:-

The Parliamentary Metadata Language (PML) Schema

The core of the LIPARM metadata architecture is the XML schema used to link components of the Parliamentary record. The schema, called Parliamentary Metadata Language (PML), defines seven  concepts as central to the record: these are shown in Figure 3 with examples of their usage in the context of the UK Parliament.

Figure 3: Core components of the Parliamentary Metadata Language schema

Figure 3: Core components of the Parliamentary Metadata Language schema

These concepts are deliberately defined in generic terms to allow them to be applied outside the context of a single legislature.

Units is used to define the  administrative sections of the legislature and their relationships; it may include, for instance, administratively defined units (such as "Government", or "Opposition"), chambers of the legislature, or geographic units (such as constituencies).

Functions records roles or offices filled by members, including named officers of state. Calendar objects are the temporal units within which legislative activities take place (in the UK, these include Parliaments, their constituent sessions and the individual sittings which take place within them).

The remaining concepts record the proceedings themselves and their relationships to each other. Each unitary component of the proceedings is recorded in a Proceedings object element: these may include debates, any item of business or the meetings of committees. A particular type of proceedings object, the holding of a vote, merits its own element, Vote event, in which fine details of votes cast are recorded.  These components can then be nested together within grouping elements, Proceedings Groups, which are used to define any relevant conjunction of these proceedings objects or vote events: usually they are used to define Acts or Bills and group together all stages of the proceedings from which they result.

The top-level XML elements for each of these components have a consistent set of  attributes to define them more precisely than their generic element names allow. A sample unit element for a Parliamentary constituency, for instance, may take this form:-

<unit               

                ID="s-s001-v0001-constituencies-0040"

                regURI="http://liparm.ac.uk/id/unit/constituency/londonderry1920-1929"

                type="constituencies"                                                       

                typeURI="http://liparm.ac.uk/id/unittype/constituency">

                <label>Londonderry</label>

</unit>

The unit is defined as a constituency (as opposed to other units such as chambers of Parliament, committees and so on) by the type attribute which provides a human-readable description of the category of unit to which it belongs, and the typeURI attribute, which provides a Universal Resource Identifier (URI) for this type. A URI is a precise mechanism for providing unambiguous identifiers for any concept or object which is valid anywhere within digital space,: here the URI is defined in a controlled vocabulary (compiled by the LIPARM Project) and identifies this unit as a Parliamentary constituency.

In addition to defining the type of unit being referenced, the example also demonstrates how the content of the element is itself presented. The label element provides the human-readable version of the element's contents which will be displayed when the file is delivered to the end-user: this is repeatable and has a lang attribute indicating the language in which the content is presented (so allowing multi-lingual versions of the data). The regURI attribute contains the URI for a controlled form of the element's contents, in this case for the Londonderry constituency which existed from 1920 to 1929: this allows the precise identification of the content and also allows this occurrence of the data to be linked to those in the same or any other PML-encoded file.

A final attribute common to all elements is ID which contains an XML identifier for the element itself: this is used as a reference point to which linkages can be made  from other elements in the same file. This is the key mechanism by which components of the PML are linked together in order to express the relationships central to a Parliamentary record.

One core relationship of this type is that of a contribution made by a person to a Parliamentary body or unit, to a given proceeding or to a vote: this is expressed using the widely-used contribution element which can be nested within unit, proceedingsObject or voteEvent. A person's service as a constituency MP, for instance, may be expressed in this way:-

<unit
        

         ID="s-s001-v0001-constituencies-0053"                                                         

         regURI="http://liparm.ac.uk/id/unit/constituency/queensuniversity1920-1969"     

         type="constituencies"                                                       

         typeURI="http://liparm.ac.uk/id/unittype/constituency">

         <label>Queen's University</label>

         <contributions>

                <contribution

                contributorID="s-s001-v0001-persons-0036"

                type="constituency mp"                                                                                                   

                typeURI="http://liparm.ac.uk/id/contributions/constituencymp"/>

               startDate="1921-03-12"

               endDate="1924-04-17"/>

         </contributions>

</unit>

This generic contribution element uses the type and typeURI mechanisms outlined above to indicate the type of contribution recorded. The mandatory contributorID attribute contains the XML ID for the person element in which the details of the MP in question are contained, and the startDate and endDate attributes indicate the temporal limits of his term of service. This same set of attributes is used throughout the schema to record any type of contribution.

The encoding of votes requires a specialised element, voteEvent, in which the full details of all votes cast, the motions on which they are taken, the proceedings in which they occur and the roles of tellers and other participants can be recorded. A (much simplified) entry for a division within a Parliamentary debate may take this form:-

<voteEvent
              ID="s-s001-v0001-voteevent-00002"
   

              calendarObjectID="s-s001-v0001-calendarobject-015"

              proceedingsObjectID="s-s001-v0001-proceedingsubject-00121"                              

              type="division"                                                  

              typeURI="http://liparm.ac.uk/voteevents/division"                                                     

             startDate="1921-12-01">

             <motionText>Division 2 That the words proposed to be left out stand part of the

                            Question</motionText>

             <options>

                            <option

                            regURI="http://liparm.ac.uk/id/votingoption/yes">

                                             <label>Ayes</label>

                                                                            <vote voterID="s-s001-v0001-persons0006"/>

                                                            <vote voterID="s-s001-v0001-persons-0007"/>


                                            </option>

                                                           <option                                                                                                                                                                                                                                                        regURI="http://liparm.ac.uk/id/votingoption/no">


                                                                         <label>Noes</label>

 
                                                                         <vote voterID="s-s001-v0001-persons-0004"/>
                                                                            

                                                           <vote voterID="s-s001-v0001-persons-0053"/>               

                                           </option>

                                           </options>

</voteEvent>

The attributes for the voteEvent element include references to the calendarObject during which the vote takes place and to the proceedingsObject element for the proceedings during which it occurs. The votes themselves are recorded by vote elements which are nested within option elements for each of the options available: those who cast their votes are identified by the voterID attribute which contains the identifier of their person element. In addition contribution elements can be used to encode roles other than the casting of votes (for instance acting as vote tellers).

The PML schema includes more features than can be outlined here, including source elements which provide links to the full-text or multimedia sources for the records of proceedings and external linkage elements which provide a generic linking feature to any external resource (for instance biographies of members).

The LIPARM Controlled Vocabularies

The regURI attribute, common to most PML elements, is the key mechanism by which components of the PML files can be linked outside the confines of a single document. This attribute contains a URI for the contents of its respective element, usually defined in a controlled vocabulary: these URIs may cover every feature encoded within the file, including persons, proceedings objects, votes, constituencies and proceedings groups such as Acts or Bills. In addition, the typeURI attribute contains URIs for controlled terms for each type of component.

To realise the full potential of its linking facilities requires the parallel use of controlled vocabularies which define these URIs. For this reason, the LIPARM Project has produced an extensive series of these vocabularies in which each component is assigned its unique URI for use in PML files. Vocabularies have been defined for such key components of the record as Members of Parliament, Acts, Bills, constituencies and sessions for both the Westminster and Stormont Parliaments. In addition, vocabularies have been compiled for types of PML components: these define URIs for the typeURI attribute to use in order to define these generic elements more precisely.

The controlled vocabularies are published in MADS (Metadata Authority Description Schema), an XML schema published by the Library of Congress for encoding thesauri, term lists and taxonomies in an interoperable and exchangeable format [2]. A sample record for a Westminster constituency encoded in MADS takes this form:

<mads>

       <authority>

              <name valueURI="http://liparm.ac.uk/id/unit/constituency/croydonnorthwest1955-1997">

                            <namePart>Croydon North West</namePart>

             </name>

             <temporal point="start">1955</temporal>

             <temporal point="end">1997</temporal>

    </authority>

</mads>

The valueURI attribute shown here records the URI for this constituency, which is used within regURI attributes in a PML file to identify it unambiguously.

Generating PML Records

To test the viability of the LIPARM architecture, several years of two pre-existing collections of digitised Parliamentary proceedings are being converted to PML records. They are from the records of debates in the House of Commons of both the Westminster and Stormont Parliaments from the 1920s which are already available on the Internet on the Millbank Hansard [3] and Stormont Papers [4] Web sites respectively. Both collections contain as their core data XML-encoded transcripts of debates and their associated metadata.

The process of generating PML from these records is being undertaken by the Centre for Data Digitisation and Analysis at Queen's University, Belfast. PML components such as information on persons, functions, proceedings objects and votes are extracted from the XML and formatted to conform to the PML schema. Internal linkages within the PML are set up using the XML IDs of each component, and links to the LIPARM controlled vocabularies are made by matching entries across their MADS files and embedding their corresponding URIs within PML regURI attributes. Much of this process can be automated relatively easily owing to the shared XML architecture of PML and the data files, but much manual editing is also necessary to establish linkages where automatic matching fails.

The User Interface

The final component of the LIPARM Project is a user interface to the PML-encoded metadata: this will take the form of a prototype Web-based union catalogue to the initial collections encoded by the project team. This interface is, at the time of writing, being designed by a team at the National Library of Wales.

The interface is designed to provide access to the main components of a PML file and to enable users to navigate the links between them. Browsing or searching by, for instance, a Member of Parliament will provide links to their offices held, the constituencies they serve and all of their contributions to proceedings from  speeches in debates to voting in divisions. Links to the source files (in the case of the two initial collections to the digitised page on the Web site for each collection) will be provided for each entry.

The interface is due to go live in December 2012.

Conclusion

The LIPARM Project sets out primarily to define an architecture and methodology for joining together Parliamentary resources, and, based on evaluation and feedback, appears so far to be meeting its objectives. The PML schema, designed to synthesise the requirements of practitioners from a variety of interested sectors and evaluated by them after publication, appears to meet well their expressed needs in diverse environments. Its integration with controlled vocabularies published in re-usable and interchangeable formats works well in providing cross-document linkages at a semantic level.

The practicalities of implementing the LIPARM architecture in working environments will depend to a large extent on local circumstances, but the experiences of the conversion team at Belfast have shown that the generation of PML records even from complex sources can be readily achieved with modest resources. They have also demonstrated that the integration of the LIPARM controlled vocabularies can be incorporated into conversion workflows with little problem, only the messiness of some data requiring major manual intervention. The work on the user interface, although not completed at the time of writing, shows that the presentation of the often complex relationships encoded in PML can be readily achieved using pre-existing technologies.

LIPARM

The LIPARM Project should for the first time make feasible the joining up of the scattered UK Parliamentary record and bring to fruition the potential of the digitised record which has until now remained to some extent latent.  Such an ambition has been held by many Parliamentary historians, librarians, archivists and publishers for some time, and while the project can only represent the initial steps to achieving this, it has established a robust architecture which integrates well with existing resources and so should be readily extensible as new collections, both historical and contemporary, adopt it.

The Web site for the LIPARM Project [5] contains full documentation for the schema, the schema itself, a sample PML file and the controlled vocabularies compiled by the project (in MADS and RDF formats).

References

  1. Richard Gartner and Lorna Hughes, Parliamentary Metadata Meeting: a brief report http://digitisation.jiscinvolve.org/wp/files/2011/07/parliamentary-metadata-2011.pdf
  2. MADS: Metadata Authority Description Schema
    http://www.loc.gov/standards/mads/
  3. Hansard 1803-2005
    http://hansard.millbanksystems.com/
  4. The Stomont Papers
    http://stormontpapers.ahds.ac.uk/stormontpapers/
  5. The LIPARM Project
    http://liparm.cerch.kcl.ac.uk/

Author Details

Richard Gartner
Lecturer
Centre for eResearch
Department of Digital Humanities
King's College London

Email: richard.gartner@kcl.ac.uk
Web site: http://www.kcl.ac.uk/innovation/groups/cerch/people/gartner/

Richard Gartner is a lecturer in Library and Information Science at King's College London. Before joining academia, he worked as a practising librarian for 20 years, specialising in digital libraries and metadata. His research concentrates on integrated metadata strategies for complex digital libraries and archives, particularly within XML architectures. In addition to acting as Principal Investigator for the LIPARM Project, he has also recently worked on projects in research information management, English place names, naval history and environmental science.