Metadata, PICS and Quality

chris armstrong

Metadata, PICS and Quality

Chris Armstrong looks at the possibility of a PICS application acting as a quality filter.

A recent Ariadne article by Anagnostelis, Cooke and McNab ended with a reference to the Platform for Internet Content Selection (PICS) [1] and added that while PICS controls neither the publication nor the distribution of information, it offers “individuals and organisations the option of filtering out or filtering in selected views of networked information”. There follows a reference to the Centre for Information Quality Management (CIQM) and its proposal to use PICS filtering in order to allow users to set constraints on the minimum quality of resources retrieved [2]. This article seeks to amplify this basic idea.

CIQM was originally set up by The Library Association and the UK Online User Group to act as a clearing house to which database users could report problems relating to the quality of any aspect of a database being used (search software, data, indexing, documentation, training, for example). CIQM undertakes to forward the problem to the appropriate body (information provider, online host, CD-ROM publisher or Internet resource provider) and route the response back to the user. The service is free to users.

Users tend to judge a new database on the description printed in vendor catalogues, on usage guides such as the KR DIALOG Blue Sheets and on general publicity material, so it is not surprising that searches are often performed which exceed the capabilities of the database. Users rarely know of the information provider’s policy as to inclusion: some databases index every article in a journal, other only substantive or key articles, still others vary their rules depending on the journal. Sometimes such expectations are based on nothing more than the idea that best practice will be followed and many of the quality issues reported to CIQM reflect a gap between expectations and the reality of searching. thus users have to face the twin spectres of unknown database specifications and unknown adherence to the unknown specifications.

Building on its work as a clearing house, CIQM has been looking at ways in which database and resource quality can be assured to users - if not measured against a fixed standard, then set against a published specification or user-level agreement of the database as it is at a fixed point in time. These have come to be known as Database Labels and several prototypes can be seen at our web site [4]. The Labels can be used to describe any electronic resource whether it is supplied on diskette or CD-ROM, by conventional online or as an Internet or Web resource; however, with the current ongoing discussions on metadata, the Dublin Core and the Warwick Framework it is evident that, for Internet-based resources at least, there exists the basis of a more direct means of assuring resource users that it reaches an acceptable standard

Metadata and PICS

The Dublin Metadata Workshop of 1995 and the Warwick Metadata Workshop of just over a year later were “convened to promote the development of consensus concerning network resource description across a broad spectrum of stakeholders, including the computer science community, text markup, and librarians” [5]. That is to say, the use of descriptive data elements stored as a part of the resource they describe (metadata) which are a simple resource declaration of content, ownership, currency, etc. As the Nordic Metadata Project shows, metadata can enhance access by making documents more easily searchable and deliverable over the Internet [6]. Already, the idea of ‘labelling’ resources is apparent although at this stage no application other than a general wish to make their location easier had surfaced.

With the outcry about censorship, an extension of the metadata methodology was developed that allowed for software filtering of resources as they were located. PICS establishes Internet conventions for label formats (here, ‘label’, does not equate in any way to a CIQM Database Label) and distribution methods, while dictating neither a labelling vocabulary nor how the labels should be used and who should use them. Resnick and Miller [7] suggest that, “It is analogous to specifying where on a package a label should appear, and in what font it should be printed, without specifying what it should say.”

An overview of PICS and its uses can be found in an article by Resnick and Miller [7] while it’s use is described in detail in a document prepared for the technical subcommittee of PICS by Miller, Resnick and Singer [8]; label syntax and communication protocols are further described in a document by Krauskopf, Miller, Resnick and Treese [9].

It is important to note that PICS itself is a methodology or an infrastructure and not a rating service or an active system for selection/censorship. It is values-neutral: it is the applications that govern the implementation.

PICS labels describe content on one or more dimensions using a purpose-made vocabulary and allow selection software to determine access. In its most publicised role, PICS can be used to control Internet access by children. The local selection software can be set to inhibit access if violence exceeds a level previously determined by the parent or teacher, for example. The resource owner or labeller indicates on a mutually agreed scale the level of violence (or nudity, profanity, etc) for their site and the selection software bars sites whose label equals or exceeds this level. Sites without labelling can also be barred. Other uses for such a system can easily be imagined.

The first labels were designed to allow circumnavigation of indecent sites in response to the US Telecommunications Decency Act - for example, using the Recreational Software Advisory Council (RSAC) rating system or the SafeSurf vocabulary - and this is probably still the main application, but as Resnick wrote in his Scientific American article, ‘Filtering Information on the Internet’, labels can be used to “convey characteristics that require human judgement - whether a Web page is funny or offensive - as well as information not readily apparent from the words and graphics [within the page], such as the Web site’s policies about use or resale of personal data.” [10]

Resnick and Miller [7] had already noted that “new infrastructures are often used in unplanned ways, to meet latent needs” and suggest that electronic journal articles could be encoded (seminal article, review article, short notice, etc); intellectual property vocabularies may develop; and reputation vocabularies could “associate labels with commercial sites that had especially good or especially bad business practices.” It is this last application which comes closest in spirit to the concept of CIQM Database Labels and Internet Resource Labels.

Over the last two years considerable development work has taken place in all the metadata areas; PICS development work is supported by the World Wide Web Consortium (W3C), the organisation responsible for the development of Web standards. Behind the general and easily understood concepts, complex work on syntax and vocabulary has gone on at the same time that further PICS applications have been suggested. PICS labels may be used to carry digital signatures of resources or to protect computers from viruses, and coming ‘meta circle’, as it were, it has been pointed out that mechanism to restrict access and to gain access are two sides of the same coin: requests to RESTRICT access to any site dealing with a given topic or having a less than up-to-date currency rating are very similar to requests to FIND any site by that topic or with a non-current date code.

At the PICS Working Group meeting held in London on 13/14th January 1997, usage of PICS within the USA was reported as widespread. It also appears that PICS is being endorsed at government level in Europe. In light of the PICS development work, it seems clear that the Internet community would benefit from work to make use of PICS for storing data on resource quality.

Locating Worthwhile Resources

In addition to searching the Internet by way of search engines such as Alta Vista or services such as Excite Web Reviews [11] and the Magellan Internet Guide [12], users can access material through a variety of subject-specific gateways, such as ADAM [13] , EEVL [14], OMNI [15] and SOSIG [16], which will direct them to selected and evaluated resources in their interest area. All of these have disadvantages - ranging from the massive results of Alta Vista and the uneven reviews of the services reviewed in the Anagnostelis, Cooke and McNab article [1] to the subject gateways which either simply describes the resource or offer an arbitrary scoring mechanism. No standard quality vocabulary has been developed and users are invariably unable to judge the strengths and weaknesses of sites.

One idea is being explored by the IEEE Computer Society Standards Activities Board. The project currently under discussion is a proposal to use PICS specifications to indicate peer endorsement of articles [17]. Their view, as a professional body actively pursuing electronic publication, is that PICS could ensure that members and/or consumers know what materials have been peer reviewed and so endorsed. CIQM believes that the same mechanism could be used for a more extensive labelling of resources and offer users a quality assurance mechanism covering a range of criteria.

The advantages of a PICS-based system come with the standardised vocabulary and scales which could be imposed on quality judgements. Users would no longer have to interpret the meaning behind a site designated as “cool” or guess how current they could expect a three-star site to be; the quality vocabulary would include scales for these and other quality criteria such as those highlighted by CIQM’s earlier work and originally itemised by the Southern California Online User Group [18] .

A considerable amount of work has already been undertaken by both CIQM and others on quality evaluation and resource criteria. In developing any set of PICS label values this should, of course, be taken into account. Two of the most extensive studies are DESIRE [19] and Alison Cooke’s Doctoral Research project at the University of Wales Aberystwyth which has resulted in sets of evaluation criteria for various Internet resource types [3], while developments in the medical community, such as the development of codes of conduct for medical Web sites by the Health on the Net Foundation [20] and the British Medical Internet Association [21] are also relevant and should be taken into account. Additional references in this area are given below [22] [23] [24] [25][26].

The existing CIQM Database Labels [4] contain a mixture of quantitative and qualitative information: for example, not only do they detail the number of records on the database and the percentages of records from different geographical regions, but they contain over twenty quality assurance statements such as, “All text fields are spell checked”, “All authors for every article are indexed” or “There is no duplicated information in this database”. The PICS labels could contain a similar mix of factual (for example: author/ownership, type of corporate source, length, subject coverage, geographical coverage/relevance) and qualitative (spell check indicator, accuracy measurement, indication of peer-reviewing, timeliness, etc).

Even within these few examples, there almost exists sufficient data to enable an assessment of the resource: the name of the author coupled with his professional standing and the knowledge that he writes from a university or research institute (as opposed to, for example, from home); the length of the article and its topicality; the fact that care has been exercised in its production (spell checked, facts confirmed, citations validated, etc) and the knowledge that it is current, peer-reviewed and regularly updated all serve to attest to the value of the site.

As a part of DESIRE mentioned above [19], the SOSIG project has produced a detailed list of quality selection criteria for subject gateways [27]. This cataloguing tool is designed to be used by subject gateways “to define or refine their quality selection criteria”. There are five sections: Scope Policy, Content Criteria, Form Criteria, Process Criteria and Collection Management Policy. Each of these areas is covered in some detail. Content Criteria, for example, contains sections on validity, authority and reputation, substantiveness, accuracy, comprehensiveness, uniqueness, composition and organisation, and currency and adequacy of maintenance. Under each of these headings are a series of criteria couched as questions with a series of hints and checks that can be used to discern whether a resource meets a particular criterion. Under Validity, the criteria in question are given as:

How valid is the content of the information?
Does the information appear to be well researched?
What data sources have been used?
Do the resources fulfil the stated purpose?
Has the format been derived from another format?
Does the information claim to be unbiased (when in fact it is biased)?
Is the information what it appears to be?
Why is the information there?/What was the motivation of the information provider when they made the information available?/Do they have an ulterior motive?
Does the resource point to other sources which could be contacted for confirmation?
Is the content of the resource verifiable - can you cross check information?

In terms of a PICS Quality Label, these could be translated into scales such as:

Few references/ –> /Many references
Data sources are poor/ –> /very good
Bibliography/No bibliography
Scope statement/No scope statement
Scope statement supported/ –> /not supported by content
Copy of data available elsewhere; scale: on paper/on CD-ROM/electronically/not a copy
Information has a geographical/political/other bias
Site is provided by personal/business/publisher/academia/research institution/ other
Information is incomplete/adequate/complete
Vanity publishing/ –> /refereed article
Author e-mail contact/postal contact/no contact information

Thus, for example, users could select sites by subject terms in the normal way and filter out those which do not contain at least an adequate number of references; that are available on another publishing medium; that are provided by an institution that is ‘lower’ down the scale than “publisher” (that is, personal or business); that are not refereed; and that have less than adequate data sources. A search on AltaVista, for example, would still result in the same number of returns that would have been supplied before labelling but users of the PICS Quality Label will find that many of them are blocked from their workstation because they do not meet the standard which they themselves have set.

For the PICS Quality Ratings to work effectively, the general PICS system must be adopted by common browsers so that users can actually retrieve data according to quality criteria such as geographical relevance, authority and timeliness in addition to the search terms employed to discover the resource in the first place simply by setting their local software appropriately. A quality filter effectively limits what users see to only the best resources; the filter being set in advance by information specialists or varied on a search-by-search basis by users able to judge and use information quality criteria for themselves. Adoption of PICS-compliant searching by the regular Internet browsers seems likely to take place - PICS is already supported, for example, by Microsoft’s Internet Explorer.

Standardising search engines by the PICS quality vocabulary

In addition to ad hoc use, a PICS quality vocabulary could be adopted by subject gateways as a standard means of evaluating the sites they include. Such an access mechanism would be immediately useful to users, providing a meaningful comparative evaluation of the resources to which they point. If taken up by more than one subject gateway it would allow sensible comparison of, for example, an OMNI site with an EEVL site and, additionally, would allow easy and quick transfer of records between gateways.

If search engines themselves were to incorporate the mechanism, users would end up with an extremely powerful access tool. In this case the quality limiting could be undertaken by the search engine, obviating the need for a second-stage local processing/filtering. The search interface could incorporate a few buttons or scales covering a subset of the PICS quality criteria and users would alter their defaults at the same time that they entered the search terms. If left alone the quality filter could be set to function at median levels or to remain inoperative.

PICS metadata would be beneficial to everyone - users and providers alike. Users retrieve data that can be trusted to a known degree while the providers of the enhanced resources would gain a more accurate assessment of site use as their counters will tend to reflect the number of actual users rather than simply the number of passing visitors. For sites that rely on advertising revenue, this has to be an added strength.

References

Anagnostelis, B.; Cooke, A. and McNab, A., “Never mind the quality, check the badge-width!”, Ariadne Issue 9,
http://www.ariadne.ac.uk/issue9/quality-ratings/
Armstrong, C. J. Quality on the Internet, db-Qual, Vol 2 Issue 1, January 1997,
http://www.fdgroup.co.uk/dbq_3_4.htm
CIQM Database Labels Cooke, A. Finding Quality on the Internet: a guide for librarians and information professionals. London: Library Association Publishing, in preparation,
CIQM Database Labels,
http://www.fdgroup.co.uk/ciqm.htm
The Dublin Core Metadata Element Set Home Page,
http://purl.org/metadata/dublin_core/
Nordic Metadata Project,
http://linnea.helsinki.fi/meta/index.html
Resnick, P. and Miller, J. PICS: Internet Access Controls without Censorship. Communications of the ACM, 1996,
http://www.w3.org/pub/WWW/PICS/iacwcv2.htm
Miller, J.; Resnick, P. and Singer, D. Platform for Internet Content Selection Version 1.1: Rating Services and Rating Systems (and their Machine Readable Descriptions), May 1996,
http://www.w3.org/pub/WWW/PICS/services.html
Krauskopf, T.; Miller, J.; Resnick, P. and Treese, W. Platform for Internet Content Selection Version 1.1: PICS Label Distribution - Label Syntax and Communication Protocols
http://www.w3.org/pub/WWW/PICS/labels.htm
Resnick, P. Filtering Information on the Internet. Scientific American, March 1997,
http://www.sciam.com/0397issue/0397resnick.html
Excite Web Reviews,
http://www.excite.com/Reviews/
Magellan Internet Guide,
http://www.mckinley.com/
ADAM: Art, Design, Architecture and Media Information Gateway,
http://adam.ac.uk/
Edinburgh Engineering Virtual Library (EEVL),
http://www.eevl.ac.uk/
OMNI (Organising Medical Networked Information),
http://www.omni.ac.uk/
SOSIG (Social Science Gateway),
http://www.sosig.ac.uk/
PICS Peer Review
http://www.computer.org/standard/Internet/peer.htm
Southern California Online User Group Quality Criteria,
http://bubl.ac.uk/archive/lis/org/ciqm/databa1.txt
Day, M. et al. Selection Criteria for Quality Controlled Information Gateways: Report for DESIRE (Development of a European Service for Information on Research and Education),
http://www.ukoln.ac.uk/metadata/DESIRE/quality/report.html
Health on the Net Foundation,
http://www.hon.ch/
British Medical Internet Association,
http://www.healthcentre.org.uk/bmia/index.html
Widener University. Web Sites Evaluation Checklists. (Includes checklists for Advocacy, Business, Informational, News and Personal web sites)
http://www.science.widener.edu/~withers/webeval.htm
Grassian, E. Thinking Critically about World Wide Web Resources
http://www.ucla.edu/campus/computing/bruinonline/trainers/critical.html
Bartelstein, A. Teaching Students to Think Critically about Internet Resources,
http://weber.u.washington.edu/~libr560/NETEVAL/index.html
Ciolek, M. Information Systems Quality and Standards,
http://coombs.anu.edu.au/SpecialProj/QLTY/QltyHome.html
Tillman, H. Evaluating Quality on the Net,
http://www.tiac.net/users/hope/findqual.html
Quality Selection Criteria for Subject Gateways,
http://sosig.ac.uk/desire/qindex.html

Author Details

C. J. Armstrong,
Centre for Information Quality Management
Email: lisqual@cix.compulink.co.uk
Tel: 01974 251441