Web Magazine for Information Professionals

Book Review: Annual Review of Information Science and Technology, 2004 (Volume 38)

Michael Day reviews a recent volume of this key annual publication on information science and technology.

The Annual Review of Information Science and Technology (ARIST) is an important annual publication containing review articles on many topics of relevance to library and information science, published on behalf of the American Society for Information Science and Technology (ASIST). Since volume 36, the editor of ARIST has been Professor Blaise Cronin of Indiana University, Bloomington.

The twelve chapters in volume 38 are divided into three sections, dealing with theory, technology, and policy.

Theory

The opening chapter in the theory section is an investigation into the relevance of science and technology studies (STS) to information studies by Nancy Van House of the University of California, Berkeley. STS is a very interdisciplinary field, using insights from disciplines as diverse as sociology, history, philosophy, anthropology and cultural studies to explore the social contexts of science and technology. While some of the more nebulous consequences of science studies have been explored in debates like the Social Text affair [1], Van House contents herself with an introductory review of key developments in STS and workplace studies followed by an analysis of the influence - both actual and potential - of such approaches on information studies. For example, the chapter discusses some of the ways in which STS concepts and methodologies are being used to help understand the complex 'ecologies' of scholarly communication [2] or the hidden ethical and moral consequences of knowledge representation technologies like classification [3].

The second chapter, by Yvonne Rogers of the University of Sussex, is an introduction to new theoretical approaches being developed for human-computer interaction (HCI). On one level, this means that the field is very interesting, but Rogers (p. 88) warns that what "was originally a bounded problem space with a clear focus and a small set of methods for designing computer systems that were easier and more efficient to use by a single user is now turning into a diffuse problem space with less clarity in terms of its objects of study." The chapter first reviews early theoretical developments in applying cognitive theory to HCI, before explaining why in the 1980s researchers began to adopt methodologies developed in other disciplines, including ecological psychology, Activity Theory and cultural anthropology (including ethnography). Rogers own studies have suggested that while the designers and implementers of systems are familiar with (at least) some of these theoretical approaches, they do not always know how to use them in practice. The chapter ends with some suggestions as to how theory can best be used in both research and interface design.

The final chapter in the theory section deals with the relatively new topic of virtual communities and was written by David Ellis, University of Wales, Aberystwyth, Rachel Oldridge, University of Hertfordshire and Ana Vasconcelos, Sheffield Hallam University. The chapter first explores the origins of online communities and how they are perceived to interact with other forms of social interaction. Sections follow this on communities of practice, virtual arenas and networked virtual communities, the last popular in research contexts (e.g. for sharing papers and datasets) and in Higher Education. The authors conclude (p. 175) that virtual communities provide researchers with the opportunity "to study the behavior, or perceptions, of dispersed communities in real time, as well as over time."

Technology

The section on technology begins with a chapter on latent semantic analysis (LSA) by Susan Dumais of Microsoft Research. LSA is a technique for improving information retrieval by using statistical techniques to analyse large collections of texts in order to induce knowledge about the meaning of documents (p. 191). Dumais notes that LSA is purely statistical, and does not use natural language processing techniques or human-generated knowledge representation systems. The chapter provides a short mathematical overview of LSA and a summary of its use in information retrieval, information filtering, cross-language retrieval and in other contexts, e.g. text classification and the link analysis performed by algorithms like PageRank [4] and the hyperlink-induced topic search (HITS) [5]. A further section elaborates the use of LSA in the cognitive sciences to model aspects of human memory.

Chapter five is by Judit Bar-Ilan of the Hebrew University of Jerusalem and concerns the use of Web search engines in information science research. After the usual sections discussing definitions, Bar-Ilan goes on to discuss two main categories of search engine research: firstly the study of search engines as objects of investigation in themselves, secondly the use of search engines as a means of collecting data for information science research [6]. The topics covered in more detail include the social study of search engine use through log analysis and the social and political aspects of Web searching. A section on information-theoretic work covers research relating to the structure of the Web and link analysis, looking in detail at the latter's influence on search engine ranking algorithms (e.g. Google's PageRank) and in developing evaluation metrics like Web Impact Factors [7]. A final section looks at applications, including problems with the evaluation of search engine performance and effectiveness. In her introduction (p. 238), Bar-Ilan expresses the concern that the fast moving nature of Web technologies means that the chapter may be out of date by the time of publication. While it is clear that later volumes of ARIST will need to return to this topic, Bar-Ilan's chapter provides a good summary of the state-of-the-art in early 2003.

Hsinchun Chen and Michael Chau of the University of Arizona then tackle a slightly different aspect of information science research on the Web, i.e. that of mining Web content and related data to create new information or knowledge. Introductory sections introduce various aspects of machine learning research and their application in pre-Web contexts, e.g. for named-entity extraction or the provision of relevance feedback in information retrieval contexts. The remaining sections look at Web mining in more detail, focusing in turn on the mining of Web content, structure and usage.

The next chapter, by Peter Bath of the University of Sheffield, looks in more detail at the use of data mining techniques for health and medical information. After a section discussing different definitions, Bath goes on to explore the potential for using data mining techniques in health and medicine. This includes a brief review of the different types of data being generated and integrated in data warehouses or clinical data repositories, e.g. medical records, laboratory test reports and medical images. The remainder of the chapter looks at data mining techniques that have already been applied to medical/health data - e.g. their application for the diagnosis and prognosis of disease - and a review of challenges, including the quality of data, the validity and usability of data mining methods and tools, and user acceptance.

The final chapter in the technology section focuses on the indexing, browsing, and searching of digital video. In this chapter, Alan Smeaton of Dublin City University first provides a succinct introduction to the coding of digital video - emphasising the importance of compression - and the family of standards developed under the auspices of MPEG (Moving Picture Experts Group). The next sections introduce various techniques for providing access to digital video. This first describes conventional approaches based on the manual annotation of video segments, which Smeaton concludes is sufficient for some applications but "expensive, not scalable, and not always very effective" (p. 386). The chapter then goes on to explore the potential for automatically extracting some information from video content, including the division of video clips into shots (shot boundary detection) and the identification of basic features (indexing primitives) - and how such structured video can be used to support searching, browsing and summarisation. A section on the evaluation of the effectiveness of video information retrieval refers to experiments being undertaken by the TREC (Text REtrieval Conference) video track, which has since 2003 become an independent evaluation activity known as TRECVID [8]. A final section on trends speculates on the potential influence of new technological developments, notes the importance of user issues, and predicts the increased take-up of MPEG-7, a format for the description of video features.

Policy

Chapter nine is a review of the roles that information and technologies (ICT) play in political life, written by Alice Robbin, Christina Courtright and Leah Davis of Indiana University, Bloomington. The chapter looks at both theoretical and practical aspects of the relationships between governments and citizens, focusing on three main categories: e-government, e-governance and e-democracy. They conclude that it is too early to know whether ICTs will have a significant effect on political life and politics, suggesting that "we are witnessing small and incremental behavioral and structural changes" (pp. 463-464). They recommend that future research into these topics should rigorously examine normative claims and empirical evidence.

The following chapter is a comprehensive introduction to legal aspects of the World Wide Web by Alexandre López Borrull of the Universitat Autònoma de Barcelona and Charles Oppenheim of Loughborough University. The chapter first deals with copyright issues, noting recent changes in the law that have tilted the balance of rights away from users in favour of owners. This section also deals with copyright in libraries, music sharing Web sites like Napster, and digital rights management technologies. Rather depressingly, López Borrull and Oppenheim conclude that "problems associated with copyright on the Internet are likely to increase rather than decrease in the future" (p. 498). The next sections deal with legal issues that are more specific to the Internet, e.g. domain name disputes, litigation over deep linking, framing, caching and 'spamdexing.' After a short section on software patents, the authors move on to a wide-ranging discussion of issues relating to pornography and censorship, defamation and the legal liabilities of employers and ISPs (Internet Service Providers). Final sections deal with topics as varied as legal jurisdiction (differences in law between countries), the legal deposit of networked publications, and unsolicited e-mail. López Borrull and Oppenheim recommend that information professionals "should maintain a watching brief on legal developments in their countries and, where necessary, take appropriate legal advice or consult their professional associations" (p. 529). While most of the chapters in ARIST 38 would be of most interest to researchers, this chapter can be firmly recommended to practitioners who need an overview of the legal aspects of the Web.

Chapter eleven returns to the subject of the preservation of digital objects, previously discussed in a 2001 ARIST chapter by Elizabeth Yakel [9]. In her introduction, Patricia Galloway of the University of Texas-Austin provides some practical examples of digital preservation, e.g. the development of standard formats for social science data, the use of XML as a preservation strategy for the preservation of digital text files by the Public Record Office of Victoria, Australia. After some preliminary definitions, Galloway then briefly introduces some of the main stakeholders in digital preservation, including digital libraries, archives and museums, computer scientists, the software industry, commercial content providers, the creators and users of digital content. The chapter's overview of research since 2000 focuses mainly on recordkeeping developments, like the Model Requirements for the Management of Electronic Records (MoReq) [10] and the InterPARES Project [11]. The next section deals with 'genre-specific' preservation problems, focusing on e-mail, office files, Web pages, databases and audiovisual objects. A short section introducing the Reference Model for an Open Archival Information System (OAIS) is followed by further analysis of preservation methods and metadata standards. The final section outlines some possible future directions for digital preservation research and practice. For those new to this topic, the chapter would be usefully supplemented by a reading of the earlier article by Yakel and another recent review article produced by Helen Tibbo [12].

The final chapter is a review of the Internet and unrefereed scholarly publishing by the late Rob Kling of Indiana University, Bloomington. This starts with a brief look at recent developments in scholarly communication and publication, focusing on the growth of communication existing outside the traditional peer-reviewed journal system. In his definitions, Kling prefers to use the term 'manuscript' - rather than the more conventional 'preprint' - to refer to the documents that authors circulate before their acceptance for publication. Further sections look at the development of 'guild publishing' - the publication of manuscript series by institutions, e.g. series of working papers or technical reports - and the use of disciplinary repositories - (like arXiv.org) - for the sharing of unpublished manuscripts.

Conclusions

ARIST 38 is a worthy addition to its predecessor volumes. The nature of the publication precludes any generic conclusions. However, the sections on theory and technology attest to the wide range of analytical techniques that are now being used in information science. These techniques originate in many other subject disciplines, most notably in ethnography, anthropology, sociology and psychology. Secondly, the chapters on data mining suggest that the use of statistical techniques in some areas of information science and technology is becoming more widespread.

This volume - as ever - will not often be read from cover-to-cover. The strength of ARIST is that chapters - together with corresponding ones in predecessor volumes - provide a good way for readers to investigate a new topic as well as a means of them keeping up-to-date with an existing one. It is perhaps worth noting that the volume as a whole contains almost 2,000 bibliographical references, reflecting its importance in directing readers to work published elsewhere.

While in the past, ARIST has (on occasion) been criticised for its bias towards US research, it is perhaps worth noting that six out of the twelve chapters in volume 38 were produced by authors with an institutional affiliation outside that country. Of the nine authors who produced these chapters, six were based in the United Kingdom, the remainder in Ireland, Israel and Spain. It may be even more useful if chapters in future volumes could begin to include research that is not published in the English language.

The consistency of recent ARIST volumes suggests that it remains in very good editorial hands, despite the practical problems elaborated in the editor's introduction. Cronin and his associate editor Debora Shaw are to be congratulated on the production of another excellent volume of ARIST.

References

  1. Sokal, A., Bricmont, J., Fashionable nonsense: postmodern intellectuals' abuse of science. New York: Picador, 1998.
  2. Kling, R., McKim, G., King, A., A bit more to IT: scholarly communications forums as socio-technological interaction networks. Journal of the American Society for Information Science and Technology, 54, 2003, 47-67.
  3. Bowker, G. C., Star, S. L., Sorting things out: classification and its consequences. Cambridge, Ma.: MIT Press, 1999.
  4. Brin, S., Page, L., The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7), 1998, 107-117. Also available at: http://dbpubs.stanford.edu:8090/pub/1998-8
  5. Kleinberg, J. M., Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 1999, 604-632.
  6. Bar-Ilan, J., Data collection methods on the Web for informetric purposes: a review and analysis. Scientometrics, 50(1), 2000, 7-32.
  7. Ingwersen, P., The calculation of Web impact factors. Journal of Documentation, 54(2), 1998, 236-243.
  8. TREC Video Retrieval Evaluation: http://www.itl.nist.gov/iaui/894.02/projects/trecvid/
  9. Yakel, E., Digital preservation. Annual Review of Information Science and Technology, 35, 2001, 337-378.
  10. MoReq: http://www.cornwell.co.uk/moreq.html
  11. InterPARES: http://www.interpares.org/
  12. Tibbo, H., On the nature and importance of archiving in the digital age. Advances in Computers, 57, 2003, 1-67.

Author Details

Michael Day
UKOLN

Email: m.day@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/

Return to top