SIGIR '97

david nichols

SIGIR '97

David Nichols reports on the follow-on conference SIGIR '97.

SIGIR is a well established technical conference and a little daunting for those hangers-on from Digital Libraries ‘97 who did not have a background in information retrieval. It was good, therefore, that the opening Salton Award lecture by Tefko Saracevic of Rutgers University made us feel at home with a talk entitled Users Lost. He described the history of the field and, what he felt, was a split in the early 80s between the technical algorithm-based side and the user-oriented side. (He also felt the funding had overwhelmingly gone to the technical side.) He criticized the large number of papers that included the phrase ‘this has implications for system design’ but never actually spelt out what it was or went on to design systems that took note of the implications.

He estimated about 3 or 4 of the papers at both Digital Libraries and SIGIR had reports of actual users doing a search, or similar activity. Although Saracevic criticized ‘Intelligent Agents’ for having ‘as much intelligence as my shoelace’ he did say that Information Retrieval should ‘plead guilty’ to losing sight of the users.

A second plenary address came from George Miller of Princeton describing the use of WordNet [4] - a lexical reference system - to aid information retrieval systems.He reported some success using synonyms that had one meaning to help clarify terms that had several possible meanings. After these two presentations the bulk of the conference was what I had suspected - ‘hard’ computer science that is probably of limited interest to most readers of Ariadne (with titles like ‘Almost-constant-time clustering of arbitrary corpus subsets’). However, there were some relevant papers.

Raya Fidel reported on users’ perception of a profile-based filtering system for news articles delivered by email. Interestingly, users thought that they weren’t missing many relevant items - that the articles they were receiving contained all of the things they were interested in. In fact, this was not the case at all. It appeared that because the sets of filtered articles they were receiving contained many non-relevant articles (the precision was low), they thought they were looking at all the relevant ones! In addition Fidel reported that several of the expressed criteria were related to the form (e.g. was it a case study) of articles rather than the textual content. This suggests that filtering on appropriate metadata can be a valuable approach when constructing profiles.

The UK representation at the conference was dominated by the Glasgow Information Retrieval Group [5] including Mark Dunlop’s presentation on an alternative evaluation approach for information retrieval systems. He proposed an expected search duration measure, as opposed to the prevalent recall and precision graphs, that takes account of interface effects along with the underlying algorithms. This seemed to be in accord with other opinions about the value of traditional evaluation measures - several people commented that precision (getting relevant results) was more important than recall (getting all the relevant results) in many real-world applications. In addition there seemed to be a lack of papers about interfaces in general, when it seems to some non-IR people that interface effects could easily swamp some of the reported effects of improved algorithms.

One exception to the dearth of interfaces was a description of the Cat-a-Cone system [6] by Marti Hearst (formerly of Xerox PARC and now at UC Berkeley). The Cat-a-Cone is a 3D browsing tool for very large category hierarchies such as Medical Subject Headings (MeSH). It builds on previous work on the Cone Tree [7] visualization interface and integrates viewing the hierarchy and the results. There have been no user studies reported to date with Cat-a-Cone and it will be interesting to see whether real users can cope with such a complex 3D interface.

The most lively session at SIGIR was a panel on ‘Real Life Information Retrieval: Commercial Search Engines’. Doug Cutting from Excite [8] produced some interesting statistics on their Web search engine: they have

100s queries / second
$0.02 revenue / search
an estimated $0.0001 / query for hardware costs
1996 average query length 1.5 words
1997 average query length 2.2 words - Cutting suggested that this may mean the users are learning that longer queries give better results
1997 average query length in Europe is 1.5 words - maybe we’re not learning fast enough!
use of boolean operators is low at about 10%

Cutting’s morals for Web search developers were to ‘keep the interface very simple’ and ‘optimize the default experience.’ Jan Pedersen from Verity [9] posed 2 questions to the SIGIR community: ‘how do you rank documents with 1 or 2 words queries?’ and ‘where are all the papers on the Web?’ Karen Sparck-Jones raised the point that it is hard to get practical evaluation methodologies on the Web. Terry Noreault from OCLC [10] predicted that cross-database searching was the next big trend and that in general the SIGIR papers didn’t inform him much at all. Matt Koll (formerly from PLS [11]) claimed that commercial systems had made significant progress in many areas (including scaleability, summarization and federated searches) without the aid of SIGIR people. He also criticized the ‘effect size’ the community was discussing and accused them of overanalysing the same data. Needless to say the question session was quite interesting, the response led by Bruce Croft who detailed the successes of Information Retrieval - including the dominance of a ranking approach to result presentation.

After so many sessions in the same room the conference finally finished and those of us who’d stayed for both Digital Libraries and SIGIR were thoroughly conferenced-out. SIGIR was much less relevant than Digital Libraries but it is worth watching the proceedings as a source of interesting research. Next year SIGIR is in Melbourne Australia [12].

The author is very grateful to the British Library Research and Innovation Centre for financial support which allowed him to attend Digital Libraries ‘97 (and then stay on for SIGIR).

References

[1] SIGIR ‘97
http://www.acm.org/sigir/conferences/sigir97/

[2] Doubletree Hotel, Philadelphia:
http://www.doubletreehotels.com/DoubleT/Hotel61/79/79Main.htm

[3] The home page of Digital Libraries ‘97:
http://www.lis.pitt.edu/~diglib97/

[4] WordNet - a Lexical Database for English:
http://www.cogsci.princeton.edu/~wn/

[5] The Information Retrieval Group of the Department of Computing Science at the University of Glasgow:
http://www.dcs.gla.ac.uk/ir/

[6] The Cat-a-Cone system:
http://www.sims.berkeley.edu/~hearst/cac-overview.html

[7] For an example of a Cone Tree:
http://www.cgl.uwaterloo.ca/~j2carrie/cone_tree.html

[8] Excite:
http://www.excite.com/

[9] Verity:
http://www.verity.com/

[10] OCLC:
http://www.oclc.org/

[11] PLS:
http://www.pls.com/

[12] SIGIR98:
http://www.cs.mu.oz.au/sigir98/

Author details

David Nichols,
Research Associate,
Cooperative Systems Engineering Group,
Computing Department,
Lancaster University,
Lancaster LA1 4YR
Email: dmn@comp.lancs.ac.uk