The Joint Information Systems Committee (JISC) Information Environment (IE, a development from the DNER - Distributed National Electronic Resource) is intended to help users in the UK academic sector maximise the value of published information resources by developing a coherent environment out of the confusing array of systems and services currently available.
The EDNER Project (Formative Evaluation of the DNER,< http://www.cerlim.ac.uk/edner>) is funded to undertake ongoing evaluation of the developing IE over the full three years of the JISC 5/99 Learning & Teaching and Infrastructure Programme period i.e. from 2000 to 2003. The EDNER Project is led by the Centre for Research in Library & Information Management (CERLIM) at the Manchester Metropolitan University; the Centre for Studies in Advanced Learning Technology (CSALT) at Lancaster University is a partner. This paper reports on work in progress and initial findings of the evaluation team.
There is a considerable body of research on user behaviour in respect of information retrieval (IR) systems, although research on retrieval from the World Wide Web is not as advanced. However, surveys of web usage give some sense of what the average web searcher is doing and point to differences between web searches and queries with traditional IR systems. Observations of the average web searcher (Spink et al, 1998; Ellis et al, 1998) point out that ineffective use may be caused by lack of understanding of how a search engine interprets a query. Few users are aware of when a search service defaults to AND or OR and expect a search engine to automatically discriminate between single terms and phrases. Also, devices such as relevance feedback work well if the user ranks ten or more items, when in reality users will only rank one or two items for feedback (Croft, 1995). Koll (1993) found that users provide few clues as to what they want, approaching a search with an attitude of 'I'll know it when I see it', which creates difficulties in formulation of a query statement.
Larsen (1997) is of the opinion that Internet search systems will evolve to meet the behaviour of the average web searcher. Thus it can be seen that there has been a shift towards the introduction of search features that appear to respond to the ways in which users actually search these systems, e.g. search assistance, query formulation, query modification and navigation. The notion that improved interaction may be key in improving results is attractive in principle but not necessarily true in reality. Nick Lethaby (Verity Inc) paraphrased in Andrews (1996) states "users don't want to interact with a search engine much beyond keying in a few words and letting it set out results". This can also be seen from the Devise results (Johnson et al, 2001) where the Interaction dimension had the weakest correlation with users' overall rating of satisfaction (where Efficiency had the strongest correlation, followed by Effectiveness, Utility and then Interaction). It can thus be assumed that most users do not use advanced search features, or enter complex queries, or want to interact with search systems. As a consequence, systems such as search engines are now trying to automate query formulation, shifting the burden of formulating precise or extensive terminology from the user to the system.
Spink et al (1998) conducted a study in which 357 Excite users responded to an interactive survey in which they were asked about their search topics, intended query terms, search frequency for information on their topic and demographic data. Search topics were spread across 16 topic categories. Most respondents searched on a single topic as determined by their query terms and search topic statements. Search terms are those that the participants intended to use, rather than what was actually used. The mean number of terms was low at 3.34. Many of the terms were clearly meant as a phrase but there was no indication that quotation marks were used. Excite requires quotation marks to indicate a phrase search, otherwise they are linked by the Boolean operator OR. Few queries included explicit Boolean or other operators.
Jansen et al (2000) analysed transaction logs containing 51,473 queries posed by 18,113 users of Excite and from this argued that "while Internet search engines are based on IR principles, Internet searching is very different from IR searching as traditionally practised and researched in online databases, CD-ROM and OPACs" (p.208). They found that web users are not comfortable with Boolean and other advanced means of searching, and do not frequently browse the results beyond the first page. Other studies also show that most web searchers do not view more that the first 10 results (Hoelscher, 1998; Silverstein et al, 1999). In addition, Jansen et al. found that the mean number of queries per user was 2.8 with a number (not specified) of users going on to modify their original query and view subsequent results. The actual queries themselves were short in comparison to searches on regular IR systems, on average a query containing only 2.21 terms.
Further to this Jansen (2000) ran analyses that compared query results with use of advanced techniques on the one hand to results without on the other, and found that on average only 2.7 new results were retrieved. From this he posits, "use of complex queries is not worth the trouble. Based on their conduct, it appears that most web searchers do not think it worth the trouble either." He also points out that the behaviour of web searchers follows the principle of least effort (Zipf, 1949). This has also been recorded by Marchionini (1992) who stated, "humans will seek the path of least cognitive resistance" (p.156) and Griffiths (1996) "increasing the cognitive burden placed on the user … can affect successful retrieval of information. Where an application required fewer actions from the user, greater success was achieved as there was less possibility for a user to make an error" (p.203).
A number of studies have been conducted into use of electronic resources by students. From their research Cmor and Lippold (2001) put forward a number of observations from their experiences of student searching behaviour on the web. These findings can be summarised as: 1) students use the web for everything; 2) they will spend hours searching or just a few minutes; 3) searching skills vary and students will often assess themselves as being more skilled than they actually are and, 4) they will give discussion list comments the same academic weight as peer reviewed journal articles.
Navarro-Prieto et al. (1999) sought to develop an empirically based model of web searching in which 23 students were recruited from the School of Cognitive and Computer Science, University of Sussex. Ten of these participants were Computer Science students and thirteen were Psychology students. Their findings highlight a number of interesting points: 1) whilst the Computer Science students are more likely to be able to describe how search engines develop their databases neither of the two groups have a clear idea of how search engines use the queries to search for information; 2) most participants considered their levels of satisfaction with the results of their search to be 'good' or 'OK' and, 3) most participants cannot remember their searches, and tended to forget those search engines and queries that did not give any successful results.
From their research Navarro-Prieto et al. were able to identify three different general patterns of searching, thus: 1) top-down strategy, where participants searched in a general area and then narrowed down their search from the links provided until they found what they were looking for; 2) bottom-up strategy, where participants looked for a specific keyword provided in their instructions and then scrolled through the results until they found the desired information. This strategy was most often used by experienced searchers and, 3) mixed strategies, where participants used both of the above in parallel. This strategy was only used by experienced participants. Twidale et al (1995) conducted a study that considered the role of collaborative learning during information searching which informed the development of Ariadne. Quoting relevant literature they identified common problems as: retrieving zero hits, retrieving hundreds of hits, frequent errors, little strategy variation and locating few of the relevant records. The only specific searching issue addressed was that of 'errors made in searching', which described how simple typing errors in a sound strategy led to few hits and subsequently led to the strategy being abandoned. More general observations revealed a number of collaborative interactions between students which were noted, thus: 1) students will often work in groups (2-4) around a single terminal, discussing ideas and planning their next actions; 2) groups working on adjacent terminals, discussing what they are doing, comparing results and sometimes seeming to compete to find the information; 3) individuals working on adjacent terminals, occasionally leaning over to ask their neighbour for help and, 4) individuals working at separate terminals monitoring the activity of others.
The JISC Circular 1/99: Monitoring and Evaluating User Behaviour in Information Seeking and Use of Information Technology and Information Services in UK HE sought to develop a Framework which would complement the work already undertaken by JISC through its Technical Advisory Unit (TAU) and Monitoring and Advisory Unit (MAU). The Framework specifically focuses on the development of a longitudinal profile of the use of electronic information services (EIS) and the development of an understanding of the "triggers and barriers that affect such use" (Rowley 2001, p.A2). The JUSTEIS project (JISC Usage Survey Trends: Trends in Electronic Information Service) was contracted to undertake Strand A and C of the Framework and the JUBILEE project (JISC User Behaviour in Information Seeking: Longitudinal Evaluation of EIS) Strand D.
Strand A of the JUSTEIS project is an annual survey which "seeks to measure and evaluate the overall awareness, uptake, usage and usefulness of information technologies and information services" in HE in the UK (Rowley 2001, p.A2). This survey was conducted by telephone interview, email and paper-based questionnaire. Strand C is a general survey of EIS provision which aims to develop profiles of current and planned service provision. Data was gathered via a Web survey of resources access provided by individual HEIs supplemented by telephone interviews with senior LIS managers. The JUBILEE project, which is undertaking Strand D, focuses on qualitative longitudinal monitoring of the information behaviour, needs and opportunities for both specific academics and student communities and academics and students in general. Questionnaires, interviews, focus groups, electronic communication and feedback on case study reports formed the basis on which this survey was conducted.
In summary the work of JUBILEE and JUSTEIS found that:
This work is continuing into a third cycle which will focus on a broad based survey profiling user behaviour and factors influencing it (Strand A); a survey of purchase intentions supplemented by a focussed survey of web provision (Strand C); a discipline based programme of qualitative monitoring of the development of EIS in HE (Strand D) and, synthesis in which integration of all strands will occur (Strand E).
Whilst there are some similarities in the aims of EDNER and JUSTEIS and JUBILEE there are significant differences. EDNER is particularly interested in whether and how JISC projects aim to influence practice and the extent to which they succeed in so doing. It is perhaps not surprising that many of the findings from EDNER are different to those of JUSTEIS and JUBILEE, due to the different aims, foci and approaches taken.
The following sections discuss a brief outline of the methodological approach adopted by the EDNER user study and presents some of the major findings.
The aim of the EDNER study reported here was to develop understanding of users' searching behaviour in the IE by asking them to assess the quality of DNER services according to a range of defined criteria (Quality Attributes, see section 3.1). This was achieved by firstly establishing a quality attributes methodology, with appropriate revisions and adaptations for its use in this context. This approach is based on the classic definitions of 'quality' such as 'fitness for (the user's) purpose' or 'conformance to (the user's) requirements' (Brophy and Coulling, 1996) but seeks to probe beneath the surface to explore the different dimensions of quality (see Section 3.1).
Test searches were then designed (one for each of the services to be used by the participants, fifteen in total). These searches were designed so that they would be of sufficient complexity to challenge the user without being impossible for them to answer. Participants were recruited via Manchester Metropolitan University's Student Union Job Shop and twenty-seven students from a wide course range participated. Each student was paid for his or her participation. One third of the sample consisted of students from the Department of Information and Communications and were studying for an Information and Library management degree, while the remaining two thirds of the sample were studying a wide variety of subjects and all were at various stages of their course. No restrictions were placed on them having computer, searching or Internet experience. Testing was conducted in a controlled environment based within the Department of Information and Communications. Each participant searched for the fifteen test queries and completed questionnaires for each task undertaken. Data gathered via the questionnaires was analysed in two ways, 1) quantitative data was analysed using (SPSS Statistical Package for the Social Sciences), and 2) open response question data was analysed using qualitative techniques.
It should be stressed that this study focussed entirely on user-centred evaluation. EDNER is also concerned with expert evaluation, but this aspect of the work will be reported elsewhere.
The use of Garvin's Quality Attributes has been applied to information services by Brophy (1998). Garvin (1987) identified eight attributes that can be used to evaluate the quality of services, and with some changes of emphasis, one significant change of concept and the introduction of two additional attributes (Currency and Usability) they apply well to ILS. They are:
Performance is concerned with establishing confirmation that a service meets its most basic requirement. These are the primary operating features of the product or service. For example, a library which claimed to offer a 'quality' service would be expected to provide some minimum set of services - a catalogue of its holdings for example. The most basic quality question is then 'Does this catalogue exist?'. Today, most users would also expect that part of the minimum would be that the catalogue was available online and covered all of the library's core holdings. These are performance attributes.
With Conformance the question is whether the product or service meets the agreed standard. This may be a national or international standard or locally determined service standard. The standards themselves, however they are devised, must of course relate to customer requirements. For information services there are obvious conformance questions around the utilisation of standards and protocols such as XML, RDF, Dublin Core, OAI, Z39.50 etc. It is worth noting that many conformance questions can only be answered by expert analysts since users are unlikely to have either the expertise or the access needed to make technical or service-wide assessments.
Features are the secondary operating attributes, which add to a product or service in the user's eyes but are not essential to it. They may provide an essential marketing edge. It is not always easy to distinguish 'performance' characteristics from 'features', especially as what is essential to one customer may be an optional extra to another, and there is a tendency for 'features' to become 'performance' attributes over time - direct links from the catalogue to full text are an example of a feature currently developing in this way.
Users place high value on the Reliability of a product or service. For products this usually means that they perform as expected (or better). For information services, a major issue is usually availability of the service. Therefore broken links, unreliability and slowness in speed of response can have a detrimental affect on a user's perception of a service.
Garvin uses the term Durability, defined as 'the amount of use the product will provide before it deteriorates to the point where replacement or discard is preferable to repair'. In the case of information services this will relate to the sustainability of the service over a period of time. In simple terms, will the service still be in existence in three or five years? Again, this is more likely to be assessed by experts in the field than by end users, although they may have useful contributions on the assessment of the attribute based on comparisons with similar services.
For most users of information services an important issue is the Currency of information, i.e. how up to date the information provided is when it is retrieved.
Serviceability relates to when things go wrong. How easy will it then be to put them right? How quickly can they be repaired? How much inconvenience will be caused to the user, and how much cost? For users of an electronic information service this may translate to the level of help available to them at the time of the search. So the availability of instructions and prompts throughout, context sensitive help and usefulness of help will be important.
Whilst Aesthetics and Image is a highly subjective area, it is of prime importance to users. In electronic environments it brings in the whole debate about what constitutes good design. In a web environment, the design of the home page may be the basis for user selection of services, and this may have little to do with actual functionality. You may have a great information service behind that home page, but do the users ever find it?
Perceived Quality is one of the most interesting of attributes because it recognises that all users make their judgments on incomplete information. They do not carry out detailed surveys of 'hit rates' or examine the rival systems' performance in retrieving a systematic sample of records. Most users do not read the service's mission statement or service standards and do their best to by-pass the instructions pages. Yet, users will quickly come to a judgment about the service based on the reputation of the service among their colleagues and acquaintances, their preconceptions and their instant reactions to it.
The addition of Usability as an attribute is important in any user-centred evaluation. User-centred models are much more helpful when personal preferences and requirements are factored in - so, for example, usability to a blind person may mean something quite different to usability to a sighted person.
The approach also maps well to many of the quality assurance approaches which government (in the UK) is sponsoring - for example in public libraries the talk is now of 'Best Value'. In European business circles, the talk is of 'business excellence' and the European Foundation for Quality Management has re-titled its annual award as 'The European Award for Business Excellence'. One aspect of this that has become important recently is its emphasis on the satisfaction of all the stakeholders.
The study was concerned with two questions: 1) How do students discover and locate information and, 2) How do services (and aspects of services) rate in a student evaluation and what criteria are most important to them? To this end the study was split into two days of testing, the first of which was concerned with how students discover and locate information and second with evaluation of IE services. The following section presents a selection of the results of the research.
Students were asked to find information on fifteen set tasks, designed to be typical of information seeking in an academic environment, completing a questionnaire after each. Every time they started a new task we asked them where they went first to try to find relevant information. The following presents the most frequently cited starting points:
From these results it is clear that the majority of participants use a search engine in the first instance. This concurs with the JUBILEE and JUSTEIS results which found that use of SEs predominates over all other types of EIS. Search engines are liked for their familiarity and because they have provided successful results on previous occasions. Individual search engines become "my personal favourite" and phrases such as "tried and tested", "my usual search engine" and "trusted" were frequently given by the students when asked why they chose this source first. Many reasons for students confidence in Google were given, such as "Couldn't think how else to start a search", "Google is always my first choice", "I used Google because of the site's reliability", "I think it is the easiest search to use", "Its better to look on Google than on the library journal search for this one as I wasn't sure of the exact name of the journal".
Of those students who were able to locate a website which provided them with the information they required only 12.4% had heard of the website prior to the task, with 57.4% never having heard of it before and 30.2% being unable to find any information.
Students were asked how difficult or easy they found each of the tasks:
Students were asked how successful they had been in locating the information:
Even when users can find information it is not always an easy task. This may have serious implications for developers of services as a number of studies (Johnson et al, 2001) have shown that users will often trade performance for the path of least cognitive resistance (minimum effort and time).
Students were asked to search for as long (or short) a time as they wanted, with a maximum of 30 minutes to be spent on any one task. The time taken by the majority of participants looking for information was between 1 and 15 minutes. Other research (Craven and Griffiths, 2002) also found that the average time taken to search for information was between 15 - 19 minutes. The DEvISE project (Johnson et al., 2001) also found that Efficiency correlated most strongly with General Satisfaction, with Effectiveness second, which may suggest that the amount of time and effort required from the user matters more than the relevance of the items found.
Students were asked why they stopped trying to locate information, reasons given were:
One respondent gave a very simple reason for stopping - 'Teatime!'
Results show varying degrees of satisfaction across each of the services and each of the Attributes. The following figures present a selection of the results across six of the 5/99 Programme projects' available services, designated A to F to preserve anonymity.
Figure 1 Graphical representation of the Performance Attribute results, 5/99 Projects
Figure 3 Graphical representation of the Usability Attribute results, 5/99 Projects
Users expressed an increase in post-search Perceived Quality on Service A and Service C, coupled with high levels of Satisfaction Overall and across each of the Attributes. High levels of satisfaction were recorded across many of the Attributes for both of these services and this appears to have a positive impact on the preconceptions of the users.
On Service D Perceived Quality pre and post searching remained static despite high levels of satisfaction with Performance. In conjunction with this Performance score, satisfaction with Usability and Aesthetics were lower (Figures 2 and 3) and Overall Satisfaction was also low. This seems to indicate that users' perceptions of quality are driven by factors other than just the performance of a system. It also raises interesting questions as to how fixed preconceptions about quality may affect the results of the evaluation of a system or service. In each instance Satisfaction Overall corresponded closely with post-search Perceived Quality.
Use of the Quality Attributes as evaluation criteria allows investigation of what happens in-between perceptions of quality of service before and after use of the service. This allows for improvement of services by targeting areas that have scored lower. Therefore, this approach allows service providers and developers to identify specific areas for improvement. In IR terms, performance would traditionally be measured by recall and precision, and in a Web environment it may be measured by user satisfaction (Johnson et al 2001). These results seem to demonstrate that other measures play an important role in user evaluation. Using a Quality Management approach, we can demonstrate that users' preconceptions play a major role in their evaluation of a service and can be hard to shift - an example is Service D where Pre and Post perceived quality did not change despite the fact that users were satisfied that required information was found. However, where a service performs well across all attributes user perceptions can change through use of the service (for example Service C). This raises interesting questions about how services are developed, how users are trained and how services are marketed.
Final issues that have arisen as a result of this research may be summarised by the following:
Users did comment on the Help feature and the availability of Instructions and Prompts - not many participants used help but those that did reporting mixed feelings on the usefulness of it. Some attention is therefore needed to develop Help into a feature that actually assists users. Users reported some good feedback on the availability of Instructions and Prompts when this feature was available and made sense.
Two results in particular raise very interesting and important issues:
Students either have little awareness of alternative ways of finding information to the search engine route or have tried other methods and still prefer to use Google - a situation we now refer to as the Googling phenomenon. Further to this, even when students are able to locate information it is not always easy (even when using Google), and with a third of participants failing to find information, user awareness, training and education needs to be improved. If the IE is truly to be embedded and integrated into learning and teaching further work needs to be done to equip students with the awareness and skills to use electronic resources other than Google.
In addition, use of Quality Attributes as evaluation criteria allows investigation of what happens to perceptions of quality of service before and after use of the service. This allows for improvement of services by targeting areas that have scored lower. Therefore, this approach allows service providers and developers to identify specific areas for improvement. In IR terms, performance would traditionally be measured by recall and precision, and in a Web environment it may be measured by user satisfaction (Johnson et al 2001). These results are early indicators from work in progress but seem to demonstrate that other measures play an important role in user evaluation. Using a Quality Management approach, we can demonstrate that users' preconceptions play a major role in their evaluation of a service and can be are hard to shift. This raises interesting questions about how services are developed, how users are trained and how services are marketed. As the IE develops further it provides a tool for evaluating its products from a user perspective. In particular it is capable of identifying specific areas that may benefit from further attention.
Jill R. Griffiths and Peter Brophy
Centre for Research in Library & Information Management (CERLIM),
the Manchester Metropolitan University,