Electronic journals are playing an increasing role in the education of students. The ARL Directory of Scholarly Electronic Journals  lists nearly 4,000 peer-reviewed journal titles and 4,600 conferences available electronically, and many academic libraries now subscribe to ejournal services such as those provided by MCB Emerald, Omnifile and ingentaJournals.
In comparison, electronic books have been slow to impact on Higher Education. Initiatives such as Project Gutenberg  and the Electronic Text Centre  have, for many years, been digitising out-of-copyright texts and making them available online. However, up-to-date academic textbooks and the technology to deliver them to students have only recently become available: lecturers are beginning to mount their publications online (see http://eboni.dis.strath.ac.uk:8080/search for an indication of the work being conducted in this area); Taylor and Francis  will shortly become the first English language publisher to digitise its entire library of book titles, many of them academic, for sale in e-book format via the Internet; MetaText , a division of netLibrary Inc., is creating Web-based digital textbooks that provide students and lecturers with a range of learning tools including multimedia content, chat rooms and self-grading quizzes; and goReader  have produced a prototype portable electronic device designed specifically for university students which can hold more than 500 textbooks and weights just five pounds.
However, despite the obvious technical and financial advantages of producing books electronically (such as the storage of multiple titles in a small space, the rapid production and worldwide dissemination of material, and the ability to exploit multimedia and hypertext capabilities), the concept has faced much criticism from the end-user point of view. People read about 25% slower from computer screens than from printed paper , and many opt to print-and-read digitized material rather than scroll through large chunks of text while sitting in front of a humming monster of a machine .
Morkes and Nielsen addressed this usability issue in a series of studies in 1997 and 1998 [9, 10]. They demonstrated that users ability to retrieve information from Web publications, and their subjective satisfaction with those publications, can be improved by up to 159% by altering the on-screen design of the text. These findings were applied, in the WEB Book study , to an electronic textbook available on the Internet and led to a 92% increase in usability, indicating the positive impact of focusing on the appearance of the content when preparing an academic text for publication on the Web.
It is essential that this matter is explored thoroughly in order that the latest e-book developments are fully informed from a design, as well as a content and technology perspective, and are delivered to the end-user in a form which maximises their usability. EBONI (Electronic Books ON-screen Interface)  has been funded under the JISC DNER  Learning and Teaching Programme to investigate this very issue, and aims to produce a set of best practice guidelines for publishing electronic textbooks on the Web which reflect the usability requirements of students and academics throughout the UK. The methodology EBONI will employ to arrive at these guidelines is outlined below.
EBONI will look into the importance of design issues on the usability of e-books through one large-scale user evaluation in one academic discipline, and several satellite studies in other disciplines. In this way, students and lecturers from a range of disciplines and backgrounds will be involved in evaluating electronic textbooks available on the Internet.
To this end, the Project Team have developed an E-book Evaluation Model, to be implemented in varying degrees by each study (depending on the specific objectives of the study, feasibility, and availability of resources) and which will ensure that the results of each evaluation are comparable at some level. This evaluation model, or methodology, comprises various options for selecting material and participants and describes the different tasks and evaluation techniques which can be employed in an experiment. These range from simple retrieval tasks measuring participants ability to find information in the material to high cognitive skill tasks set by lecturers to measure participants understanding of concepts in the texts, and from Web-based questionnaires measuring subjective satisfaction to one-to-one interviews with participants discussing all elements of interacting with the test material. As such, it offers comprehensive and wide-ranging methods of measuring the usability of e-books, incorporating traditional IR concepts such as speed and accuracy of retrieval, as well as measures which reflect the complex requirements of learners and teachers in Higher Education.
Some of the elements in this methodology are derived from the work of Morkes and Nielsen and Spool et al . Variously, they employed retrieval and memory tasks and techniques including covert observation and subjective satisfaction questionnaires to evaluate the usability of online technical white papers, Web sites with tourist information, and popular Internet sites. Clearly, this material differs in content, style and scope from the educational material that EBONI aims to evaluate, and the techniques we derived from their methodologies have been refined accordingly. In particular, the TILT projects expertise in conducting evaluation studies of teaching software across a wide range of disciplines at Glasgow University has provided a useful reference point in designing elements of the methodology particularly concerned with measuring educational aspects of the test material .
Each phase of EBONIs e-book evaluation model is outlined below.
Central to any comparative evaluation, and fundamental to the development of a methodology, is the selection of the objects (or experimental material) to be evaluated. This process will be directed by the particular objectives of the investigation and it is important that the chosen material will enable those objectives to be met in full.
In an e-book evaluation, texts may be selected for comparison according to three parameters:
For example, the Visual Book experiment  studied the application of the paper book metaphor to the design and production of electronic books, particularly focusing on the role of visual components. Its evaluation, therefore, compared the same text in electronic and paper media. The WEB Book experiment, on the other hand, focused on the impact of appearance on the usability of textbooks on the Web. To this end, two electronic versions of the same text, each of which exhibited different styles of presentation (or formats), were selected as the material for evaluation.
EBONI's core study builds on the experience of the Visual Book and the WEB Book, and the texts selected for evaluation vary according to two parameters: appearance and content . Like the WEB Book, it aims to measure the impact on usability of various design techniques evident in books on the Internet, but it has a broader focus and is concerned with evaluating a wide spectrum of styles evident in the online textbooks of, in the first instance, one academic discipline. As such, several texts (different in content), each of which represent different styles, have been selected for evaluation, all of which are aimed at the same level of students in the same discipline.
In all, the possible combinations of objects selected for comparative evaluation in an e-book usability study include:
The next stage in developing a methodology is the selection of the key actors who will play a role in the process of evaluating the chosen material.
In an e-book usability study, four main roles, or possible actors, can be distinguished:
Preliminary questionnaires can be used to glean more specific information about participants. They can help to distinguish between different groups of participants, identify significant differences in population samples, and record any other user needs, characteristics or abilities which may affect the results of the experiment.
The information gathered as a result of a pre-questionnaire will usually be factual and, in the case of electronic book evaluations, is likely to fall into two broad categories, both of which may be relevant when interpreting results: background information about the user, and details of the equipment he or she is using to conduct the experiment.
Engaging participants in performing tasks using the selected material enables the evaluator to collect comparable quantitative data which can be used to measure both the "effectiveness" and "efficiency" aspects of the usability of the text. Three types of task, each of which measures different aspects of usability, are outlined below.
The Scavenger Hunt is suggested by Spool et al  as a means of discovering how quickly and/or accurately participants can find information in an electronic text. Participants are involved in hunting through the material selected for evaluation in search of specific facts without using the browsers Find command. Questions should be chosen to which participants are unlikely to know the answer automatically, to ensure that they are fully engaged in the experimental process. A judgement task can be included in which the participant first has to find relevant information, then make a judgement about it and, if resources allow, the length of time it takes participants to complete all these tasks should be measured.
Scavenger Hunts are appropriate in all cases where it is important to the goals of the experiment to determine the accuracy and/or speed with which information can be retrieved from the test material, and the level of difficulty of tasks set will change according to the level of knowledge and expertise of the participants.
The results of the Scavenger Hunt will feed directly into two measures of usability:
Memory tasks are suggested by Morkes and Nielsen as a method of measuring a participants ability to recognise and recall information from an electronic text, after spending a specified amount of time reading it . Data gathered from such tasks can be used to infer how the appearance of information on screen affects users' ability to remember that information. Memory tasks involve the participant reading a chapter or a chunk of text for a short period of time, learning as much as possible in preparation for a short exam. The exam will comprise a set of multiple-choice questions (measuring recognition) and a question asking them to recall a list of items.
These tasks will comprise two more measures: recognition and recall.
These differ from the Scavenger Hunt and memory tasks in that they are intended to measure participants ability to engage with the selected material in a manner which requires a higher degree of cognitive skill than simple retrieval and memory tasks. This type of task is suggested by the EBONI Project Team for e-book evaluations that require a more complex measure of the usability of the text to be made, which reflects the requirements of learners and teachers in Higher Education.
Lecturers in the appropriate discipline are asked to read the test material and to assign tasks, the results of which will indicate students' understanding of the concepts in that material. These "high cognitive skill tasks should reflect the more complex uses to which HE material is often put and measure the effectiveness of the electronic textbook in enabling participants successfully to engage in processes involving skills appropriate to their Higher Education course.
A lecturer will also be asked to assess the results of these tasks. Adoption of the same marking scheme across all evaluations which implement high cognitive skill tasks will enable results from different experiments to be compared easily and effectively.
Each e-book study will be different in terms of the procedures it adopts to evaluate the selected material. The various techniques that can be employed are as follows:
Along with effectiveness and efficiency, satisfaction is one of the key aspects of usability, as defined in the ISO standard, part 12. This is measured after participants have used the test material and carried out any tasks which form part of the experiment, so that their responses are informed and based on experience.
Questionnaires are one method of determining participants satisfaction. They have the following advantages:
Morkes and Nielsen's subjective satisfaction indices for measuring Web site usability are a useful reference point when creating questionnaires specific to an e-book study. They used the following four indices to measure satisfaction:
However, it is important to remember that Morkes and Nielsen were concerned with evaluating technical white papers and Web sites with tourist information. Therefore, some of their measures (e.g. "entertaining", and "how satisfied are you with the site's quality of language?") will be irrelevant to measuring users' satisfaction with academic material and should be omitted.
In addition, studies especially concerned with learning and teaching aspects of the test material (such as those employing high cognitive skill tasks) may find it appropriate to engage the help of a lecturer in the relevant discipline in devising the questionnaire; he or she may be able to advise, for example, on items to include in an index measuring participants satisfaction with the educational elements of the test material.
Covert observation involves watching a person, or group of people, without their knowledge and recording their behaviour as faithfully as possible. It can be used in e-book evaluations to examine closely the interaction between users and the test material. While interviewing and think-aloud procedures discover information about participants' thoughts, views and opinions, and heuristic evaluation involves speculation by evaluators as to the cause of confusion or difficulties, covert observation enables participants physical behaviour to be studied and draws attention to specific problems. As Bernard notes, covert observation is increasingly used as part of a mixed-method strategy, as researchers combine qualitative and quantitative data to answer questions of interest .
Using video as an observation tool will enable the evaluator to investigate interaction issues that are not easily studied by any other method, and will therefore provide additional data to that derived from other evaluation techniques. Observation is appropriate to investigations which are particularly concerned with HCI issues.
The "think-aloud" technique involves at least one participant in thinking aloud to explain what he or she is doing at each stage of performing the tasks, and why, to at least one evaluator.
Having participants give a running commentary as they are proceeding through the tasks provides valuable insights into the problems that they may encounter with the material selected for evaluation. Think-aloud sessions provide qualitative information about participants' cognitive processes, can reveal their learning strategies, motivations and affective state, and will deconstruct the steps taken to complete tasks. Because participants are revealing their thoughts as they work on a particular task, it can be assumed that the thought processes directly correspond to that task, thus generating very specific information about the test material.
Of primary interest will be the participant's explanation of how he or she is navigating the test material, reasons for errors, difficulty or hesitation, and comments on various aspects of the material. The information uncovered can support or contradict results of other evaluation techniques, or uncover unexpected thought processes.
Evaluators may interact with the participant during the test, asking questions to clarify areas of confusion, or suggesting new avenues of exploration, but this is generally kept to a minimum. The evaluator can also observe the participants behavior at the time of the task, which adds another source of data.
The subjective satisfaction questionnaire is a form of structured interview, where participants are asked to respond to as nearly identical a set of stimuli as possible. In this type of research, the input that triggers each person's responses is controlled so that the output can be reliable compared.
Semistructured interviews, on the other hand, use a "script" or set of clear instructions and cover a list of questions in a predetermined order, but the interviewer and respondent are free to follow leads . Patrick Dilley suggests structuring the flow of questions to lead the conversation pointedly yet comprehensively toward the larger research questions of the study. Even if the interviewer deviates from the script later, a printed list of questions serves as a guide to return to should he lose his way in the course of listening to answers . They can, therefore, be used to elicit full feedback on selected aspects of the experiment, and to follow leads on additional themes raised by the participant. As Dwyer notes, interviews can draw out information not only about what an individual does or thinks about but also about the how and why of behaviour .
Interviews will be conducted on a one-to-one basis after any tasks have been performed and the subjective satisfaction questionnaire has been completed.
EBONIs core study will implement all of the above tasks and techniques, making it a relatively expensive experiment. Some of the satellite studies, however, will be smaller in scale and only implement certain of the tasks and techniques, depending on the availability of resources. Thus, the total expense of each experiment will vary across two dimensions: task complexity (ranging from simple retrieval tasks to more complex high cognitive skill tasks); and technique complexity (from inexpensive questionnaires to interviews requiring time and expertise). EBONIs hypothesis is that very expensive experiments derived from this e-book evaluation model, with unlimited resources, can be mapped to simple, unsophisticated experiments employing only one task or technique. Following this logic, the results of EBONIs core study and the various satellite studies will all be comparable at some level.
The results of all user evaluations will feed directly into a set of best practice guidelines for producing educational material on the Web. Available in January 2002, these will be sent to relevant organisations, targeting publishers of electronic material, similar or related programmes, libraries and museums involved in digitizing collections and interested parties in the HE community in general. In addition, they will be available from the project Web site, together with an example of a text on which they have been implemented.
EBONi is based at the Centre for Digital Library Research , University of Strathclyde. The project welcomes feedback at all stages, and interested parties are invited to join the project mailing list .