With the instruction to produce only one more issue this year, I felt it was important to publish as much of the content in the pipeline as I could. We have previously developed the WebCorp [1] suite of software tools, designed to extract language examples from the Web and to uncover frequent and changing usage patterns automatically. eMargin, with its emphasis on manual annotation and analysis, was therefore somewhat of a departure for us.

The eMargin Project came about in 2007 when we attempted to apply our automated Corpus Linguistic analysis techniques to the study of English Literature. To do this, we built collections of works by particular authors and made these available through our WebCorp software, allowing other researchers to examine, for example, how Dickens uses the word 'woman', how usage varies across his novels, and which other words are associated with 'woman' in Dickens' works.

What we found was that, although our tools were generally well received, there was some resistance amongst literary scholars to this large-scale automated analysis of literary texts. Our top-down approach, relying on frequency counts and statistical analyses, was contrary to the traditional bottom-up approach employed in the discipline, relying on the intuition of literary scholars. In order to develop new software to meet the requirements of this new audience, we needed to gain a deeper understanding of the traditional approach and its limitations. Compounding the problem is the fact that, often, not all students in the class have read the text in its entirety.

The traditional mode of study in the discipline is 'close reading': the detailed examination and interpretation of short text extracts down to individual word level. This variety of 'practical criticism' was greatly influenced by the work of I.A. Richards in the 1920s [2] but can actually be traced back to the 11th Century [3]. What this approach usually involves in practice in the modern study of English Literature is that the teacher will specify a passage for analysis, often photocopying this and distributing it to the students. Students will then read the passage several times, underlining words or phrases which seem important, writing notes in the margin, and making links between different parts of the passage, drawing out themes and motifs. On each re-reading, the students' analysis gradually takes shape (see Figure 1). Close reading takes place either in preparation for seminars or in small groups during seminars, and the teacher will then draw together the individual analyses during a plenary session in the classroom. It has encompassed 10,000-hole optical coincidence cards, online database services, videotext, laser discs, and CD-ROMs, the World Wide Web, mobile services and big data solutions. I find the historical development of information resource management absolutely fascinating, yet feel that in general it is poorly documented from an analytical perspective even though there are some excellent archives.

These archives include the back issues of Ariadne from January 1996. Ariadne has always been one of my must-reads as a way of keeping in touch with issues and developments in e-delivery of information. The recently launched new Ariadne platform [1] has provided easier access to these archives. Looking through its content has reminded me of the skills and vision of the UK information profession as it sought to meet emerging user requirements with very limited resources. The archives have always been available on the Ariadne site but the recent update to the site and the availability of good tags on the archive content has made it much easier to mine through the archive issues.

The Ariadne team, in particular Richard Waller, has given me the opportunity to mine those archives [2] and trace some of the developments in electronic service delivery in the UK.

Indeed working through the archives is now probably too easy as in the preparation of this column I have found myself moving sideways from many of the feature articles to revel in the other columns that have been a feature of Ariadne. This article is a personal view of some of these developments and is in no way intended to be a definitive account. Its main purpose is to encourage others to look into the archive and learn from the experiences of the many innovators that have patiently coped with the challenges of emerging technology, resource limitations and often a distinct lack of strategy and policy at both an institutional and government level. School does not prepare you for reading primary journals and how best to make use of Chemical Abstracts, but I quickly found that working in the library was much more fun than in a laboratory. I obtained an excellent result in one vacation project on physical chemistry problems by reverse engineering the problems through Chemical Abstracts! Therefore, as it turned out, I had started my career as an information scientist before I even graduated. By 1977 I was working with The Chemical Society on the micropublishing of journals and taking part in a British Library project on the future of chemical information. Re-reading the outcomes of that project makes me realise how difficult it is to forecast the future. Now my past has re-asserted itself to good effect as I have both the honour and excitement of being Chair of the eContent Committee of the Royal Society of Chemistry.

By the mid-1990s good progress had been made in e-journal production technologies and the first e-only journals were beginning to appear. Among them was Glacial Geology and Geomorphology (GGG) which existed in a printed version only in as far as readers could print out a selection from it. One aim of GGG is therefore to provide the benefits of electronic transfer as well as other value added products in an accepted academic, peer-reviewed system. The immediate problem you face reading this admirable summary of the potential benefits of markup is that many of the hyperlinks have disappeared. History has been technologically terminated. Almost 15 years passed by before the Royal Society of Chemistry set up Project Prospect and turned semantic markup into a production process [4]. Dr Rzepa is now Professor of Computational Chemistry at Imperial College, London.

By the mid-1990s good progress had been made in e-journal production technologies and the first e-only journals were beginning to appear. Among them was Glacial Geology and Geomorphology (GGG) which existed in a printed version only in as far as readers could print out a selection from it. One aim of GGG is therefore to provide the benefits of electronic transfer as well as other value added products in an accepted academic, peer-reviewed system. The author of the article describing the project [5] was Dr. Brian Whalley, who went on to become a Professor in the Geomaterials Research Group, Queens University of Belfast. As you will discover from his author profile (another Ariadne innovation), Brian is still active though retired from formal education. What struck me about this article was the author's vision in January 1996 of how e-journals could be of benefit in university teaching. One of its very earliest decisions was to hold a conference every two years at which new developments could be reported. The first conference was held in Germany in 1968, and over the following years it would be held in 15 different countries across 4 continents. The opportunity to share experiences from these differing perspectives doesn't happen that often and brings real benefits, such as highly productive networking. This year's Online Information, held between 20 - 21 November, felt like a slightly different event to previous years. Besides the long-standing debate on what information and knowledge really mean, the world of current technologies is changing at a pace which inevitably influences all spheres of human activity. But the first of those spheres to tackle is perhaps that of information – how we create, disseminate, and use it. The past few years have seen coverage of highly topical areas such as virtualisation and the cloud, the mobile university and access management. According to the Digital Preservation Coalition (DPC), more than 70 new domains are registered and more than 500,000 documents are added to the Web every minute [1]. This scale, coupled with its ever-evolving use, present significant challenges to those concerned with preserving both the content and context of the Web. This concept allows for managing research and publication data together with related metadata, internal and/or external links and access rights. Development of eSciDoc was initiated by a collaborative venture between FIZ Karlsruhe – Leibniz Institute for Information Infrastructure and the Max Planck Digital Library (MPDL) and was funded by the German Federal Ministry of Education and Research. Hence research has to be regarded as one of the aces remaining to us, and thus I hope the importance of gathering, managing and preserving for long-term access research outcomes will be widely appreciated and supported. On the surface, libraries would seem to have much human and technological infrastructure ready-constructed to repurpose for data: digital library platforms and institutional repositories may appear fit for purpose. However, unless libraries understand the salient characteristics of research data, and how they do and do not fit with library processes and infrastructure, they run the risk of embarrassing missteps as they come to grips with the data challenge.

Whether managing research data is 'the new special collections,'[1] a new form of regular academic-library collection development, or a brand-new library specialty, the possibilities have excited a great deal of talk, planning, and educational opportunity in a profession seeking to expand its boundaries.

Faced with shrinking budgets and staffs, library administrators may well be tempted to repurpose existing technology infrastructure and staff to address the data curation challenge. Existing digital libraries and institutional repositories seem on the surface to be a natural fit for housing digital research data. Unfortunately, significant mismatches exist between research data and library digital warehouses, as well as the processes and procedures librarians typically use to fill those warehouses. Repurposing warehouses and staff for research data is therefore neither straightforward nor simple; in some cases, it may even prove impossible. 'Small data,' however, may prove to be the bigger problem: data emerging from individual researchers and labs, especially those with little or no access to grants, or a hyperlocal research focus. Though each small-data producer produces only a trickle of data compared to the like of the Large Hadron Collider Project, the tens of thousands of small-data producers in aggregate may well produce as much data (or more, measured in bytes) as their Big Data counterparts [2]. Securely and reliably storing and auditing this amount of data is a serious challenge. The burgeoning 'small data' store means that institutions without local Big Data projects are by no means exempt from large-scale storage considerations.

Small data also represents a serious challenge in terms of human resources. Best practices instituted in a Big Data project reach all affected scientists quickly and completely; conversely, a small amount of expert intervention in such a project pays immense dividends. Because of the great numbers of individual scientists and labs producing small data, however, immensely more consultations and consultants are necessary to bring practices and the resulting data to an acceptable standard. When that tool does not include long-term data viability as a development goal, the data it produces are often neither interoperable nor preservable.

A major consequence of the diversity of forms and formats of digital research data is a concomitant diversity in desired interactions. The biologist with a 3-D stack of microscope images interacts very differently with those images than does a manuscript scholar trying to extract the underlying half-erased text from a palimpsest. These varying affordances must be respected by dissemination platforms if research data are to enjoy continued use.

One important set of interactions involves actual changes to data. Many sorts of research data are considerably less usable in their raw state than after they have had filters or algorithms or other processing performed on them. Others welcome correction, or are refined by comparison with other datasets. Two corollaries emerge: first, that planning and acting for data stewardship must take place throughout the research process, rather than being an add-on at the end; and second, that digital preservation systems designed to steward only final, unchanging materials can only fail faced with real-world datasets and data-use practices. Recent technological advances in digital photography and image processing not only offer a high level of documentation, they also provide powerful analytical tools for conservation monitoring of cultural objects.