Personal collections such as those kept in the British Library have long documented diverse careers and lives, and include a wide variety of document (and artefact) types, formats and relationships. In recent years these collections have become ever more 'digital'. Not surprisingly, given the inexorable march of technological innovation, individuals are capturing and storing an ever-increasing amount of digital information about or for themselves, including documents, articles, portfolios of work, digital images, and audio and video recordings . People can now correspond by email, have personal Web pages, blogs, and electronic diaries. Many issues arise from this increasingly empowered landscape of personal collection, dissemination, and digital memory, which will have major future impacts on librarianship and archival practice as our lives are increasingly recorded digitally rather than on paper. Not only the media and formats but, as we discovered in our research into digital collections, also the contents of works created by individuals are changing in their exploitation of the possibilities afforded them by the various software applications available. We need to understand and address these issues now if future historians, biographers and curators are to be able to make sense of life in the early twenty-first century. There is a real danger otherwise that we will lose whole swathes of personal, family and cultural memory.
Various aspects of the subject of these personal digital archives have been studied, usually as aspects of 'personal information management' or PIM, such as that on the process of finding documents that have been acquired . As Jones  says in his comprehensive literature review on the subject, 'much of the research relating to ... PIM is fragmented by application and device ...'. Important studies have focused on:
The research reported in this article, which forms part of a longer-term study 'Digital Lives: research collections for the 21st century', takes a wider look at personal digital document acquisition and creation, organisation, retrieval, disposal or archiving, considering all applications and formats.
'Digital Lives' is a research project focusing on personal digital collections and their relationship with research repositories. It brings together expert curators and practitioners in digital preservation, digital manuscripts, literary collections, Web archiving, history of science, and oral history from the British Library with researchers in the School of Library, Archive and Information Studies at University College London, and the Centre for Information Technology and Law at the University of Bristol.
The Digital Lives research project aims to be a groundbreaking study addressing these major gaps in research and thinking on personal digital collections. The full study is considering not only how collections currently being deposited are changing, but also the fate of the research collections of the future being created now and implications for collection development and practice. We are seeking to clarify our understanding of an enormously complex and changing environment, engage with major issues, and evaluate radical new practices and tools that could assist curators in the future.
Within this broad remit, this article focuses on the first stage of the digital archive process - individuals' own digital behaviour and their build-up of a digital collection.
We wanted to find out:
We used in-depth interviews to explore the views, practices and experiences of a number of eminent individuals in the fields of politics, the arts and the sciences, plus an equal number of young or mid-career professional practitioners. Questions covered the subjects listed above, with the first question addressing the history of the interviewee's experience with computers and ICT. The narration of this account often touched on topics such as training, manipulating files, backing up and transfer, collaborative work and thus offered information contextualised in more general experiences, attitudes and perceptions.
For this qualitative phase of the research, a wide spectrum of respondents, in terms of ages, backgrounds, professional expertise, and type and extent of computer usage, were interviewed. This was to elicit a diverse range of experiences and behaviours.
The 25 interviewees included respondents who were:
During the course of the research a fascinating variety of experiences, behaviours and approaches were uncovered, ranging from the digitisation of scientific records, to forwarding emails to oneself so that the subject line could be changed, to filming a theatre production and then projecting it onto the surrounding environment.
Overall, the breadth of disciplines, backgrounds, ages and experiences of the individuals interviewed gave such contrasting and varied accounts that it is almost impossible to generalise findings at this preliminary stage. However, the narratives do provide excellent descriptions of a whole range of 'digital' behaviours that will be very useful in drawing up a questionnaire survey to be undertaken in the next phase of the research.
Despite 'home' usually being where an individual's use of computers first developed, later usage seems to be dominated by work. Nearly all respondents have a collection of digital photographs, and a minority have a blog or page on a social networking Web site, but overwhelmingly documents and other 'electronic artefacts' produced are work-related, and the work environment was the one thought of first when answering the question.
An important point which has implications for archiving is that for some people, much work is undertaken remotely – directly from a server. This includes
There was a surprising enthusiasm for updating technology (although, of course, the sample was biased towards those who have a 'digital collection'). Only one respondent showed any reluctance – an interviewee who has been using C120 standard cassette tapes to record a diary for 40 years (he played one to us!) and intends to continue doing so. Equally, he said he did not wish to digitise his accumulated collection.
One unexpected finding was that, unless specifically asked to discuss 'non-work' digital artefacts, respondents did not readily include discussion of them in their accounts. In fact, there was far less convergence between professional and work- related items on the one hand, and non-work and leisure items on the other, than had been expected. There was little evidence of a division of computers between home and work use (though there was such a division among family members – with each person generally having their own log-in and/or separate directories). The organisation of documents seemed always to reflect the separation of work and leisure.
Several key points arose. First, the norm is to be self-taught, even where people's jobs involve very sophisticated application of computer software. This is mainly because of early fascination with computers or the influence of parents or older siblings; but also out of necessity. Often a PhD topic would require the use of a particular application, which students would take upon themselves to learn for the sake of their studies. This ranged from citation-referencing software such as Endnote to learning high-level computer programming.
Second, where respondents were not (or only partially) self-taught, training given was often informal, sporadic or ad hoc. For example, ongoing help was provided for one of the BL interviewees by a variety of family and friends. In another case, a digital collection was managed by one of the interviewee's sons. In other cases a relationship was built up between the computer user and a particular individual (such as 'Reg, the computer man') who was called as required, and in whom all trust was placed.
Despite many respondents working in an environment in which IT support was readily available, such support did not spring to mind when respondents spoke of their needs in this area. When this was mentioned, comments were generally negative: 'they don't know anything' or '[they are] generally unhelpful'. A leading information manager called IT support workers 'an interesting breed', explaining that 'they don't understand that non-IT people have skills. IT and librarians will always clash, because IT people are always concerned with security and librarians with information-sharing'. However, one respondent – perhaps significantly, also an IT expert – found them 'very good', having worked directly with them in his job.
Finally, in the cases where informal help was provided, many misconceptions were apparent, although whether these were due to the quality of advice or teaching given is not known. To give one example, a Web author did not realise her novel, written as a serial blog, or her emails, were stored remotely from her own computer and that without an Internet connection, therefore, she would be unable to access them. She has never been unable to do so as her broadband Internet connection is on all the time. She had no idea of managing email messages and did not know that these were retained after being opened. In this respect, the current study supported the findings of Marshall and colleagues  whose study of how people acquire, keep, and access their digital 'belongings' showed 'a scattered and contradictory understanding of computers in the abstract'. (p.26)
No obvious pattern has emerged so far. The main points are, firstly, that documents are stored in folders that reflect either chronological creation, or topic, depending on appropriateness. The variety of material and general work determined this decision. For example, where documents are related to one theme – such as repeated experiments –they are more likely to be filed by date, or in folders by date range. The system's automatic allocation of 'date modified' was considered not to be of use in files where a date is important. This is because often the important date is the one in which the experiment or event took place and not when the document was modified.
Many other people also tend to put the date manually as part of the file name, even though the system automatically records and displays the date when a document was last modified. This is for four main reasons:
Another broad finding was that collections appear to grow organically – instead of moving completed files and folders to a less visible position on a computer, other folders are simply created next to existing resources. However, a minority of respondents do delete files once they are backed up elsewhere, or create folders in which they relocate folders of completed documents, so there are not too many folders at the top level. The study also found similar results to Jones et al.  in that people replicate file structures where similar filing is required for different projects. An obvious example of this is a lecturer who has a folder for each course, within which are subfolders containing, respectively, PowerPoint slides, lecture notes, current student work, etc.
A theme that emerged with regard to email in particular was that documents and other digital artefacts accumulate unintentionally. There were examples of email archives containing literally thousands of out of date messages, kept only because it was less effort to retain than to delete them. According to Marshall  'most personal information in the digital world is not collected intentionally and thus does not form a coherent collection; instead heterogeneous materials accumulate invisibly over time' (p5). Whilst the present study's results would not suggest that 'most' information is collected inadvertently, this may be true for emails and attendant attachments.
Another 'theme' or 'generality' that came out of the interviews was that a change in computer is often the motivating force and the main way in which files are removed from an 'active' location. In other words, the act of transferring files from one computer to another includes that of weeding files, whereby those that are no longer active are either discarded with the old machine or, where the hardware is retained, kept in long-term storage.
Finally, there were few examples of documents not being organised into folders and directory hierarchies, but retrieved or accessed by keyword only – a finding that has echoes in the work of Jones and colleagues (p. 1505), who found that all but one of their sample of 14 professionals and academics refused to pilot a search-based system for retrieving information without organising it hierarchically. Our research found that folder hierarchies represented 'information in their own right' and that 'Folders … represented key components of a project and thus constituted an emerging understanding of the associated information items and their various relationships to one another'. This supports the approach that has been adopted by the Digital Manuscripts Project at the British Library, for example, where the value of contextual information beyond simply the digital files has been emphasised . In the present study, each folder and subfolder system formed discrete units of work themed around time periods or different tasks, such as teaching, research etc. An aspect of folders not mentioned by Jones was that of the facility to browse. Lecturers said that they sometimes needed material given in a particular talk for something else. Using a traditional folder system they could browse filenames both to find specific files and for inspiration – there were occasions where, in at least one case, a file existed that the interviewee had forgotten and was only remembered on seeing the file name.
Ironically, of those who did not use folders and hierarchies, one was of a computer expert, who is adept at information retrieval, and the other of a novice. The latter did not appear to know about folders or hierarchies, and had much help from one of his sons in indexing his files, which were integrated into a bigger collection of cassette tapes, hard copy articles and letters etc.
Almost all respondents said that they deleted fewer files 'these days' because there were not the electronic storage problems that marked earlier computer usage, with some exceptions such as institutional file storage limits (see email below). For many it was actually less of an effort to simply keep a file than to delete it. However, people were aware that they were creating possible problems in the future in the form of 'document overload' – in other words, having too many files and directories to easily navigate to active documents. As mentioned above, there were ways of obviating this problem.
With regard to the transferring of files, many respondents generally only remember their most recent computer transfer, or their behaviour in this respect over the last three or four years. When asked about periods before this, they usually prefix answers with the words 'I must have ...' and sometimes puzzle about what happened to 'all those floppy disks'.
Many documents appear to have been lost when changing from an old to a new computer, where the former was then sold on or discarded. There were cases, though, where old computers were kept simply for the files they contain, although subsequently retrieving the files and migrating them to newer computers was problematic, as in some cases the only way to remove a file was on a floppy disk, which in turn could not be read by a newer computer. There were cases of this within the 'eminent' sample, whose collections may be deposited at the BL or similar institution, where the interviewees were told of the capacity of the BL to both extract files, and also to access corrupted or deleted material through computer forensics.
Back-up policies appear to relate to three major factors:
In a minority of advanced user cases, synchronous or automatic back-up is undertaken, and two interviewees have the new Mac system with its 'Time machine' function, enabling users to restore files and folders to their status and 'position' on any given date. Nevertheless, even this method on its own is vulnerable if there is no backing up to an additional, separate, store.
Other back-up methods included storage media such as external drives, floppy disks, CDs or DVDs, and alternative computers. However, email was also used quite extensively. This is discussed more below.
As Jones  points out, 'decisions concerning whether and how to keep … information are an essential part of personal information management'. Although Jones was not necessarily talking about information to be kept in the long term, clearly, archiving files – and deciding whether to keep particular documents or not – is also a critical aspect of PIM.
Our study found, as with earlier work by Ravasio et al.  that saving work that was completed (even if not actually using the word 'archiving') was an important part of working with digital documents.
The main points that emerged are that the decision to archive appears to depend on both affective and utilitarian factors. These were:
Regarding the first of these points, the time and/or emotional investment in producing a document appeared to be a major factor in its retention, regardless of whether it would ever be needed again. Marshall et al.  (p30) suggest that 'value' can be calculated using five factors:
For the current project, creative time and effort appeared to be interlinked to the extent that they formed one factor. In many cases the 'emotional impact' was also inextricably linked with the time and effort expended in creating the artefact, although this was also influenced by the contextual factors surrounding its creation and history. The work and emotional effort going into a project defined it as an important statement of achievement, and thus heightened its value and guaranteed its continued existence. Sometimes with the case of key artefacts, there would be back-up copies (for example, on CD) just to ensure survival. Indeed, some respondents look at their archive as a reflection of their life's work, and keep items of no further practical value. Logically, one might assume that it would be hard copy or 'physical' artefacts that would be retained as such a testimony. Much hard copy material is also kept because of the time or effort invested in its creation, or for representing important points in the creators' lives. However, the reluctance to dispose of electronic files indicated that they too constituted an important part of a professional or academic portfolio. Of course, in the case of many (increasingly prevalent) digital files such as animated images, audio and video, it is not meaningful to print out a hard copy.
It has long been known from conventional (neither digital nor hybrid) archives that people retain items for their sentimental value and as biographical records or pointers to a person's individual or family life story. Thus while people differ in how many items are retained for this kind of reason, the importance of at least a modest degree of personal archiving is widely and strongly felt. Etherton  has noted in an illuminating account of the role of archives: 'Families very often keep personal records of people and events such as family photograph albums and baby albums which record the growth of and achievements of a child's early years', and argues convincingly that such things play an important role in a person's sound psychological well-being, helping to provide individuals with a sense of belonging and a sense of place. So much so that social work and medical professionals working with children in care and with terminally ill young mothers, ensure that life story books, memory boxes and oral history recordings are prepared in order to provide 'fundamental information on the birth family and on the early details of the child'.
It is perhaps not surprising that this general need for personal memory is (to varying degrees) also felt by academics and professionals in respect of their careers as well as of their home life. Moreover, such a need has begun to embrace digital objects as well as non-digital ones, as is borne out by some of the comments of interviewees.
On several occasions interviewees showed researchers either hard copy or electronic files which strongly reminded them of the contextual aspects surrounding the creation or acquisition of those files. Examples of such contexts were working with friends, undertaking specific activities, or being constrained by the technology. For example, a geophysicist showed us printouts of his attempts to model an ice block melting during his PhD research. This evoked memories of limited computing power and memory (the model could not be visualised on-screen or stored locally), alongside nervous anticipation about the results of his early research efforts. Programs had to be run overnight from departmental computers and the results retrieved from a remote printer in the morning.
Many examples were encountered where creative effort and labour lent considerable value to documents, even where they were now, to all intents and purposes, without any practical worth. For example, Word Perfect files were not deleted even though they may have been unreadable to the creator. It needs to be borne in mind, however, that the retention of obsolete files is not necessarily emotional or irrational. Once a file is destroyed it is gone, but in the case of an obsolete file there remains some hope of recovery. In science, for example, datasets and records of analyses need to be kept in order to allow colleagues the possibility of re-analysis. A scientist who deletes a file (obsolete or not) might expose himself or herself to the criticism that he or she has actively denied other researchers the possibility of re-analysis: a gradual obsolescence might be deemed more acceptable.
With storage capacity so much larger now, it is easier to retain documents. In fact, as mentioned above, many interviewees simply did not bother to delete documents that they no longer needed. Thus the long-term existence of a document no longer necessarily implies that it has been invested with considerable value.
Another reason for archiving is that documents may be required in the future. Much of the literature on this  discusses acquired documents in this context. The research reported here, however, shows that created documents are also archived for their potential later usage in different contexts. Not surprisingly, lecturers kept Word and PowerPoint files for later reference. They included material for courses no longer taught or current, just in case they were needed at a later stage – even to illustrate historical points (where it is the contemporary nature of the discipline that matters). Even student essays were kept (partly) for this reason. One interviewee said that his undergraduate essays contained 'decent' reviews of past literature that could be 'plundered' for later use. Academic writers also found that they could 'recycle' parts of old articles. For example, a conclusion to one article could be used in a section outlining the author's own prior research in a further paper. Of course, quite often interviewees could not specify exactly how a document would be used later but still felt that, as long as there was a possibility that it might be of use, it was worth keeping. Email messages that were never, or no longer, 'actionable' were also stored.
Hard copy is also an option for some interviewees, with the hard copies often being generated by others on their behalf (e.g. by receiving hard copies of journal articles). Final versions of articles are particularly likely to be printed out. One scientist interviewed has his journal articles in electronic and hard copy form, and binds the latter every time he reaches another 30 publications. In some cases, the hard copy is actually more extensive than the digital back-up. For example, a playwright whom we interviewed prints his work every day, reads through and makes any changes by hand. The next day he makes the adjustments on his computer, saving the document under the same name and thus overwriting his previous version. He keeps and files all his printouts and thus has a comprehensive record of all stages of his work. In addition, every 30 days or so, he does create a new electronic version (i.e. by changing the filename).
The exploration of the use of email proved to be one of the most fruitful and interesting areas of study. Usage has gone far beyond the original purpose of email (e.g. communicating with others) and is being appropriated in various innovative ways by respondents. These include:
Within the topic of email, the creation and usage of different accounts was particularly instructive:
The research and the nature of the personal collections and digital behaviour described above clearly have significant implications for large institutional repositories such as the British Library. Large, hybrid collections of contemporary papers, partly generated using computers, including eMANUSCRIPTS and eARCHIVES, have resulted in personal collections of substantial quantity and complexity in terms of version control of documents, archival appraisal and selection. Our findings so far indicate that these issues will remain pertinent in dealing with personal hybrid and digital collections, although they may need to be tackled in different ways. For example, more of the management, appraisal and selection of archival material from personal digital collections may need to be carried out by creators in partnership with repositories in their lifetimes, rather than retrospectively. Moreover, repositories may need to deal with a greater variety of digital formats as part of a continuous decision-making process and workflow, rather than parcelling out different aspects of personal collections to format specialists.
From a different perspective, curators have also often dealt with intermediaries authorised by creators to control their archives (often posthumously). However, with institutions and commercial service providers offering the creators of personal digital collections services in their lifetime such as email, social networking and file storage, the decision to pass control of potential archive material to intermediaries is sometimes taken less advisedly, and may lead to further complexities. Digital Lives research will be examining issues concerning rights and storage services in more detail.
Finally, there is one aspect that has not been mentioned in this report but constituted part of our qualitative research among creators of personal digital collections. It is that of attitudes to rights issues including privacy, personal control and misuse of information and copyright. Again, these are issues traditionally encountered by repositories that have in the past balanced concerns about privacy, protection of sensitive information and intellectual property on the part of archive creators with that of access to researchers. Our interviewees among the creators of personal digital collections seemed relaxed on the whole about these issues, or otherwise to have given them little thought. In a context of legal compliance, it may also be appropriate to consider issues of cultural change when thinking about how rights issues are to be handled by archive creators and repositories in the future. Again, this is a discrete area of research that is receiving more attention in the Digital Lives Project.
Issues of acquiring, creating, manipulating, storing, archiving and managing personal digital archives are extremely complex and few patterns emerge from the interviews described. This may be because there are many distinct styles of conducting digital lives or because the scope of what is meant by digital lives lacks adequate definition at present. The sample was made up of people with widely differing backgrounds, and who used computers in a great variety of different ways. There seems to be many distinct styles of conducting digital lives. Our research found significant differences in:
While not yet yielding any general conclusions, the study has already highlighted for the researchers some of the issues relating to the deposit of personal digital collections with which, increasingly, repositories will be faced. With further analysis and dissemination, the project findings, will greatly inform the British Library and other repositories. One such issue, for example, is the blurring of the distinction (at least, in the interviewees' views) between what is created or stored online and off-line, and a certain misunderstanding about this issue. This was particularly true of email, where some respondents did not know whether their messages were stored on their own computer or remotely, and, indeed, had never given it any thought.
A certain ambiguity was also revealed, regarding 'back-up', 'storage' and 'archive'. In part, this was just a question of terminology, but vague areas were revealed where, for example, back-ups for active documents – often several draft versions – were retained permanently because this was easier than deleting them, even though such dormant and somewhat repetitive documents were not considered part of an archive. Indeed, many interviewees did not regard even their long-term retained documents as an 'archive' of enduring value.
The interplay between digital and non-digital artefacts and individual artefacts having both digital and hard copy elements is becoming a big issue for repositories. Our research showed that often hard copy and digital versions of works were not always the same (e.g. in some cases a printout preceded further modifications, which remained in electronic form only), mirroring observations made by the Digital Manuscripts Project at the British Library. There were also examples of major drafts being written only in hard copy, with later or final drafts being committed to computer.
This article – and the research to date – has elicited and highlighted some of the major issues. In the next phase of the research, we will attempt to quantify some of the behaviours outlined here, and to explore in more depth the personal digital collection practices of various specific groups by means of a large-scale online survey. This will help to delineate commonalities and differences, to elucidate how they came about, and to articulate implications for library and – in particular – archival professionals.
Two related aspects that we are keen to begin to explore and characterise are the questions of:
Finally, a provisional curatorial response to the tentative conclusions of this paper may include the following points:
The 'Digital Lives' research project is being generously funded by the Arts and Humanities Research Council (Grant number BLRC 8669). Special thanks are due to Neil Beagrie who conceived of the idea for the project and was its Principal Investigator until leaving the British Library on 7 December 2007. Members of the research team who arranged and attended interviews (in addition to the authors) were: Jamie Andrews, Alison Hill, Rob Perks and Lynn Young, all from the British Library. The authors also wish to thank the interviewees themselves for their valuable contributions to the project.