Embedding Web Preservation Strategies Within Your Institution
The Web is where you go to find out what is happening now, it is, or should be, where the most up to date information about a topic, company or institution is to be found. Every day more and more information is added to existing Web sites and new ones appear at a frightening pace. Having all this information at our fingertips is undoubtedly a good thing but there is also a downside: more information means that it is increasingly difficult to find the bits that are relevant to you, and often the new information simply replaces what was there before.
It is this second point that inspired the Preservation of Web Resources Project  and its series of three workshops held in London, Aberdeen and Manchester, the first of which was reviewed in the previous edition of Ariadne  by Stephen Emmott of the LSE. The more we use the Web as our primary publication channel for certain types of information, the more the problem of losing the last version of that information grows, and the more aware we become of the ticking timebomb of our discarded Web resources. It is sobering to recall the early days of television before anyone took the decision to record everything for posterity; how many fascinating gems of broadcasting have been lost forever? Will we also be mourning our lost Web sites in the future?
This third workshop has brought together opinion, feedback and research from the first two and was a chance for the delegates to get a first look at the Handbook being developed by the project team. The theme was “embedding Web preservation strategies within your institution” and this was addressed in presentations and breakout sessions as follows:
Morning (10.00 – 12.40)
- Presentation 1. Introduction to JISC PoWR (Kevin Ashley, ULCC)
- Presentation 2. Records Management vs. Web Management: Beyond the Stereotypes (Marieke Guy, UKOLN)
- Breakout Session 1: Web Preservation in your Organisation
- Presentation 3. Web Preservation in a Web 2.0 Environment (Brian Kelly, UKOLN)
Afternoon (13.30 – 16.00)
- Presentation 4. The JISC-PoWR Workshops - Inputs and Outcomes (Marieke Guy, UKOLN)
- Presentation 5. The JISC-PoWR Handbook - Explaining Web Preservation (Kevin Ashley, ULCC)
- Presentation 6. The JISC-PoWR Handbook - Identifying Web Issues (Richard Davis, ULCC)
- Breakout Session 2: The Next Steps for Web Preservation in your Organisation
- Presentation 7. The JISC-PoWR Handbook - Recommended Approaches (Ed Pinsent, ULCC)
- Future possibilities and Final Thoughts
Introduction to JISC PoWR
Kevin Ashley began by clarifying what the project’s goals were: this series of workshops would lead to the publishing of a handbook that would help both effective decision making about Web resource preservation and how then to implement those decisions. Where the earlier workshops helped to validate the project team’s thinking, this final workshop aimed to validate their understanding.
When outlining a decision-making process, Kevin referred to ‘MoSCoW’, which Stephen Emmott mentioned in July’s ‘At the Event’, and this neatly illustrated the symbiotic nature of the workshops and the delegates’ input. (MoSCoW is a model of prioritisation – Must do; Should do; Could do; Won’t do). This is an ideal starting point for anyone implementing a preservation strategy. It may at first seem much easier just to preserve everything, but doing so could lead to future conflicts with Data Protection. On the other hand not preserving anything can leave you open to legal liability: someone makes a decision based on information held solely on your Web site, you change this information, their decision becomes flawed, if they pursue the issue, how do you prove that the misinformation was not on your site? A further complication is that even if there is a robust preservation process in operation it may only show what was changed, not why it was changed. At this point I felt a flutter of despair. Where does this end? What next? Preserving every tweet from Twitter? What about your reason for tweeting in the first place?
In his conclusion, Kevin explained the outline structure of the handbook and, as a result of these workshops and feedback from the PoWR Web site, what will no longer be included, such as managing preserved content, how much and how often, and implications of change. He closed with a further request to us for feedback as to whether the handbook is useful, how it could be improved and to supply case studies from our own institutions.
Records Management vs. Web Management: Beyond the Stereotypes
Marieke Guy had clearly had some fun exploring the stereotypical characteristics of Web Managers and Record Managers to highlight the assumption that they are opposites. Web Managers tend to be male, technology-literate and open, questing for the new, and not bothered about preservation; whereas Records Managers are pessimistic (they deal with planning for the worst imaginable contingencies), take a sensible, risk-management approach, want to be closed (not too many cries of ‘Let’s try re-organising this archive!’ in libraries and universities throughout the land), and are very interested in preservation and - its necessary flipside - destruction of records.
She briefly reminded us of Alison Wildish and Lizzie Richmond’s presentation about the University of Bath from Workshop 1  and then tied it all together with a succinct and positive conclusion. Web Managers need Record Managers and Record Managers need Web Managers, so get together and begin to share knowledge and skills. Successfully addressing Web preservation is too large and complex for it to be the sole responsibility of one person or team. The responsibilities must first be agreed upon and then shared amongst all stakeholders. This will inevitably require additional resources and senior management buy-in, but more of that later.
Breakout Session 1: Web Preservation in your Organisation
To pre-empt seminar fatigue the focus was then turned on the delegates. Three groups were each given a scenario: to think about a preservation audit for either our institution’s Web site, our online prospectus, or a student’s Personal Learning Environment including Web 2.0 services. I was quite relieved to be in the group looking at the prospectus, it initially seemed quite obvious and self-contained but as soon as we began to discuss the online version we realised that this was not the case. A printed prospectus will cover the application process and the course details but on the Web site there will be links through to departments, schools and faculties, particular academics and research projects, clubs, societies and, in Oxford’s case, the colleges. One office may hold the responsibility for producing the prospectus and so could potentially be able to preserve it, but they are unlikely to also be responsible for the content on the departments’, academics’ or projects’ Web sites. How will this content be preserved?
Some thought-provoking points arose in the wider discussion that followed the exercise. There may be risks involved in having a preservation policy that states, ‘do nothing’, but having one that says, ‘do something’ simply presents other risks. Taking a much broader view, what actually is the risk? It may seem plausible, even responsible, to approach senior management with the (worst case) scenario that not preserving online information may lead to a future law suit, but then most of us still remember the doom-laden scenario of Y2K. Ultimately, what is the real preservation driver?
Web Preservation in a Web 2.0 Environment
Brian Kelly spoke at the 2nd Workshop in Aberdeen, part of IWMW 2008 , on this topic and gave a comprehensive recap here. Where Web 1.0 was primarily concerned with content, Web 2.0 focuses on collaboration and communication. Web 1.0 sites are housed in single locations, Web 2.0 brings together numerous third-party services, the network becomes the platform and this creates more complex IPR (Intellectual Property Rights) issues.
He posed two questions: will the use of these services lead to new preservation concerns? And how should we respond to these new challenges?
Using 8 case studies covering topics such as migrating content from one blog to another, moving wiki content to a static Web page, and what happens to your work when a service like Slideshare vanishes, he neatly illustrated that if anyone hoped that Web 2.0 was going to make preservation easier, they were mistaken. It will lead to new preservation concerns. How should we respond? In some cases, Twitter, Skype, Facebook status messages, the individual is probably happy to think of their contributions as disposable, but in others, such as blogs, the only real way to be sure that the content is preserved may be for the bloggers themselves to take responsibility and copy or export the data regularly.
The main point I took from this talk and its subsequent discussion is the matter of responsibility. Many of these services are associated with the individual, not the institution that they work in. It would be fair to assume that the institution should accept the responsibility for preserving the departmental Web site that an academic works in and may create content for, but if the academic also runs a blog through a service like Blogger, would the responsibility not then move to the academic (or even to Blogger)? Students too, I believe, fall outside the institution’s responsibility. Some universities provide these services in-house but what happens when students leave? Is there any point today in institutions ‘reinventing the wheel’ by developing an in-house blogging tool? Why allocate precious resources to a service that will almost certainly be done better elsewhere, be updated more frequently, is probably free to use, and, perhaps most importantly, students may already be using by the time they start at the institution?
The JISC-PoWR Workshops: Inputs and Outcomes
Marieke Guy returned to give an overview of how this project had run and what resources had been created apart from the handbook: presentations on Slideshare, a wiki, mp3 recordings of the talks. She also offered an excuse about the limit of the outputs because of the shortage or time and funds for the project. This was unnecessary, the fact that anything at all has been done about this ever-growing concern is more than enough and I think that all the delegates were suitably appreciative. When asked what else the project team could do if they applied for more funding there was no shortage of suggestions ranging from help with existing software and creating policy guidelines right through to JISC running a central preservation service.
The JISC-PoWR Handbook: Explaining Web Preservation
Kevin Ashley also returned to give an extremely useful overview of the Web preservation landscape that I will condense into What, How and When.
What: Take a second to ask yourself this question: imagine some time from now that your institution has been preserving its Web resources for a decade, you want to find out what was on the homepage of the main Web site 5 years ago; what do you actually expect to see? Is it just the raw content, the text and links? Is it a faithful recreation of how the site looked, functioned and ‘felt’? If the site utilised RSS feeds or some clever AJAX utility, would they still work? If so, are they pulling other archived content for that particular chosen date? Is that also preserved by your institution? What about navigating the preserved site, internal links might work, external ones will link to what?
How: Will the preservation process happen within the authoring system or server? Acceptable perhaps in the short term, but if done through a CMS then you are likely to create a dependency on an external provider. If it happens through the browser then the site becomes representative, like a publication, but also ‘frozen’. The most widely used option is harvesting, examples being the Internet Archive  and the British Library , but this is still far from perfect. How much or how little do you capture? Can the harvesting tool access content held on databases? Will it even capture style sheets?
When: do you preserve daily, monthly, annually? Who decides? What triggers it; every change or just major changes? Is it automatic or manually initiated?
The JISC-PoWR Handbook: Identifying Web Issues
Richard Davis picked up the baton by identifying the main issues that will affect preservation. First identify what is on your Web and then, of that list, what is only available on the Web? It is this information that we should concentrate on and ask: Who owns it? Where is it? And what is it for?
He revisited many issues that had arisen over the course of all three workshops, from the basic – sheer quantity and variety of content – to the specific – problems of aggregation and personalisation.
Breakout Session 2: The Next Steps for Web Preservation in Your Organisation
We were given another chance to expound our opinions through the scenario of thinking about the first steps in creating our own institution’s Web preservation strategy. We duly listed who needed to be involved, what training and education would be necessary, how we would audit the existing information and then implement. However this also highlighted that before any of this could be done resourcing would need to be addressed; how could a strategy be implemented without a budget or available staff?
The JISC-PoWR Handbook - Recommended Approaches
Ed Pinsent tied the whole series up with a concise summary that reflected the content of the handbook and managed to be both cheerily optimistic and, most importantly, realistic. There have been times in the workshops when the already thorny issue of preservation had seemed insurmountable but Ed reassuringly stated that it was possible, just remember:
- not everything
- not every version of every resource
- not forever
- not perfect
Do what it is possible to do whether it is a policy review, some quick wins or a full strategic approach. When it actually comes to “Embedding Web Preservation Strategies within Your Institution” this may be done by:
- Convincing the decision makers
- Including Web Preservation in policy
- Preservation-friendly features in future procurements
- Resources to manage capture and curation of resources
We can all begin the process by working out who we should be collaborating with inside our institution, carrying out an audit of what information we actually have, even if this is just finding out what all the registered domains are from our DNS Manager, who actually requires the resources to be preserved and what can be discarded.
The working draft of this handbook is available to download from the JISC PoWR Web site .
This series of workshops has certainly been useful and the handbook deserves a place on any Web manager’s desk, but more than this it was also reassuring and, let’s be honest, a relief to meet peers facing the same problem who have also yet to grasp the nettle. I learnt much in the workshops but the points that stand out are:
- To implement a Web preservation strategy effectively, buy-in has to be attained at a very high level. Yes, it is vital to collaborate with the relevant colleagues inside the institution to agree upon and carry out the project; but without senior management’s backing it is unlikely to be successful. In one of the discussions we surmised that most universities lack an information strategy because most senior level post-holders come from an academic background. Business has grasped the importance of the Chief Information Officer, one day HEIs might too.
- Once preservation is seen as important, it will be gradually accepted by the current workforce and so can be planned for to ensure it will always be enforced. The perfect time to introduce staff to your institution’s preservation policy and procedures is during their induction.
- If content contributors were always aware that their work would still be accessible in the future, would this raise its quality? Books do not get published by authors, they go through a rigorous selection, editing and proofing process. When was the last time your Web pages went through this quality assurance procedure?
- There will not be a ‘one-size-fits-all’ answer, so assess each area independently. The research work carried out in institutions may already be comprehensively preserved, and the best of it will be published and thus exist in many formats in many locations. Concentrate on the most important areas that are not provided for elsewhere.
- It is somewhat encouraging to remember that not everything is really worth keeping, although it will be time-consuming to gain agreement on what is.
What will be most interesting is seeing what has happened one year from now. Will some proactive delegate meet the challenge, learn from the handbook and be in a position to present their experience to others? I hope so and I hope that the JISC-PoWR team are inspired to keep the momentum going on this project so that there is still a platform in place to allow that presentation to occur.
- JISC-PoWR Project Web site http://jiscpowr.jiscinvolve.org/
- Stephen Emmott, Preservation of Web Resources: Making a Start, July 2008, Ariadne, Issue 56 http://www.ariadne.ac.uk/issue56/jisc-powr-rpt/
- Institutional Web Management Workshop 2008
- Internet Archive http://www.archive.org
- British Library http://www.bl.uk
- Working draft of JISC-PoWR Handbook http://jiscpowr.jiscinvolve.org/handbook/