Overview of content related to 'google docs' http://www.ariadne.ac.uk/taxonomy/term/13931/all?article-type=&term=&organisation=&project=&author=dorothea%20salo&issue= RSS feed with Ariadne content related to specified tag en Retooling Libraries for the Data Challenge http://www.ariadne.ac.uk/issue64/salo <div class="field field-type-text field-field-teaser-article"> <div class="field-items"> <div class="field-item odd"> <p><a href="/issue64/salo#author1">Dorothea Salo</a> examines how library systems and procedures need to change to accommodate research data.</p> </div> </div> </div> <p>Eager to prove their relevance among scholars leaving print behind, libraries have participated vocally in the last half-decade's conversation about digital research data. On the surface, libraries would seem to have much human and technological infrastructure ready-constructed to repurpose for data: digital library platforms and institutional repositories may appear fit for purpose. However, unless libraries understand the salient characteristics of research data, and how they do and do not fit with library processes and infrastructure, they run the risk of embarrassing missteps as they come to grips with the data challenge.</p> <p>Whether managing research data is 'the new special collections,'[<a href="#1">1</a>] a new form of regular academic-library collection development, or a brand-new library specialty, the possibilities have excited a great deal of talk, planning, and educational opportunity in a profession seeking to expand its boundaries.</p> <p>Faced with shrinking budgets and staffs, library administrators may well be tempted to repurpose existing technology infrastructure and staff to address the data curation challenge. Existing digital libraries and institutional repositories seem on the surface to be a natural fit for housing digital research data. Unfortunately, significant mismatches exist between research data and library digital warehouses, as well as the processes and procedures librarians typically use to fill those warehouses. Repurposing warehouses and staff for research data is therefore neither straightforward nor simple; in some cases, it may even prove impossible.</p> <h2 id="Characteristics_of_Research_Data">Characteristics of Research Data</h2> <p>What do we know about research data? What are its salient characteristics with respect to stewardship?</p> <h3 id="Size_and_Scope">Size and Scope</h3> <p>Perhaps the commonest mental image of research data is terabytes of information pouring out of the merest twitch of the Large Hadron Collider Project. So-called 'Big Data' both captures the imagination of and creates sheer terror in the practical librarian or technologist. 'Small data,' however, may prove to be the bigger problem: data emerging from individual researchers and labs, especially those with little or no access to grants, or a hyperlocal research focus. Though each small-data producer produces only a trickle of data compared to the like of the Large Hadron Collider Project, the tens of thousands of small-data producers in aggregate may well produce as much data (or more, measured in bytes) as their Big Data counterparts [<a href="#2">2</a>]. Securely and reliably storing and auditing this amount of data is a serious challenge. The burgeoning 'small data' store means that institutions without local Big Data projects are by no means exempt from large-scale storage considerations.</p> <p>Small data also represents a serious challenge in terms of human resources. Best practices instituted in a Big Data project reach all affected scientists quickly and completely; conversely, a small amount of expert intervention in such a project pays immense dividends. Because of the great numbers of individual scientists and labs producing small data, however, immensely more consultations and consultants are necessary to bring practices and the resulting data to an acceptable standard.</p> <h3 id="Variability">Variability</h3> <p>Digital research data comes in every imaginable shape and form. Even narrowing the universe of research data to 'image' yields everything from scans of historical glass negative photographs to digital microscope images of unicellular organisms taken hundreds at a time at varying depths of field so that the organism can be examined in three dimensions. The tools that researchers use naturally shape the resulting data. When the tool is proprietary, unfortunately, so may be the file format that it produced. When that tool does not include long-term data viability as a development goal, the data it produces are often neither interoperable nor preservable.</p> <p>A major consequence of the diversity of forms and formats of digital research data is a concomitant diversity in desired interactions. The biologist with a 3-D stack of microscope images interacts very differently with those images than does a manuscript scholar trying to extract the underlying half-erased text from a palimpsest. These varying affordances <em>must</em> be respected by dissemination platforms if research data are to enjoy continued use.</p> <p>One important set of interactions involves actual changes to data. Many sorts of research data are considerably less usable in their raw state than after they have had filters or algorithms or other processing performed on them. Others welcome correction, or are refined by comparison with other datasets. Two corollaries emerge: first, that planning and acting for data stewardship must take place throughout the research process, rather than being an add-on at the end; and second, that digital preservation systems designed to steward only final, unchanging materials can only fail faced with real-world datasets and data-use practices.</p> <p></p><p><a href="http://www.ariadne.ac.uk/issue64/salo" target="_blank">read more</a></p> issue64 feature article dorothea salo california digital library dcc google oai university of wisconsin hydra algorithm api archives bibliographic data big data blog cookie curation data data management data set database digital curation digital library digital preservation digitisation dissemination drupal dspace dublin core eprints fedora commons file format flickr google docs infrastructure institutional repository interoperability library management systems linked data marc metadata mods oai-pmh open source preservation rdf repositories research search technology software standardisation standards sword protocol wiki xml Thu, 29 Jul 2010 23:00:00 +0000 editor 1566 at http://www.ariadne.ac.uk