Quotation is the heart of scholarly argument and teaching, the activity of bringing insight to something complex by focused discussion of its parts. Philosophers who have reflected on the question of quotation have identified two necessary components: a name, pointer, or citation on the one hand and a reproduction or repetition on the other. Robert Sokolowski calls quotation a 'curious conjunction of being able to name and to contain' ; V.A. Howard is more succinct: quotation is 'replication-plus-reference' . We are less interested in the metaphysical aspects of quotation than in the practical ones.
The tools and techniques described here were supported by the National Science Foundation under Grants No. 0916148 & No. 0916421. Any opinions, findings and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the National Science Foundation (NSF).
Quotation, when accompanied by citation, allows us to bring the reader's attention to bear on a particular part of a larger whole efficiently and without losing the surrounding context. A work of Biblical exegesis, for example, can quote or merely cite 'Genesis 1:29' without having to reproduce the entire Hebrew Bible, or even the Book of Genesis; a reader can resolve that citation to a particular passage about the creation of plants, and can see that passage as a discrete node at the bottom of a narrowing hierarchy: Hebrew Bible, Genesis, Chapter 1, Verse 29. We take this for granted.
Quoting a text is easy. But how can we quote an image? This remains difficult even in the 21st century where it is easy to reproduce digital images, pass them around through networks, and manipulate them on our desks.
A scholar wishing to refer to a particular part of an image will generally do something like this: She will open one version of an image in some editing software, select and 'cut' a section from it, and 'paste' that section into a document containing the text of her commentary or argument. She might add to the text of her argument a reference to the source of the image. The language that describes this process is that of mechanical work – cutting and pasting – rather than the language of quotation and citation. The process yields a fragment of an image with only a tenuous connection to the ontological hierarchy of the object of study. The same scholar who would never give a citation to 'The Bible, page 12' rather than to 'Genesis 1:29' will, of necessity, cite an image-fragment in a way similarly unlikely to help readers find the source and locate the fragment in its natural context.
Because images are so easily accessible, manipulated and shared, scholars and teachers are increasingly taking advantage of technology to produce new scholarship and new pedagogies that take advantage of images. The authors of this article, one a botanist working with plants as herbarium specimens and as living accessions in a botanical garden, and one a classicist working with Greek manuscripts, have frequent need to 'quote' images. This article will describe a particular solution to this challenge that has emerged from an ongoing project in the digital humanities. We will discuss the theoretical assumptions behind this approach to using images for scholarship, describe the current technological implementation of this approach, and give two examples of how we use image quotation in the real world.
We will begin by describing the humanities project that was the occasion for a digital library infrastructure that allows image quotation using canonical citation. We will then describe two examples of projects that rely heavily on this approach to the integration of images and text. The first example is firmly in the humanities, a commentary on one peculiar feature of some Byzantine manuscripts. The second is a project that is both in the sciences and aligned with pedagogy: a lesson in herbarium specimens and the relationship between historical botany and research in the field. Together these examples span disciplines and unite the realms of research and teaching; they have in common the need to 'name and contain' particular parts of images, accurately, concisely, and with confidence toward the future.
The Homer Multitext , hereafter 'HMT', is a project of the Center for Hellenic Studies of Harvard University, a research institution in Washington, DC. The HMT intends to bring together, publish, and build scholarship on the primary source documents for our knowledge of the history of Greek epic poetry, particularly the Homeric Iliad and Odyssey. HMT's editors, Casey Dué and Mary Ebbott, designed it as a digital project from the outset .
The data that form the foundation of the HMT are widely varied: inscriptions in Linear B, the earliest written form of Greek, quotations in early Greek prose writers such as Herodotus, Thucydides, the Athenian orators and dramatists, fragmentary papyri, late-antique commentaries and lexica, Byzantine manuscripts. Most significantly, the project depends on digital facsimile editions of Byzantine manuscripts of the Iliad, highly complex documents that include the poetic text juxtaposed with many discrete commentary texts. Two centuries of efforts at editing and understanding these texts in isolation have proven that the only responsible way to interpret these documents is by analysing the component-texts and their interrelations in context .
The range, diversity, and potential quantity of sources that will eventually comprise the Homer Multitext demand a generic digital library service that can scale indefinitely and that can remain relevant as underlying technologies change. The HMT's infrastructure is generic in that it contains and serves 'scholarly primitives' with regard only to type – texts, images, collections of data-objects . It is agnostic of technology because the infrastructure is defined by protocols of request and reply and because the sole method for linking between objects is citation .
The HMT is the occasion for the technological infrastructure that allows image quotation, the subject of our discussion, and the occasion for the first of the examples of how this technology has practical application for digital research.
The infrastructure for the HMT is called CITE., an acronym for 'Collections, Indices, Texts, and Extensions.' Texts and Collections are the two main categories of data that concern the project. Collections of data can include lexica, collections of morphological data, or collections of metadata describing the individual folios of a manuscript. For collections of particular kinds of data, the infrastructure defines 'Extensions' that afford particular functionalities. Image-collections are one such Extension, since in addition to querying metadata fields or retrieval by identifier, we want to get and display binary image data in various ways.
The acronym CITE also refers to the basic mechanism for linking in this digital library architecture: citation. Citation gives us a concise means of identifying, retrieving, and manipulating a scholarly object, and drawing associations between objects. A proper citation should be hierarchical, giving access to a scholarly object at a broad level (e.g. 'the Iliad'), or at a very precise one (e.g. 'the third letter iota in line 1 of Book 1 of the Iliad as presented in the Venetus A manuscript') .
In the CITE architecture, citation is expressed through a Universal Resource Name (URN) notation . A URN is related to a Universal Resource Locator (URL) as an ISBN number is related to a library's call-number. The latter describes where a resource resides in one particular library; the former describes what it is.
For a text, a CITE-URN would look like this:
'urn' identifies the string as a Universal Resource Name. 'cts' identifies it as following the 'canonical text services' protocol; CTS is the part of CITE that handles textual sources. 'greekLit' says that the subsequent identifiers will be from the list related to Greek Literature that the Center for Hellenic Studies maintains; these identifiers correspond to identifiers used by the Thesaurus Linguae Graecae, a decades-old project of the University of California at Irvine . 'tlg0012' is the identifier in our 'greekLit' namespace for Homer. 'tlg001' is the work 'Iliad' under the text-group Homer. 'msA' is a particular edition of the Iliad. '1.12' is the hierarchical citation within the work, in this case, book 1, line 12.
A URN such as this, combined with a call to an online service that uses the CITE protocol, will retrieve the passage in question. Here we have addressed an implementation of service at http://hmt-cts.appspot.com/ . We have added the request 'GetPassage', and given a URN as a request-parameter:
The service will return an XML-formatted response containing the Greek text of Iliad Book 1, line 12, as it appears on the Venetus A manuscript.
By defining a URN-structured citation for each scholarly object, of any type and at any level of detail, the HMT can very easily publish, link, and align a wide range of sources. A Byzantine manuscript, which contains an Iliadic text, several commentary texts, extra-Homeric poetry, graphical elements, all represented as digital images of pages and as transcriptions (and ultimately translations), becomes a simple matter of linking URNs.
The 'I' in CITE is for 'Indices', which is the linchpin of the HMT's digital library infrastructure. A CITE Index is a simple two-column table, consisting of two URNs to be associated, or one URN and one unformed piece of textual data. Text-to-Image linking thus becomes very straightforward, given CITE's URN-standard for citing images, and specific regions of images.
The 'E' in CITE stands for 'Extensions', that is, extensions to the collection-services protocol that delivers pieces of structured data that are not texts. Extensions exist for known types of data for which we want to provide particular modes of interaction defined by requests and responses. For collections of images, the most obvious request is 'GetBinaryImage', and the response is binary image data ready for display.
In the CITE Image service, then, the 'GetBinaryImage' request takes a URN as a citation, just as the Text Service's GetPassage does. A URN to an image might look like this:
Here the URN is identified as of the type 'cite', and the image's identifier is located under the 'hmt' namespace. 'chsimg' is an 'image-group', and 'VA001VN-0503' is the (arbitrary) identifier of a particular image.
Figure 1: Venetus A, folio 1-verso, retrieved from the HMT's Image-service
A CITE URN serves as a parameter to a request from an image-service. The most basic request for an image from the HMT's service at the University of Houston would look like this:
In this system, the image is considered generically and abstractly. The CITE infrastructure does not specify file-type, file-name, location, how any technological details about how the image is stored, or how the service retrieves and serves the image. So the URN – urn:cite:hmt:chsimg.VA001VN-0503 – should remain legitimate and viable regardless of changes to the technology that implements the service.
To identify, retrieve, and reproduce a portion of an image, that is to 'quote' the image, the CITE infrastructure allows a further predicate on the URN that defines a rectangle:
Here, attached to the URN for the image is a predicate consisting of four numbers: 0.2, 0.27, 0.1, 0.1. These are fractions representing the left, top, width, and height of a rectangular region of interest, relative to the whole image. By using percentages, these regions of interest expressed in URN-notation remain valid regardless of how the image is scaled either in storage or delivery. The URN given above defines the head of Helen of Troy as she is illustrated on Folio 1-verso of the Venetus A manuscript. Helen's head appears in a rectangle bounded at 20% from the left, and 27% down on the larger image, and equal in height and width to 10% of the larger image.
Figure 2: The head of Helen of Troy,
defined and retrieved by urn:cite:hmt:chsimg.VA001VN-0503:0.2,0.27,0.1,0.1
The full service-address, request, URN combination for retrieving Helen's head as a quotation of the larger image of Venetus A, 1-verso is:
The Homer Multitext has published digital images of five important Byzantine manuscripts of the Iliad. By making these freely available in the digital realm, the HMT allows scholarship that was previously a practical impossibility. This digital library allows us to place, side by side, manuscripts that have never been in the same country since their original production in Constantinople in the 900s and 1000s of the Common Era.
One interesting feature of several of these Byzantine manuscripts is the scribes' practice of summarising each of the 24 poetic books of the Iliad. These summaries appear on the pages of the manuscript where a given book begins. Furthermore, each summary is a single poetic line, written in Greek using the vocabulary of Homeric epic, and adhering to the same poetic meter, dactylic hexameter , as the epic poem itself.
Figure 3: The summary of Book 1 ('Alpha') that appears in the Venetus A manuscript
For example, the summary of Book 1 ('Alpha') that appears in the Venetus A manuscript, is:
ἄλφα λιτὰς Χρύσου. λοιμὸν στρατοῦ· ἔχθος ἀνάκτων
Alpha contains prayers of Chrysēs; plague among the army; enmity of the leaders.
Figure 4: Meter of the summary of Book 1 ('Alpha') in the Venetus A manuscript.
The meter of that summary is:
¯ ˘ ˘ | ¯ ¯ | ¯ :¯ | ¯ ¯ | ¯ ˘ ˘ | ¯ ¯
a dactyl, three spondees, a dactyl, and a spondee, with a 'pemphthememeral caesura' (a word-break after the fifth half-foot).
The editors of the HMT wanted to present a side-by-side comparison of these metrical summaries from two manuscripts – the 10th century manuscript Marcianus Graecus Z.454 (=822), the 'Venetus A', and the 11th century manuscript Escorialensis ω.I.12 (513 = Allen E4). For each of these, we needed to identify a digital image of a manuscript page, or folio-side. Each of these pages contains many texts – poetic texts and different kinds of commentary – so we wanted to 'quote' the particular area of that image that contains the metrical summary.
The complete publication of these texts  is at:
The data themselves are straightforward, since we needed to capture only four scholarly 'objects' for each of 24 books, for each of two manuscripts: the metrical book-summary as it appears on the manuscript, a transcription of it, a translation of it, and, optionally, a paragraph of commentary.
Figure 5: A simple XML structure for this image-based commentary (larger format)
The relevant portion of each page of the manuscript is easily quoted by specifying an image URN with a defined region of interest.
Figure 6: The entry for the summary of Iliad Book 1 on the Venetus A
The entry for the summary of Iliad Book 1 on the Venetus A is:
<cell role="label">1 Alpha</cell>
<cell role="data">+ ἄλφα λιτὰς Χρύσου. λοιμὸν στρατοῦ· ἔχθος ἀνάκτων +</cell>
<cell role="data">Alpha contains prayers of Chrysēs; plague among the army; enmity of the leaders.</cell>
The document follows a very simple interpretation of the Text Encoding Intiative's standards for XML documents . All 24 summaries for the Venetus A appear in one division, with all 24 summaries from the E4 manuscript in another.
Figure 7: A very simple document structure for comparing elements of two manuscripts
The URN notation for regions of interest on images allows us to publish this image-based edition with commentary very efficiently and with confidence. The image's identifiers will not change, and the regions of interest will remain valid regardless of how the images are scaled and delivered.
The simple XML document constitutes the HMT's publication of this material. Its presentation is a separate concern, and in this case we handle it with a stylesheet that transforms the XML into HTML for display; this transformation includes resolving the citations to regions of images into binary images for display.
Figure 8: The XML commentary transformed, with citations to images resolved into quotations
These metrical summaries are quirky little texts that have received very little attention, probably because they have never been published anywhere. It has certainly never before been easy to compare this text as it appears on a Venetian manuscript with its counterpart on a Spanish one. Several things emerge even at first glance. First, the two manuscripts in no case present identical summaries for a given book. Variations include different punctuation and accentuation of the Greek, variations in noun cases, inversions of word order, and substitutions of words. In places the variation seems attributable to oddities of Homeric Greek, which the author or authors of these summaries used with, perhaps, imperfect understanding. We have not concluded much from our initial look at these texts, but we are becoming convinced at least that there was not a single source-document containing 24 metrical summaries from which subsequent scribes drew their texts. Rather, this was traditional material, perhaps existing in many versions, or perhaps existing mainly in the memories of professional scribes. Did they have to pass an exam on the contents of each book of the Iliad? If so, these would be very useful mnemonics, and their variant forms would be easy to understand.
The herbaria of the world house millions of plant specimens. These specimens are a tremendous scientific resource. They contain most of the definitive 'type specimens' that identify individual plant species. They provide life-size examples of plant structures, and occasionally contain viable seeds. Almost every specimen is labelled according to the date and place it was collected, constituting valuable data on plant biogeography.
The potential of herbaria, however, has barely been explored. In recent years, scientists have used herbarium data to explore the spread of alien plants , to determine the range of plants in phylogeographic studies  and to sequence the DNA of historical fern samples . But this work has been done on the merest tip of an iceberg of data that is largely ignored. Botanists estimate that at least 70,000 living plant species have never been described. At least half of those undescribed species have probably already been collected and are sitting in herbaria waiting to be discovered . In 2009 botanists found type specimens of a number of plant species collected by Charles Darwin during the 1831-36 voyage of the HMS Beagle; these specimens sat in several of the most important herbaria in the U.S. and Europe but were not 'found' until quite recently . Other treasures surely lurk in herbarium collections, just waiting for someone to unearth them.
The potential of herbaria to contribute to scientific research and teaching is hampered by lack of access. Herbaria tend to be old-fashioned collections. Dried plant specimens are stacked in metal cabinets, usually divided by genus and species but not necessarily easily searchable by any other data such as collection location. Many herbaria have put their inventories online and many have made progress with digital imaging of specimens, but the sheer volume of specimens makes it difficult to digitise entire collections. Herbarium research often requires the researcher to flip through individual specimens by hand, either having requested the specimens as a loan and received them by mail or by visiting the herbarium in person. Instructors who wish to use herbarium specimens as a teaching tool must pull individual specimens from their local herbarium and carry them to the classroom or laboratory, with the permission and assistance of the herbarium curator.
Digital library infrastructure holds great promise for using herbarium specimens to teach botany. Many students find physical herbarium specimens strange and off-putting; digital image quotation, however, can make it easy to explain the parts of a specimen and lead students to areas that are not immediately obvious on the specimen page itself. For example, an instructor can zoom in on a particular part of a plant to illustrate its structure, focus on a label to explain what information it contains, or examine the different methods used to affix plants to a page. It is a simple matter to link to other forms of data, such as a Google Map indicating the location in which a plant was collected.
Here is an example of a short guide for students on how to read an herbarium specimen, and how historical and modern specimens relate to living collections, either in situ in the wild, or ex situ in the collection of a botanical garden :
This document is more prosaic than the previous example, that of the Homeric manuscripts. Here we present a straightforward piece of expository prose that we want to illustrate with images. Some of the illustrations are entire digital images, but at other times we want to call out a particular part of an image, to 'quote' it. We used the TEI's standard <figure> and <link> elements, the latter of which provides for a 'target' attribute; for this 'target' we gave a CITE URN to an image or region of interest on an image.
Obviously even the most flexible use of digital imagery will not help students or researchers get all of the benefits of handling actual herbarium specimens. A photograph of a specimen will not provide the tactile data of a real dried plant, nor will it ever furnish DNA. The problem of digitising entire collections is also very real. Nevertheless, herbaria as libraries of scientific data need to be as accessible as possible for scholars and teachers; the more precisely and easily we can incorporate digital facsimiles of herbarium specimens into online publications, the more easily we can reach a wide audience effectively.
The files used for these two examples are available to download.
For the metrical summaries on two Byzantine manuscripts of the Iliad, we have made available a downloadable metrical_summaries.zip archive. This archive contains the following:
metrical_summaries.xml. This is the XML file. It includes a document-type declaration that should allow an XML editor to validate it against the 'teilite.rng' schema. It also includes a stylesheet declaration that will allow a modern Web browser to display us using the specified XSLT stylesheet.
hmt_roi.xsl. This is an XSLT stylesheet that will transform the simple XML of the 'metrical_summaries.xml' file into HTML for display, including constructing image-elements that resolve the citations to images and regions of interest into quotations in the form of binary images.
metrical_summaries.html. This is a transformation of the XML file, using the XSLT file, useful in cases where a browser is not able to perform the transformation automatically. It is also potentially useful for inspecting the output of the XSLT file.
css. This is a directory containing Cascading Stylesheet files and incidental image files for displaying the manuscript commentary.
ROI-Calculations.xlsx. This document is an Excel spreadsheet (which will also work with the open-source OpenOffice and its variants) used in the workflow for defining regions of interest. This workflow is described below at some length.
For the handout on how to read herbarium specimens, we have made available a downloadable herbarium_specimens.zip. This .zip archive contains the following:
BotanicalSpecimens.xml. The illustrated essay itself, in TEI-XML, using the minimal markup necessary to represent the content.
BotanicalSpecimens.html. The XML file transformed to HTML using the stylesheets described below. This file depends on the tei.css file included in the archive.
cite-urn-figure.xsl and stylesheet. A file and a directory. The latter is the TEI's standard package of stylesheets for transforming TEI documents to HTML. In that directory, the only customisation has been made to the file /xhtml/tei.xsl, in order to include cite-urn-figure.xsl in the process of transformation. That file includes templates for transforming CITE-URNs to requests to the image services that can resolve the URNs to binary image quotations.
ROI-Calculations.xlsx. This document is an Excel spreadsheet (which will also work with the open-source OpenOffice and its variants) used in the workflow for defining regions of interest. This workflow is described below at some length.
oXygen_transformation_scenario.xml. This is an export of the 'transformation scenario' that the oXygen XML editor ( http://oxygenxml.com )  uses to convert BotanicalSpecimens.xml to BotanicalSpeciments.html. This file includes values for certain parameters that will override the default values in the TEI's stylesheets.
To view the examples, download and decompress the .zip archives. For the Metrical Summaries, it should be enough to open the file metrical_summaries.xml or metrical_summaries.html in a modern Web browser; we have tested this with Safari and Firefox. For the Botanical Specimens, similarly, open BotanicalSpecimens.html in a Web browser.
The other files in the archive should remain where they are relative to the main document files.
To examine the underlying code, open either metrical_summaries.xml or BotanicalSpecimens.xml in any text-editing software.
At the moment, creating a URN for image citation requires a certain amount of set-up.
We begin by using the National Institute of Health's image-manipulation tool, ImageJ (http://rsbweb.nih.gov/ij/) . This program is free, and it gives us the ability to draw rectangles on images and capture co-ordinates from those rectangles.
Figure 9: An image opened in ImageJ, with a rectangle drawn, and 'Show Info…' displayed
By drawing a rectangle on an image and selecting 'Show Info…' from the 'Image' menu, we can get information for the height and width of the overall image (in pixels), as well as of the selected rectangle (in terms of 'x' (left), 'y' (right), width, and height, expressed as pixels).
Pixel dimensions are not canonical, as digital images are normally scaled. To translate the pixel dimensions into percentages, we use a spreadsheet document that serves as a 'URN Calculator'.
Figure 10: A spreadsheet file that calculates CITE URNs
This customised spreadsheet accepts dimension information as pixels, as well as other data such as image-group, namespace, and image-ID. It calculates a complete CITE-URN with a region of interest predicate expressed as percentages (rounded to three significant digits).
This combination of ImageJ and the 'roi-calculator' was our tool for both examples described in this article. We should note that this system allowed us to work quickly from very low-resolution versions of the manuscript images, while the resulting URNs are valid for the high-resolution version.
We would be quick to note an important difference between this, admittedly involved process, and the traditional involved process of cutting and pasting areas from images: the final result of this process is a canonical reference, abstracted from any ephemeral manifestation of files, formats, and services. Even if the CITE architecture falls by the wayside, the information in a CITE-URN would admit mechanical reproduction into a new useful format, provided only that the images continue to exist somewhere.
This digital library infrastructure is a work in progress. The best way for interested readers to follow progress in the CITE architecture and its implementations is through the Homer Multitext's Web site  or by contacting the authors directly.
This infrastructure, we should emphasise, assumes openly licensed content. There is no provision in CITE for scholarly primitives to which access is limited to subscribers. The Clemson Herbarium and South Carolina Botanical Garden are public resources, and as such their data are freely available. The HMT's editors have always adhered to a policy of open content; all data in the HMT that are not in the public domain are licensed under a Creative Commons license , and all source code for the project is licensed under the General Public License .
Professor of Classics
Christopher Blackwell received a BA in Classics from Marlboro College in Vermont, and a PhD in Classics from Duke University in North Carolina. He is currently Professor of Classics at Furman University in South Carolina. He is the author of books and articles on Alexander the Great, Athenian Democracy, and topics in digital humanities. With Neel Smith, Blackwell is a Project Architect of the Homer Multitext Project of the Center for Hellenic Studies of Harvard University.
Graduate Research Fellow
Amy Hackney Blackwell received a BA in History from Duke University in North Carolina, an MA in History from Vanderbilt University in Tennessee, a JD from the University of Virginia School of Law, and is currently a PhD candidate in Plant and Environmental Science at Clemson University in South Carolina. Hackney Blackwell has written books and articles on legal, historical, literary, and scientific topics for professional, popular, and scholarly audiences. With Patrick McMillan she is conducting research on policies, practices, pedagogies, legal and diplomatic issues, and scientific applications of biological research in the context of public Botanical Gardens.