Web Magazine for Information Professionals

Electronic Homer

Martin Mueller reads Homer electronically with the TLG, Perseus, and the Chicago Homer.

Introduction and summary

In the following pages I look at reading Homer in Greek as a paradigm of “reading with a dictionary” and other forms of “look-up” reading for which a digital environment offers distinct advantages. I take as my point of departure the activity of reading Homer in a print environment with a text, dictionary, and commentary, and then consider the added value of three electronic tools:

  1. the Thesaurus Linguae Graecae (TLG), a virtually complete archive of all ancient Greek texts
  2. the Perseus Project, a bilingual text-and-dictionary web site that provides access to a large chunk of classical and Hellenistic Greek texts
  3. the Chicago Homer, a specialized bilingual web site of Early Greek epic that will be published by the University of Chicago Press late in 2000 [1]

What can you do with any electronic tool that you cannot do with a printed text and a dictionary? And what can you do with a special tool like the Chicago Homer that you cannot do with more broadly based tool such as Perseus or the TLG? I want to give concrete answers to those questions, but I also want to use the particular example as a way of reflecting on the ways in which information technology offers some distinct advantages for “reading” canonical texts and navigating the envelope of annotation and finding tools that has traditionally surrounded them.

Reading electronically

There is much to be said in favor of lowering the claims made for what electronic editions can or cannot do. The transformative power of “e-ditions” is sometimes wildly exaggerated, and in the collision of authorial hyperbole and readerly skepticism it is easy to lose sight of the limited but significant advances that electronic editions offer to readers in search of better understanding. Any edition of a text, whether printed or electronic, rests on ancient technologies of reading and writing, which in turn rest on evolved human capacities for processing language. It is worth repeating the obvious point that the core activities of reading and writing in a narrow sense have remained and will remain quite unaffected by information technology. Although virtually all writers have come to depend on the convenience of word processors, it is doubtful whether word processors have on average produced faster or better writers. As for reading, the computer industry has only begun to design reader-friendly devices, and it will be some time before any computer screen can compete with a moderately well designed printed page. Even when it does, it will not transform the act of reading.

Considered as a tool for reading, an electronic edition of a text is a very poor cousin of a decently printed book. Sometimes an electronic edition is the only version of a document available to a reader. In such cases, it serves as a better surrogate than, say, microfilm. But with highly canonical texts like Homer printed editions are plentiful and are either available freely in libraries or can be bought for little money. Thus the added value of something like the Chicago Homer cannot rest on any claim that it presents a superior tool for reading a text, where “reading” is understood as the bundle of activities involved in going through a book more or less from beginning to end.

If the computer is on the whole a wretched tool for reading (and is likely to remain so), it is on the other hand a terrific tool for looking up things. The simple reasons for this bear spelling out. Reading in a broad sense consists of “getting ready” or “getting there” and actually reading. Once I have the book I want to read, getting ready is simple: I sit down, open the book, and read. I may also adjust the light, get a cushion, or do various other things involved in curling up with a book. In any event, getting ready takes up only a small fraction of the time spent reading, unless I’m the kind of reader for whom getting ready to read is the more important part of the activity, in which case I may doze off after a few minutes.

When I look things up in a dictionary or look for a passage in a book, the balance shifts. Getting ready takes as much time as reading itself. In fact, looking for something in a book that is not designed for look-ups may take much longer than reading it once I have found it. What people ordinarily call “reading” is actually a fairly special case of the complex activity “getting there and reading.” It describes a situation in which “getting there” takes up a negligible fraction of the total activity because I have the text at hand, am competent to read it without further aid, or have immediate access to and am familiar with such look-up tools as I require. But such situations of readerly equilibrium are special.[2] They do not apply to the novice who has insufficient knowledge to make sense of the text or to the curious and expert reader who finds his knowledge no longer sufficient and wants to know more.

While computers offer no advantages for “reading” in the narrow sense, they can, on the other hand, reduce by orders of magnitude the time it takes to “get there.” If a digital environment does better than print in achieving an acceptable balance of “getting there ” and “reading” it offers a clear advantage for working with a text or “reading” in the broad sense.

Reading ancient Greek electronically

The reading of ancient Greek offers a good example of the very practical components of this cost benefit analysis. Very few readers know ancient Greek well enough to read it without frequent recourse to a dictionary or grammar, and because of their highly specialized interests, the few readers who can do so are likely to be particularly intensive users of such reference works. Reading ancient Greek is an activity with an intrinsically high look-up cost. If in reading the Iliad, I need to look up a word, it may take me thirty seconds to find it in a dictionary. If I follow up a citation, it may take another thirty seconds if it is in the Iliad; it may take a minute or more if it is in another book on a shelf within reach. If I do this occasionally, it is a pleasant interruption from the flow of reading. If I do it often, the minutes add up very quickly.

Reading ancient Greek with the help of the TLG

There are three electronic tools that in different ways can cut down the time of getting there. The most fundamental is the Thesaurus Linguae Graecae (TLG). This is an electronic archive of virtually every ancient Greek text. From one perspective, it is “merely” a digital transcription of print editions, some better than others. The digital encoding includes nothing but the texts and the conventional citation schemes. The query potential of the data is quite primitive since the archive only supports conventional string searches. From another perspective, this thirty-year old project is a phenomenal achievement because it has put all the remains of Greek literature within the confines of one search space. For any word in any text, I can find the other occurrences of that word in the text, in the whole corpus, or in a chosen subset of the corpus. The task of looking for all occurrences of a Greek word in the printed canon is practically impossible. Ten years ago such a search with the search tool Pandora took about 45 minutes. Today, it may not be possible to formulate a search that takes more than a few seconds to execute on an up-to-date desktop computer.

The TLG makes no concessions to the user. The available search engines for it do not tell you what a word means; they simply return all the lines of text in which a specified character string occurs, together with the canonical citation of the line. You must know a fair amount about Greek dialects and morphological rules to retrieve the differently inflected occurrences of a “word.” Search results are not summarized in any way, whether a search returns three or 3,000 hits. But if you want to know whether a word in the Iliad also occurs in the late antique poet Nonnus, the TLG is the only place to go, and it will tell you in a split second. And the TLG is a good demonstration of the fact that in the conversion from printed to digital text the first step is by far the biggest: once an electronic text exists, a user with sufficient expertise in both the discipline and the technology can perform an amazing variety of previously impossible searches.

The text and dictionary environment of Perseus

The text-and-dictionary environment of Perseus makes its archive much more accessible to a reader with less expertise in the discipline or the technology. Perseus offers a very special digital environment, currently unmatched for any other substantial linguistic corpus. It contains a large chunk of the surviving texts from archaic, classical, and Hellenistic Greece, many of them derived from the TLG, and all of them accompanied by English translations. Every wordform in the Perseus corpus contains its possible morphological descriptions, and through this morphological parser it is linked to the lemmata or dictionary entry forms in Liddell-Scott-Jones (LSJ), the most authoritative dictionary of ancient Greek. All the citations in the dictionary are in turn linked back to the Perseus corpus. The English equivalent of this would be a digital corpus in which any wordform in much of the literature from Chaucer to Joyce is linked directly to its lemma in the Oxford English Dictionary.[3]

When I read the Iliad in the mountain cabin where I am writing this piece, it is not faster to look up a word online than it is to look it up in the dictionary: my modem connection is very slow, and since I have used my copy of LSJ for forty years, I can find my way around it quite fast. On the other hand, if I want to follow up a citation from LSJ, I have a choice between using my very slow modem or not doing it at all: there is no corpus of Greek literature on the shelves of my mountain cabin. At home over a DSL connection or in my office, the advantages of Perseus are striking: if I click on a word in the text, I will get to the dictionary entry in less than three seconds, and in less than ten second I can get from a citation in the dictionary to the full text. If I look up enough words, the time savings are very considerable.

There is of course a “moral hazard” aspect to this improvement. The easier it is to do something, the less I’m inclined to ask whether it is worth doing in the first place. Every form of saving time generates new ways of wasting it. That melancholy truth, however, is a better argument for moderating my enthusiasm about such advances than for rejecting them. If you have an interest in reading ancient Greek (a very big “if”) there is little doubt that the digital environment of the Perseus corpus offers distinct advantages to readers at all levels of competence and encourages forms of textual exploration by drastically lowering the cost of “getting there.”

If I make my way through a book of the Iliad with the help of Perseus I will almost certainly use a printed text of the Iliad to “read” and use the screen text as a point of departure for look-ups. Screen display of Greek would have to improve by orders of magnitude before I give up the clarity and familiarity of the printed page. It is a nice question whether in such a case the computer is an extension of the book or the other way round. But the modalities of my work are more heavily dependent on the computer than on the book. Without the book, I would curse the wretched display of the text on the screen. Without the computer, I would have to pay the look-up costs of the print medium, usually high, and often prohibitive.

Compared with the raw text archive of the TLG, the data in Perseus are more heavily mediated. The morphological tools in particular make Greek texts much more accessible to readers with little Greek. Expert readers will also find Perseus more convenient and informative provided they need not go beyond the (generous) limits of its archive. Searching by lemma, which you can do in Perseus, but not in the TLG, is simpler and faster than capturing the variants of a lemma through through truncation and wildcard characters. Perseus also contains primitive but effective frequency data. Proper humanists like to shudder at numbers, but very little reflection shows that in order to claim knowledge of a word you must know not only what it means, but how common it is and in what contexts it is likely to occur. Perseus keeps frequency data by broad genres (hexameter, poetry, drama, prose) as well as by author. It tells you at a glance and with some precision that the first word of the Iliad is more common in epic than in poetry or drama and much less common in prose. It is much harder to extract this kind of information with equal precision from the TLG or a print dictionary.

Translation and transliteration in Perseus

Oscar Wilde said of Loeb editions that they were the kind of book where you have to look at the original to figure out what the translation means. He had earned the right to this joke by his great competence as a Hellenist. Today most professional Hellenists depend heavily on translations, a fact that is no less true for being weakly acknowledged. To read Greek for the most part means to read bilingually: it is probably a very common practice to “preread” in English and focus on key passages in the original. As an “electronic Loeb Library” Perseus does not only provide a service ad usum delphini, but is a basic reference tool for the professional scholar.

Perseus also allows the display of text in Latin transliteration. Hellenists like to look askance at this practice, for reasons that have little to do with philology and a lot with guild mentality. There is nothing particularly authentic about Greek as it has been printed for 500 years. Transliteration not only involves no significant loss of information but is arguably closer to the orthographical practices of Plato’s youth. But most importantly, transliteration can be a surprisingly effective tool in making some aspects of the original accessible to a reader without Greek. An educated speaker of English has a tacit knowledge of many Greek words, which it is possible to draw on in pedagogical situations through the combination of translation, transliteration, and some of the search features in Perseus. With some guidance, even Greekless readers can get a feel for the semantic contours of keywords or concepts in Greek texts that it would not be possible to derive from the translation alone. This limited but real access to aspects of the original for a Greekless reader is a distinct feature of the electronic environment in Perseus. It is not matched by anything available in print.

The Chicago Homer

The Chicago Homer is a highly specialized tool. It restricts itself to the small corpus of the quarter million words that make up Early Greek epic (Homer, Hesiod, Homeric Hymns), but building on the base of TLG and Perseus, it processes the data of its narrow corpus much more heavily in ways that are responsive to the linguistic peculiarities of the corpus and the needs of sophisticated researchers as well as readers with little or no Greek. The very granular (and often manual) data processing that underlies this project is simply not feasible with larger corpora like Perseus or TLG, and except for the New Testament there probably is no Greek text that has received similarly close attention. The parallel is not an accident. There will come a day when nobody will read ancient Greek anymore. But Homer and and the New Testament will almost certainly be the last two texts to go.

Like TLG and Perseus, the Chicago Homer does not concern itself with textual variants, but accepts a more or less standard text for what it is. While it is the case that the electronic medium offers many opportunities (so far unrealized) for representing textual variance in Early Greek epic, it is also the case that for all practical purposes dependence on any standard text is good enough to realize most of the benefits that accrue from the query potential of a text in a digital format.

Like Perseus, the Chicago Homer includes several features for readers with little or no Greek. It uses the translations of Richmond Lattimore and Daryl Hine. Both translators closely follow the line structure of the original and make it easier for a Greekless reader to navigate the original via the translation. The Chicago Homer also includes an English-Greek index derived from Lattimore’s Homer translations, which permits quite precise triangulations of semantic aspects of the original.

The most distinctive features of the Chicago Homer are extensions of Morpheus, the morphological parser in Perseus. Morpheus is a bundle of rules that establish for every wordform in the Perseus corpus its possible grammatical descriptions. The Chicago Homer disambiguates the many instances where a wordform has more than one valid descriptions and establishes the grammatical form it represents in a specific location. This is sometimes useful for the novice reader, although such readers usually do not have much difficulty determining which of several possible descriptions applies in a particular context. The true utility of such disambiguation lies elsewhere: the Chicago Homer contains a complete inventory of all morphological phenomena separately or in combination. Thus readers who encounter a particular morphological form (eg aorist optative passive) can look for other forms that match it wholly or in part. At a more abstract level, the Chicago Homer supports inquiries into the frequency and distribution of morphological phenomena across the corpus of Early Greek epic. This is a pretty arcane field, but for those interested in it, the Chicago Homer gives very fast access at different levels of granularity to information that could not easily be extracted from the TLG or Perseus.

A second extension is also based on but leads away from Morpheus. Phrasal repetition (“rosy-fingered dawn”) lies at the heart of Homeric poetry, and the kind of repetition found in Homer can be captured by a computer with considerable precision. Because morphological disambiguation has lemmatization as a byproduct the Chicago Homer contains a a version of the text in which every inflected form is replaced by its lemma, and this abstract model of the text is the basis for a complete index of all repeated phrases or strings of words that occur more than once. Because the lemmatized text ignores textual difference due to inflectional variants, the index supports a considerable amount of “fuzzy matching.” Repeated phrases are treated as lexical items and can be searched by length, frequency, location, or words contained. As a result, the Chicago Homer offers significant advantages for the systematic investigation of repetitive phenomena, whether at a summary or finely granular levels. Printed commentaries, such as the great nineteenth century editions by Leaf and Ameis-Hentze, have done an excellent job of recording repeated lines and half-lines. But there are several ways in which the management of this information in a digital environment is superior:

  1. The local identification of repetitions in print commentaries rests on variable criteria of significance. In the Chicago Homer, an algorithm is used to generate the list of repetitions. While this algorithm is liable to a (very small) margin of error for false positives and negatives, its very “stupidity” is a huge advantage: repeated phrases that readers experience as “meaningful” are embedded in a wealth of “meaningless” functional or grammatical repetition that turns out to be surprisingly interesting for many purposes. The Chicago Homer offers for the first time a firm empirical base for defining the circumstances under which something “counts as” a repetition of a certain kind.
  2. The sheer mass of Homeric repetitions makes it difficult to generalize about repetitive phenomena with precision if the evidence exists in the form of individual annotation of particular lines, from which it is possible, but extremely tedious, to aggregate data for specific purpose. The database in the Chicago Homer contains some 180,000 occurrences of about 36,000 distinct repetitions. These data can be quickly grouped, sorted, and summarized by various criteria.
  3. Perhaps most strikingly, repeated phrases can be displayed in the text as links from which the reader can go to other occurrences of the phrase, navigating, as it were, the neural networks of bardic memory. While the index of repeated phrases exists in the Chicago Homer as tables in a databases, the results of queries based on them can be “projected” on the “screen” of the text. This is probably the most original feature of this project, and it can be thought of as a visual simulation of the experience of the putative original audience for whom repeated phrases resonated with the contexts of their other occurrences. This feature is of equal interest to the novice and expert reader. For the novice, it identifies potential points of resonance. For the expert, it establishes the degrees and parameters of such resonance with a precision not found in any other tool.

Conclusion

Let us take stock of the extensions that various electronic tools offer to the reader of Homer beyond the resources of the page and the associated finding tools of a print environment. The TLG, Perseus, and the Chicago Homer wrap different layers of information around the text. A great deal of tedious labor went into the construction of these layers in such a way that a contemporary user can, in most instances, retrieve information virtually instantaneously. In the TLG, this additional layer was created by the transformation of the corpus of printed Greek into a single search environment in which for any single text all other texts become an archive of lexical information. It took decades to complete this task, and probably nobody in the early seventies imagined that thirty years later a user with relatively modest resources could within seconds complete almost any imaginably search.

In Perseus, the additional layer consists of the integration of a narrower corpus with a morphological parser, the major dictionary, and the simple but powerful quantitative reporting powers of a digital text environment. As a result, the canonical Greek authors from Homer to Plutarch exist in a search space that greatly lowers access hurdles for inexperienced users and gives to professional Hellenists much more precise data about usage.

The Chicago Homer surrounds the small corpus of Early Greek epic with a more intensive layer of information. It goes beyond Perseus in disambiguating all word forms with multiple morphological descriptions. Its index of repetitions takes advantage of information technology to create a complete inventory of a stylistic feature that is crucial to this particular form of poetry. This is a very corpus specific form of textual elaboration and probably would not be a sensible thing to do for other works.

How much difference do these tools make to our reading of Homer? None of them will turn a reader without aptitude or interest into a good reader of Homer. A gifted reader with a dog-eared print copy will always do better than a plodding user with a fancy digital toolkit, and disappointment is in store for whoever thinks that using Perseus or the Chicago Homer is a substitute for learning Greek. On the other hand, for a sensitive reader with a knack for taking advantage of the query potential of electronic texts, the tools create new opportunities for working with texts and will produce some insights not easily available to the reader of the printed page. One should not expect more from computers.

References

[1] The Chicago Homer is a bilingual database that uses the search and display capabilities of electronic texts to make distinctive features of Early Greek epic accessible to readers with or without Greek. Its editors are Ahuvia Kahane and Martin Mueller. Its technical editors are Craig Berry and Bill Parod. The Chicago Homer will be published by the University of Chicago Press with the support of Northwestern University Academic Technologies and the Northwestern University Library.
[2] Craig Berry (in correspondence) draws my attention to the famous scene in the eighth book of the Confessions, where the despairing Augustine hears the voice of a child next door repeating the refrain “tolle lege” (take, read), opens his copy of the Bible at random and finds the verse that triggers his conversion: “Augustine’s conversion experience could be described (perversely but accurately) as a fantasy of rapid-access text retrieval. The point being that – like the airplane – the electronic text/concordance is a technology that finally lets us do something we’ve always wanted to do.”
[3] The analogy ignores some questions of scale and orthography. Perseus, which includes pretty much all the highly canonical Greek texts, contains about four million words (roughly five times the Shakespearean corpus), compared to the 76 million words of the TLG. An English “Perseus” from Chaucer to Joyce would be a database of some 500 million words with several million distinct wordforms. The aggregate of Chadwyck-Healey databases approximates such a corpus, but the variability of English spelling makes cross-corpus searches difficult, and there are as yet no tools that let users look up a modern spelling and retrieve all orthographic variants. Perhaps because ancient Greek is a heavily inflected language, the orthographic practice of manuscripts and editions was standardized from an early period on. Thus orthographic variance is a relatively minor difficulty in the digital transcription of ancient Greek texts. It is a major problem with creating an English corpus that can be dependably searched.
[4] See also a companion article to this one by Jeffrey Rydberg-Cox et al, “Knowledge Management in the Perseus Digital Library”, Ariadne Issue 25, http://www.ariadne.ac.uk/issue25/rydberg-cox/

Author Details

 Martin Mueller
email: martinmueller@nwu.edu
www.nwu.edu