Electronic Publication of Ancient Near Eastern Texts
The civilizations of the ancient Near East produced the world's first written texts. In both Egypt and Mesopotamia, recognizable texts begin to appear in the late fourth millennum B.C. A well developed system of numerical tabulation combined with a varied and sophisticated repertoire of sealings and seal impression is evident even earlier across a wide geographical range in Western Asia and evidence from recent archaeological discoveries in Egypt promises to push the origins of writing even further into antiquity.
For the first two millennia or so of the world's written record Near Eastern texts were written in one of several varieties of cuneiform, or in Egyptian hieroglyphs and its cursive variant known as hieratic. In the latter half of the second millennium B.C. scripts with recognizably alphabetical characteristics begin to appear, and rapidly spread among the languages and dialects of the Eastern Mediterranean world, eventually spawning a host of descendents and borrowings across Asia, Africa and Europe.
Writing more than a half-century ago, A. T. Olmstead began his monumental study of the the Old Persian period with the memorable statement: "When Cyrus entered Babylon in 539 B.C., the world was old. More significant, the world knew its antiquity." Ancient scholars and scribes collected and catologued historical and scientific records from their own immediate and distant pasts; they observed and organized natural phenomena; they abstracted medical, mathematical, astronomical and theological ideas; they sought to understand the world, and to preserve their understanding of it. On a more mundane level, ancient scribes recorded the commercial transactions on behalf of individuals, organizations, and political entities; they recorded contracts deeds and legal proceedings; they wrote notes and letters; and they doodled in the margins. The roots of "Western" (not to say "modern") scholarship on the societies and cultures of the ancient Near East are ancient in themselves. Among the most celebrated literary compositions of western civilization are the editions, translations and interpretations of, and the commentaries on, Hebrew, Aramaic and Greek religious texts from the ancient Near East. The communities which produced the scholarship on religious texts, and the societies in which they lived and flourished, maintained the languages of the Bible as living entities. A more ignominious fate befell the languages of Mesopotamia and Egypt. Texts of various sorts continued to be written in Akkadian and Egyptian into the first centuries of this era, but knowledge of them soon died out to the point where there was no longer even the recognition that the languages behind visible texts inscribed on the standing walls of ruined buildings were ancestral or cognate to living tongues, or that living tongues such as Coptic were related in any way to such ancient writings.
Aside from occasional descriptions of monuments in early travellers' accounts, many of which connect the remains of ancient sites with descriptions in biblical and classical literature, and aside from the astonishing and fantastic speculations of scholars such as Athanasius Kircher, almost nothing was learned of ancient Egyptian or Mesopotamian societies until the eighteenth century. The more careful observations and drawings made by such travellers as Carsten Niebuhr in Iran and Robert Wood and James Dawkins in Syria, eventually resulted in publications which were fundamental to the early decipherments of Palmyrene and Old Persian. French and English colonial adventures in Egypt at the turn of the eighteenth-nineteenth centuries resulted in the recovery of multilingual inscriptions - notably the Rosetta Stone - leading directly to the decipherment of Egyptian. English speaking scholars, working with the Niebuhr's drawings of bi- and trilingual cuneiform texts, as well as with the raft-loads of inscribed monuments and tablets appearing from the excavations of Botta at Khorsabad and Layard at Nineveh in Assyria, and which they shipped back to the Museums in Paris and London, competed with one another for the honor of being called "the decipherer" of the languages of these inscriptions.
Even before there was universal acceptance that both Egyptian and Akkadian had been deciphered there was a growing corpus of secondary literature including text publications, editions, commentaries, catalogues, dictionaries and so on. From the start, hieroglyphs and cuneiform characters posed problems for the typographers charged with seeing manuscripts into print. Already in the first generation there were reasonably successful attempts to build typefonts capable of representing a wide variety of cuneiform characters. Such efforts were quickly followed by similar movements in Egyptological publication. A parallel trend continued - and continues- in both Assyriological and Egyptological publishing - the use of hand written text to reproduce individual characters normalized to standard forms, as well as the hand drawn facsimiles or copies of texts themselves. There have never been universally accepted standards among either the Egyptological or the Assyriological communities on how to represent texts in transliteration or transcription. Particular fields have developed individual styles as have "schools" of scholarship which, for reasons well-known in academe, tend also to fall into groups according to nationalist criteria or language of scholarship.
Until the 1960's scholarship on ancient Near Eastern texts was conducted with the long established tools of the trade: the eye, the pen and the index card. Individual scholars, as well as collaborative projects, collected data with - for the most part - specific purposes in mind. The Assyrian Dictionary of the University of Chicago, for example, had adopted and modified the procedures of the Oxford English Dictionary for the collection of lexical data - to date they have collected nearly two million cards. Modifications of card-based systems, such as needle-sorted punch-cards, existed, but it was the development of electronic text processing which offered the first real promise as a tool to sort large amounts of data in complex ways.
Encouraged by the success and usefulness for ancient Near Eastern studies of such projects as the Tuckerman Tables, scholars began to experiment with how computers could be harnessed to process textual data. Many early projects were individualized and often conducted in relative isolation, but in 1965 Stanislav Segert and I. J. Gelb, working respectively on South Arabic and Amorite developed a mutually acceptable code for the representation of Semitic phonemes for use in text processing on an IBM mainframe. Other projects, like those of the Sumerologists Gene Gragg and Miguel Civil, exploited the big mainframes to considerable advantage in the analysis of textual corpora.
Despite such use, computers remained a tool which was relatively invisible from the point of view of the published results of research. Text had to be processed into one or another idiosyncratic machine-readable form for manipulation in the computer, and then re-rendered into forms acceptable to the reader's eye for publication. However, with the development of the personal computer in the 1970's; its wholesale adoption by the scholarly community by the end of the 1980's; and the gradual and (nearly) universal network wiring which began in the 1990's, the division between text processing, tool development and publication became less and less evident. The remarkable success of such largescale computerized text corpus projects as the State Archives of Assyria project in Helsinki, and the increasing availability of inexpensive and highly effective off-the-shelf tools for desktop text processing, encouraged the development of multi-use filing systems and the incremental accumulation of "personal" text corpora by virtually every scholar. It is issues surrounding the development of these resources, the long-sought-for standardization of encoding, and consequent ability to communicate and collaborate more effectively, which we hoped would be addressed in the Chicago Conference in early October 1999.
The Plan for the Conference
In December 1998 we announced the conference with a call for papers. We indicated that a primary focus of the conference would be on Web publication of "tagged" texts using the new Extensible Markup Language (XML). XML provides a simple and extremely flexible standardized syntax for representing complex information and for delivering it over the World Wide Web. Furthermore, it is based on a proven approach because it is a streamlined subset of the Standard Generalized Markup Language (SGML) that has been used for electronic publication worldwide for more than a decade. XML therefore makes possible powerful and efficient forms of electronic publication via the Internet, including academic publication of philological and archaeological data.
With the technology and infrastructure in place, it is appropriate for ancient Near East specialists to begin considering what is involved in publishing their data on the Web in XML format. XML itself is merely a starting point because the very simplicity and flexibility require the development of specific tagging schemes appropriate to each domain of research. It was the intention of the organizers of the conference to bring together researchers who have begun working on electronic publication in various ways using such tools as SGML, HTML, and XML, or who are interested in exploring these techniques, and to foster collaboration in the development of specific XML/SGML tagging schemes, especially for cuneiform texts in which a number of the conference participants specialize. In addition, it was our intention to inaugurate a formal working group on cuneiform markup to provide an ongoing forum for communication and collaboration in this field. We stressed however, that the issues under discussion are not of interest only to cuneiformists - presentations and discussion concerning other ancient Near Eastern scripts and languages were encouraged and explicitly solicited.
Similarly, we recognized the need to present "texts in context": as archaeological artifacts among other artifacts. Archaeologists and philologists share the need for efficient and flexible electronic publication of complex data. In many cases also they have overlapping interests in terms of substantive historical questions. Indeed, it is likely that cooperation on the level of technical methodology in pursuit of effective electronic publication will have the beneficial effect of reducing the tendency toward balkanization among disciplines. An ancillary goal of our conference, therefore, was to stimulate interest in interdisciplinary research projects that involve both archaeological and philological data. By facilitating electronic access to philological data by archaeologists and vice versa, and by learning a common data representation technique such as XML, we can expect to generate new ways of representing or even conceiving of the conceptual relationships not just within but also between archaeological and philological datasets, which are so often considered in isolation. The potential to store these different kinds of datasets and their interrelationships in a commonly accepted, rigorous, formal framework offers exciting prospects for subsequent linguistic, socioeconomic, and historical research.
We have no doubt that electronic publication will play an essential role in future research on the ancient Near East. Philologists and archaeologists alike work with complex, highly structured datasets consisting of visual as well as textual information which call for "hyperlinks" among different kinds of data. But devising suitable forms of electronic publication is not a trivial matter and can only be done on a collaborative basis. Suitable electronic publications will represent in a standardized fashion the large number of internal and external cross-references among the many individual elements of each dataset and will capture the semantic diversity of the many possible types of such cross-references, representing, for example, various kinds of spatial, temporal, or linguistic relationship. Furthermore, the goal of such publication is not simply to facilitate human navigation of large and complex bodies of information but also to permit automated computer-aided analyses of data derived from many disparate sources. We believe that XML will be an important medium for this because Web publication using this format promises to be a simple and effective means of merging complex datasets from multiple sources for purposes of broader scale retrieval and analysis, avoiding the problems caused by the existing proprietary, limited, and inflexible data formats which have hindered electronic publication to date. XML is a non-proprietary, cross-platform, and fully internationalized standard that has been enthusiastically embraced by the software industry in general. For this reason our conference was announced as focussing specifically on the use of XML in the publication of ancient Near Eastern texts.
Goal of the Conference
A major goal of our conference was to assess the prospects for establishing a formal international standards organization charged with setting technical standards for the interchange of Near Eastern data in digital form. Both the conference and the establishment of such an organization are timely in light of the recent development of internet-oriented data standards and software that now provide a common ground for cooperation among diverse philological and archaeological projects, which have heretofore adopted quite idiosyncratic approaches. This common ground, not just for academic research but in all areas of information exchange, is created by the Extensible Markup Language (XML) and a growing array of software tools that make use of XML to disseminate information on the Internet.
The XML Standard
As we noted in our original announcement of the conference, XML is a nonproprietary "open" or public standardized data format which provides a simple and extremely flexible "tag"-based syntax for representing complex information as a stream of ASCII or Unicode text and delivering it over the World Wide Web. Furthermore, it is based on a proven approach because it is a subset of the ISO-ratified Standard Generalized Markup Language (SGML), which has been used for electronic publication worldwide for more than a decade. XML therefore makes possible powerful and efficient forms of electronic publication via the Internet, including academic publication of philological and archaeological data. But XML itself is merely a starting point, for its very simplicity and flexibility, which ensure its widespread adoption, require the development of specific XML tagging schemes or "markup languages" appropriate to each domain of research. Such a tagging scheme expresses the abstract logical structure of a particular kind of data in a rigorous and consistent fashion. Thus, for example, chemists have already created a "Chemical Markup Language" using XML to express the structure of molecules and chemical reactions, so that the data they work with can be easily shared and searched on the Web. Likewise, NASA has created an "Astronomical Instrument Markup Language," biologists have created a "Biological Markup Language," and so on. Once such tagging schemes exist, various kinds of software can then be developed to present different views of logically structured data for different purposes, or to create new sets of data structured in a particular way, with the assurance that these data structures can be created and viewed on any computer anywhere without special conversions or translations.
For general reference see Robin Cover's The SGML/XML Web Page
Formation of a Working Group for Text Markup
There was a consensus among the conference participants that XML should be used as the basis for future electronic publication of Near Eastern data. The establishment of a formal working group for Near Eastern text markup was also strongly endorsed, as a vehicle for the collaborative development and dissemination of suitable XML tagging schemes and associated software. Stephen Tinney of the University of Pennsylvania, the editor of the Pennsylvania Sumerian Dictionary, who has substantial experience in electronic text processing and in the use of SGML and XML, in particular, was elected to be the chair of the working group.
The name and scope of the new standards organization remain to be decided. A number of conference participants emphasized the importance of including Near Eastern languages and texts of all periods within the scope of the text markup group, rather than arbitrarily limiting it to ancient Near Eastern texts in general or cuneiform texts in particular, because comparable issues arise in dealing with non-European scripts and languages regardless of their date. Similarly, several people expressed what seemed to be a generally held desire to find ways to include electronically published archaeological data within our standards-setting effort. This would ensure maximum interoperability of textual and archaeological datasets, so that it would be easy to obtain information about the spatial provenience and the material-cultural context of excavated or monumentally inscribed texts, and conversely so that it would be easy to obtain philological information about texts viewed as artifacts from an archaeological perspective.
In the opinion of the conference organizing committee, therefore, a suitable name for the new standards organization would be "Organization for Markup of Near Eastern Information" (OMNEI). This name emphasizes the central role of XML markup as well as the organization's potentially wide scope in terms of Near Eastern information of all kinds, including both primary data (philological, archaeological, and geographical) and relevant secondary literature. Even restricting the scope to "Near Eastern" information is rather arbitrary from a technical standpoint, but this mirrors the scope of the existing academic infrastructure of Near and Middle Eastern departments, institutes, and centers to which members of this organization will in most cases already belong. OMNEI would serve as an umbrella organization for various standards-setting efforts necessary for the interchange of Near Eastern information, beginning with a Working Group for Text Markup chaired by Stephen Tinney. Eventually there could be a parallel Working Group for Archaeological Markup whose efforts would be integrated with those of the Text Markup group. Note that OMNEI's mission is not just to devise XML tagging schemes but also to facilitate the development of well-documented Web browser-based software that could be widely shared among Near Eastern projects, and to coordinate training and professional development for researchers who want to learn how to use these tagging schemes and software. Thus at some point it might also be desirable to create a formal Task Force for Training and Professional Development within the OMNEI organization.
In the aftermath of the conference, discussion is underway concerning these details, including the name and the precise scope and mode of operation of our new international organization, as well as a schedule of future meetings. Decisions will be announced in the near future, but it is clear already that there is a widespread desire to make this organization as broadly based as possible so that it can facilitate the cooperative development of effective and widely accepted technical standards. Judging by the success of the recent conference, it seems likely that many leading Near and Middle Eastern departments and institutes worldwide can be enlisted in support of this venture. The Oriental Institute of the University of Chicago will continue to do everything possible to sponsor this effort and to support it with its reputation and resources, in collaboration with the University of Chicago's Department of Near Eastern Languages and Civilizations, Center for Middle Eastern Studies, and Committee on the Ancient Mediterranean World.
What follows is a brief summary of the main points touched on in the formal presentations and in the open discussion sessions. It is not an exhaustive account of everything that was said. For further details on the formal presentations, in particular, please contact the presenters individually. Following each section is a paragraph including links to on-line or other electronic publications pertinent to the issues discussed in the section.
Friday October 8th
- Stephen Tinney of the University of Pennsylvania led off the Friday morning session with a presentation entitled "From Dictionary to Superdocument: XML, the Pennsylvania Sumerian Dictionary, and the Universe." Tinney surveyed some of the basic concepts underlying XML and the "markup" approach to electronic text representation, and then he outlined his ideas concerning the implementation of a corpus-based lexicon such as the Pennsylvania Sumerian Dictionary on the Internet using XML. He pointed out that such a lexicon can and should transcend the limitations of existing printed dictionaries. In particular, an electronic lexicon would not be a static entity but would be the dynamic product of three types of interlinked and constantly updated data, comprising primary text corpora, grammatical analyses, and secondary literature. In other words, the same data would be reusable in different contexts, and many possible views of the data could be constructed for different users. One such "view," of course, is a printed or printable version of the lexicon in the traditional format. Tinney concluded his talk by presenting and commenting briefly on an XML "document type definition" (DTD) which defines a set of element (tag) types and their attributes by means of which a corpus-based lexicon, for any language, could be represented.
The Pennsylvania Sumerian Dictionary
The Index to Sumerian Secondary Literature
In the discussion that followed Tinney's presentation, and in other discussions throughout the conference, the concern was expressed that electronic publications of the type he and others envisage would be evanescent and might become inaccessible because of the notoriously rapid obsolescence of digital media, the instability of the Web addresses (URLs) for electronic publications, and the dependence of the scholarly community on a few technologically expert colleagues, such as Tinney, whose eventual departure or retirement might orphan their brainchildren. Tinney and a number of other conference participants responded to these important concerns at various times during the conference by making the following points:
- Sustained institutional support of electronic publications is necessary, just as it is for printed publications; thus the publisher, be it a private company or a university institute, museum, or department, must take responsibility not only for peer review and editorial oversight, but also for preserving the accessibility of its publications by systematically migrating electronic files to new physical media and upgrading the necessary software and hardware as needed. An institutionally supported and sponsored publication such as the Pennsylvania Sumerian Dictionary cannot and would not be left to an individual scholar, whether it exists in printed or electronic form. The basic assumption here, of course, is that the publicly available hardware and software infrastructure of the Internet and the World Wide Web is not going to disappear or regress, but will be maintained and developed indefinitely, just as the infrastructure for producing and disseminating printed publications has been maintained and developed over the centuries since Gutenberg. Moreover, it is safe to assume that an increasing number of scholars will acquire the software and the relatively simple technical know-how required for them to produce and use XML-based electronic publications -- especially if an organization exists to help them do this.
- Governmental or quasi-governmental agencies (the Library of Congress?) will eventually take responsibility for archiving and making permanently available many kinds of electronic publications, on the model of the government-funded Arts and Humanities Data Service established in the United Kingdom a few years ago, in part to meet this need. University libraries may also have a part to play in this.
- XML is a nonproprietary standard, like HTML (on which the Web is currently based), so it is not subject to the whims and fortunes of an individual software company such as Microsoft. Furthermore, unlike proprietary database formats, XML is a text-based format (using the ASCII and Unicode international character encoding standards), which means that any computer anywhere can read and print XML datasets as plain text. Indeed, for this reason it is "human-readable" on an immediate level in a way that other data-encoding formats are not.
- The capability for permanent Web addresses ("permanent URLs") which can be reliably referenced over a long period has been or will be developed by Internet standards bodies, because everyone, not just the academic community, requires this feature. In the meantime, physical distribution of electronic publications on optical disks can ensure accessibility.
- The second presentation on Friday morning was by Stephan Seidlmayer of the Berlin-Brandenburg Academy of Sciences and Humanities, on the subject of the "The Ancient Egyptian Dictionary Project: Data Exchange and Publication on the Internet." Seidlmayer described the history of the Ancient Egyptian Dictionary project and outlined the plan for taking it onto the Internet using XML. The precomputer text corpus of the Ancient Egyptian Dictionary was stored on a large number of handwritten index cards, produced from the 1920s to the 1960s, as was typical of dictionary projects of this kind. This information was used to produce the twelve-volume Wörterbuch der ägyptischen Sprache, which is now out-of-date and in need of revision. Much of the original material has been digitized and updated, and the current text corpus of the Ancient Egyptian Dictionary project is stored in a DB2 relational database with local client-server access. Once a suitable XML markup scheme has been developed, this information will be converted to XML format and made available on the Internet, to facilitate international cooperation in this dictionary project.
Das digitalisierte Zettelarchiv des Wörterbuchs der ägyptischen Sprache
- The third session on Friday morning was an open discussion "The Current State of Electronic Publication: Problems and Possibilities," moderated by Charles E. Jones and John Sanders of the Oriental Institute. Participants engaged in lively and interesting duscussion of issues relating to the on-line publication of text. Caution and concern was raised about copyright and intellectual property right issues. Jeffrey Rydberg-Cox from Perseus and Mark Olsen from ARTFL shared their own experiences with the use of text over which individuals or organizations claim ownership. It was evident that the law governing the re-use of text is both unclear and incompletely understood by the participants. There was a general sense that openness and collaboration were to be encouraged, and indeed are essential, if large scale projects are to be successful. Differences between commercial and non-commercial models of publication and long term institutional support - whether from commercial publishers or from universities or other non-commercial institutions - seemed to present a source of anxiety for participants, particularly as they have an impact on the long-term accessibility of on-line electronic publications.
- In the first session on Friday afternoon, Jeffrey Rydberg-Cox of the Perseus Project, based at Tufts University, spoke about "Creating, Integrating, and Expanding Electronic Texts in the Perseus Digital Library." The conference participants, almost all of whom work with ancient Near Eastern texts, found it very useful to learn more about this relatively large and well-established project that deals with Greek and Latin texts and their cultural and geographical context. Rydberg-Cox described how the Perseus team currently operates in terms of both tagging procedures and software development, and the kinds of lexical and morphological searching Perseus makes possible via a simple Web browser interface.
Perseus Searching Tools
Teaching with Perseus
- The next presentation was entitled "XML and Digital Imaging Considerations for an Interactive Cuneiform Sign Database." It was given in three parts by Theodoros Arvanitis and Sandra Woolley of the School of Electronic and Electrical Engineering at the University of Birmingham in Britain, and by Tom Davis, a forensic handwriting specialist in the Department of English at the University of Birmingham. Dr. Arvanitis read an introductory statement by Alasdair Livingstone, the Assyriological member of this project, who unfortunately could not be present. The Birmingham team described the objectives of their collaborative project, the results of the first year's work, and their plans for future work. A major goal of their project is to experiment with various digital image representations of cuneiform signs in order to determine which techniques for image capture, formatting, and compression are most effective for disseminating detailed facsimiles of cuneiform texts on the Web for research purposes. Another aspect of their research involves the automated analysis and categorization of cuneiform signs and scripts. The Birmingham team has also kindly offered to host a future meeting of the new working group on Near Eastern text markup.
Cuneiform Database Project, University of Birmingham
XML and Digital Imaging Considerations for an Interactive Cuneiform Sign Database - A Powerpoint Presentation
- The final session on Friday afternoon was devoted to an open discussion of "Editing, Disseminating, and Preserving Electronic Publications," moderated by Charles Jones and John Sanders of the University of Chicago's Oriental Institute, with panelists Patrick Durusau of Scholars Press, James Eisenbraun of Eisenbrauns Inc., and Thomas Urban of the Oriental Institute Publications Office. The necessity of careful editorial oversight of electronic publications was emphasized by several participants, in light of the ease of "self-publication" on the Internet. On the other hand, it was recognized that the electronic medium makes possible a variety of types of publications of varying degrees of formality and completeness, ranging from the equivalent of "privately circulated" manuscripts, by means of which a group of colleagues informally shares ideas and data, to official institutional publications corresponding to printed monographs in peer-reviewed series or journals. Several participants stressed the important role even of lightly edited individual publications on the Web, which need not be regarded as the author's final word, and to which electronic access might be restricted to those who understand their limitations and can make best use of them. The line between what is "published" versus "unpublished" is now somewhat blurred because all types of Web publications are equally accessible from a technical standpoint.
Another point made during this discussion had to do with the role of publishers, which might seem to be threatened in the era of electronic publication. Jim Eisenbraun pointed out that printing, binding, and distributing printed books is not the major expense in publishing, in any case. The major expense is incurred at the editorial stage, and the traditional role of publishers in this and the associated expenses will not be diminished, regardless of the medium of distribution. The financial basis for Web-based electronic publication will be some kind of subscription system, however, rather than the purchase of physical media.
Recapitulating a point made in the morning discussion session, Patrick Durusau make an explicit call for open source development of resources and tools.
The Oriental Institute Web site
Oriental Institute Publications
- On Friday evening there were three presentations. The first was a presentation of "The Achaemenid Royal Inscriptions Project" by Gene Gragg and Matthew Stolper of the University of Chicago. Stolper gave a brief overview of the background and goals of the project, and Gragg described the project's existing encoding scheme and his plans to convert this to XML. Gragg demonstrated the use of the XML-oriented Extensible Stylesheet Language (XSL) in Internet Explorer to generate various views of XML- encoded texts within a Web browser application.
Achaemenid Royal Inscriptions Project
The Afroasiatic Index Project
- The second evening presentation was by Hans van den Berg of the Center for Computer-aided Egyptological Research at Utrecht University. In a talk entitled "Egyptian Hieroglyphic Text Processing, XML, and the New Millennium," van den Berg noted the substantial progress that has been made in Egyptology in developing standardized character encodings of hieroglyphic signs, to the point where there is now a proposal before the Unicode consortium for a 16-bit character encoding system that covers most of the known signs. The need for XML arises when representing the palaeographic characteristics of hieroglyphic texts, in terms of both character anomalies and specific positional information (i.e., the juxtaposition or superposition of individual signs). Van den Berg presented a set of XML tags that can represent such palaeographic characteristics.
Centre for Computer-aided Egyptological Research (CCER)
- The last presentation of the evening was by Mark Olsen, Assistant Director of the Project for American and French Research on the Treasury of the French Language (ARTFL) of the University of Chicago. Olsen's title was "Using Encoded Texts at ARTFL: The Case for Simplicity." He argued that tagging schemes should be kept as simple as possible, drawing on the experience of the ARTFL project and the negative example of the Text Encoding Initiative, which has developed overly elaborate tagsets that are difficult to support. The main problems with a complex tagset involve the expense of developing software to support such a tagset and the need to spend extra time and effort training data entry staff to use it. In the discussion that followed Olsen's presentation there was general agreement that the complexity of tagging schemes should be kept to a minimum. Elsewhere during the conference, however, a distinction was made between the number of distinct element types (tags) used to mark up a text and the degree (or "granularity") of the tagging, because the software and procedures for dealing with an intensively marked up text are no different than those for a lightly tagged text, if the same simple set of tags is used in each case. What is at issue is the appropriate degree of logical abstraction for a tagging scheme which is to faithfully describe the data and maximize its reusability in different contexts.
Project for American and French Research on the Treasury of the French Language, University of Chicago (ARTFL)
ARTFL Experiments and Development Projects Page
Saturday October 9th
- The first presentation on Saturday morning was given by Jeremy Black and Eleanor Robson of the University of Oxford. They discussed "The Electronic Text Corpus of Sumerian Literature," a Web-based project whose aim is to make accessible to a wide variety of readers, specialists and laypeople alike, hundreds of Sumerian literary works. Black presented the philological and pedagogical rationale for the project, while Robson discussed its operating procedure. This procedure involves the use of SGML tags and a simple wordprocessing macro interface for the entry and markup of transliterated texts by Sumerologists, and hence a minimum of custom software development. Robson showed the project's Web browser interface via an online Internet connection, emphasizing the project's use of basic HTML generated from the underlying SGML version of the texts. Because the intended audience goes beyond scholars at major research universities, users of the electronic Sumerian text corpus should not and do not need the latest version of Web browser software running on the fastest computers with high-speed Internet connections in order to use the texts effectively.
The Electronic Text Corpus of Sumerian Literature (ETCSL)
The ETCSL docum ent type definition for composite texts
The ETCSL document type definition for English prose translations
The ETCSL document type definition for bibliographies
SGML Declaration for all ETCSL SGML files
- The next presentation was by Miguel Civil of the University of Chicago's Oriental Institute, who drew on his decades-long experience in cuneiform text encoding to comment on the history of efforts in this area. Civil gave an overview of his own approach and the software he has developed to integrate text corpora with grammatical and lexical information. During the course of the conference a number of participants congratulated Civil for his influential pioneering work and for his generosity in supplying otherwise unavailable editions of texts in digital form to a wide variety of colleagues.
Sumerian Lexical Archive (SLA)
- The Saturday morning session ended with an open discussion of "Standards for Text Encoding and Markup," moderated by Gene Gragg and Steve Tinney. This discussion continued after lunch, after Gene Gragg, the Director of the Oriental Institute of the University of Chicago, had made a formal proposal to create a "working group on cuneiform markup" chaired by Stephen Tinney. There was strong support of this proposal, with considerable discussion concerning the scope and thus the name of the proposed standards group, as we have already mentioned.
- David Schloen, an archaeologist in the University of Chicago's Oriental Institute, gave the final formal presentation on Saturday afternoon, entitled "Texts and Context: Using XML to Integrate and Retrieve Archaeological Data on the Web." Schloen noted that XML is as suitable for representing archaeological databases as it is for representing ancient texts. But whether the information is expressed in XML or in some other data format (e.g., a relational database), archaeologists need an appropriate data model that captures in a rigorous and consistent fashion the idiosyncrasies of units of archaeological observation, as well as the spatial and temporal interrelationships among them. Schloen proposes a hierarchical, "item-based" data model, rather than the "class-based" (tabular) data model which currently prevails. The item-based data model has the advantage of being straightforwardly represented in XML as a nested hierarchy of tagged elements with their attributes. Moreover, texts can be treated like any other type of artifact, as items in a spatial hierarchy with their own properties. Schloen concluded by presenting an XML tagging scheme dubbed ArchaeoML ("Archaeological Markup Language") which can represent any kind of archaeological data on any spatial scale, including the vector map shapes and raster images which belong to individual archaeological items.
In the discussion that followed, the question arose of the precise relationship between electronically represented texts and archaeological data disseminated on the Web using XML. Schloen's response was that the physical characteristics and archaeological context of a text would be represented as for any other artifact, but the XML "item" element representing a given text as an archaeological item would have a link to another Web location containing the contents of the text from a philological perspective. The same kind of link would operate from the other direction, so that each XML "text" element in an electronically represented corpus of texts would be able to retrieve its geographical location and archaeological context from an archaeological dataset.
- The conference ended with a final open discussion entitled "What's It Good For? Uses of Electronically Published Texts," moderated by Matthew Stolper of the University of Chicago's Oriental Institute. Stolper commented on the impact electronic publication has had and will have in his own research. During this discussion several previously discussed issues were revisited, including the need to welcome a wide variety of different types of electronic publication intended for different purposes, ranging from the relatively simple sharing of transliterated texts to comprehensive and authoritative critical editions. This should not cause problems as long as the user is made aware of the author's intention, recognizing that it is preferable whenever possible to deploy the full panoply of peer review, copy-editing, and "official" dissemination under a reputable institution's imprimatur.
Abzu: Guide to Resources for the Study of the Ancient Near East Available on the Internet
|The program of the conference is also available
- Excellent recent surveys of the development of writing appear in the appropriate chapters of J. T. Hooker (editor) Reading the Past: Ancient Writing from Cuneiform to the Alphabet, London: The British Museum, 1995, and in Peter T. Daniels and William Bright (Editors) The World's Writing Systems, New York: Oxford University Press, 1996. [Return to text]
- The corpus of such counters and tokens is collected and analysed in detail in the two volumes of Denise Schmandt-Besserat's Before Writing, Austin: University of Texas Press, 1992 - significant and important commentary on the corpus appeared in subsequent literature. [Return to text]
- Widspread press coverage during 1999 accompanied the appearance of the formal publication of some of this material in Günter Dreyer's Umm El-Qaab I: Das prädynastische Königsgrab U-j und seine frühen Schriftzeugnisse, Mainz am Rhein: Verlag Philipp von Zabern, 1999; c1998. (Deutsches Archäologisches Institute Abteilung Kairo: Archäologische Veröffentlichungen; v. 86). [Return to text]
- Standard inventories of the character set used for dialects of Akkadian number about six hundred cuneiform signs - René: Labat (with Florence Malbran-Labat) Manuel d'Épigraphie Akkadienne (Signes, Syllabaire, Idéogrammes), Paris; Librairie Orientaliste Paul Geuthner, 1988 (Sixth edition). For Egyptian, calculations of the inventory range - depending on how one interprets variants of sign forms - from several hundred as in the basic sign list in Alan Gardiner's Egyptian Grammar, Oxford: Griffith Institute, 1957 (third edition), to several thousand, as in Nicolas Grimal's (and colleagues) Hieroglyphica, Utrecht: Center for Cpomputer-aided Egyptological Research, 1993 (Publications Interuniversitaires de Recherches Égyptologiques Informatisées; v. 1). [Return to text]
- See again The World's Writing Systems cited in note 1 above. [Return to text]
- A. T. Olmstead History of the Persian Empire, Chicago; The University of Chicago Press, 1948, page 1. [Return to text]
- Modern literature on ancient scholarship is vast and rapidly increasing. Particularly interesting are the products of the State Archives of Assyria, for instance Simo Parpola's volume Letters from Assyrian and Babylonian Scholars, Helsinki: Helsinki University Press, 1993. (State Archives of Assyria, v. 10). Notable also is the recent publication of the Babylonian astronomical diaries by A. J. Sachs and Hermann Hunger Astronomical Diaries and Related Texts from Babylonia, Wien: Verlag der Österreichischen Akademie der Wissenschaften, 1988-1996, which presents a quite astounding corpus of scholarly observation spanning nearly six hundred years of history. [Return to text]
- The Jewish, and Christian bibles and the scholarly literatures from which they are inseparable are fundamentally important to virtually all early western scholarship. Emanuel Tov's Textual Criticism of the Hebrew Bible, Minneapolis: Fortress Press, 1992 presents the scope of discussion for this corpus in a single volume. Also worthy of note is Ernst Würthwein's The Text of the Old Testament, Grand Rapids: William B. Eerdmans, 1995 (Translated from the fifth German edition by Erroll F. Rhodes). [Return to text]
- A series of essays at the beginning and end of Jack Sasson's monumental four volume Civilizations of the Ancient Near East, New York: Charles Scribner's Sons, 1995, give an elegant overview of the legacy of "The Ancient Near East in Western Thought" (Volume I, Part 1) and the recovery of antiquity in the last two centuries: "Retrospective Essays" (Volume IV, Part 11). [Return to text]
- An assessment of Kircher's Egyptology is found in Enrichetta Leospo's article "Atanasio Kircher e l'Egitto: Il formarsi di una collezione egizia nel Museo del Collegio Romano" in Morigi Govi, Cristiana; Curto, Silvio; Pernigotti, Sergio, Editors, L'Egitto fuori dell'Egitto: Dalla riscoperta all'Egittologia, Bologna: Editrice CLUEB, 1991, pages 269-281 [Return to text]
- A wonderful account of Carsten Niebuhr's extraordinary expedition of the 1760's to South Arabia and Iran appears in Thorkild Hansen's Arabia Felix: The Danish Expedition of 1761-1767, New York: Harper and Row, 1964. [Return to text]
- See Peter T. Daniels article "The Decipherment of Ancient Near Eastern Scripts" in the first volume of Sasson's Civilizations of the Ancient Near East, (see note  above). [Return to text]
- This year marks the bicentennial of the discovery of the Rosetta Stone. To commemorate that event the British Museum has mounted an expedition Cracking the Codes, with an excellent catalogue by Richard Parkinson, Cracking Codes: The Rosetta Stone and Decipherment. London: The British Museum, 1999. [Return to text]
- For the former see the essays in Elisabeth Fontan (editor), De Khorsabad à Paris: La Découverte des Assyriens, Paris: Réunion des Musées Nationaux, 1994; and for both, Mogens Trolle Larsen's The Conquest of Assyria: Excavations in an Antique Land 1840-1860, London: Routledge, 1994. [Return to text]
- Edward Hincks emerges as a fundamental force in decipherment history. See, for instance the essays collected by Kevin J. Cathcart in The Edward Hincks Bicentenary Lectures, (Dublin: Department of Near Eastern Languages, University College Dublin, 1994). [Return to text]
- The the best of my knowledge there has not yet been a study of Assyriological or Egyptological typography, though it is of considerable interest for the history of both fields. A set of type which was apparently quite flexible in its customizability was used for the publication of such fundamental works as Robert Francis Harper's seminal fourteen volume Assyrian and Babylonian Letters Belonging to the K. Collection of the British Museum, Chicago: University of Chicago Press; 1892-1914. The Institut Français d'Archéologie Orientale began to use a font of movable hieroglyphic type early in the century: Émile Chassinat published the first catalogue of the type available from them in the Catalogue des signes hiéroglyphiques de l'imprimerie de l'Institut français du Caire, [with supplement issues in 1930 and a complete reprinting as recently as 1983]. Not long after that, Alan Gardiner published his Catalogue of the Egyptian Hieroglyphic Printing Type..., Oxford and Chicago: Oxford University Press and The University of Chicago Press, 1929, which was designed for the production of his then forthcoming Egyptian Grammar, which in its third edition (Oxford: Griffith Institute, 1957) remains in print to this day. [Return to text]
- With the exception of a few introductory pages, every word of Adolf Erman and Herman Grapow's essential Wörterbuch der Aegyptische Sprache, (Leipzig; J. C. Hinrisch'sche Buchhandlung, 1925, etc.) was hand written, as was each volume of Anton Deimel's Sumerisches Lexikon, (Rome: biblical Institute Press, 1928, etc.). Such practice was not unusual into the 1960's - see for example volume 8 of Benno Landsberger's Materialien zum Sumerischen Lexikon, Rome: Biblical Institute Press, 1962, and others. [Return to text]
- Hand drawn facsimiles of ancient texts remain, along with photographs, the primary form of interpretive presentation of both cuneiform and hieroglyphic texts. Styles of copying vary widely among fields and sub-fields of ancient Near Eastern philology, but copies are seldom absent from text publications, and if they are the authors are generally called to task for the omission in reviews. [Return to text]
- A droll (if not particularly scholarly) account of a number of lexicographical projects - including the Chicago dictionaries - appears in Israel Shenker's Harmless Drudges : Wizards of Language--Ancient, Medieval and Modern, (Bronxville: Barnhart Books, 1979). [Return to text]
- Bryant Tuckerman Planetary, Lunar, and Solar Positions 601 B.C. to A.D. 1 at Five-Day and Ten-Day Intervals, (Philadelphia: The American Philosophical Society, 1962), and Planetary, Lunar, and Solar Positions A.D. 2 to A.D. 1649 at Five-Day and Ten-Day Intervals, (Philadelphia: The American Philosophical Society, 1964) [Return to text]
- Some of the history of this collaboration is outlined in the introduction to the final report of Gelb's project Computer-Aided Study of Amorite, (Chicago: The Oriental Institute of the University of Chicago, 1980). [Return to text]
- Gragg made use of Civil's unpublished editions of Sumerian literary compositions typed onto punch-cards and processed in the mainframe in the preparation of his Sumerian Dimensional Infixes, (Kevelaer and Neukirchen-Vluyn: Verlag Butzon und Bercker and Neukirchener Verlag, 1973. For the later work of both, see below. [Return to text]
- The State Archives of Assyria project http://www.helsinki.fi/scienc e/saa/cna.html has published more than a score of volumes to date, and has revolutionized the field of Assyrian Studies because of the size and quality of the corpus it has collected, and because of the generosity of its staff.[Return to text]
- The Oriental Institute
The Franke Institute for the Humanities
[Return to text]
Charles E. Jones
Research Associate and Archivist - Bibliographer
The Oriental Institute
University of Chicago
1155 E. 58th St. Chicago IL 60637-1569
Voice (773) 702-9537
Fax (773) 702-9853
The Oriental Institute
and The Department of Near Eastern Languages and Civilizations
University of Chicago
1155 E. 58th St. Chicago IL 60637-1569
Voice (773) 702-1382
Fax (773) 702-9853