Web Magazine for Information Professionals

SGML, XML and Databases

Stephen Emmott reports on a one day meeting in London

It was clear from the crowd gathered in the reception area of the Brunei Gallery Lecture Theatre that SGML’s appeal is far reaching. From grey suits to combat trousers, the first of the day’s 180 attendees represented at a glance the diversity of domains into which SGML extends. The conversation evident before the opening talk confirmed the cooperative spirit underlying the phenomenal growth of SGML’s prodigies - HTML and XML. That this was happening now, amid academic and commercial circles, indicated the meeting’s true agenda - XML.

For the W3C, XML is a replacement for SGML. It uses angle-bracketed tags as popularised by HTML but allows the author to create their own tags. Content can be labelled as the author sees fit rather than as constrained by the tags available. By dictating how each tag is to be used (through the DTD), a web browser or any other piece of software equipped to interpret XML will manipulate the content as the author intended. Moreover, it is possible for the author to be a machine.

The day’s six talkswere split into a morning of the theoretical and descriptive followed by an afternoon of case studies. While a steady stream of London’s traffic victims filed to their seats, the SGML UK Chapter began another well organised though relaxed event.

John Chelsom from CSW Informatics opened with An Overview of SGML/ XML and Databases. Although SGML structures content into documents, databases are required to manage their build, storage and generation. Crucially, this production fits within an overall workflow - “the wider context”. It is here that XML stands out from SGML like HTML did before it. XML is flexible, portable and accessible to newcomers - thereby creating the wider context. And XML doesn’t suffer one of HTML’s primary limitations, the ‘page horizon’. With HTML we talk and operate in units of one or more pages whilst XML allows us to talk and operate in units of any size - from one character to entire collections. Although this is more complex to manage - hence databases - XML reveals the document as just one way to gather content together as Alex Brown from Griffin-Brown Digital Publishing Ltd revealed.

Given that XML can work in terms of fragments of content, Alex revealed in Using Databases in the SGML/ XML Production Lifecycle that the databases required must be more flexible than traditional RDBMS. Instead, OODBMS are required (a fact supported by at least one vendor of such databases exhibiting during the breaks). The model presented placed XML at the centre of the process - a repository of fragments. From it documents are produced: a paper document, an on-screen presentation, a web-site, a CD-ROM, etc.

That the document is no longer the absolute container for content - as dictated by the wordprocessor or the spreadsheet - emerged as a central theme for the day. The power of the database to enable its content to be reshaped is now as valid for papers and articles as it is for invoices and inventories.

But Alex called for caution. XML promises many uses whether ideally suited to them or not. Specifically, XML isn’t designed for encoding data. Its designers have reverse engineered its suitability. “XML is just one solution” when dealing with fragments of content.

For Daniel Rivers-Moore of Rivcom, this isn’t a concern. XML is a simple, powerful and easy to implement ‘version’ of SGML, well suited to encoding content. With the strengths of HTML but non of its weaknesses, XML opens up the possibility for structured information management on the WWW.

In XML, Schemas and Industrial Data, Daniel’s concern was that language by itself isn’t enough for communication: shared schemas are also required (i.e., shared by both sender and receiver). For a document-centric view of the world, the DTD is sufficient. But XML offers alternatives to the document-centric view, including application to application contact e.g., passing data from one database to another. Daniel argued that with a sufficient schema, XML is suitable for encoding data. Daniel demonstrated a schema that enabled an XML file to be used as a miniature address book. His view is supported by activity at the W3C - particularly DCD - lifting XML to the status of “lingua franca”.

The afternoon’s talks took the audience through three case studies that in many ways teased out the issues raised during the morning:

* Philip Ward (Technical Support Operations, Ford of Europe) Technical Information Publishing in a Distributed Enterprise revealed how he and his team were adapting to the documentation needs of a multi-national organisation by offering solutions that change the way the organisation operates. Document production exists within a wider context that is too complex for any one piece of software. Instead, a localised solution is required using widely available tools. Primary technology: XML, databases, Java, and CORBA.

* Tony Swithenby (Infrastructures for Information): Information Delivery Using SGML, XML and Databases showed that XML could be used to bind disparate teams within an organisation. A suite of applications was shown to share data through a single XML file. Using a scaled model of a shop-floor robot, Tony demonstrated how team members could share data and effect updates to the robot’s movements. Primary technology: XML.

* Tom Catteau (SGML Technologies Group) :SGML and Database Technology for the European Union’s Budget. The EU’s budget is published in 11 languages 3 times a year. With multiple authors and translators this presents a tricky version control problem. By operating on selective parts of the document, multiple versions could be generated. Primary technology: SGML and databases.

By the end of the day there appeared to be three identifiable groups in the audience - those who’d been involved with SGML for many years; those who work with databases; and those who’d come into contact with one or both via HTML and/ or XML. Whilst the vocal part of the audience appeared to be primarily concerned with SGML, it appeared that for many the speakers had presented XML as a way forward - something to be taken away and applied.

References


1. Extensible Markup Language (XMLTM)http://www.w3.org/XML/
2. SGML UK Home Page http://www.sgml.org.uk/
3. Some Background on SGML for the World-Wide Web http://www.w3.org/People/Connolly/drafts/html-essay.html
4. The World Wide Web: Past, Present and Future http://www.w3.org/People/Berners-Lee-Bio.html/1996/ppf.html
5. W3C Data Formats http://www.w3.org/TR/NOTE-rdfarch
6. Zooleika: sgml uk conference report http://www.zooleika.org.uk/tech/sgml/sgml_uk.htm

Abbreviations

DCD - Document Content Definition
DTD - Document Type Definition
HTML - Hypertext Markup Language
OODBMS - Object-oriented Database Management System
RDBMS - Relational Database Management System
SGML - Standard Generalized Markup Language
WWW - World-wide Web
XML - Extensible Markup Language

Author Details

Stephen Emmott
Web Editor
Press & Public Relations Office
rm 4.14, Waterloo Bridge House
57 Waterloo Road, London SE1 8WA
http://www.kcl.ac.uk/
tel: +44 (0) 171 872 3342; fax: +44 (0) 171 872 0214
.gamut - Web Editors’ forum:
http://www.gold.ac.uk/gamut/