Web Magazine for Information Professionals

MCF: Will Dublin Form the Apple Core

Jon Knight looks at how Dublin Core and Apple's new MCF metadata file format might make useful and interesting bed fellows.

For many years librarians and computer scientists have been researching and developing metadata standards and technology. Although library OPACs are obviously commercially viable systems for maintaining metadata about hard copy resources, they are something of a niche market still. With the explosion in information provision on the Internet, this niche metadata market is set to explode itself, as an increasing number of companies develop a commercial interest in the provision and support for indexing, cataloging and navigating Internet resources.

One major computer vendor that has started to make a concerted push into metadata standards for online resources is Apple Computer. As part of the 'Project X' research programme Apple has produced a metadata file format called the Metadata Content Format (MCF) [1] MCF is a text based file format that provides an extensible structure for encoding and transporting metadata. If it were just a closed metadata file format MCF would be a relatively weak foray into the metadata arena as the associated metadata content that Apple has devised so far are relatively simplistic. However MCF has a number of major points in its favour which may well make it a technology worth watching.

A growing toolbox

The first plus point for MCF is that it has a growing number of tools available to aid the generation of the metadata and that make use of this metadata in novel ways. The most notable of these tools is probably Apple's HotSauce [2]. HotSauce is available as both a standalone application and a web browser plugin for Apple Macintoshes and Wintel boxes running Windows95/NT. HotSauce provides the end user with a way of visualizing MCF files in such a way as to give a "hotsauced" site a three dimensional graphical site map. This map can be navigated around by "flying through" the graphical representation of the site's structure generated from the metadata held in the MCF file.

The 3D view of a site's structure can provide the end user with more cues as to where they can go and what parts of the site are likely to hold the most data. There are certainly human computer interaction issues that still need to be addressed (such as how to deal with very large sites with a broad, flat hierarchy in a way that prevents the screen from becoming too overcrowded with objects) but this does appear to over a very interesting new way of making use of relatively low quality metadata to aid navigation.

In the case of the current release of HotSauce, little information aside from the object's title, its URL and the URL of the hyperlinks from it is used to generate the representation. This information can be generated by either using the HotSauce tool itself to make and edit MCF files, or by using one of the growing number of third party MCF generating scripts. An example of the use of one of these scripts gives you the ability to let you fly through the eLib programme information[3], which was generated by Andy Powell from information held in the ROADS [4] based eLib project database. This demonstrates how MCF can be generate automatically, not only from HTML documents held within a local web site but also from hand generated catalogues and robot generated indexes on a subject basis.

Dublin Core in MCF

The next plus point of MCF is its very minimalism. It is a simple format and yet provides plenty of room for expansion. The simple syntax makes it relatively easy to write new, homemade scripts that will be able to read and write MCF files. The flexibility allows MCF to carry metadata other than the simple titles and URLs that HotSauce currently uses. Apple have already said that they are interested in incorporating the Dublin Core Element Set [5] within MCF files.

This is good news; one of the goals of the entire Dublin Core development to date has been to build an abstract, lowest common denominator metadata content model that can be expressed concretely in a number of ways. Existing proposals are on the table for the concrete representation of Dublin Core metadata with HTML2.0/3.2 META elements [6], WHOIS++ document templates [7], USMARC records [8] and an SGML DTD [9]; adding an MCF based concrete representation of Dublin Core seems to be a natural and sensible step for Apple and MCF's supporters. It also forwards the goals of the Dublin Core supporters to get something out there.

By incorporating Dublin Core into MCF, Apple are gain a number of advantages. Firstly they can gain leverage off existing metadata that is either directly in a Dublin Core concrete representation (such as Dublin Core metadata embedded within HTML documents) or can be translated into Dublin Core concepts via a mapping function (such as extracting the "core" out of a USMARC record). This means that the tedious and troublesome task of creating large amounts of metadata to "pump prime" the MCF format has effectively already been done for them; all they need to do is to write a few tools that can grab this metadata, convert it into a Dublin Core format if necessary and then stuff it into the MCF files.

Once this is done, end user tools such as HotSauce and its successors can present the user with a much richer information environment. Rather than simply relying on the titles of objects to guide the user's selection whilst browsing, they could provide the user with detailed descriptions and abstracts or allow arbitrary groupings to be set up based on controlled subject vocabularies held within the Dublin Core. Resources that have had their MCF metadata extracted from, or extended by, information provided by a quality review sites (such as the eLib funded SBIG like ADAM, EEVL, OMNI and SOSIG) might be given some visual highlighting to make them stand out from the rest of the resources at a site. The richer the metadata available, the richer the navigation experience can be made for the end users.

The second advantage for Apple in incorporating Dublin Core into MCF is that they may be able to take advantage of the interest being shown in Dublin Core in general by the developers of the various Internet indexing and directory services. If authors can generate Dublin Core metadata this is embedded in their documents and this can be easily turned into MCF files by the webmaster of the site that mounts the documents, then the robot indexers could grab a single MCF file that describes in detail all the resources available on the site, in much the same way as they currently check robots.txt for permission to index the server. In many respects this is what Martijn Koster [10] did some years ago with Aliweb [11], but with added advantages that the metadata can also be used for improving browsing navigation for end users as well as running a search engine.

This provides benefits to all parties concerned; the authors get to describe their resources in detail in a way that they feel will appeal to their target audience. The webmaster of a site will only have to run a local DC-to-MCF converter at regular intervals on their local sites in order to reduce the traffic due to robots. The indexing services in turn only have to pull a single file from sites carrying a DC enhanced MCF file, which will speed their indexing operation. Apple benefits because MCF will be used more and so there will be more metadata for their tools such as HotSauce to process, making the tools more attractive to end users. Lastly the Dublin Core community will benefit by having more commercial backing

Conclusions

Apple's MCF is a relative newcomer to the metadata field. However it is a promising commercial entrant into this arena and the option of having Dublin Core metadata elements held in MCF files appears to offer a number of advantages. Whether the three dimensional representation of metadata structures provided by tools like HotSauce will turn out to be more than a passing fad remains to be seen. It does open up interesting new possibilities in applying HCI research and irrespective of whether HotSauce itself takes off or not, MCF looks like a format with a future.

References

[1] Metadata Content Format (MCF),
http://mcf.research.apple.com/

[2] Hotsauce,
http://mcf.research.apple.com/hs/download.html

[3] eLib MCF fly-through,
http://www.ukoln.ac.uk/ROADS/MCF/elib.mcf

[4] ROADS eLib project Web site,
http://www.ukoln.ac.uk/roads/"

[5] Dublin Core Element Set,
http://purl.org/metadata/dublin_core_elements

[6] HTML 2.0/3.2 META Elements,
http://www.ukoln.ac.uk/ariadne/issue5/metadata-masses/

[7] WHOIS++ template records,
ftp://ds.internic.net/internet-drafts/draft-ietf-asid-whois-schema-00.txt

[8] USMARC records,
http://lcweb.loc.gov/marc/dccross.html

[9] SGML DTD,
http://www.uic.edu/~cmsmcq/tech/metadata.syntax.html

[10] Martin Koster,
mak@webcrawler.com

[11] Aliweb,
http://www.nexor.co.uk/public/aliweb/

Author Details

Jon Knight works on the ROADS eLib project at the University of Loughborough, UK
Email: jon@net.lut.ac.uk
Personal Web Page: http://www.roads.lut.ac.uk/People/jon.html