QMSearch: A Quality Metrics-aware Search Framework

aaron krowne; urvashi gadi

QMSearch: A Quality Metrics-aware Search Framework

Aaron Krowne and Urvashi Gadi present a framework which improves searching in the context of scholarly digital libraries by taking a 'quality metrics-aware' approach.

In this article we present a framework, QMSearch, which improves searching in the context of scholarly digital libraries by taking a 'quality metrics-aware' approach. This means the digital library deployer or end-user can customise how results are presented, including aspects of both ranking and organisation in general, based upon standard metadata attributes and quality indicators derived from the general library information environment. To achieve this, QMSearch is generalised across metadata fields, quality indicators, and user communities, by abstracting all of these notions and rendering them into one or more 'organisation specifications' which are used by the system to determine how to organise results. The system is being built as open source software on top of Apache Lucene, to afford sustainability as well as state-of-the art search engine capability. It is currently at the working prototype stage. Herein we chiefly motivate and explicate the model, architecture and development of QMSearch. We also give a summary of the first round of our focus group studies, upon which the development work is based.

While the field of information retrieval (IR) has made great strides in the past few decades, we have found in our work on the MetaScholar Initiative [1] that many problems remain unsolved, particularly in scholarly settings. Not only are general-purpose metasearch engines and scholarly databases not completely fulfilling to end users, but the IR field [2] has largely been dealing with a specific setting of metadata-poor, general-purpose documents queried by general-interest users. This setting, as it turns out, is far from comprehensive, and does not match with the scholarly environment.

The IR field has produced many useful metrics--such as precision and recall to evaluate search engine performance--but we have found in our investigations that these metrics fall short of addressing many clarity and usability issues scholars encounter in digital library metasearch systems. Further, we have come to question the model of having results as a simple linear list, with quantitative evaluation taking the form of objective functions on that list.

On the contrary, we find that metrics like precision and recall cannot tell the digital librarian how different types of metadata should be broken out, emphasised, faceted, filtered, hidden or revealed. Their usefulness is also biased towards determining the efficiency of known-item retrieval, rather than telling us much about user comprehension and general satisfaction with the metasearch system.

We have also noticed that little formal exploration has been done regarding the needs of different kinds of users (e.g. novice vs. expert, undergrad vs. grad student vs. professor vs. practitioner, etc.) or modes of usage (e.g., free exploration vs. known-item retrieval, dabbling vs. working in one's depth, etc.). These 'facetisations of user space' suggest that most retrieval evaluation work to date has been of limited scope relative to the real world.

The Scholarly Digital Library Setting

A key aspect of the data-rich environment of the digital library is that there is a large quantity of latent information which can (to varying extents) be used to make inferences about how desireable primary information resources are to users. For example, access logs convey information about usages (popularity), citations convey information about scholarly interest, inclusion (as in ''bookbags" or reserves lists) conveys information about pedagogy, and so forth. We call this quality information in general and quality indicators in specific.

Below we list some quality indicators (i.e., record or metadata attributes or aspects) which typically carry a significant level of discriminatory value for the retrieval task:

rating
vettedness (how thorough is review)
popularity (circulation/activations/views)
granularity (collection/archive vs. box/series, individual item)
has publication (yes/no, 1/0)
reference/citation linkage (cites, is-cited, or both)
selection (e.g for pedagogical purposes, or for aggregations/sequences by any user)
categorisation

Previous work [3] has shown that even when sparse, this kind of information can be collected and integrated into retrieval to yield significant additional utility. We believe it behooves digital libraries to begin to incorporate this information if they wish to add unique value to users in today's metasearch landscape.

The Quality Metrics Project

These insights have limited effect if digital libraries do not have tools to act upon them. In that spirit, we have undertaken in the Quality Metrics project to build a working prototype of a quality metrics-aware digital library metasearch system, called QMSearch. We plan to make this system openly available to the digital library community (and of course, any other parties who wish to deploy more flexible search functionality). We intend this paper to do double-duty; besides reporting, as a call for others in the research and practice communities to try QMSearch in their settings and help it to take shape.

Development Goals and Requirements

Contemporary search systems are generally based upon technical analysis of the search problem and assumptions about what users want, for an essentially general notion of documents and search inquiry. In our estimation, there is a dearth of information in the digital library and information retrieval field on the fundamentals of what users want and need out of search, particularly in a setting with rich metadata, heterogeneous objects, and extensive (though often latent) quality information. Thus, our primary goal with Quality Metrics is to undertake a fact-finding mission among scholarly users, to determine what their needs and expectations are in this setting.

Our second broad goal, and the focus of this report, is to build a working search system embodying the solutions to the above fundamental problems. Finally, the third goal is to test this system to determine how well we have solved the problems identified in scholarly metasearch systems. This last part, being led by Virginia Tech, will take the form of user studies with a more quantitative methodology, to be reported on in the near future.

In the remainder of this section, we flesh out some of the key requirements which have shaped the development of the QMSearch system, satisfying the second broad goal of the project.

System Requirements

The System Must Treat Organisation Beyond Ranking

There is more to retrieval presentation than just ranking, and QMSearch must be built to support some of these other kinds of organisation. The system must be able to create flexible groupings, delimitations, embellishments, and so forth, for returned objects of a heterogeneous nature.

This need was instrumental in inspiring the entire project, as we noticed in previous MetaScholar activities [4] that in our heterogeneous collections, users had little idea of what 'kinds' of things they were looking at. Such is the hazard of any metasearch system by definition, and in fact, many metasearch systems we surveyed made some attempt to address this problem. However, none did so comprehensively, in a way that was transferable to other settings, either in terms of models or systems or both [5].

The System Must Rank Based on Any Number of Indicators

One consequence of the specialised focus of the information retrieval field has been that it has remained wedded to a notion of ranking based upon content-query similarity, with occasional expansion to an additional metric, such as link network-based metrics (e.g., Google's 'PageRank' [6]). Yet, as discussed above, we know that much more information is available which conveys, or could be interpreted as conveying, some level of information about quality (or fitness). Thus, a major developmental goal of our project is to find some way to integrate this kind of information into the search system, without building a new one from scratch for every distinct deployment. Such instances will generally integrate different quality indicators and/or necessitate a different ranking formulas, but should not have to be 'hardcoded' from scratch each time.

The System Must be Modular

The system has to be modular, so that digital library developers can 'drop it in' to their existing settings. In general, we favour a componentised paradigm for digital library architecture (as in ODL [7] or OCKHAM-xform [8]), and we believe that sophisticated search functionality is a good example of a component that should be separated out.

Further, we find that in our experience, it is unlikely that a digital library system which excels at functions like harvesting and aggregating, browsing, or management, will also be likely to excel at search. Perhaps more importantly, we find that search functionality is especially dependent on scenario, in a way that extremely-standardised modalities like harvesting and (increasingly) repository storage are not (thanks to initiatives like OAI [9] and Fedora [10]).

In fact, we hope a major outcome of our work will be to standardise digital library metasearch, modelling it generally in a way that affords more capable core functionality and better interfacing with other digital library components, without the 'watering-down' effect typical of one-size-fits-all solutions.

The System Must Address the Digital Library Setting

We are a library, so we are naturally interested in solving the metasearch problem for our stakeholders. The digital library setting is much different from the general Web setting. While metasearch exists in both, characteristics of the users, inquiry scenarios, and data diverge significantly between the two. Table 1 breaks down some of the key distinctions. As will be made evident in a later section where we report on the focus group studies, all of these aspects have practical implications with regards to scholarly metasearch.

Table 1: Some differences between the Web and digital library settings, which are key for metasearch in the two realms.

Digital Library Setting	Web Setting
- rich metadata attributes - controlled metadata - a milieu of quality information - domain specialisation - scholarly community, subcommunities - open, academic	- general purpose - no particular target community - commercial (spam and ads) - poor or no metadata - metadata not uniform - obfuscation of ranking

Theoretical Model

In this section we describe our theoretical model which formally defines all of the conceptual structures and functionalities necessary to solve the core problems described above, as well as to better meet the search needs of digital library users in general.

Theoretical Goal

Our goal is to produce a model which will always allow us to present records in a way that is based solidly upon intuitive notions of their inherent quality or fitness, despite varying, missing or unreliable underlying quality indicator information. This guiding principle of resilience is a property existing ranking frameworks tend to lack.

The thrust of our solution is to achieve this kind of resilience by integrating into comprehensive quality metrics as many of these indicators as possible, and to do so without confusing or overwhelming either the digital library deployer or the end user. While some of this indicator information might not be present for some individual records, subcollections, or even entire digital libraries, complementary information from other indicators is still likely to be present. Thus, in most real-world circumstances, there should be considerable value to be added by the QMSearch framework.

5S Rendering of QMSearch

We couch our model in the language of 5S; an extant digital library modelling system which breaks DLs down into structures, streams, scenarios, spaces, and societies [11][12]. 5S represents these entities, and their relationships to each other, in a formalised manner. Below we sketch the presentation and scoring sub-models of QMSearch, including their interactions with each other and all of the 5S elements:

Streams - Search results can be modeled as record identifiers delivered in streams. Records themselves are streams of XML metadata, which may be returned as part of the results stream.
Structures - Search result streams have structure (linear ranked ordering, separation into logical bins, graph-theoretic connections, linkages to taxonomies, etc.).
Spaces - Spaces are used to render the structure onto search result streams. For example, in traditional IR, you have a stream of search results structured (ordered) linearly. This is a one-dimensional rank space, which is discretised from a continuous, one-dimensional rank space. The scorings are generated using another space, such as a vector space or probability space.

In our model, we are extending the results organisation space into two or more dimensions, and adding binning into the mix. As an example of how this might be instantiated, the first second dimension could be viewed as a separation into bins based on media type (image, video, text, sound, etc.). The second dimension, 'within' this first one, could be the usual text similarity (rendered as a linear list).

Attributes used to map objects to different dimensions and positions along these dimensions should be configurable, because one might want to use various metadata elements and quality indicators for organising along the desired number of dimensions.
Scenarios - In different scenarios (serving different information needs) user notions of value and interest change. Also, the expected mappings of attributes and values to the results organisation spaces dramatically changes. These different valuations and selections of attributes of interest should be considered by the system.
Societies - Different societies have different notions of value, which attach to the various content and collection attributes we consider. As above, different societies may have different typical notions of how the attributes should effect organisation of results into spaces. Societies also have different sets of frequently-engaged-upon scenarios, as above.

This all suggests that valuations and attributes of interest should facet for a given user based, firstly, upon the society or societies they belong to, secondly, upon their own unique characteristics, and finally, upon the specific scenarios they are engaged in as they interact with the search system.

Model Detail

The detailed theoretical modelling of QMSearch can be broken into two connected but conceptually distinct sub-components:

The presentation model - Determines how objects should be organised logically, given their attributes and valuations delivered by the scoring model.
The scoring model - Determines how quality indicators (either explicit metadata attributes or latent information which must be mined/inferred) are fused into scalar score values upon which the presentation model is built.

The presentation model in fact is closely-tied to much of what is typically described as 'visualisation' as well as 'presentation' or 'reporting'. Fundamentally, it deals with establishing the informational basis for all of these activities, all of which might be pursued as the end of the process of digital library searching. Thus the presentation model must solve both the 'dealing with overload' and 'dealing with heterogeneity' problems.

The scoring model actually extends beyond scoring to the gathering of information (quality indicators) which is necessary to perform scoring. Therefore it addresses both the 'heterogeneity' and 'sparsity' problems, as it deals with extracting and integrating (potentially latent) quality information.

The Presentation Model

In our model, we are abstracting the notion of a dimension of organisation separately from the notion of a quality indicator. A dimension is some aspect of organisation which relies upon a score (which is just a scalar value) to order and group items. This score may be made up of one or more indicators. Thus, an organisation of results may integrate more quality indicators than dimensions, with the additional indicators grouped together and mapped to the dimensions by way of the scoring function. This grouping provides much of the resilience described earlier.

In Figure 1, we give a diagrammatic illustration of these aspects of the presentation model. Shown are two presentation 'views' incorporating the same three underlying metadata attributes- 'vettedness' (our made-up term for degree of peer-review), domain (which we define as OAI repository of origin in this case), and query-content similarity (the usual search engine relevance metric). These attributes are hypothetical, but realistic.

In this figure, each square (or cube) should be thought of as a set of objects 'in the same bin', as determined by whether their keys fall into the appropriate range for the corresponding dimension. The axes point in the direction of increasing score. The top portion of the diagram illustrates a logical 'slicing' process, by which we can imagine zooming-in on the 'best' object in the assortment. The bottom portion displays 'mock-up' screen shots, illustrating a potential way to render the corresponding presentation model in a standard 2-D Web browser interface. This demonstrates the correspondence between dimensions and implicit 'screen axes', as well as the way indicators and metadata attributes play into this relationship.

diagram (81KB) : An illustration of the presentation model aspect of QMSearch. Two views are shown, both built upon the same three quality indicators (vettedness, content-query similarity, and OAI repository domain). However, the overall effect of the views are very different because of the different ways the underlying indicators are mapped to display axes and rendered into a final presentation.

Figure 1: An illustration of the presentation model aspect of QMSearch. Two views are shown, both built upon the same three quality indicators (vettedness, content-query similarity, and OAI repository domain). However, the overall effect of the views are very different because of the different ways the underlying indicators are mapped to display axes and rendered into a final presentation.

The two hypothetical views shown render these attributes in two different dimensionalities--the first view contains two dimensions, the second three (each logical dimension appears as an axis in this conceptual sketch). Note that in view 2, each indicator used corresponds to one axis. However, in view 1, vettedness and query-content similarity are 'squeezed' into a single axis. This is done with the help of a combination function which takes the two attributes and yields a single score.

When noting the 'presentational rendering' portion of the diagram, it becomes apparent why one might want to select a different number of axes for the same set of attributes: because these different dimensionalities support radically different presentations. View 1 has a natural rendering as a familiar 'tabbed' display, with the 'domain' axis corresponding to the tabs, and the vettedness + content-query similarity axis corresponding to vertical rank. Within each tab, we essentially have the same presentation model as Google, which fuses content-query similarity with inferred PageRank values for each object. However, this is only the vertical organisation: simultaneously we allow the user to switch between various 'horizontal' domains (Web, images, library holdings, books, etc.).

View 2 allows us to construct an 'A9-like' display of the results [13]. Here, three dimensions are all available, despite the fact that the display is only two-dimensional. This is achieved by mapping domain to columns, vettedness to vertical panels, then query-content similarity to vertical organisation within vettedness bins of the same value.

Perhaps the most important innovation of the presentation model is this packing of many (but arbitrary) indicators into a single logical dimension of organisation (and therefore presentational axis). This means the system deployer (or even the end-user) has the potential ability to select any and all indicators considered important for retrieval, and display them in as few or as many dimensions as necessary for clarity.

In sum, each display axis used by the presentation model is predicated upon the following components:

An underlying scoring function which generates a scalar value based on one or more metadata attributes. When two or more attributes are the inputs, we call this a combination function.
A function which sorts the scoring values along the axis, given the type of the value (integer, real, character, etc.). Such a function is generally obvious and natural given the score and underlying attributes.
A function that bins the values. This function is responsible for giving us the 'solid block' model, as opposed to a scatter-plot field of points corresponding to each object (as in an n-dimensional vector space). As shown in the presentation model diagram, such a function is critical for enabling presentational elements such as tabs or panels.

Note that the scoring function is the nexus between the scoring model and the presentation model. Given this model and the above functions, we can thus say the digital library deployer must specify (at minimum) the following items in preparing the presentation of a QMSearch system:

Number of display axes (equivalently, logical organisation dimensions).
The grouping of indicators to axes.
The scoring/combination function for each axis.
The sorting function for each axis.
A binning function for each axis.
The stylistic aspects of the presentational rendering.

The Scoring Model

The scoring model is critical for mapping quality indicators to display dimensions. The problem of scoring encompasses both the translation of explicit metadata fields into scalar scores, as well as inferring/extracting new indicators which can subsequently be translated into scores. An important part of what the scoring subsystem does is to gather sparse information and make it 'dense'.

In Table 2 we give some examples of indicators, whether they are typically explicit (metadata fields) or implicit (part of the general library information environment), where they originate, and the kind of scoring function one might expect to be built upon them to produce an indicator.

Table 2: Quality indicators and potential scoring functions, with attention to whether the attributes are based upon implicit or explicit data, and what the data source is.

indicator	type	data source	scoring function
rating	explicit	numeric ratings	AVG (ratings)
vettedness	explicit	peer review data / publication venue	count (reviewers) / trust (publisher)
citedness	implicit	citation links	Amsler, etc.
popularity	implicit	activation records	%age of views
granularity	explicit	containment data	1/0 (collection / item)
topical sim.	implicit	co- classification, activation / selection by users in same affinity group	sim(topic(query), topic(doc))

Whether a scoring function is based on explicit or implicit indicators actually depends on the situation, and is not universal. For example, in Table 2, 'popularity' was classified as an 'implicit' indicator. However, one could implement this indicator as a simple count of views of a record, thus making it quite explicit. In this case, one would be losing some fidelity, as one could not distinguish between kinds of activation. An a posteriori estimate of popularity might instead be based on sophisticated analysis of log data, which would be more of an implicit version of the indicator.

Scoring based on implicit indicators may require offline computation, but scoring based on explicit ones never does. This is because explicit indicators are encoded as actual metadata fields and are usable in scoring computations with minimal transformation. For example, a vettedness-based score requires no offline computation if peer-review data is stored in a relational database as links between review records, people, and objects, but offline computation would be required if this information incorporated some latent element. In this example, such a thing might be necessary if an impact factor of publication (journals) were used to estimate trustworthiness.

The scoring model addresses the central "quality metrics" nature of this project, because it has the ability to take latent and explicit digital library information and turn it into actualised scores which are representative of object quality. These scores can then be manifest to the user through the presentational system.

Related Work

Many aspects of our model closely resemble existing work done at Microsoft Research on the 'Data Cube'; a relatively recent relational operator developed to facilitate the comprehension and manipulation of multidimensional data objects [14]. We extend the data cube model here by delving into the aggregation functions (our combination functions) and rendering the selected objects into a presentation.

Thus, others have recognised that there is a need to better comprehend and present multifaceted data objects. In our case, these objects are metadata records in a digital library metasearch engine, as opposed relations in a DBMS.

Instantiating the Model

So far we have discussed how an organisation of results is made up of one or more logical dimensions (which map to display 'axes'). Each dimension has a scoring function, which maps one or more quality indicators to a single scalar value. This value is then used by a binning function to separate the items out into bins for that dimension.

At this point we supply the missing pieces of the puzzle. Firstly, all of the above elements must be defined and definable somehow. This is done through the device of the organisation specification (or "org spec" for brevity's sake). Secondly, we must account for the presence of many scenarios and societies. This is done through the definition of multiple org specs by the digital library deployer. Such multiple org specs select their dimensions and quality indicators (as well as how they are combined) based on the usual needs and values of the constituent societies of the digital library, as well as the usual scenarios of their members.

In Figure 2, some example org specs are shown, in the XML format we have defined. These specs come from the same QMSearch deployment and define two alternative "profiles" for viewing search results for the collection. We omit a complete schema for the org specs for brevity's sake (as well as the fact the format is still changing).

 <organization>
      <dim name="collection">
         <key>
               <metadata>oaiset</metadata> 
    </key>
         <binning type="natural" />

    <dim name="textsim">
               <key>
                    <metadata>score</metadata>
               </key>
               <binning type="trivial" />
         </dim>
     </dim>
</organization>

<organization> 
    <dim name="collection">
          <key>
                <metadata>oaiset</metadata>
          </key>
          <binning type="natural" />


          <dim name="views">
                <key> <metadata>views</metadata>
                </key>
                <binning type="fixed">4</binning>

 
                <dim name="textsim">
                      <key>
                             <metadata>score</metadata>
                      </key>
                      <binning type="trivial" />
                </dim>
          </dim>
     </dim>
</organization>

Figure 2: Sample org specs. Top: This org spec contains two dimensions of organisation. The outer is named "collection" and has the semantics of grouping items based on the value of their oaiset field ("natural" binning provides for one bin per key value, oaiset in this case). Within this is a dimension which ranks linearly based on the usual text similarity score ("trivial" binning places all of the results in a single bin). This org spec underlies the screenshot shown later in this paper. Bottom: This is a three-dimensional org spec. The outer dimension is the same, based on oaiset, as is the inner-most, based on text similarity. However in this case a middle dimension is added which partitions the records into bins based on their value for a views indicator, which is split into four ranges. The end result is a layout that could be visualised as lists inside table cells, as opposed to inside columns. The table cells would correspond to a (collection, view range) pair.

Another important aspect of results presentation is the notion of filtering. This is the act of narrowing-down a results set (or other collection) based on attributes of items in the set. Typically this means requiring that certain metadata fields have a certain value or fall within a certain range, and is very familiar to users through "advanced search" interfaces.

In our system, we provide the ability to do this kind of filtering within an org spec through use of a <filter> tag (not pictured in the examples). Within this tag, arbitrary Boolean clauses based on metadata fields can combined into a filter for the results set. The processing of these filters is independent of the core retrieval and quality metrics organisation functionalities. Since there is conceptually nothing new here, we will not delve further into the topic.

Finally, the results returned by the QMSearch system must be formally specified in some way. This output must reflect the organisation for the results as defined by the org spec given at the time of the search. We have made this provision by defining another XML format for the output stream, which mirrors the input org spec by having the same dimension structure. On top of this, it adds bins to separate out results within the dimensions, and of course the records themselves (along with their metadata/indicator fields). An example of this output format, based on real-world results, is shown in Figure 3.

 <qm_search_output> 
  <query user="black life">
    <term>black</term>
    <term>life</term> 
  </query>

<dim name="collection"> 
  <bin ord="0" value="Florida Center for Library Automation - Florida 
 Environment Online" count="2" isNull="0">

    <dim name="textsim">
     <bin ord="0" count="2">

       <item id="oai:harvester.americansouth.org:record/25216">
        <score dim="collection">Florida Center for Library
        Automation - Florida Environment Online</score>
        <score dim="textsim">0.43918777</score>
 
        <metadata>
          <datestamp>2006-03-10T17:40:30Z</datestamp>
          <identifier> oai:harvester.americansouth.org:record/25216</identifier> 
          <url>
          http://www.americansouth.org/viewrecord.php?id=25216</url> 
          <title>The widow spiders of Florida [electronic
          resource] / John D. McCrone, Karl J. Stone.</title>

                          ...
         </metadata>
        </item>

               ...

     </bin>
    </dim>
   </bin>

   <bin ord="1" value="Florida Center for Library Automation - Florida 
    Heritage Collection" count="2" isNull="0">
     
       <dim name="textsim">
         <bin ord="0" count="2">
        
          <item id="oai:harvester.americansouth.org:record/30329">
           <score dim="collection">Florida Center for Library
            Automation - Florida Heritage Collection</score>
            <score dim="textsim">0.47943082</score>

            <metadata>
             <url>
              http://www.americansouth.org/viewrecord.php?id=30329</url>

              <title>Twelve Black Floridians, by Leedell W.
              Neyland.</title>

              ...
            </metadata>
          </item>

               ...

         </bin>
      </dim>

    </bin>

    ...
    
   </bin>
 </dim>
</qm_search_output>

Figure 3: A sample of QMSearch output. This output fragment (compressed for comprehensibility) corresponds to the collection/repository column-based profile which is shown rendered in the screenshot Figure 7.

All of the conceptual and organisational structures of QMSearch are shown in the diagrams of Figure 4. These diagrams show both the containment relationships as well as the cardinality of the elements involved in the system.

Other details, such as where latent quality indicators come from, will be discussed later in the Architecture section.

diagram (36KB) : Diagrams of key theoretical constructs of QMSearch. Left: Org specs as profiles making up a QMSearch deployment. Right: The entities that make up a result set returned by QMSearch. Right: The entities that make up a result set returned by QMSearch. Both: Arrows mean contains. Quantifiers +, *, and ? mean 'one or more', 'zero or more', and 'zero or one', respectively. The tag d connects the two diagrams, and represents the number of logical dimensions of the results set

Figure 4: Diagrams of key theoretical constructs of QMSearch. Left: Org specs as profiles making up a QMSearch deployment. Right: The entities that make up a result set returned by QMSearch. Both: Arrows mean "contains." Quantifiers +, *, and ? mean 'one or more', 'zero or more', and 'zero or one', respectively. The tag d connects the two diagrams, and represents the number of logical dimensions of the results set.

Results of Focus Groups

This section consists of an overview of our findings in the first phase of the focus group investigations. The focus groups were held at Emory University, during the fall of 2005. Nine of these introductory focus groups were held in total, consisting of graduate students and faculty members from Emory humanities and sciences programs (as well as a few staffers from the library). The participants were focused towards mock-ups illustrating the proposed QMSearch system, as well as the facilitators' questions and prompting about the participant's experiences and the underlying concepts.

There are a number of reasons we opted to use focus groups for the initial, and bulk of our investigation. We find that focus groups are very good when there is a general, but not definite idea of how to solve a problem or address user needs. Having this initial idea gives some kernel that can be commented-upon by participants, who (along with the moderator) are able to give feedback based on their own experiences, opinions, needs, and situations, usually coming to some understanding of differences and similarities. This property makes focus groups very good for requirements elicitation and participatory design. Focus groups are also economical, as a single moderator (potentially with supporting note-takers) can 'cover' many users with each session. Rather than spending 27 hours to to interview the same number of users, focus groups (averaging three participants per group) allow us to do something similar in only nine hours.

Research Hypotheses

Before delving into the findings, it is necessary to provide some perspective by making the research hypotheses explicit. These are basically that:

The digital library setting is different from the general Web setting because it has richer metadata, more focused purpose (e.g. by discipline or topic), and carries more information about a particular scholarly community.
However, metasearch is still useful (and highly-demanded) as a paradigm in this context.
Attributes of digital library information, either explicit (encoded in metadata) or implicit (needing to be extracted by analysis/data mining) convey information about the quality or fitness of resources with respect to each inquiry
This quality information can be used to better-organise search results to make the search-centric inquiry process more efficient and fulfilling for users.
Notions of quality will be subjective, varying from DL to DL, subcommunity to subcommunity, individual to individual, and even inquiry to inquiry by the same individual.
Despite this apparent sophistication, a metasearch system could be built that affords the utility described above by exposing and allowing manipulation (either by digital librarians or end users) of quality indicators and their bearing on results organisation (i.e., customisation of quality metrics in metasearch).

This progressive chain of hypotheses, informed by our general experience, intuition and past studies under the MetaScholar initiative, guided our development of the theoretical model and our investigation for the initial round of focus group interviews.

Major Focus Group Findings

The focus groups were very successful in the sense of confirming almost all of the research hypotheses above. What mostly remains to be seen is how useful the QMSearch system is to users. This is the role of subsequent focus groups and the user studies we are initiating at Virginia Tech.

The key aspects of our hypothesis that the focus groups confirmed were:

Metasearch is seen as a useful, if not the most salient paradigm, but existing Web/metasearch solutions fall short for comprehensive scholarly inquiry. This manifested as lack of comprehensiveness, poor/opaque ranking, and fragmentation, among other complaints.
Metrics that involve some sort of value judgment in their constituent indicators are seen as useful, but must be balanced with alternatives and given in a fully transparent way. This transparency is perhaps the most-emphasised point we heard, and is a major failing of existing offerings across the board.
Metadata is seen as crucial for faceting the results display. This is a major 'win' over the generic/Web metasearch scenario, in that scholars actually desire to make use of rich metadata.
A wide variety of indicators are seen as useful but in different ways by different people. Similarly, different interfaces are liked and disliked by various individuals.
Users do desire the ability to manipulate how ranking/organisation are done, seeing this in basically the same light as 'advanced search' is currently.

Other major findings that we did not anticipate but which did not surprise us were:

Users want feedback about availability of library items (e.g. full text or not, at a subscribing library or not, ability to print or not, etc.) Searchers want to seem simplest interfaces first, but still have the ability to move to more advanced manipulations progressively. Most scholars use both basic and advanced searches.
Most scholars are actually using resources from all sorts of 'realms', including the Web, the library, professional organisations, and the mainstream press. Yet, scholarly metasearch portals rarely draw upon all of these realms.
There is widespread use of commercial/general metasearch systems like Google, Google Scholar, Amazon A9, Amazon's book search, and so forth. Sometimes these are used in unexpected ways, such as to explore topics in which the scholar is not an expert, to look up general "encyclopedic" information, or to retrieve full text copies of items already located in academic search engines.
Scholars are even using ratings and reviews from the general public, particularly younger scholars.
Terminology for metadata fields, quality indicators, and display facets can be very troublesome.

Many of the findings boil town to the insight that scholars do not want to be told what is good, but rather to be given transparent tools to apply their own nuanced and inquiry-specific notions of what 'good' is. This capability goes far beyond the metasearch systems that are available today, and seems to counter the commercial-sector wisdom that the search system should simply provide results without any details of how they were attained.

Other Findings

In addition to confirming much of what we already expected, the studies provided rich serendipitous insights. We began integrating many of these in subsequent focus groups as we progressed, as well as in the design and architecture of QMSearch, which was being continuously refined.

Some of the findings are likely to remain outside of the scope of this short project, though we certainly hope to follow up on them with other grant projects or ongoing library systems engineering processes.

Some key unexpected findings were:

Relational information is very much in demand; e.g. to reviews, to categories, to and from citations, to different editions, and so forth. This kind of network information allows navigation that seems to be closely interspersed with search in the wider inquiry process.
There is a very strong desire to see 'under the hood' of the search engine in order to know how ranking is done and what it means, even if manipulation of ranking is not intended. In other words, knowing how ranking is done is a trust issue in addition to a functional issue.
We learned of many interesting indicators we didn't think of (e.g., religious vs. non-religious provenance, presence or absence of fields such as abstract, etc.)
A number of participants explicitly requested having more social feedback in the search system, seeing scholarly inquiry as more of a collaborative undertaking. Social information is generally seen as useful but subject to the subjectivity concerns above.

Other interesting unexpected findings:

Scholars are sometimes very averse to 'popularity'-based ranking. To exemplify, one told us he/she even goes to Altavista sometimes instead of Google because Altavista is based purely upon text similarity.
There was not much complaining about ads or spam. It is unknown whether this doesn't bother scholars or whether they are simply so used to the phenomenon that they do not think to mention it (however we did not really bring up this issue).

A final noteworthy finding is that scholars are very interested in the prospects of metasearch as applied to archival research. Many of them echoed a desire to integrate more archival information, even if it was not in the form of completely digitised records.

Discussion

The above findings do seem to suggest we are on the right track with regards to core design. That is, the theoretical model described earlier is basically correct. However, the findings also hint at some areas where modelling, design and implementation needs to be expanded. These are tackled later in the section on "Future Work."

Architecture

In this section we discuss the implementation of a system based on the above theoretical model.

Software Basis

By the start of the Quality Metrics project in 2005, it had long been the case that there were a number of free, open-source search engine systems or digital library systems that included some search facility. This gave us the opportunity to focus Quality Metrics on solving higher-level metasearch problems and to avoid re-inventing core search engine techniques long-since established. Thus, we resolved not to produce a search engine from scratch and to instead leverage some existing system. While none of the ones in our initial survey met the software development goals of Quality Metrics, many had very good search functionality [15].

The system we chose as the basis of our implementation was the Lucene search engine system, a part of the Apache Project. Besides Lucene's key qualities of having an extensive query syntax (encompassing both ranked keyword and Boolean search, as well as proximity search and much more) and its ability to store and search data in a fielded manner, there are a number of other attractive features of the Lucene project:

The system is well architected and coded.
It is modular in design (not coupled to any particular information system).
It is based on Java standards.
Lucene has high 'market share' and user recognition in the open source search engine market.
The project has strong developer and user communities.
Development is very active, while simultaneously capabilities are full.

These non-functional qualities of the system and the project itself improve the odds that our work will remain useful long after the end of the initial grant project phase. In particular, the open source nature of Lucene combined with the strong developer community means that future deployers of our work will be able to tweak QMSearch, potentially with the help of Lucene developers, if and when we do not have the resources to help them. The Lucene community also may be able to support aspects of deploying the system (aside from customising it). It is also likely some members of this informal community will take interest in QMSearch, when it is released.

Document Model

In Lucene, an index is made up of documents. As mentioned above, Lucene has a fielded conception of a document. This is ideal for our purposes in the digital library setting, because we can map metadata records to documents, and their elements to fields. A schematic diagram of a Lucene document is shown in Figure 5, along with a comparison of the implicit model of metadata encoding to the explicit document model of the Lucene index. Correspondences between the entities are indicated with dotted arrows.

diagram (32KB) : The Lucene Document class (left) and logical comparison to the data model of metadata in general (right). The mapping of entities between the two models, as in our framework, is indicated by dotted arrows. Solid arrows mean contains, and the arrow labels +, *, and ? stand for 'one or more', 'zero or more', and 'zero or one', respectively.

Figure 5: The Lucene Document class (left) and logical comparison to the data model of metadata in general (right). The mapping of entities between the two models, as in our framework, is indicated by dotted arrows. Solid arrows mean "contains", and the arrow labels +, *, and ? stand for 'one or more', 'zero or more', and 'zero or one', respectively.

Implicit in this mapping is a 'flattening' of metadata. That is, fields in Lucene documents cannot contain sub-fields. While this may seem a major problem, it actually is not. The first reason is that popular metadata encodings, such as Dublin Core, and the actual usage of other forms which allow nested elements, end up being flat. More importantly, metadata with complex structure is probably overkill in terms of what is exposed in a "standard" search interface. It is already rare enough that users use fields; we do not expect much demand for sophisticated querying of sub-fields [16].

Thus, Lucene's simple fielded document model gives us most of the capability we are looking for without much complexity. Lucene allows us to leverage the query model users are already accustomed to, without any addition development by us.

QMSearch Architecture

Since Lucene is so complete, we have needed no modifications to its internals in order to achieve the basic, core functionality of QMSearch. Instead, we have added a scoring layer which post-processes the Lucene search, and an analysis layer, which pre-processes the Lucene index (i.e. the underlying data which is being searched). There is also a presentation layer we have built with XSLT, however, it is only loosely-coupled to the whole system and could be replaced with nearly any kind of presentation component (discussed in detail later). This overall architecture of the system is shown in Figure 6.

diagram (64KB) : QMSearch system architecture

Figure 6: QMSearch system architecture.

The components of the system fit together as follows:

Harvesting or other forms of data collection are done, pulling together the information to be indexed. The DL deployer must provide the harvester, but there are scores of them available off the shelf [17].
The deployer provides a Lucene indexer, which normalises the metadata and loads it into the Lucene index (rendered as Lucene Documents). Writing an indexer already must be done in a Lucene deployment, so this is not an additional requirement of QMSearch.
Zero or more QMSearch analyzers (also provided by the deployer) are run on the index, adding indicators to the documents. These indicators are simply additional fields containing latent, data-mined information (which can be drawn from auxiliary data sources).
Core search functionality is performed by Lucene, on the resulting index. As far as Lucene is concerned, the data it is searching is no different than usual (it fits the same schema, discussed in Figure 5).
Search results go through the QMSearch scoring layer, which calculates zero or more quality metric scores based on metadata fields and indicators. The plan for how to do this is given in the organisation specification, discussed later. The scoring layer outputs the results in a structured XML form (also detailed later).
The presentation layer takes the structured XML and renders it as a user interface (our testbed does this with an XSLT and HTML + CSS + Javascript system). This layer also handles interface manipulations as well as dispatching modified queries and org specs to the search system core (Step 4 above).

The QMSearch system components are discussed in detail in the following sections.

Analysis

The analysis subsystem is actually quite simple. It is run after the indexing phase, and simply loops through the supplied analyzer modules and executes them. Each module reads the Lucene index, plus potentially any number of auxiliary data files supplied by the user for that particular analysis task, and distills this data into one or more quality indicator attributes that are added to (or updated in) the records.

The real complexity in the analysis subsystem is wrapped into each analyzer. They may do extremely simple tasks (such as reading a field from an external database and adding it to each record) or extremely sophisticated ones involving intensive data mining or computation. {C}{C}{C}

Scoring

The scoring system structures the results based on the org spec supplied with the query. The first step of this task is to go through all of the results returned by the Lucene search core and produce as many scores for each result as there are dimensions in the org spec. This is necessary because, as discussed in the theory section, each dimension of organisation necessitates a score which allows organisation along that dimension.

The second step is to then use these additional scores, along with the binning specification of each dimension, to hierarchically facet the results. For instance, if the score corresponding to the outer dimension combined with its binning results in three bins, then each of of those bins is recursively descended into and processed with respect to the remaining nested dimensions. A typical second dimension would be a linear list of results organised by text similarity. This dimension would then consist of only a single bin (the trivial bin), and there would be no recursion to further nested dimensions.

As discussed in the theory section, this process generates what is essentially a hierarchical facetisation of the output (or forest of facetisation trees). This is encoded as an XML format that looks something like the sketch given in Figure 3. The result set is also returned with supplementary information about the query and other attributes of the results (such as matches that weren't returned due to size limits) which is useful for contextualising and manipulating the output.

Presentation

Since the search output is delivered as a rigorously-defined, structured XML stream, any system which can read and interpret XML can use the data or construct a user interface around it. Due to the proliferation of XML tools and libraries, this supports very high portability of the system.

In our testbed deployment (which also constitutes the default for the distribution), we use XSL (along with CSS and Javascript) to turn our results sets into Web-based results displays and full-fledged functioning user interfaces.

In fact, our testbed search system essentially lacks a Web application server, despite being Web-based. The only module we have in this role simply takes the search parameters as CGI and passes them on to the QMSearch. It then returns the result as XML directly to the browser. The XML file is linked declaratively to the corresponding XSL file, so modern browsers will retrieve this file as well and apply the XSL transformation on the client-side. The end result is that the user sees a Web page (complete with interactive Javascript trimmings) instead of the raw XML file.

There are variants of this setup that still use the XSL-based interface. One attractive variant would be to use a stylesheet application server like Cocoon to do the XSL transformations server-side. A benefit of this would be to noticeably speed up the round-trip time of the request, as a separate fetch of the XSL file would not have to be done by the client. It is also likely that the XSL transformation can be applied faster due to a more powerful server and its ability to cache objects between requests.

Progress

Current Status

Currently we consider QMSearch to be approaching a "0.75" release. It supports full Lucene query syntax, org specs with unlimited nested and parallel dimensions, dimensions keyed on values that can be straight or weighted averages of fields, arbitrary Boolean filtering based on fields, and arbitrary analyzer modules. The current release comes with XSL templates to create interfaces and display results based on four testbed org specs [18]. A number of simple analysers are included, which we are actually using. These analyzers do little more than load and normalise data gathered from auxiliary databases, however.

Figure 7 shows a screenshot of one org spec and output template on our testbed system as of this writing. The content is based on the American South collection, which is a harvested collection based on various archives on the topic of the history and culture of the American South.

Figure 7: A screenshot of the QMSearch prototype system, as of March 2006. In this shot, the user has selected our 'M9' profile (org spec + XSL template). The collection is populated with content from the American South digital library.

Many features are visible in this screenshot. We consider it a 'two-dimensional' display because there are two logical dimensions which are encoded in the org spec: collection/archive (horizontal columns), and text similarity (linear lists of items within the columns). The overall effect is similar to Amazon's A9 [19], which arranges results for a query from many different sources in separate columns.

Also like A9, we have constructed a client-side interface so that the user can easily expand and collapse columns they are interested in (magnifying glass, plus and minus boxes). In the current settings, four columns (the ones with the most results) default to expanded. All remaining columns (four in this case) default to collapsed. A 'sticky' note floats over them to inform the user they can be expanded.

Certain features cannot be shown with a static screenshot, aside from the manipulation of these controls. In many places, mouse-over hints are present, explaining metadata fields, headings, and controls. For example, a mouse-over of the "+" or magnifying glass of a collapsed column gives the name of the corresponding collection.

Also not visible is the "more..." link which gives a pop-up of all the results in a column/collection when more than five are present. This allows access to the full results set without overly cluttering the display.

Finally, visible at the top are four thumbnails, serving as exemplars for our four search profiles. By clicking on the thumbnails, users can switch views fluidly.

Next Steps

While we will release the software to interested parties now, we will not do a general release until 1.0. Aside from numerous minor enhancements, the key feature on the "roadmap" to 1.0 is an organisation specification editor to serve as an "advanced search" for users and an profile editor for DL deployers. Such functionality is extremely important in providing the ease of use of deployment we hope to attain, as well as the full search flexibility the framework can logically deliver. To achieve this, we are exploring the application of the schema-driven metadata editor developed at the Virginia Tech Digital Library Research Lab [20].

We are also in the process of conducting a second round of focus groups with the current working prototype (with the American South and CITIDEL CS collections). At the same time, Virginia Tech is initiating its quantitative user studies of the prototype. We will be reporting on these activities shortly.

Research Issues and Future Work

The above findings do seem to suggest we are on the right track with regards to core design. That is, our model of quality indicators and metadata attributes being treated interchangeably; both being source-able for ranking or facetisation of presentation in the form of dimensions and bins, seems to be at least a major component of any successful solution.

However, the findings also suggest the following areas where modelling, design and implementation needs to be expanded:

Relational information. Relational information is inadequately integrated and interconnected in the current model. Links of primary records to collections, categories, ratings, and alternate versions can be established with the current framework, but only with a lot of a posteriori implementation leg-work by the DL deployer. To make this process easier, infrastructural provisions would need to be made, likely including some modelling of link graphs in Lucene's index layer, and some exposure of this information to query syntax.
Transparency. Similarly, there is no methodical way in our framework to date to have the inner workings of the ranking exposed. Of course, the DL deployer can go as far as adding interface-level prose to do this, but once again this involves 'extra' work on their part. We could begin to 'mechanise' the transparency process by provisions like 'rendering' organisation specifications based on some transformation template (much like how the HTML output of search results is presently created). Thus, any org spec the DL deployer can create, we can in theory provide an automated facility to "explain" it to users. However, the specifics of how to do this so end users would actually understand the org specs would likely require further research with scholarly patrons.
Manipulation. Our initial design has placed the emphasis for building organisation specs on the DL deployer. The end users would then be set to have one or more of these specs as the defaults for their search profile, with the ability to switch between them. However, it became clear in the course of the user studies that scholars were more willing and desirous to have lower-level manipulation of results presentation. This is why we think the 'advanced search' functionality discussed above in the context of an org spec editor is needed. The specifics of how to expose this functionality to users will be nontrivial, due to a usability vs. capability trade-off. In addition, even if such an editor proves useful, a new problem is created of transforming the output of searches based on ad hoc org specs, as the presentation-layer stylesheets are currently static and bound to particular org specs. This issue may not be solved within the duration of the present grant project.

In addition, our work to date represents only the beginning of maximising the ranking capabilities of the search system. With the basic QMSearch framework in place, there is a great deal of potential to leverage sophisticated analysis and data mining engines and automated ranking frameworks to strengthen quality information and improve how it is used.

For example, in [21], work was done which combined many sources of evidence, based on different but interconnected objects, into a unified score. Such a 'link fusion' system could potentially be used to provide a summary indicator value for objects based on an unlimited number and typology of interlinks. Work [22][23] was done showing that scoring/ranking based on mixtures of indicators could be optimised with genetic algorithms. Such a technique is very fitting for the QMSearch model. Finally, a system [24] was constructed to gather and fuse sparse quality information into usable rankings. Such a system could potentially make for a very useful QMSearch analyser.

Eventually we can envision the use of a tool such as 5SGraph [25] to configure a QMSearch system graphically using a quality metrics 5S metamodel. Once this formal model is encoded, the system could be instantiated using a tool such as DLGen [26].

Conclusion

In this paper we have introduced QMSearch, a system which re-contextualises metasearch by robustly integrating "quality metrics" for heterogeneous digital library objects. This system has been founded upon focus group user studies, which we used to develop our model and specific ideas for its implementation. We are currently in the process of testing our working prototype of QMSearch, both in additional focus groups and in quantitative user studies.

The key innovations of QMSearch are:

accommodate and provide a framework for integrating (making explicit) implicit quality indicators which are available in the library information landscape,
to provide for alternative, multifaceted scoring/ranking metrics based on any number of these indicators or explicit metadata attributes, and
to provide results based on these metrics in a hierarchical XML format representing the requested organisation, beyond simple linear ordering, and to allow this output to be templatised into a user interface as a separate step.

The upshot of these advances is to better accommodate the scholarly digital library setting, by fostering flexibility, transparency, and comprehensibility for the end user, as well as superior information integration and modularity for the digital librarian.

While a fully featured system that 'completely' addresses all of the issues in this problem space is likely not in the cards for the scope of this short, largely-investigative project, we believe we have made a great deal of progress in advancing digital library metasearch. We believe QMSearch will allow digital libraries to better fulfil user needs and expectations, providing a strong contender to more opaque and less-flexible search engine components in the near future.

References

The MetaScholar Initiative http://www.metascholar.org/
As epitomised for example by TREC and SIGIR.
For example: Seonho Kim, Uma Murthy, Kapil Ahuja, Sandi Vasile, and Edward A. Fox. Effectiveness of implicit rating data on characterizing users in complex information systems. In European Conference on Digital Libraries, 2005.
For example, MetaArchive, AmericanSouth, and MetaCombine.
Aaron Krowne, Martin Halbert, Urvashi Gadi, and Edward A. Fox. Quality metrics interim report 1. Technical report, Emory University Woodruff Library, July 2005.
Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1-7):107-117, 1998.
Hussein Suleman. Open Digital Libraries. PhD thesis, Virginia Tech, November 2002.
Aaron Krowne. A draft standard for an ockham-based oai transformation service framework (ockham-xform). Technical report, Emory University Woodruff Library, August 2005.
Carl Lagoze and Herbert Van de Sompel. The open archives initiative: Building a low-barrier interoperability framework. In JCDL, June 2001.
Thornton Staples, Ross Wayland, and Sandra Payette. The fedora project: An open-source digital object repository management system. D-Lib Magazine, April 2003.
Marcos Andre; Goncalves, Edward A. Fox, Layne T. Watson, and Neill A. Kipp. Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries. ACM Trans. Inf. Syst., 22(2):270-312, 2004.
Marcos Goncalves and Edward A. Fox. 5SL: A language for declarative specification and generation of digital libraries. In Proceedings of JCDL 2002, June 2002.
See A9.com home page http://www.a9.com/
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov., 1(1):29-53, 1997.
Aaron Krowne, Martin Halbert, Urvashi Gadi, and Edward A. Fox. Quality metrics interim report 1. Technical report, Emory University Woodruff Library, July 2005.
Whether system developers can use this is another matter, which is why we believe there is an important niche for XML databases with XPath/XQuery (or similar) searching.
For Open Archives, see http://www.openarchives.org/tools/tools.html
Visible at http://metacluster.library.emory.edu/quality_metrics/
See http://a9.com/
See http://oai.dlib.vt.edu/odl/software/mdedit/
Wensi Xi, Benyu Zhang, Yizhou Lu, Zheng Chen, Shuicheng Yan, Huajun Zeng, Wei-Ying Ma, and Edward A. Fox. Link fusion: A unified link analysis framework for multi-type interrelated data objects. In The Thirteenth World Wide Web conference, 2004.
Weiguo Fan, Michael D. Gordon, Praveen Pathak, Wensi Xi, and Edward A. Fox. Ranking function optimization for effective web search by genetic programming: An empirical study. In HICSS, 2004.
Martin Utesch. Genetic query optimization in database systems. In Postgresql 6.3 Documentation. 1997.
Seonho Kim, Uma Murthy, Kapil Ahuja, Sandi Vasile, and Edward A. Fox. Effectiveness of implicit rating data on characterizing users in complex information systems. In European Conference on Digital Libraries, 2005
Qinwei Zhu. 5SGraph: A modeling tool for digital libraries. Master's thesis, Virginia Tech, 2002.
Rohit Kelapure. Scenario-based generation of digital library services. Master's thesis, Virginia Tech, June 2003.

Author Details

Aaron Krowne
Woodruff Library
Emory University

Email: akrowne@emory.edu
Web site: http://web.library.emory.edu/

Urvashi Gadi
Woodruff Library
Emory University

Email: ugadi@emory.edu
Web site: http://web.library.emory.edu/

Return to top