A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations. In Information Science, Library Science, and Information Technology, specialized thesauri are designed for information retrieval. They are a type of controlled vocabulary, for indexing or tagging purposes. Such a thesaurus can be used as the basis of an index for online material. The Art and Architecture Thesaurus, for example, is used to index the Canadian Information retrieval thesauri are formally organized so that existing relationships between concepts are made explicit. As a result, they are more complex than simpler controlled vocabularies such as authority lists and synonym rings. Each term is placed in context, allowing a user to distinguish between "bureau" the office and "bureau" the furniture. Following international standards, they are generally arranged hierarchically by themes, topics or facets. Unlike a literary thesaurus, these specialized thesauri typically focus on one discipline, subject or field of study. In information technology, a thesaurus represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence, a thesaurus may sometimes be referred to as an ontology. (Excerpt from <a href="http://en.wikipedia.org/wiki/Thesaurus">Wikipedia article: Thesaurus</a>)


Theora is a free lossy video compression format. It is developed by the Xiph.Org Foundation and distributed without licensing fees alongside their other free and open media projects, including the Vorbis audio format and the Ogg container. (Excerpt from <a href="http://en.wikipedia.org/wiki/Theora">Wikipedia article: Theora</a>)


The Getty Thesaurus of Geographic Names (abbreviated TGN or GTGN) is a product of the J. Paul Getty Trust included in the Getty Vocabulary Program. The TGN includes names and associated information about places. Places in TGN include administrative political entities (e.g., cities, nations) and physical features (e.g., mountains, rivers). Current and historical places are included. Other information related to history, population, culture, art and architecture is included. The resource is available to museums, art libraries, archives, visual resource collection catalogers, bibliographic projects through private license or available to members of the general public for free on the Getty Vocabulary website (see external links). (Excerpt from <a href="http://en.wikipedia.org/wiki/Getty_Thesaurus_of_Geographic_Names">Wikipedia article: Thesaurus of Geographic Names</a>)

text mining

Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities). (Excerpt from <a href="http://en.wikipedia.org/wiki/Text_mining">Wikipedia article: Text mining</a>)


Tesseract is a free software optical character recognition engine for various operating systems. Originally developed as proprietary software at Hewlett-Packard between 1985 and 1995, it had very little work done on it in the following decade. It was then released as open source in 2005 by Hewlett Packard and UNLV. Tesseract development has been sponsored by Google since 2006. It is released under the Apache License, Version 2.0. Tesseract is considered one of the most accurate free software OCR engines currently available. (Excerpt from <a href="http://en.wikipedia.org/wiki/Tesseract_(software)">Wikipedia article: Tesseract</a>)

terminology service

A terminology service is a structured network service that offers terminolgy-related services, for example mapping a term from one controlled vocabulary to another or expanding terms within a thesaurus (Excerpt from <a href="http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/glossary/">JISC Information Environment Glossary</a>)


Telnet is a network protocol used on the Internet or local area networks to provide a bidirectional interactive text-oriented communications facility using a virtual terminal connection. User data is interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over the Transmission Control Protocol (TCP). Telnet was developed in 1969 beginning with RFC 15, extended in RFC 854, and standardized as Internet Engineering Task Force (IETF) Internet Standard STD 8, one of the first Internet standards. Historically, Telnet provided access to a command-line interface (usually, of an operating system) on a remote host. Most network equipment and operating systems with a TCP/IP stack support a Telnet service for remote configuration (including systems based on Windows NT). Because of security issues with Telnet, its use for this purpose has waned in favor of SSH. (Excerpt from <a href="http://en.wikipedia.org/wiki/Telnet">Wikipedia article: Telnet</a>)


The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities. The community runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset. The Guidelines define some 500 different textual components and concepts (word, sentence, character, glyph, person, etc), which can be expressed using a markup language and defined by a DTD or XML schema. Early versions of the Guidelines used SGML as a means of expression; more recently XML has been adopted. (Excerpt from <a href="http://en.wikipedia.org/wiki/Text_Encoding_Initiative">Wikipedia article: TEI DTD</a>)

techwatch report

TechWatch's main output is its peer reviewed, horizon scanning reports. Originally, these reports focused exclusively on technologies and standards, but as the impact of new technologies has become much more interwoven with legal and social issues, the reports have changed slightly to accommodate this. So, whilst the focus of the reports is still primarily on technology and standards, it is inevitable that discussion of a particular technology may also need to encompass an awareness of the social impact of that technology. (Excerpt from <a href="http://www.jisc.ac.uk/whatwedo/services/techwatch/reports">this source</a>)


Technorati is an Internet search engine for searching blogs. By June 2008, Technorati was indexing 112.8 million blogs and over 250 million pieces of tagged social media. The name Technorati is a blend of the words technology and literati, which invokes the notion of technological intelligence or intellectualism. Technorati uses and contributes to open source software. Technorati has an active software developer community, many of them from open-source culture. Sifry is a major open-source advocate, and was a founder of LinuxCare and later of Wi-Fi access point software developer Sputnik. Technorati includes a public developers' wiki, where developers and contributors collaborate, also various open APIs. (Excerpt from <a href="http://en.wikipedia.org/wiki/Technorati">Wikipedia article: Technorati</a>)


Taxonomy (from Ancient Greek: taxis "arrangement" and Ancient Greek: nomia "method") is the practice and science of classification or the result of it. Taxonomy uses taxonomic units, known as taxa (singular taxon). A resulting taxonomy, a taxonomy, or taxonomic scheme, is a particular classification ("the taxonomy of ..."), arranged in a hierarchical structure or classification scheme. Typically this is organized by supertype-subtype relationships, also called generalization-specialization relationships, or less formally, parent-child relationships, typically indicated by the phrase 'is a kind of' or 'is a subtype of'. In such an inheritance relationship, the subtype by definition has the same properties, behaviours, and constraints as the supertype plus one or more additional properties, behaviours, or constraints. For example: a bicycle is a kind of vehicle, so any bicycle is also a vehicle, but not every vehicle is a bicycle. Therefore a subtype needs to satisfy more constraints than its supertype. Thus to be a bicycle is more constraint than to be a vehicle. If other kinds of relationships between concepts are also included, a taxonomy is extended into an ontology. Thus various ontologies also include a taxonomy. This holds especially for the upper level ontologies (arrangements of generic concepts). (Excerpt from <a href="http://en.wikipedia.org/wiki/Taxonomy">Wikipedia article: Taxonomy</a>)


In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information (such as an Internet bookmark, digital image, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are generally chosen informally and personally by the item's creator or by its viewer, depending on the system. Tagging was popularized by websites associated with Web 2.0 and is an important feature of many Web 2.0 services. It is now also part of some desktop software. (Excerpt from <a href="http://en.wikipedia.org/wiki/Tag_(metadata)">Wikipedia article: Tagging</a>)

tag cloud

A tag cloud (keyword cloud, or weighted list in visual design) is a visual depiction of user-generated tags, or simply the word content of a site, typically used to describe the content of web sites. Tags are usually single words and are normally listed alphabetically, and the importance of each tag is shown with font size or color. Thus, it is possible to find a tag alphabetically and by popularity. The tags are usually hyperlinks that lead to a collection of items that are associated with a tag. Sometimes, further visual properties are manipulated, such as the font color, intensity, or weight. (Excerpt from <a href="http://en.wikipedia.org/wiki/Tag_cloud">Wikipedia article: Tag cloud</a>)

tablet computer

A tablet computer, or simply tablet, is a one-piece mobile computer. Devices typically have a touchscreen, with finger or stylus gestures replacing the conventional computer mouse. It is often supplemented by physical buttons or input from sensors such as accelerometers. An on-screen, hideable virtual keyboard is usually used for typing. Tablets differentiate themselves by being larger than smart phones or personal digital assistants. They are usually 7 inches (18 cm) or larger, measured diagonally. Though generally self-contained, a tablet computer may be connected to a physical keyboard or other input device. A number of hybrids that have detachable keyboards have been sold since the mid-1990s. Convertible touchscreen notebook computers have an integrated keyboard that can be hidden by a swivel or slide joint. Booklet tablets have dual-touchscreens and can be used as a notebook by displaying a virtual keyboard on one of the displays. (Excerpt from <a href="http://en.wikipedia.org/wiki/Tablet_computer">Wikipedia article: Tablet computer</a>)


Web syndication is a form of syndication in which website material is made available to multiple other sites. Most commonly, web syndication refers to making web feeds available from a site in order to provide other people with a summary or update of the website's recently added content (for example, the latest news or forum posts). The term can also be used to describe other kinds of licensing website content so that other websites can use it. (Excerpt from <a href="http://en.wikipedia.org/wiki/Web_syndication">Wikipedia article: Web syndication</a>)

sword protocol

SWORD (Simple Web-service Offering Repository Deposit) is an interoperability standard that allows digital repositories to accept the deposit of content from multiple sources in different formats (such as XML documents) via a standardized protocol. In the same way that the HTTP protocol allows any web browser to talk to any web server, so SWORD allows clients to talk to repository servers. SWORD is a profile (specialism) of the Atom Publishing Protocol), but restricts itself solely to the scope of depositing resources into scholarly systems. (Excerpt from <a href="http://en.wikipedia.org/wiki/SWORD_(protocol)">Wikipedia article: Sword protocol</a>)


SWF is a file format for multimedia, vector graphics and ActionScript in the Adobe Flash environment. Originating with FutureWave Software, then transferred to Macromedia, and then coming under the control of Adobe, SWF files can contain animations or applets of varying degrees of interactivity and function. Currently, SWF functions as the dominant format for displaying "animated" vector graphics on the Web. It may also be used for programs, commonly browser games, using ActionScript. (Excerpt from <a href="http://en.wikipedia.org/wiki/SWF">Wikipedia article: SWF</a>)


Scalable Vector Graphics (SVG) is a family of specifications of an XML-based file format for describing two-dimensional vector graphics, both static and dynamic (i.e. interactive or animated). The SVG specification is an open standard that has been under development by the World Wide Web Consortium (W3C) since 1999. SVG images and their behaviors are defined in XML text files. This means that they can be searched, indexed, scripted and, if required, compressed. Since they are XML files, SVG images can be created and edited with any text editor, but drawing programs are also available that support SVG file formats. (Excerpt from <a href="http://en.wikipedia.org/wiki/Scalable_Vector_Graphics">Wikipedia article: SVG</a>)


The Standardized Usage Statistics Harvesting Initiative (SUSHI) protocol standard (ANSI/NISO Z39.93-2007) defines an automated request and response model for the harvesting of electronic resource usage data utilizing a Web services framework. Built on SOAP, a versioned Web Services Description Language (WSDL), and XML schema with the syntax of the SUSHI protocol, this standard is intended to replace the time-consuming user-mediated collection of usage data reports. SUSHI was designed to be both generalised and extensible, so that it could be used to retrieve a variety of usage reports. An extension designed specifically to work with COUNTER reports is provided with the standard, as these are expected to be the most frequently retrieved usage reports. (Excerpt from <a href="http://www.niso.org/workrooms/sushi">this source</a>)

subject heading

An index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records. They are an integral part of bibliographic control, which is the function by which libraries collect, organize and disseminate documents. They are used as keywords to retrieve documents in an information system, for instance, a catalog or a search engine. A popular form of keywords on the web are tags which are directly visible and can be assigned by non-experts also. Index terms can consist of a word, phrase, or alphanumerical term. They are created by analyzing the document either manually with subject indexing or automatically with automatic indexing or more sophisticated methods of keyword extraction. Index terms can either come from a controlled vocabulary or be freely assigned. (Excerpt from <a href="http://en.wikipedia.org/wiki/Subject_heading">Wikipedia article: subject heading</a>)


