tree logo

E. Design Information Retrieval Systems

Introducation to Core Competency E. Design, query and evaluate information retrieval systems.

Information retrieval systems are at the core of library and information science. However librarians compete with information technology (IT) specialists from the field of computer science for jurisdiction over this domain and inclusion of the work associated with information retrieval systems. Andrew Abbot writes about this conflict in a 1998 seminal article, “Professionalism and the Future of Librarianship.” A brief, two-page summary of Abbot's article explains this conflict. The future of librarianship, Abbott argues, is contingent on the link between librarians and their work. Information technology has changed the traditional work of librarians, creating new work and eliminating old work.

Information retrieval systems, also known as content management systems and knowledge management systems, are one information technology that is shared by library science and IT professionals. This causes conflicts in terminology and domain knowledge. In “After the Dot-Bomb: Getting Web Information Retrieval Right This Time,” Marcia Bates (2002) discusses the predilection of IT professionals to stick a hot new term onto a standard old practice. One example is “ontology,” a term used in philosophy to describe the nature of being, that IT people have applied to describe and control the contents of information retrieval systems. Ontologies are nothing new. They have been created by librarians for years and are called classification systems, lists of indexing terms, thesauri, and controlled vocabularies, such as the Library of Congress Subject Headings and the Getty Art & Architecture Thesaurus (AAT).

The Web is a giant information retrieval system. Databases are information retrieval systems; a library catalog is an information retrieval system. What they all have in common is that they store and retrieve information objects whether Web sites, customer records, product requirements, or bibliographic information. The differences are in the ways the information objects are stored and retrieved. Individual sites on the Web are stored at random and retrieved by search engines that search the full text of documents. Employee records, product requirements, and bibliographic information are stored or indexed according to categories designed into the retrieval system and are retrieved by accessing these categories or “access points.” To retrieve an employee record, possible access points would be the employee name category, the employee number, the division where she works, any of the categories or fields under which information is stored. Bibliographic information is stored and retrieved the same way; a book or journal article is retrieved through the access points of title, author, subjects, etc.

Like in any file cabinet, information items in a database can be stored randomly or under an organizing system. The difference appears when retrieving the information. Searching through indexed fields instead of the full text is a more efficient, more accurate, and more reliable way of aggregating—pulling together like information, or segregating—separating out unlike information.

Concepts in information retrieval are complex and gaining mastery in designing, querying and evaluation them is challenging. After completing two classes devoted solely to information retrieval, I am not an expert in taxonomies, thesauri, or ontologies, but I can demonstrate competency and a desire to continue learning. The Internet and the Web dominates this area today and expansion in information retrieval and storage systems will continue to get more and more sophisticated. Take, for example, DOORS by Tau, defined by a user as a requirements analysis and documenting flow-down tool used for information storage and retrieval but for the purpose of product development!

I had the experience of creating two databases in three-man teams in LIBR 202 Information Retrieval and later in other teams and solo in LIBR 247 Vocabulary Design. In Travel Memorabilia Collection: Creating the Database, we created a database to organize, preserve and make accessible a collection of travel memorabilia of entrance⁄admission tickets to various popular and cultural places visited by tourists and travelers all over the world. The searchable fields or access points are the physical characteristics of the objects and information provided on the object, such as place name, place type, country, city, language, currency value, content, visual image, logo, date, validation, physical condition. The document includes a User Guide, the Data Structure, the Validation Lists, Rules for Indexing, Data Records, and Evaluation and Revisions of the Structure, Lists, and Rules. As part of this project I participated in evaluating another team's database, in addition to another team evaluating our database.

In Subject Headings for a Database: Developing the Controlled Vocabulary, we built a database for a collection of 15 documents relating to information storage and retrieval. This database consists of fields of descriptive information and subject fields. Descriptive fields identify a specific article. Subject fields locate articles about a particular subject. To populate the subject fields we developed controlled vocabularies. The controlled vocabularies are designed to enable a user to search by subject using a list of authorized terms (subject headings) to predictably find documents that are about a particular subject. This database uses two kinds of subject headings to represent the concepts of the articles it collects. One is a precoordinate vocabulary and the other is a postcoordinate vocabulary. The document includes a User Guide, the Validation List for the two controlled vocabularies, the Data Structure, Date Records, and Evaluation including testing procedures and test searches.

I prepared an information retrieval system for swimming pools in Silicon Valley on my own for LIBR 257 Vocabulary Design final project. This was an enormous project that involved completing a user needs analysis, researching existing retrieval systems for swimming pools, researching useful metadata schemes and controlled vocabularies related to sports facilities, creating a metadata table and a style sheet, populating the records, writing an annotated bibliography, and analyzing the results.

I can demonstrate certain mastery in querying information retrieval systems based on 244 Online Searching coursework. The Midterm Quiz tests my understanding of Dialog commands and briefly discusses the concepts of precision and recall—how to broaden a search and narrow a search. The Final Quiz requires that I use Dialog, Lexis-Nexis and the Internet to answer reference questions. I answered hundreds of these questions in my Online Searching class at Foothill College with John Hogle who taught the same class at SLIS. Unfortunately my answers to the Online Searching Final Exam are only in print and not digital format, but the test shows evidence of the searches I completed using Dialog Telnet.

Abbott, A. (1998). Professionalism and the future of librarianship. Library Trends, 45, 430-444. Retrieved December 18, 2003, from InfoTrac Web Expanded Academic ASAP.

Bates, M. J. (2002). After the dot-bomb: Getting Web information retrieval right this time. First Monday, 7(7). Retrieved April 2, 2007, from http://firstmonday.org/issues/issue7_7/bates/index.html