Chapter 13

Information Search

Introduction 13.1

  • Information search should be a joyous experience
    • The new generation of digital libraries, databases, and search methods has enabled a much wider range of users to explore the growing number of information spaces.
  • Information retrieval and database management are being pushed aside by the new generation of terms:
    • information gathering, seeking, filtering, collaborative filtering, sensemaking, and visual analytics.
    • data mining from data warehouses and data marts
    • knowledge networks or semantic webs
  • Common goals of information search
    • Finding a small set of items that satisfy an information need within a large set of data
    • Making sense of information and discovering patterns within a collection of data
  • Problems of older search interfaces
    • They were very hard to learn and use because of:
      • Complex commands
      • Boolean operators
      • Senseless concepts
    • It was also hard to:
      • Repeat searches across multiple interfaces
      • Narrow broad searches
      • Integrate other tools
  • Newer Interfaces
    • Information search is becoming much easier as a new generation of strategies for query formulation and information presentation emerges.

Search terminology

  • Task objects (such as movies for rent) are stored in structured relational databases, textual document libraries, or multimedia document libraries
  • A structured relational database consists of relations and a schema to describe the relations
  • Relations have items (usually called tuples or records), and each item has multiple attributes (often called fields), which each have attribute values
  • A textual document library consists of a set of collections (typically up to a few hundred collections per library) with some descriptive attributes (metadata) about the library (name, location, owner)
  • Each collection also has a name and other metadata (media type, creator, dates) that describe it. A collection contains a set of items (typically 10 to 100,000 per collection).
  • Although the items in a collection may vary greatly, there is generally a superset of attributes that cover them all.
  • A multimedia document library consist of collections of documents that contain sound, images, video, animations, datasets, etc.
  • Directories hold metadata about the items within a library and help direct users to the appropriate locations
  • The World Wide Web is considered an unstructured collection because it contains fewer attributes.
    • Tools that facilitate dynamically created metadata (tagging and annotation) are emerging
  • Task actions are decomposed into browsing or searching
    • Here are some examples of task actions:
      • Specific fact finding (known-item search)
        • Find the e-mail address of the President of the United States.
      • Extended fact finding
        • What other books are by the author of “Jurassic Park”?
      • Exploration of availability
        • Is there new work on voice recognition in the ACM digital library?
      • Open-ended browsing and problem analysis
        • Is there promising new research on fibromyalgia that might help my patient?
  • After determining their information needs, users must deciding where to search
  • Finding aids help users find the information they are looking for
    • Table of contents
    • Indexes
    • Introductions
  • Preview and overview surrogates
    • Can give the user an idea of the size, scope, or structure and can help determine the relevance of collections

Searching in Textual Documents and Database Querying 13.2

  • Expert users can use SQL:
    • SELECT DOCUMENT#
    • FROM JOURNAL-DB
    • WHERE (DATE >= 2004 AND DATE <= 2008)
    • AND (LANGUAGE = ENGLISH OR FRENCH)
    • AND (PUBLISHER = ASIST OR HFES OR ACM)
  • SQL has powerful features, but it requires training
  • While SQL is a standard, form fill-in queries have simplified query formulation
  • Other methods include:
    • Natural-language queries which are meant to be appealing to users, but the computer’s capacity for processing such queries is often limited to eliminating frequent terms or commands and searching for the remaining words.
    • Form fill-in queries that have substantially simplified query formulation while still allowing some Boolean combinations to be made available.
    • Query by example is another method in which users enter attribute values and some keywords in relational table templates.
    • Evidence shows that users perform better and have higher satisfaction when they can view and control the search
  • Five-phase framework to clarify user interfaces for textual search
    • Formulation: expressing the search
      • Provide access to the appropriate sources in libraries and collections.
      • Use fields for limiting the source: structured fields such as year, media, or language; and text fields such as titles or abstracts of documents.
      • Recognize phrases to allow entry of names and concepts.
      • Permit variants to allow relaxation of search constraints; such as case sensitivity, stemming, partial matches, phonetic variations, abbreviations, or synonyms from a thesaurus.
      • Control the size of the result set.
    • Initiation of action: launching the search
      • Include explicit actions initiated by buttons with consistent labels, locations, sizes, and colors.
      • Include implicit actions initiated by changes to a parameter that immediately produce new sets of search results.
    • Review of results: reading messages and outcomes
      • Present explanatory messages.
      • View an overview of the results and previews of items.
      • Manipulate visualizations.
      • Adjust the size of the result set and which fields are displayed.
      • Change the sequencing (alphabetical, chronological, relevance ranked, etc).
      • Explore clustering ( by attribute value, topics, etc).
      • Examine selected items.
    • Refinement: formulating the next step
      • Use meaningful messages to guide users in progressive refinement.
      • Make changing of search parameters convenient.
      • Explore relevance feedback.
    • Use: compiling or disseminating insight
      • Allow queries, parameter settings, and results to be saved and annotated, sent by e-mail, or used as input to other programs.

Multimedia Document Searches 13.3

Image search
Map search
Design or diagram search
Sound search
Video search
Animation search

  • Image Search:
    • Finding photos with images such as the Statue of Liberty is a challenge
      • Query-by-Image-Content (QBIC) is difficult
      • Search by profile (shape of lady), distinctive features (torch), colors (green copper)
    • Use simple drawing tools to build templates or profiles to search with
    • More success is attainable by searching restricted collections
    • Search a vase collection
    • Find a vase with a long neck by drawing a profile of it
    • Critical searches such as fingerprint matching requires a minimum of 20 distinct features
    • For small collections of personal photos effective browsing and lightweight annotation are important
  • Map Search
    • On-line maps are plentiful
    • Search by latitude/longitude is the structured-database solution
    • Today's maps are allow utilizing structured aspects and multiple layers
      • City, state, and site searches
      • Flight information searches
      • Weather information searches
      • Mapquest, Google Maps, etc.
    • Mobile devices can allow “here” as a point of reference
  • Design/Diagram Searches
    • Some computer-assisted design packages support search of designs
    • Allows searches of diagrams, blueprints, newspapers, etc., e.g. search for a red circle in a blue square or a piston in an engine
    • Document-structure recognition for searching newspapers
  • Sound Search
    • MIR supports audio input
    • Search for phone conversations may be possible in future on speaker independent basis
  • Video Search
    • Provide an overview
    • Segmentation into scenes and frames
    • Support multiple search methods
    • Infomedia project
  • Animation Search
    • Prevalence increased with the popularity of Flash
    • Possible to search for specific animations like a spinning globe
    • Search for moving text on a black background

Advanced Filtering and Search Interfaces 13.4

  • Filtering with complex Boolean queries
    • Widespread use of Boolean queries has been restricted by their difficulty to use.
    • Boolean uses of AND and OR are different from their use in natural english which can lead to confusion. For example:
      • In natural english, "List employees living in Lubbock AND Amarillo", will be responded to by a human with a list of all the employees living in Lubbock and all the employees living in Amarillo.
      • However, Boolean logic will only return a list of employees that live in both cities, and only both cities, meaning that an empty list would be returned.
    • Many metaphores and visual aids have been developed to help aleviate the difficulty of using Boolean queries, but their usefulness is quickly diminished as queries increase in complexity.
  • Automatic Filtering
    • Users create a collection of keyphrases which are used to automatically filter dynamically generated information (e-mails, newspaper stories, journal articles)
    • Each time a new document is generated and it matches the user's search criteria then a notification is sent to the user, or the results are saved for later review.
  • Dynamic queries
    • A search approach which uses sliders and buttons to allow users to specify ranges and categories.
    • Could also be called direct-manipulation queries; The actions of users manipulating buttons, sliders, and fields results in the ability to rapidly reverse actions, make incremental changes, and can provide immediate feedback.
    • Because immediate feedback is expected by most users of Dynamic queries, data must usually be readily available on the user's local system.
    • Problems with storing information on the user's local system can be avoided using query previews which give users an interactive overview of the possible data fields available to them.
    • Potential issues related to dynamic queries include the possiblity of wasted time on no-hit queries and mega-return queries.
  • Faceted metadata search
    • Search technique that uses catagorical menus, images, and keywords. Example: NewEgg.com
  • Query by example
    • Search that lets users submit data as an example that is used to search for similar data.
  • Implicit search
    • Data is collected base on other searches and context to present other possiblities to users. Example: Product suggestions on Amazon.com
  • Collaborative filtering
    • Groups of users combine their personal searchs to help each other find items of interest. Example: Digg
  • Multilingual searches
    • Allows users to search for information in multiple languages. Practical applications might include more cost effective searches for documents worth professionaly translating for their content.
  • Visual field specification
    • Information is presented in more visually friendly ways. Example: Seeing a map of the US to pick the city you want to fly to instead of using text based selection.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License