Latent Semantic Indexing - What Is It?
Written by bmlengel on January 26th, 2010
Latent semantic indexing (LSI) is an information retrieval strategy that applies a certain mathematical technique to determine the concept or idea that is found in a body of text. This is an information retrieval method that utilizes the natural language processing method of latent semantic analysis (LSA). LSA looks at the various relationships between a number of documents and the body of text found in them and establishes a group of concepts for these documents. With LSI, the documents that are presented in response to a particular query do not necessarily have the exact words or phrases that the searcher has keyed in.
LSI offers a remedy to two of the most annoying deficiencies of the usual Boolean search technique. One is that several words can have similar meanings and another is that a particular word can have several meanings. These two possibilities are the common reasons for the irritating appearance of documents for a particular query even if they are not relevant and the absence of documents that should have been included.
Another application for LSI is the automation of the categorization of a document. It utilizes sample documents to determine the conceptual foundations of every category. It then compares the concepts found in the documents to those that are present in the example documents and assigns a category for a document when there are similarities in its concepts with those of the example documents for that category.
Another benefit offered by LSI is that it can be used for any language because it is purely dependent on mathematical formulas. Thus, it can extract the semantic content from the documents written in any language without the need to consult any thesaurus or dictionary. The search can also be made in a particular language while the documents to be queried can be in another language.
LSI is also applicable for terms that are not exactly words, such as the DNA sequences of genes. Thus, biological and medical documents can easily be searched and categorized using LSI. For example, LSI is capable of classifying genes based on the biological information that could be extracted from the abstracts and titles of biological databases.
LSI can also easily adapt itself to any modifications in the terminology and it can still function in spite of the presence of misspelled words, unreadable characters, typographical errors, and other types of noise in documents. Therefore, LSI is applicable for a body of text that is the result of speech-to-text conversion programs and those that have been extracted from images by optical character recognition software. Check out http://ArticlesOnTap.com for more on this
Tags: features of latent semantic indexing, latent semantic indexing, what is latent semantic indexing






O comments at "Latent Semantic Indexing - What Is It?"
Comment Now!