×

Automatic indexing using a 3-Poisson model. (English) Zbl 0771.68046

Summary: The question of how to index documents in information retrieval systems is a difficult decision problem. This paper presents a set of formal statistical rules for the selection of index terms for a document collection. A combination of three Poisson distributions is examined in detail as a model of index term distribution. A measure of the relative level of treatment of each term in the collection is discussed, and a study of the overall effectiveness of a term as a potential index term is presented. The 3-Poisson model was found to be successful in selecting the index terms for the Cranfield document collection.

MSC:

68P20 Information storage and retrieval of data