Long-Term Learning for Web Search Engines

Charles Kemp⁴ &
Kotagiri Ramamohanarao⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2431))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

1923 Accesses
16 Citations

Abstract

This paper considers how web search engines can learn from the successful searches recorded in their user logs. Document Transformation is a feasible approach that uses these logs to improve document representations. Existing test collections do not allow an adequate investigation of Document Transformation, but we show how a rigorous evaluation of this method can be carried out using the referer logs kept by web servers. We also describe a new strategy for Document Transformation that is suitable for long-term incremental learning. Our experiments show that Document Transformation improves retrieval performance over a medium sized collection of webpages. Commercial search engines may be able to achieve similar improvements by incorporating this approach.

Download to read the full chapter text

Chapter PDF

Scale Effects in Web Search

Incremental Patterns in Text Search

A Search Engine Development Utilizing Unsupervised Learning Approach

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Vo Ngoc Anh and Alistair Moffat. Improved retrieval effectiveness through impact transformation. In Proceedings of the Thirteenth Australasian Database Conference, Melbourne, Australia, in press.
Google Scholar
Doug Beeferman and Adam Berger. Agglomerative clustering of a search engine query log. In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, pages 407–416, Boston, 2000. ACM Press.
Google Scholar
Richard K. Belew. Adaptive information retrieval: Using a connectionist representation to retrieve and learn about documents. In Proceedings of the Twelfth International Conference on Research and Development in Information Retrieval, pages 11–20, Cambridge, MA, 1989. ACM Press.
Google Scholar
Justin Boyan, Dayne Freitag, and Thorsten Joachims. A machine learning architecture for optimizing web search engines. In Proceedings of the AAAI Workshop on Internet-Based Information Systems. 1996.
Google Scholar
T. Brauen. Document vector modification. In Gerard Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 456–484. Prentice Hall, NJ, 1971.
Google Scholar
Chris Buckley. Implementation of the SMART information retrieval system. Technical Report 85-686, Department of Computer Science, Cornell University, Ithaca, NY, 1985.
Google Scholar
Chris Buckley and Ellen M. Voorhees. Evaluating evaluation measure stability. In Proceedings of the Twenty Third Annual International Conference on Research and Development in Information Retrieval, pages 33–40, Athens, Greece, 2000. ACM Press.
Google Scholar
Hsinchun Chen. Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms. Journal of the American Society of Information Science, 46(3):194–216, 1995.
Article Google Scholar
The Direct Hit popularity engine technology: A white paper, 1999. Available from http://www.directhit.com/about/products/technology_whitepaper.html.
S. Friedman, J. Maceyak, and S. Weiss. A relevance feedback system based on document transformations. In Gerard Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 447–455. Prentice Hall, NJ, 1971.
Google Scholar
Norbert Fuhr and Chris Buckley. A probabilistic learning approach for document indexing. Information Systems, 9(3):223–248, 1991.
Google Scholar
M. Gordon. Probabilistic and genetic algorithms for document retrieval. Communications of the ACM, 31(10):1208–1218, 1988.
Article Google Scholar
B. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5–17, 1998.
Article Google Scholar
K. L. Kwok. A neural network for probabilistic information retrieval. In Proceedings of the Twelfth Annual International Conference on Research and Development in Information Retrieval, pages 21–30, Cambridge, MA, 1989.
Google Scholar
David D. Lewis. Learning in intelligent information retrieval. In Lawrence A. Birnbaum and Gregg C. Collins, editors, Machine Learning: Proceedings of the Eighth International Workshop, pages 235–239, Evanston, IL, 1991. Morgan Kaufmann.
Google Scholar
M. Maron and J. Kuhns. On relevance, probabilistic indexing and information retrieval. Journal of the Association for Computing Machinery, 7(3):216–244, July 1960.
Google Scholar
Benjamin Piwowarski. Learning in information retrieval: a probabilistic differential approach. In Proceedings of the Twenty Second Annual Colloquium on Information Retrieval Research, Cambridge, England, April 2000.
Google Scholar
J. Rocchio, Jr. Relevance feedback in information retrieval. In Gerard Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice Hall, 1971.
Google Scholar
Gerard Salton. Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading, MA, 1989.
Google Scholar
J. Savoy and D. Vrajitoru. Evaluation of learning schemes used in information retrieval. Technical Report CR-I-95-02, Faculty of Sciences, University of Neuchâtel, 1996.
Google Scholar
Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael Moricz. Analysis of a very large AltaVista query log. Technical Report 1998-014, Systems Research Center, Digital Equipment Corporation, Palo Alto, California, October 1998.
Google Scholar
Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. In H-P Frei, D. Harman, and P. Schäuble, editors, Proceedings of the Nineteenth International Conference on Research and Development in Information Retrieval, pages 21–29, New York, 1996. ACM Press.
Google Scholar
Karen Sparck Jones. Automatic indexing. Journal of Documentation, 30:393–432, 1974.
Article Google Scholar
Justin Zobel and Alistair Moffat. Exploring the similarity space. SIGIR Forum, 32(1):18–34, 1998.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Software Engineering, The University of Melbourne, Australia
Charles Kemp & Kotagiri Ramamohanarao

Authors

Charles Kemp
View author publications
You can also search for this author in PubMed Google Scholar
Kotagiri Ramamohanarao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Helsinki, P.O. Box 26, 00014, Helsinki, Finland
Tapio Elomaa , Heikki Mannila & Hannu Toivonen , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kemp, C., Ramamohanarao, K. (2002). Long-Term Learning for Web Search Engines. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_22

Download citation

DOI: https://doi.org/10.1007/3-540-45681-3_22
Published: 18 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Long-Term Learning for Web Search Engines

Abstract

Chapter PDF

Similar content being viewed by others

Scale Effects in Web Search

Incremental Patterns in Text Search

A Search Engine Development Utilizing Unsupervised Learning Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Long-Term Learning for Web Search Engines

Abstract

Chapter PDF

Similar content being viewed by others

Scale Effects in Web Search

Incremental Patterns in Text Search

A Search Engine Development Utilizing Unsupervised Learning Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation