Skip to main content

Test Collection-Based IR Evaluation Needs Extension toward Sessions – A Case of Extremely Short Queries

  • Conference paper
Information Retrieval Technology (AIRS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Included in the following conference series:

Abstract

There is overwhelming evidence suggesting that the real users of IR systems often prefer using extremely short queries (one or two individual words) but they try out several queries if needed. Such behavior is fundamentally different from the process modeled in the traditional test collection-based IR evaluation based on using more verbose queries and only one query per topic. In the present paper, we propose an extension to the test collection-based evaluation. We will utilize sequences of short queries based on empirically grounded but idealized session strategies. We employ TREC data and have test persons to suggest search words, while simulating sessions based on the idealized strategies for repeatability and control. The experimental results show that, surprisingly, web-like very short queries (including one-word query sequences) typically lead to good enough results even in a TREC type test collection. This finding motivates the observed real user behavior: as few very simple attempts normally lead to good enough results, there is no need to pay more effort. We conclude by discussing the consequences of our finding for IR evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 39.99
Price excludes VAT (USA)
Softcover Book
USD 54.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Jansen, M.B.J., Spink, A., Saracevic, T.: Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web. Inf. Proc. Man. 36(2), 207–227 (2000)

    Article  Google Scholar 

  2. Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 4–15. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Smith, C.L., Kantor, P.B.: User Adaptation: Good Results from Poor Systems. In: Proc. ACM SIGIR 2008, pp. 147–154 (2008)

    Google Scholar 

  4. Stenmark, D.: Identifying Clusters of User Behavior in Intranet Search Engine Log Files. JASIST 59(14), 2232–2243 (2008)

    Article  Google Scholar 

  5. Turpin, A., Hersh, W.: Why Batch and User Evaluations Do Not Give the Same Results. In: Proc. ACM SIGIR 2001, pp. 225–231 (2001)

    Google Scholar 

  6. Järvelin, K., Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM TOIS 20(4), 422–446 (2002)

    Article  Google Scholar 

  7. Swanson, D.: Information Retrieval as a Trial-and-Error Process. Library Quarterly 47(2), 128–148 (1977)

    Article  Google Scholar 

  8. Sanderson, M.: Ambiguous Queries: Test Collections Need More Sense. In: Proc. ACM SIGIR 2008, pp. 499–506 (2008)

    Google Scholar 

  9. Azzopardi, L.: Position Paper: Towards Evaluating the User Experience of Interactive Information Access Systems. In: SIGIR 2007 Web Information-Seeking and Interaction Workshop, p. 5 (2007)

    Google Scholar 

  10. Lykke, M., Price, S.L., Delcambre, L.M.L., Vedsted, P.: How doctors search: a study of family practitioners’ query behaviour and the impact on search results (in press, 2009)

    Google Scholar 

  11. Cleverdon, C.W., Mills, L., Keen, M.: Factors determining the performance of indexing systems, vol. 1 - design. Aslib Cranfield Research Project, Cranfield (1966)

    Google Scholar 

  12. Salton, G.: Evaluation Problems in Interactive Information Retrieval. Inf. Stor. Retr. 6, 29–44 (1970)

    Article  Google Scholar 

  13. Su, L.T.: Evaluation Measures for Interactive Information Retrieval. Inf. Proc. Man. 28(4), 503–516 (1992)

    Article  Google Scholar 

  14. Hersh, W.: Relevance and Retrieval Evaluation: Perspectives from Medicine. JASIS, 201–206 (April 1994)

    Google Scholar 

  15. ISO: Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs), Part 11: Guidance on Usability. ISO 9241-11:1998 (E) (1998)

    Google Scholar 

  16. Vakkari, P., Sormunen, E.: The Influence of Relevance Levels on the Effectiveness of Interactive Retrieval. JASIST 55(11), 963–969 (2004)

    Article  Google Scholar 

  17. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately Interpreting Click-through Data as Implicit Feedback. In: Proc. ACM SIGIR 2005, pp. 154–161 (2005)

    Google Scholar 

  18. Price, S.L., Nielsen, M.L., Delcambre, L.M.L., Vedsted, P.: Semantic Components Enhance Retrieval of Domain-specific Documents. In: Proc. ACM CIKM 2007, pp. 429–438 (2007)

    Google Scholar 

  19. Sormunen, E.: Liberal Relevance Criteria of TREC - Counting on Negligible Documents? In: Proc. ACM SIGIR 2002, pp. 324–330 (2002)

    Google Scholar 

  20. Voorhees, E., Harman, D.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)

    Google Scholar 

  21. Bates, M.J.: The Design of Browsing and Berrypicking Techniques for the Online Search Interface (1989), http://www.gseis.ucla.edu/faculty/bates/berrypicking.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M. (2009). Test Collection-Based IR Evaluation Needs Extension toward Sessions – A Case of Extremely Short Queries. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04769-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04768-8

  • Online ISBN: 978-3-642-04769-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics