Test Collection-Based IR Evaluation Needs Extension toward Sessions – A Case of Extremely Short Queries

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Included in the following conference series:

Asia Information Retrieval Symposium

917 Accesses
32 Citations

Abstract

There is overwhelming evidence suggesting that the real users of IR systems often prefer using extremely short queries (one or two individual words) but they try out several queries if needed. Such behavior is fundamentally different from the process modeled in the traditional test collection-based IR evaluation based on using more verbose queries and only one query per topic. In the present paper, we propose an extension to the test collection-based evaluation. We will utilize sequences of short queries based on empirically grounded but idealized session strategies. We employ TREC data and have test persons to suggest search words, while simulating sessions based on the idealized strategies for repeatability and control. The experimental results show that, surprisingly, web-like very short queries (including one-word query sequences) typically lead to good enough results even in a TREC type test collection. This finding motivates the observed real user behavior: as few very simple attempts normally lead to good enough results, there is no need to pay more effort. We conclude by discussing the consequences of our finding for IR evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Markovian Approach to Evaluate Session-Based IR Systems

WHOSE – A Tool for Whole-Session Analysis in IIR

Are Test Collections “Real”? Mirroring Real-World Complexity in IR Test Collections

References

Jansen, M.B.J., Spink, A., Saracevic, T.: Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web. Inf. Proc. Man. 36(2), 207–227 (2000)
Article Google Scholar
Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 4–15. Springer, Heidelberg (2008)
Chapter Google Scholar
Smith, C.L., Kantor, P.B.: User Adaptation: Good Results from Poor Systems. In: Proc. ACM SIGIR 2008, pp. 147–154 (2008)
Google Scholar
Stenmark, D.: Identifying Clusters of User Behavior in Intranet Search Engine Log Files. JASIST 59(14), 2232–2243 (2008)
Article Google Scholar
Turpin, A., Hersh, W.: Why Batch and User Evaluations Do Not Give the Same Results. In: Proc. ACM SIGIR 2001, pp. 225–231 (2001)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM TOIS 20(4), 422–446 (2002)
Article Google Scholar
Swanson, D.: Information Retrieval as a Trial-and-Error Process. Library Quarterly 47(2), 128–148 (1977)
Article Google Scholar
Sanderson, M.: Ambiguous Queries: Test Collections Need More Sense. In: Proc. ACM SIGIR 2008, pp. 499–506 (2008)
Google Scholar
Azzopardi, L.: Position Paper: Towards Evaluating the User Experience of Interactive Information Access Systems. In: SIGIR 2007 Web Information-Seeking and Interaction Workshop, p. 5 (2007)
Google Scholar
Lykke, M., Price, S.L., Delcambre, L.M.L., Vedsted, P.: How doctors search: a study of family practitioners’ query behaviour and the impact on search results (in press, 2009)
Google Scholar
Cleverdon, C.W., Mills, L., Keen, M.: Factors determining the performance of indexing systems, vol. 1 - design. Aslib Cranfield Research Project, Cranfield (1966)
Google Scholar
Salton, G.: Evaluation Problems in Interactive Information Retrieval. Inf. Stor. Retr. 6, 29–44 (1970)
Article Google Scholar
Su, L.T.: Evaluation Measures for Interactive Information Retrieval. Inf. Proc. Man. 28(4), 503–516 (1992)
Article Google Scholar
Hersh, W.: Relevance and Retrieval Evaluation: Perspectives from Medicine. JASIS, 201–206 (April 1994)
Google Scholar
ISO: Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs), Part 11: Guidance on Usability. ISO 9241-11:1998 (E) (1998)
Google Scholar
Vakkari, P., Sormunen, E.: The Influence of Relevance Levels on the Effectiveness of Interactive Retrieval. JASIST 55(11), 963–969 (2004)
Article Google Scholar
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately Interpreting Click-through Data as Implicit Feedback. In: Proc. ACM SIGIR 2005, pp. 154–161 (2005)
Google Scholar
Price, S.L., Nielsen, M.L., Delcambre, L.M.L., Vedsted, P.: Semantic Components Enhance Retrieval of Domain-specific Documents. In: Proc. ACM CIKM 2007, pp. 429–438 (2007)
Google Scholar
Sormunen, E.: Liberal Relevance Criteria of TREC - Counting on Negligible Documents? In: Proc. ACM SIGIR 2002, pp. 324–330 (2002)
Google Scholar
Voorhees, E., Harman, D.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)
Google Scholar
Bates, M.J.: The Design of Browsing and Berrypicking Techniques for the Online Search Interface (1989), http://www.gseis.ucla.edu/faculty/bates/berrypicking.html

Download references

Author information

Authors and Affiliations

University of Tampere, Finland
Heikki Keskustalo, Kalervo Järvelin, Ari Pirkola & Tarun Sharma
Royal School of Library and Information Science, Denmark
Marianne Lykke

Authors

Heikki Keskustalo
View author publications
You can also search for this author in PubMed Google Scholar
Kalervo Järvelin
View author publications
You can also search for this author in PubMed Google Scholar
Ari Pirkola
View author publications
You can also search for this author in PubMed Google Scholar
Tarun Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Marianne Lykke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
School of Computing, The Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Dawei Song
Microsoft Reseach Asia, 5F Beijing Sigma Center, 49 Zhichun Road, Haidian District, 100190, Beijing, P.R. China
Chin-Yew Lin
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430, Tokyo, Japan
Akiko Aizawa
School of Literature, Shirayuri College, 1-25 Midorigaoka, Chofu-shi, 182-8525, Tokyo, Japan
Kazuko Kuriyama
Graduate School of Information Science and Technology, Hokkaido University, North 14 West 9, Kita-ku. Sapporo-shi, 060-0814, Hokkaido, Japan
Masaharu Yoshioka
Microsoft Research Asia, 5F Beijing Sigma Center, 49 Zhichun Road, Haidian District, 100190, Beijing, P.R. China
Tetsuya Sakai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M. (2009). Test Collection-Based IR Evaluation Needs Extension toward Sessions – A Case of Extremely Short Queries. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-04769-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04768-8
Online ISBN: 978-3-642-04769-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Test Collection-Based IR Evaluation Needs Extension toward Sessions – A Case of Extremely Short Queries

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Markovian Approach to Evaluate Session-Based IR Systems

WHOSE – A Tool for Whole-Session Analysis in IIR

Are Test Collections “Real”? Mirroring Real-World Complexity in IR Test Collections

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Test Collection-Based IR Evaluation Needs Extension toward Sessions – A Case of Extremely Short Queries

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Markovian Approach to Evaluate Session-Based IR Systems

WHOSE – A Tool for Whole-Session Analysis in IIR

Are Test Collections “Real”? Mirroring Real-World Complexity in IR Test Collections

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation