Abstract
When a user of a microblogging site authors a microblog post or browses through a microblog post, it provides cues as to what topic she is interested in at that point in time. Example-based search that retrieves similar tweets given one exemplary tweet, such as the one just authored, can help provide the user with relevant content. We investigate various components of microblog posts, such as the associated timestamp, author’s social network, and the content of the post, and develop approaches that harness such factors in finding relevant tweets given a query tweet. An empirical analysis of such techniques on real world twitter-data is then presented to quantify the utility of the various factors in assessing tweet relevance. We observe that content-wise similar tweets that also contain extra information not already present in the query, are perceived as useful. We then develop a composite technique that combines the various approaches by scoring tweets using a dynamic query-specific linear combination of separate techniques. An empirical evaluation establishes the effectiveness of the composite technique, and that it outperforms each of its constituents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: WWW, pp. 851–860 (2010)
Wu, S., Hofman, J.M., Mason, W.A., Watts, D.J.: Who says what to whom on twitter. In: WWW, pp. 705–714. ACM, New York (2011)
Deshpande, P.M., Deepak, P., Kummamuru, K.: Efficient online top-k retrieval with arbitrary similarity measures. In: EDBT, pp. 356–367 (2008)
Krinke, J.: Identifying similar code with program dependence graphs. In: WCRE, p. 301. IEEE Computer Society, Washington, DC (2001)
Subramaniam, L.V., Roy, S., Faruquie, T.A., Negi, S.: A survey of types of text noise and techniques to handle noisy text. In: AND, pp. 115–122 (2009)
Allison, B., Guthrie, D., Guthrie, L.: Another Look at the Data Sparsity Problem. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 327–334. Springer, Heidelberg (2006)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8 (1966)
Wang, W., Xiao, C., Lin, X., Zhang, C.: Efficient approximate entity extraction with edit distance constraints. In: SIGMOD, pp. 759–770 (2009)
Sanderson, M., Croft, W.B.: Deriving concept hierarchies from text. In: SIGIR, pp. 206–213 (1999)
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR, pp. 475–482 (2008)
Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: Similarity - measuring the relatedness of concepts. In: AAAI, pp. 1024–1025 (2004)
Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)
Robertson, S., Zaragoza, H.: On rank-based effectiveness measures and optimization. Inf. Retr. 10, 321–339 (2007)
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM, pp. 623–632 (2007)
Uysal, I., Croft, W.B.: User oriented tweet ranking: a filtering approach to microblogs. In: CIKM, pp. 2261–2264 (2011)
Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.Y.: An empirical study on learning to rank of tweets. In: COLING, pp. 295–303 (2010)
De Choudhury, M., Counts, S., Czerwinski, M.: Identifying relevant social media content: leveraging information diversity and user cognition. In: HT (2011)
Sarma, A.D., Sarma, A.D., Gollapudi, S., Panigrahy, R.: Ranking mechanisms in twitter-like forums. In: WSDM, pp. 21–30 (2010)
Chen, J., Nairn, R., Nelson, L., Bernstein, M.S., Chi, E.H.: Short and tweet: experiments on recommending content from information streams. In: CHI (2010)
Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: RecSys, pp. 385–388. ACM, New York (2009)
Pennacchiotti, M., Gurumurthy, S.: Investigating topic models for social media user recommendation. In: WWW (Companion Volume), pp. 101–102 (2011)
Diaz, F., Metzler, D., Amer-Yahia, S.: Relevance and ranking in online dating systems. In: SIGIR, pp. 66–73. ACM, New York (2010)
Hannon, J., Bennett, M., Smyth, B.: Recommending twitter users to follow using content and collaborative filtering approaches. In: RecSys, pp. 199–206 (2010)
Guy, I., Jacovi, M., Perer, A., Ronen, I., Uziel, E.: Same places, same things, same people?: mining user similarity on social media. In: CSCW, pp. 41–50 (2010)
Lee, M.-J., Chung, C.-W.: A User Similarity Calculation Based on the Location for Social Network Services. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part I. LNCS, vol. 6587, pp. 38–52. Springer, Heidelberg (2011)
Ding, Y., Li, X., Orlowska, M.E.: Recency-based collaborative filtering. In: Proceedings of the 17th Australasian Database Conference, ADC, vol. 49, pp. 99–107 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
P., D., Chakraborti, S. (2012). Finding Relevant Tweets. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds) Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32281-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-32281-5_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32280-8
Online ISBN: 978-3-642-32281-5
eBook Packages: Computer ScienceComputer Science (R0)