skip to main content
research-article

Product feature categorization with multilevel latent semantic association

Published: 02 November 2009 Publication History

Abstract

In recent years, the number of freely available online reviews is increasing at a high speed. Aspect-based opinion mining technique has been employed to find out reviewers' opinions toward different product aspects. Such finer-grained opinion mining is valuable for the potential customers to make their purchase decisions. Product-feature extraction and categorization is very important for better mining aspect-oriented opinions. Since people usually use different words to describe the same aspect in the reviews, product-feature extraction and categorization becomes more challenging. Manually product-feature extraction and categorization is tedious and time consuming, and practically infeasible for the massive amount of products. In this paper, we propose an unsupervised product-feature categorization method with multilevel latent semantic association. After extracting product-features from the semi-structured reviews, we construct the first latent semantic association (LaSA) model to group words into a set of concepts according to their virtual context documents. It generates the latent semantic structure for each product-feature. The second LaSA model is constructed to categorize the product-features according to their latent semantic structures and context snippets in the reviews. Experimental results demonstrate that our method achieves better performance compared with the existing approaches. Moreover, the proposed method is language- and domain-independent.

References

[1]
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(5):993--1022, 2003.
[2]
S. Branavan, H. Chen, J. Eisenstein, and R. Barzilay. Learning document-level semantic properties from free-text annotations. In Proceedings of 46th Annual Meeting of the Association for Computational Linguisticsm (ACL'08), pages 263--271, 2008.
[3]
C. Cardie and K. Wagstaff. Noun phrase coreference as clustering. In Proceedings of the Joint Conf on Empirical Methods in NLP and Very Large Corpora, pages 82--89, 1999.
[4]
K. W. Church and P. Hanks. Word association norms, mutual information and lexicography. Computational Linguistics 16(1):22--29, 1990.
[5]
P. Deane. A nonparametric method for extraction of candidate phrasal terms. In Proceedings of 43th Annual Meeting of the Association for Computational Linguistics (ACL'05), 2005.
[6]
S. Evert and B. Krenn. Methods for the qualitative evaluation of lexical association measures. In Proceedings of 39th Annual Meeting of the Association for Computational Linguistics (ACL'01), 2001.
[7]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR99), 1999.
[8]
M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery&Data Mining (KDD-2004), pages 761--769, 2004.
[9]
M. Hu and B. Liu. Mining opinion features in customer reviews. In Proceedings of Nineteeth National Conference on Artificial Intellgience (AAAI-2004), pages 755--760, 2004.
[10]
N. Kaji and M. Kitsuregawa. Building lexicon for sentiment analysis from massive collection of html documents. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pages 1075--1083, 2007.
[11]
B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on World Wide Web (WWW'05), pages 1024--1025, 2005.
[12]
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297, 1967.
[13]
A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proceedings of Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP '05), Vancouver, CA, 2005.
[14]
W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846--850, December 1971.
[15]
C. Scaffidi, K. Bierhoff, E. Chang, M. Felker, H. Ng, and C. Jin. Red opal: Product-feature scoring from reviews. In Proceedings of the 8th ACM conference on Electronic commerce (EC'07), pages 182--191, 2007.
[16]
P. Schone and D. Jurafsky. Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP'01), 2001.
[17]
Q. Su, X. Xu, H. Guo, Z. Guo, X. Wu, X. Zhang, B. Swen, and Z. Su. Hidden sentiment association in chinese web opinion mining. In Proceedings of the 17th international conference on World Wide Web (WWW'08), pages 959--968, 2008.
[18]
I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In Proceedings of 46th Annual Meeting of the Association for Computational Linguisticsm (ACL'08), pages 308--316, 2008.
[19]
I. Titov and R. McDonald. Modeling online reviews with multi-grain topic models. In Proceedings of the 17th international conference on World Wide Web (WWW'08), 2008.
[20]
K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 577--584, 2001.
[21]
X. Wei and B. Croft. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR06), 2006.
[22]
T.-L. Wong, W. Lam, and T.-S. Wong. An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In Proceedings of the 31st Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08), pages 35--41, 2008.
[23]
T. Zagibalov and J. Carroll. Automatic seed word selection for unsupervised sentiment classification of chinese text. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 1073--1080, 2008

Cited By

View all
  • (2024)Determining directions of service quality management using online review mining with interpretable machine learningInternational Journal of Hospitality Management10.1016/j.ijhm.2023.103684118(103684)Online publication date: Apr-2024
  • (2023)Interpretable machine learning-based approach for customer segmentation for new product development from online product reviewsInternational Journal of Information Management10.1016/j.ijinfomgt.2023.10264170(102641)Online publication date: Jun-2023
  • (2023)Early Unimodal Sentiment Analysis of Comment Text Based on Traditional Machine LearningMulti-Modal Sentiment Analysis10.1007/978-981-99-5776-7_3(53-134)Online publication date: 27-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. latent semantic association
  2. opinion mining
  3. product feature categorization

Qualifiers

  • Research-article

Conference

CIKM '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)4
Reflects downloads up to 23 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Determining directions of service quality management using online review mining with interpretable machine learningInternational Journal of Hospitality Management10.1016/j.ijhm.2023.103684118(103684)Online publication date: Apr-2024
  • (2023)Interpretable machine learning-based approach for customer segmentation for new product development from online product reviewsInternational Journal of Information Management10.1016/j.ijinfomgt.2023.10264170(102641)Online publication date: Jun-2023
  • (2023)Early Unimodal Sentiment Analysis of Comment Text Based on Traditional Machine LearningMulti-Modal Sentiment Analysis10.1007/978-981-99-5776-7_3(53-134)Online publication date: 27-Nov-2023
  • (2022)Explore and Interpret the Correlations Among VR Applications2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct57072.2022.00015(22-26)Online publication date: Oct-2022
  • (2021)Explainable neural network-based approach to Kano categorisation of product features from online reviewsInternational Journal of Production Research10.1080/00207543.2021.200065660:23(7053-7073)Online publication date: 16-Nov-2021
  • (2021)Sentiment Analysis on Social Media for Emotional Prediction During COVID‐19 Pandemic Using Efficient Machine Learning ApproachComputational Intelligence and Healthcare Informatics10.1002/9781119818717.ch12(215-233)Online publication date: 25-Oct-2021
  • (2020)Needs-Based Product Configurator Design for Mass Customization Using Hierarchical Attention NetworkIEEE Transactions on Automation Science and Engineering10.1109/TASE.2019.2957136(1-10)Online publication date: 2020
  • (2020)Sentiment Analysis10.1017/9781108639286Online publication date: 23-Sep-2020
  • (2019)Amalgamation of General and Domain Specific Word Embeddings for Improved Hierarchical Aspect Aggregation2019 IEEE 13th International Conference on Semantic Computing (ICSC)10.1109/ICOSC.2019.8665518(55-62)Online publication date: Jan-2019
  • (2019)A novel topic-based framework for recommending long tail productsComputers & Industrial Engineering10.1016/j.cie.2019.106063(106063)Online publication date: Sep-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media