skip to main content
short-paper

Regularized Dual-PPMI Co-clustering for Text Data

Published: 11 July 2021 Publication History

Abstract

Co-clustering of document-term matrices has proved to be more effective than one-sided clustering. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises-Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper we propose a novel co-clustering approach based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible method for text co-clustering that can easily incorporate both word-word semantic relationships and document-document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a dual multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence.

References

[1]
Séverine Affeldt, Lazhar Labiod, and Mohamed Nadif. Ensemble block co-clustering: a unified framework for text data. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 5--14, 2020.
[2]
Stanley C Ahalt, Ashok K Krishnamurthy, Prakoon Chen, and Douglas E Melton. Competitive learning algorithms for vector quantization. Neural networks, 3 (3): 277--290, 1990.
[3]
Melissa Ailem, Francc ois Role, and Mohamed Nadif. Graph modularity maximization as an effective method for co-clustering text data. Knowledge-Based Systems, 109: 160--173, 2016.
[4]
Melissa Ailem, Aghiles Salah, and Mohamed Nadif. Non-negative matrix factorization meets word embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1081--1084, 2017.
[5]
Arindam Banerjee and Joydeep Ghosh. Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres. IEEE Transactions on Neural Networks, 15 (3): 702--719, 2004.
[6]
Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, and Suvrit Sra. Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res., 6: 1345--1382, 2005.
[7]
Duane DeSieno. Adding a conscience to competitive learning. In IEEE international conference on neural networks, volume 1, pages 117--124, San Diego, CA, USA, 1988. Institute of Electrical and Electronics Engineers New York, IEEE.
[8]
Inderjit S. Dhillon and Dharmendra S. Modha. Concept decompositions for large sparse text data using clustering. Mach. Learn., 42 (1--2): 143--175, 2001.
[9]
Inderjit S Dhillon, Subramanyam Mallela, and Dharmendra S Modha. Information-theoretic co-clustering. In the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 89--98. ACM, 2003.
[10]
Siddharth Gopal and Yiming Yang. Von Mises-Fisher clustering models. In ICML, pages 154--162, Beijing, China, 2014.
[11]
Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of classification, 2 (1): 193--218, 1985.
[12]
Kanti V Mardia and Peter E Jupp. Directional statistics, volume 494. John Wiley & Sons, New York, NY, USA, 2009.
[13]
Geoffrey J McLachlan and David Peel. Finite mixture models. John Wiley & Sons, 2004.
[14]
David Newman, Sarvnaz Karimi, and Lawrence Cavedon. External evaluation of topic models. In in Australasian Doc. Comp. Symp. IEEE, 2009.
[15]
Francois Role and Mohamed Nadif. Handling the impact of low frequency events on co-occurrence based measures of word similarity. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2011). Scitepress, pages 218--223, 2011.
[16]
François Role, Stanislas Morbieu, and Mohamed Nadif. Coclust: A python package for co-clustering. Journal of Statistical Software, Articles, 88 (7): 1--29, 2019. ISSN 1548--7660.
[17]
]salah2017modelAghiles Salah and Mohamed Nadif. Model-based von mises-fisher co-clustering with a conscience. In Proceedings of the 2017 SIAM International Conference on Data Mining, pages 246--254. SIAM, 2017 a.
[18]
]salah2017socialAghiles Salah and Mohamed Nadif. Social regularized von mises--fisher mixture model for item recommendation. Data Mining and Knowledge Discovery, 31 (5): 1218--1241, 2017 b .
[19]
Aghiles Salah and Mohamed Nadif. Directional co-clustering. Adv. Data Analysis and Classification, 13 (3): 591--620, 2019.
[20]
Aghiles Salah, Melissa Ailem, and Mohamed Nadif. Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clustering. In Thirty-Second AAAI Conference on Artificial Intelligence, pages 3992--3999, 2018.
[21]
Douglas Steinley. Properties of the hubert-arable adjusted rand index. Psychological methods, 9 (3): 386, 2004.
[22]
Alexander Strehl and Joydeep Ghosh. Cluster ensembles--a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3: 583--617, 2003.
[23]
Shi Zhong and Joydeep Ghosh. Generative model-based document clustering: a comparative study. Knowledge and Information Systems, 8 (3): 374--384, 2005.

Cited By

View all

Index Terms

  1. Regularized Dual-PPMI Co-clustering for Text Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2021
    2998 pages
    ISBN:9781450380379
    DOI:10.1145/3404835
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. co-clustering
    2. information retrieval
    3. regularization
    4. text mining

    Qualifiers

    • Short-paper

    Funding Sources

    • French National Research Agency (ANR)

    Conference

    SIGIR '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Boosting Subspace Co-Clustering via Bilateral Graph ConvolutionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330081436:3(960-971)Online publication date: 3-Aug-2023
    • (2023)Multi-objective genetic model for co-clustering ensembleApplied Soft Computing10.1016/j.asoc.2023.110058135:COnline publication date: 1-Mar-2023
    • (2023)Fast parameterless prototype-based co-clusteringMachine Language10.1007/s10994-023-06474-y113:4(2153-2181)Online publication date: 21-Nov-2023
    • (2022)Subspace Co-clustering with Two-Way Graph ConvolutionProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557706(3938-3942)Online publication date: 17-Oct-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media