×

An efficient density based ant colony approach on web document clustering. (English) Zbl 07843624

Summary: World Wide Web (WWW) use has been increasing recently due to users needing more information. Lately, there has been a growing trend in the document information available to end users through the internet. The web’s document search process is essential to find relevant documents for user queries.As the number of general web pages increases, it becomes increasingly challenging for users to find records that are appropriate to their interests. However, using existing Document Information Retrieval (DIR) approaches is time-consuming for large document collections. To alleviate the problem, this novel presents Spatial Clustering Ranking Pattern (SCRP) based Density Ant Colony Information Retrieval (DACIR) for user queries based DIR. The proposed first stage is the Term Frequency Weight (TFW) technique to identify the query weightage-based frequency. Based on the weight score, they are grouped and ranked using the proposed Spatial Clustering Ranking Pattern (SCRP) technique. Finally, based on ranking, select the most relevant information retrieves the document using DACIR algorithm.The proposed method outperforms traditional information retrieval methods regarding the quality of returned objects while performing significantly better in run time.

MSC:

65H05 Numerical computation of solutions to single equations
65F10 Iterative numerical methods for linear systems
91C20 Clustering in the social and behavioral sciences
62G07 Density estimation
08A02 Relational systems, laws of composition
Full Text: DOI

References:

[1] J. Gong, In-depth Data Mining Method of Network Shared Resources Based on K-means Clustering, 13th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), 7 (2021), 694-698.
[2] J. Chiang, C.C.-H. Liu, Y.-H. Tsai and A. Kumar, Discovering Latent Semantics in Web Documents Using Fuzzy Clustering, IEEE Transactions on Fuzzy Systems 7 (2015), 2122-2134.
[3] C.C. Yang and T.D. Ng., Analyzing and Visualizing Web Opinion Development and Social Interactions With Density-Based Clustering, IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 7 (2011), 1144-1155.
[4] J. Wang, L. Wang H., J.-X. Liu, X.-Z. Kong and S.-J. Li, Multi-View Random-Walk Graph Regularization Low-Rank Representation for Cancer Clustering and Differentially Expressed Gene Selection, IEEE Journal of Biomedical and Health Informatics 7 (2022), 3578-3589.
[5] D. Bollegala, Y. Matsuo and M. Ishizuka, A Web Search Engine-Based Approach to Measure Semantic Similarity between Words, IEEE Transactions on Knowledge and Data Engineering 7 (2011), 977-990.
[6] Y. Xu, H.D. Li, Y. Pan, F. Luo, F.X. Wu and J. Wang, A Gene Rank Based Approach for Single Cell Similarity Assessment and Clustering, IEEE/ACM Transactions on Computational Biology and Bioinformatics 7 (2021), 431-442.
[7] B. Zhang, Y. Bai, Q. Zhang, J. Lian and M. Li, An Opinion-Leader Mining Method in Social Networks With a Phased-Clustering Perspective, IEEE Access 7 (2020), 31539-31550.
[8] G. Kang, J. Liu, Y. Xiao, Y. Cao, B. Cao and M. Shi, Web Services Clustering via Exploring Unified Content and Structural Semantic Representation, IEEE Transactions on Network and Service Management 7 (2022).
[9] S. Shehata, F. Karray and M. Kamel, An Efficient Concept-Based Mining Model for Enhancing Text Clustering, IEEE Transactions on Knowledge and Data Engineering 7 (2010), 1360-1371.
[10] D. Huang, C.D. Wang, H. Peng, J. Lai and C.K. Kwoh, Enhanced Ensemble Clustering via Fast Propagation of Cluster-Wise Similarities, IEEE Transactions on Systems, Man, and Cybernetics: Systems 7 (2021), 508-520.
[11] T.B. Mudiyanselage and Y. Zhang, Feature selection with graph mining technology, Big Data Mining and Analytics 7 (2019), 73-82. · doi:10.26599/BDMA.2018.9020032
[12] S. Kumar and M. Singh, A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem, Big Data Mining and Analytics 7 (2019), 240-247.
[13] B. Xu, X. Li, W. Hou, Y. Wang and Y. Wei, A Similarity-Based Ranking Method for Hyperspectral Band Selection, IEEE Transactions on Geoscience and Remote Sensing 7 (2021), 9585-9599.
[14] A.M. Sheri, M.A. Rafique, M.T. Hassan, K.N. Junejo and M. Jeon, Boosting Discrimination Information Based Document Clustering Using Consensus and Classification, IEEE Access 7 (2019), 78954-78962. · doi:10.1109/ACCESS.2019.2923462
[15] H. Lu, K. Gu, W. Lin and W. Zhang, Object Tracking Based on Stable Feature Mining Using Intraframe Clustering and Interframe Association, IEEE Access 7 (2017), 4690-4703.
[16] H. Qin, H.R. Li, G. Wang, X. Huang, Y. Yuan and X.J. Yu, Mining Stable Communities in Temporal Networks by Density-Based Clustering, IEEE Transactions on Big Data 7 (2022), 671-684.
[17] E. Uzun, A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages, IEEE Access 7 (2020), 61726-61740. · doi:10.1109/ACCESS.2020.2984503
[18] S. Miloudi, Y. Wang and W. Ding, A Gradient-Based Clustering for Multi-Database Mining, IEEE Access 7 (2021), 11144-11172.
[19] L. Wang, X. Qian, X. Zhang and X. Hou, Sketch-Based Image Retrieval With Multi-Clustering Re-Ranking, IEEE Transactions on Circuits and Systems for Video Technology 7 (2020), 4929-4943.
[20] J. Ravi, A robust measure of pairwise distance estimation approach: RD-RANSAC, International Journal of Statistics and Applied Mathematics 2(2) (2017), 31-34.
[21] M. Dorigo, V. Maniezzo, and A. Colorni, Positive feedback as a search strategy Technical report 91-016, Politecnico di milano, Dip. Elettronica, 1991.
[22] M. Dorigo, V. Maniezzo, and A. Colorni, The ant system: Optimization by a colony of cooperating agents, IEEE Transactions on Systems, Man, and Cybernetics-Part B 26(1) (1996), 29-42. · doi:10.1109/3477.484436
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.