Classification of run-length encoded binary data. (English) Zbl 1103.68744
Summary: In classification of binary featured data, distance computation is carried out by considering each feature. We represent the given binary data as run-length encoded data. This would lead to a compact or compressed representation of data. Further, we propose an algorithm to directly compute the Manhattan distance between two such binary encoded patterns. We show that classification of data in such compressed form would improve the computation time by a factor of 5 on large handwritten data. The scheme is useful in large data clustering and classification which depend on distance measures.
MSC:
68T10 | Pattern recognition, speech recognition |
68P30 | Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) |
68W05 | Nonnumerical algorithms |
References:
[1] | Zhang, T.; Ramakrishnan, R.; Linvy, M., BIRCH: an efficient data clustering method for very large databases, (Proceedings of ACM SIGMOD, International Conference on Management of Data (1996), ACM Press: ACM Press New York) |
[2] | B.C.M. Fung, Hierarchical document clustering using frequent itemsets, Thesis submitted in Simon Fraser University, 1999, available in \(\langle;\) http://citeseer.ist.psu.edu/581906.html \(\rangle;\); B.C.M. Fung, Hierarchical document clustering using frequent itemsets, Thesis submitted in Simon Fraser University, 1999, available in \(\langle;\) http://citeseer.ist.psu.edu/581906.html \(\rangle;\) |
[3] | Makinen, V.; Navarro, G.; Ukkonen, E., Approximate matching of run-length compressed strings, Algorithmica, 35, 4, 347-369 (2003) · Zbl 1045.68059 |
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.