×

Classification of run-length encoded binary data. (English) Zbl 1103.68744

Summary: In classification of binary featured data, distance computation is carried out by considering each feature. We represent the given binary data as run-length encoded data. This would lead to a compact or compressed representation of data. Further, we propose an algorithm to directly compute the Manhattan distance between two such binary encoded patterns. We show that classification of data in such compressed form would improve the computation time by a factor of 5 on large handwritten data. The scheme is useful in large data clustering and classification which depend on distance measures.

MSC:

68T10 Pattern recognition, speech recognition
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68W05 Nonnumerical algorithms
Full Text: DOI

References:

[1] Zhang, T.; Ramakrishnan, R.; Linvy, M., BIRCH: an efficient data clustering method for very large databases, (Proceedings of ACM SIGMOD, International Conference on Management of Data (1996), ACM Press: ACM Press New York)
[2] B.C.M. Fung, Hierarchical document clustering using frequent itemsets, Thesis submitted in Simon Fraser University, 1999, available in \(\langle;\) http://citeseer.ist.psu.edu/581906.html \(\rangle;\); B.C.M. Fung, Hierarchical document clustering using frequent itemsets, Thesis submitted in Simon Fraser University, 1999, available in \(\langle;\) http://citeseer.ist.psu.edu/581906.html \(\rangle;\)
[3] Makinen, V.; Navarro, G.; Ukkonen, E., Approximate matching of run-length compressed strings, Algorithmica, 35, 4, 347-369 (2003) · Zbl 1045.68059
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.