×

A novel framework for automatic sorting of postal documents with multi-script address blocks. (English) Zbl 1213.68516

Summary: Recognition of numeric postal codes in a multi-script environment is a classical problem in any postal automation system. In such postal documents, determination of the script of the handwritten postal codes is crucial for subsequent invocation of the digit recognizers for respective scripts. The current framework attempts to infer about the script of the numeric postal code without having any bias from the script of the textual address part of the rest of the address block, as they might differ in a potential multi-script environment. Scope of the current work is to recognize the postal codes written in any of the four popular scripts, viz., Latin, Devanagari, Bangla and Urdu. For this purpose, we first implement a Hough transformation based technique to localize the postal-code blocks from structured postal documents with defined address block region.
Isolated handwritten digit patterns are then extracted from the localized postal-code region. In the next stage of the developed framework, similar shaped digit patterns of the said four scripts are grouped in 25 clusters. A script independent unified pattern classifier is then designed to classify the numeric postal codes into one of these 25 clusters. Based on these classification decisions a rule-based script inference engine is designed to infer about the script of the numeric postal code. One of the four script specific classifiers is subsequently invoked to recognize the digit patterns of the corresponding script.
A novel quad-tree based image partitioning technique is also developed in this work for effective feature extraction from the numeric digit patterns. The average recognition accuracy over ten-fold cross validation of results for the support vector machine (SVM) based 25-class unified pattern classifier is obtained as 92.03%. With randomly selected six-digit numeric strings of four different scripts; an average of 96.72% script inference accuracy is achieved. The average of tenfold cross-validation recognition accuracies of the individual SVM classifiers for the Latin, Devanagari, Bangla and Urdu numerals are observed as 95.55%, 95.63%, 97.15% and 96.20%, respectively.

MSC:

68T10 Pattern recognition, speech recognition
Full Text: DOI

References:

[1] 〈http://www.rajbhasha.nic.in/dolacteng.htm〉; 〈http://www.rajbhasha.nic.in/dolacteng.htm〉
[2] 〈http://en.wikipedia.org/wiki/Languages_with_official_status_in_India〉; 〈http://en.wikipedia.org/wiki/Languages_with_official_status_in_India〉
[3] 〈http://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers〉; 〈http://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers〉
[4] S. Chaudhury R. Sheth, Trainable script identification strategies for Indian languages, in: Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999, pp. 657-660.; S. Chaudhury R. Sheth, Trainable script identification strategies for Indian languages, in: Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999, pp. 657-660.
[5] Singhal, V.; Navin, N.; Ghosh, D., Script-based classification of hand-written text document in a multilingual environment, Research Issues in Data Engineering, 47-54 (2003)
[6] Gopal Datt Joshi, Saurabh Garg, Jayanthi Sivaswamy, Script identification from Indian documents, DAS 2006, LNCS, vol. 3872, 2006, pp. 255-267.; Gopal Datt Joshi, Saurabh Garg, Jayanthi Sivaswamy, Script identification from Indian documents, DAS 2006, LNCS, vol. 3872, 2006, pp. 255-267.
[7] Datt Joshi, Gopal; Garg, Saurabh; Sivaswamy, Jayanthi, A generalised framework for script identification, IJDAR, Vol. 10, 55-68 (2007)
[8] U. Pal, B.B. Chaudhuri, Automatic separation of different script lines from Indian multi-script documents, in: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, 1998, pp. 141-146.; U. Pal, B.B. Chaudhuri, Automatic separation of different script lines from Indian multi-script documents, in: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, 1998, pp. 141-146.
[9] U. Pal, B.B. Chaudhuri, Script line separation from Indian multi-script documents, in: Proceedings of Fifth International Conference on Document Analysis and Recognition, 1999, pp. 406-409.; U. Pal, B.B. Chaudhuri, Script line separation from Indian multi-script documents, in: Proceedings of Fifth International Conference on Document Analysis and Recognition, 1999, pp. 406-409.
[10] U. Pal, B.B. Chaudhuri, Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line, in: Proceedings of the International Conference on Document Analysis and Recognition, 2001, pp. 0790-0794.; U. Pal, B.B. Chaudhuri, Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line, in: Proceedings of the International Conference on Document Analysis and Recognition, 2001, pp. 0790-0794.
[11] Pal, U.; Chaudhuri, B. B., Identification of different script lines from multi-script documents, Image and Vision computing, 20, 13-14, 945-954 (2002)
[12] U. Pal, S. Sinha, B.B. Chaudhuri, Multi-script line identification from Indian documents, in: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, 2003, pp. 880-884.; U. Pal, S. Sinha, B.B. Chaudhuri, Multi-script line identification from Indian documents, in: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, 2003, pp. 880-884.
[13] Sinha, Suranjit,Umapada Pal, B.B. Chaudhuri, Word-wise script identification from Indian documents, DAS 2004, LNCS, vol. 3163, 2004, pp. 310-321.; Sinha, Suranjit,Umapada Pal, B.B. Chaudhuri, Word-wise script identification from Indian documents, DAS 2004, LNCS, vol. 3163, 2004, pp. 310-321.
[14] K. Roy, A. Banerjee, U. Pal, A system for wordwise handwritten script identification for Indian postal automation, in: Proceedings of the IEEE INDICON-04, 2004, pp. 266-271.; K. Roy, A. Banerjee, U. Pal, A system for wordwise handwritten script identification for Indian postal automation, in: Proceedings of the IEEE INDICON-04, 2004, pp. 266-271.
[15] Lijun, ZhouYue Lu Chew Lim Tan, Bangla/English script identification based on analysis of connected component profiles, DAS 2006, LNCS, vol.3872, 2006, pp. 243-254.; Lijun, ZhouYue Lu Chew Lim Tan, Bangla/English script identification based on analysis of connected component profiles, DAS 2006, LNCS, vol.3872, 2006, pp. 243-254.
[16] K. Roy, S. Vajda, U. Pal, B.B. Chaudhuri, A system towards Indian postal automation, in: Proceedings of the Ninth IWFHR, 2004, pp. 361-367.; K. Roy, S. Vajda, U. Pal, B.B. Chaudhuri, A system towards Indian postal automation, in: Proceedings of the Ninth IWFHR, 2004, pp. 361-367.
[17] K. Roy, S. Vajda, U. Pal, B.B. Chaudhuri, A. Belaid, A system for Indian postal automation, in: Proceedings of the Eighth ICDAR, 2005.; K. Roy, S. Vajda, U. Pal, B.B. Chaudhuri, A. Belaid, A system for Indian postal automation, in: Proceedings of the Eighth ICDAR, 2005.
[18] Plamondon, R.; Srihari, S. N., On-line and off-line handwritten recognition: a comprehensive survey, IEEE Transactions on PAMI, 22, 62-84 (2000)
[19] Wen, Y.; Lu, Y.; Shi, P., Handwritten Bangla numeral recognition system and its application to postal automation, Pattern Recognition, 40, 99-107 (2007) · Zbl 1103.68793
[20] U. Bhattacharya et al., Neural combination of ANN and HMM for handwritten Devnagari numeral recognition, in: Proceedings of the 10th IWFHR, 2006, pp. 613-618.; U. Bhattacharya et al., Neural combination of ANN and HMM for handwritten Devnagari numeral recognition, in: Proceedings of the 10th IWFHR, 2006, pp. 613-618.
[21] S. Basu, C. Chaudhuri, M. Kundu, M. Nasipuri, D.K. Basu, A two pass approach to pattern classification, in: N.R. Pal, et al. (Eds.), Lecture Notes in Computer Science, vol. 3316, ICONIP, Kolkata, November 2004, pp. 781-786.; S. Basu, C. Chaudhuri, M. Kundu, M. Nasipuri, D.K. Basu, A two pass approach to pattern classification, in: N.R. Pal, et al. (Eds.), Lecture Notes in Computer Science, vol. 3316, ICONIP, Kolkata, November 2004, pp. 781-786.
[22] S. Basu, R. Sarkar, N. Das, M. Kundu, M. Nasipuri, D.K. Basu, Handwritten Bangla digit recognition using classifier combination through DS technique, in: S.K. Pal et al. (Eds.), Lecture Notes in Computer Science, vol. 3776, PReMI, ISI, Kolkata, December 2005, pp. 236-241.; S. Basu, R. Sarkar, N. Das, M. Kundu, M. Nasipuri, D.K. Basu, Handwritten Bangla digit recognition using classifier combination through DS technique, in: S.K. Pal et al. (Eds.), Lecture Notes in Computer Science, vol. 3776, PReMI, ISI, Kolkata, December 2005, pp. 236-241.
[23] Umapada Pal, N. Sharma, Tetsushi Wakabayashi, Fumitaka Kimura, Handwritten numeral recognition of six popular Indian scripts, in: ICDAR, 2007, pp. 749-753.; Umapada Pal, N. Sharma, Tetsushi Wakabayashi, Fumitaka Kimura, Handwritten numeral recognition of six popular Indian scripts, in: ICDAR, 2007, pp. 749-753.
[24] Mahmoud, Sabri, Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models, Signal Processing, 88, no. 4, 844-857 (April, 2008) · Zbl 1186.94226
[25] S. Basu, S.S. Seth, P. Sarkar, B. Das, S. Dey, S. Ghosh, Recognition of Pincodes from Indian Postal Documents, Soft Computing, Allied Publishers, 817764632-X, 9788177646320, pp. 239-245.; S. Basu, S.S. Seth, P. Sarkar, B. Das, S. Dey, S. Ghosh, Recognition of Pincodes from Indian Postal Documents, Soft Computing, Allied Publishers, 817764632-X, 9788177646320, pp. 239-245.
[26] Wen, Ying; Lu, Yue; Shi, Pengfei, Handwritten Bangla numeral recognition system and its application to postal automation Pattern Recognition, 40, Issue 1, 99-107 (January 2007) · Zbl 1103.68793
[27] Gonzalez, R. C.; Woods, R. E., Digital Image Processing (1992), Prentice-Hall: Prentice-Hall India
[28] http://yann.lecun.com/exdb/mnist/; http://yann.lecun.com/exdb/mnist/
[29] Basu, S.; Das, N.; Sarkar, R.; Kundu, M.; Nasipuri, M.; Basu, D. K., A hierarchical approach to recognition of handwritten Bangla characters, Pattern Recognition, vol. 42, no. 7, 1467-1484 (2009) · Zbl 1189.68108
[30] Wang, X.; Ding, X.; Liu, C., Gabor filters-based feature extraction for character recognition, Pattern Recognition, 38, Issue 3, 369-379 (March 2005) · Zbl 1061.68143
[31] C.L. Liu, M. Koga, H. Fujisawa, Gabor feature extraction for character recognition: comparison with gradient feature, in: Proceedings of International Conference on Document Analysis and Recognition (ICDAR) 2005, pp. 121-125.; C.L. Liu, M. Koga, H. Fujisawa, Gabor feature extraction for character recognition: comparison with gradient feature, in: Proceedings of International Conference on Document Analysis and Recognition (ICDAR) 2005, pp. 121-125.
[32] U. Bhattacharya, S.K. Parui, M. Sridhar, F. Kimura, Two-stage recognition of handwritten Bangla; U. Bhattacharya, S.K. Parui, M. Sridhar, F. Kimura, Two-stage recognition of handwritten Bangla
[33] U. Bhattacharya, M. Sridhar, S.K. Parui, On recognition of handwritten Bangla; U. Bhattacharya, M. Sridhar, S.K. Parui, On recognition of handwritten Bangla
[34] N.J. Nilson, Principles of Artificial Intelligence, Springer-Verleg, pp. 21-22.; N.J. Nilson, Principles of Artificial Intelligence, Springer-Verleg, pp. 21-22. · Zbl 0474.68094
[35] 〈http://www.csie.ntu.edu.tw/∼cjlin/libsvm/〉; 〈http://www.csie.ntu.edu.tw/∼cjlin/libsvm/〉
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.