Abstract
Detecting and extracting information from the machine-readable zone (MRZ) on passports and visas is becoming increasingly important for verifying document authenticity. However, computer vision methods for performing similar tasks, such as optical character recognition, fail to extract the MRZ from digital images of passports with reasonable accuracy. We present a specially designed model based on convolutional neural networks that is able to successfully extract MRZ information from digital images of passports of arbitrary orientation and size. Our model achieves 100% MRZ detection rate and 99.25% character recognition macro-f1 score on a passport and visa dataset.
Similar content being viewed by others
Availability of data and material
Not available.
Change history
References
Abhishek Dutta, A.G., Zisserman, A. (2020). https://www.robots.ox.ac.uk/~vgg/software/via/
Arlazarov, V.V., Bulatov, K.B., Chernov, T.S., Arlazarov, V.L.: Midv-500: a dataset for identity document analysis and recognition on mobile devices in video stream. 43(5) (2019)
Bessmeltsev, V., Bulushev, E., Goloshevsky, N.: High-speed OCR algorithm for portable passport readers. In: 21st International Conference on Computer Graphics and Vision, GraphiCon’2011—Conference Proceedings (2011)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Chernyshova, Y.S., Aliev, M.A., Gushchanskaia, E.S., Sheshkus, A.V.: Optical font recognition in smartphone-captured images, and its applicability for ID forgery detection. In: Eleventh International Conference on Machine Vision (ICMV 2018), p. 59 (2019). https://doi.org/10.1117/12.2522955. arXiv:1810.08016
Dai, Y., Huang, Z., Gao, Y., Xu, Y., Chen, K., Guo, J., Qiu, W.: Fused Text Segmentation Networks for Multi-oriented Scene Text Detection. arXiv:1709.03272 [cs] (2018)
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: Detecting Scene Text via Instance Segmentation. arXiv:1801.01315 [cs] (2018)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Deng, L., Gong, Y., Lin, Y., Shuai, J., Tu, X., Zhang, Y., Ma, Z., Xie, M.: Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334, 134–142 (2019). https://doi.org/10.1016/j.neucom.2019.01.013
Donoser, M., Arth, C., Bischof, H.: Detecting, tracking and recognizing license plates. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds) Computer Vision—ACCV 2007, Lecture Notes in Computer Science, pp. 447–456. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-76390-1_44
doubango.org: (2020). https://github.com/DoubangoTelecom/ultimateMRZ-SDK#Getting-started-Adding-the-SDK-to-your-project
Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 2373–2376 (2009)
Giusti, A., Ciresan, D.C., Masci, J., Gambardella, L.M., Schmidhuber, J.: Fast image scanning with deep max-pooling convolutional neural networks. In: 2013 IEEE International Conference on Image Processing (2013). https://doi.org/10.1109/icip.2013.6738831
González, Á., Bergasa, L.M., Yebes, J.J.: Location in complex images (2012)
Hartl, A., Arth, C., Schmalstieg, D.: Real-time detection and recognition of machine-readable zones with mobile devices:. In: Proceedings of the 10th International Conference on Computer Vision Theory and Applications, pp. 79–87. SCITEPRESS—Science and and Technology Publications, Berlin (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv:1703.06870 [cs] (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016). https://doi.org/10.1109/cvpr.2016.90
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single Shot Text Detector with Regional Attention. arXiv:1709.00138 [cs] (2017)
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00527
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: Exploiting Word Annotations for Character based Text Detection. arXiv:1708.06720 [cs] (2017)
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on Robust reading. In: 13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015—Conference Proceedings, pp. 1156–1160. IEEE Computer Society (2015). https://doi.org/10.1109/ICDAR.2015.7333942
Kasar, T., Ramakrishnan, A.: Multi-script and multi-oriented text localization from scene images. pp. 1–14 (2012). https://doi.org/10.1007/978-3-642-29364-1_1
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)
Kostro, D., Zasso, M. (2020). https://github.com/image-js/mrz-detection
Lee, H., Kwak, N.: Character recognition for the machine reader zone of electronic identity cards. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 387–391 (2015). https://doi.org/10.1109/ICIP.2015.7350826
Liao, M., Shi, B., Bai, X.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018). https://doi.org/10.1109/TIP.2018.2825107
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time Scene Text Detection with Differentiable Binarization. arXiv:1911.08947 [cs] (2019)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Lecture Notes in Computer Science, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: Fast oriented text spotting with a unified network. arXiv:1801.01671 [cs] (2018)
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)
Liu, Y., He, T., Chen, H., Wang, X., Luo, C., Zhang, S., Shen, C., Jin, L.: Exploring the capacity of an orderless box discretization network for multi-orientation scene text detection. arXiv:1912.09629 [cs] (2020)
Liu, Z., Sarkar, S.: Robust outdoor text detection using text intensity and shape features
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. arXiv:1807.01544 [cs] (2018)
Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. arXiv:1807.02242 [cs] (2018)
Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. arXiv:1802.08948 [cs] (2018)
Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Iwamura, M., Shafait, F. (eds) Camera-Based Document Analysis and Recognition, Lecture Notes in Computer Science, pp. 29–41. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-29364-1_3
Minetto, R., Thome, N., Cord, M., Stolfi, J., Précioso, F., Guyomard, J., Leite, N.: Text detection and recognition in urban scenes. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 227–234 (2011). https://doi.org/10.1109/ICCVW.2011.6130247
Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: Proceedings of the British Machine Vision Conference 2012, pp. 127.1–127.11. British Machine Vision Association, Surrey (2012). https://doi.org/10.5244/C.26.127. http://www.bmva.org/bmvc/2012/BMVC/paper127/index.html
Neumann, L., Matas, J.: Real-time scene text localization and recognition. pp. 3538–3545 (2012). https://doi.org/10.1109/CVPR.2012.6248097
Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011). https://doi.org/10.1109/TIP.2010.2070803
Petrova, O., Bulatov, K.: Methods of machine-readable zone recognition results post-processing. In: Eleventh International Conference on Machine Vision (ICMV 2018), vol. 11041, p. 110411H. International Society for Optics and Photonics (2019). https://doi.org/10.1117/12.2522792
SakuraRiven (2020). https://github.com/SakuraRiven/EAST
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00474
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks (2013)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. arXiv:1703.06520 [cs] (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Smith, R.: An overview of the tesseract ocr engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633. IEEE (2007)
Tretyakov, K.: PassportEye: Extraction of machine-readable zone information from passports, visas and id-cards via OCR (2016). https://github.com/konstantint/PassportEye
Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Advances in Neural Information Processing Systems, pp. 335–344 (2017)
Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. arXiv:1903.12473 [cs] (2019)
Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. arXiv:1910.07954 [cs] (2019)
Xu, Y., Duan, J., Kuang, Z., Yue, X., Sun, H., Guan, Y., Zhang, W.: Geometry normalization networks for accurate scene text detection. arXiv:1909.00794 [cs] (2019)
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. arXiv:1604.04018 [cs] (2016)
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. arXiv:1704.03155 [cs] (2017)
Zhu, K.H., Qi, F.H., Jiang, R.J., Xu, L.: Automatic character detection and segmentation in natural scene images. J. Zhejiang Univ. Sci. A 8, 63–71 (2007). https://doi.org/10.1631/jzus.2007.A0063
Funding
This research was supported by Lendbuzz.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Code availability
Not available.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article was revised due to update in second author name.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Joren, H., Gupta, O. et al. MRZ code extraction from visa and passport documents using convolutional neural networks. IJDAR 25, 29–39 (2022). https://doi.org/10.1007/s10032-021-00384-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-021-00384-2