Skip to main content
Log in

MRZ code extraction from visa and passport documents using convolutional neural networks

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

A Correction to this article was published on 11 July 2023

This article has been updated

Abstract

Detecting and extracting information from the machine-readable zone (MRZ) on passports and visas is becoming increasingly important for verifying document authenticity. However, computer vision methods for performing similar tasks, such as optical character recognition, fail to extract the MRZ from digital images of passports with reasonable accuracy. We present a specially designed model based on convolutional neural networks that is able to successfully extract MRZ information from digital images of passports of arbitrary orientation and size. Our model achieves 100% MRZ detection rate and 99.25% character recognition macro-f1 score on a passport and visa dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and material

Not available.

Change history

References

  1. Abhishek Dutta, A.G., Zisserman, A. (2020). https://www.robots.ox.ac.uk/~vgg/software/via/

  2. Arlazarov, V.V., Bulatov, K.B., Chernov, T.S., Arlazarov, V.L.: Midv-500: a dataset for identity document analysis and recognition on mobile devices in video stream. 43(5) (2019)

  3. Bessmeltsev, V., Bulushev, E., Goloshevsky, N.: High-speed OCR algorithm for portable passport readers. In: 21st International Conference on Computer Graphics and Vision, GraphiCon’2011—Conference Proceedings (2011)

  4. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)

    Article  Google Scholar 

  5. Chernyshova, Y.S., Aliev, M.A., Gushchanskaia, E.S., Sheshkus, A.V.: Optical font recognition in smartphone-captured images, and its applicability for ID forgery detection. In: Eleventh International Conference on Machine Vision (ICMV 2018), p. 59 (2019). https://doi.org/10.1117/12.2522955. arXiv:1810.08016

  6. Dai, Y., Huang, Z., Gao, Y., Xu, Y., Chen, K., Guo, J., Qiu, W.: Fused Text Segmentation Networks for Multi-oriented Scene Text Detection. arXiv:1709.03272 [cs] (2018)

  7. Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: Detecting Scene Text via Instance Segmentation. arXiv:1801.01315 [cs] (2018)

  8. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

  9. Deng, L., Gong, Y., Lin, Y., Shuai, J., Tu, X., Zhang, Y., Ma, Z., Xie, M.: Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334, 134–142 (2019). https://doi.org/10.1016/j.neucom.2019.01.013

    Article  Google Scholar 

  10. Donoser, M., Arth, C., Bischof, H.: Detecting, tracking and recognizing license plates. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds) Computer Vision—ACCV 2007, Lecture Notes in Computer Science, pp. 447–456. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-76390-1_44

  11. doubango.org: (2020). https://github.com/DoubangoTelecom/ultimateMRZ-SDK#Getting-started-Adding-the-SDK-to-your-project

  12. Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 2373–2376 (2009)

  13. Giusti, A., Ciresan, D.C., Masci, J., Gambardella, L.M., Schmidhuber, J.: Fast image scanning with deep max-pooling convolutional neural networks. In: 2013 IEEE International Conference on Image Processing (2013). https://doi.org/10.1109/icip.2013.6738831

  14. González, Á., Bergasa, L.M., Yebes, J.J.: Location in complex images (2012)

  15. Hartl, A., Arth, C., Schmalstieg, D.: Real-time detection and recognition of machine-readable zones with mobile devices:. In: Proceedings of the 10th International Conference on Computer Vision Theory and Applications, pp. 79–87. SCITEPRESS—Science and and Technology Publications, Berlin (2015)

  16. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv:1703.06870 [cs] (2018)

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016). https://doi.org/10.1109/cvpr.2016.90

  18. He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single Shot Text Detector with Regional Attention. arXiv:1709.00138 [cs] (2017)

  19. He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00527

  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  21. Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: Exploiting Word Annotations for Character based Text Detection. arXiv:1708.06720 [cs] (2017)

  22. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on Robust reading. In: 13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015—Conference Proceedings, pp. 1156–1160. IEEE Computer Society (2015). https://doi.org/10.1109/ICDAR.2015.7333942

  23. Kasar, T., Ramakrishnan, A.: Multi-script and multi-oriented text localization from scene images. pp. 1–14 (2012). https://doi.org/10.1007/978-3-642-29364-1_1

  24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)

  25. Kostro, D., Zasso, M. (2020). https://github.com/image-js/mrz-detection

  26. Lee, H., Kwak, N.: Character recognition for the machine reader zone of electronic identity cards. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 387–391 (2015). https://doi.org/10.1109/ICIP.2015.7350826

  27. Liao, M., Shi, B., Bai, X.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018). https://doi.org/10.1109/TIP.2018.2825107

    Article  MathSciNet  MATH  Google Scholar 

  28. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time Scene Text Detection with Differentiable Binarization. arXiv:1911.08947 [cs] (2019)

  29. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Lecture Notes in Computer Science, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  30. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: Fast oriented text spotting with a unified network. arXiv:1801.01671 [cs] (2018)

  31. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)

  32. Liu, Y., He, T., Chen, H., Wang, X., Luo, C., Zhang, S., Shen, C., Jin, L.: Exploring the capacity of an orderless box discretization network for multi-orientation scene text detection. arXiv:1912.09629 [cs] (2020)

  33. Liu, Z., Sarkar, S.: Robust outdoor text detection using text intensity and shape features

  34. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  35. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. arXiv:1807.01544 [cs] (2018)

  36. Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. arXiv:1807.02242 [cs] (2018)

  37. Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)

  38. Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. arXiv:1802.08948 [cs] (2018)

  39. Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Iwamura, M., Shafait, F. (eds) Camera-Based Document Analysis and Recognition, Lecture Notes in Computer Science, pp. 29–41. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-29364-1_3

  40. Minetto, R., Thome, N., Cord, M., Stolfi, J., Précioso, F., Guyomard, J., Leite, N.: Text detection and recognition in urban scenes. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 227–234 (2011). https://doi.org/10.1109/ICCVW.2011.6130247

  41. Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: Proceedings of the British Machine Vision Conference 2012, pp. 127.1–127.11. British Machine Vision Association, Surrey (2012). https://doi.org/10.5244/C.26.127. http://www.bmva.org/bmvc/2012/BMVC/paper127/index.html

  42. Neumann, L., Matas, J.: Real-time scene text localization and recognition. pp. 3538–3545 (2012). https://doi.org/10.1109/CVPR.2012.6248097

  43. Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011). https://doi.org/10.1109/TIP.2010.2070803

    Article  MathSciNet  MATH  Google Scholar 

  44. Petrova, O., Bulatov, K.: Methods of machine-readable zone recognition results post-processing. In: Eleventh International Conference on Machine Vision (ICMV 2018), vol. 11041, p. 110411H. International Society for Optics and Photonics (2019). https://doi.org/10.1117/12.2522792

  45. SakuraRiven (2020). https://github.com/SakuraRiven/EAST

  46. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00474

  47. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks (2013)

  48. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. arXiv:1703.06520 [cs] (2017)

  49. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)

  50. Smith, R.: An overview of the tesseract ocr engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633. IEEE (2007)

  51. Tretyakov, K.: PassportEye: Extraction of machine-readable zone information from passports, visas and id-cards via OCR (2016). https://github.com/konstantint/PassportEye

  52. Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Advances in Neural Information Processing Systems, pp. 335–344 (2017)

  53. Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. arXiv:1903.12473 [cs] (2019)

  54. Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. arXiv:1910.07954 [cs] (2019)

  55. Xu, Y., Duan, J., Kuang, Z., Yue, X., Sun, H., Guan, Y., Zhang, W.: Geometry normalization networks for accurate scene text detection. arXiv:1909.00794 [cs] (2019)

  56. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. arXiv:1604.04018 [cs] (2016)

  57. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. arXiv:1704.03155 [cs] (2017)

  58. Zhu, K.H., Qi, F.H., Jiang, R.J., Xu, L.: Automatic character detection and segmentation in natural scene images. J. Zhejiang Univ. Sci. A 8, 63–71 (2007). https://doi.org/10.1631/jzus.2007.A0063

    Article  Google Scholar 

Download references

Funding

This research was supported by Lendbuzz.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yichuan Liu.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Code availability

Not available.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article was revised due to update in second author name.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Joren, H., Gupta, O. et al. MRZ code extraction from visa and passport documents using convolutional neural networks. IJDAR 25, 29–39 (2022). https://doi.org/10.1007/s10032-021-00384-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-021-00384-2

Keywords

Navigation