A rotation robust shape transformer for cartoon character recognition

211 Accesses
1 Citation
Explore all metrics

Abstract

Recognizing cartoon characters accurately is important for animators to design and create cartoon scenarios by utilizing existing cartoon materials. Current deep learning approaches are sensitive to image rotation and heavily rely on rich textures that rarely exist in cartoon figures. In order to address this problem, the focus of our work is on the distinct nature of shapes, which mostly encodes the geometric structure of contours, rendering more discriminative and robust features than textures. We propose a rotation robust shape transformer for cartoon character recognition. As the filters in deep learning hardly detect discriminative gradient information in cartoon figures, we leverage multi-scale shape context (SC) to obtain the geometry of contour sampling points other than differences in gray level. Further, we propose a rotation-invariant positional encoding to depict the geometric relations of local shape features. The contributions of the different scales of SC templates are learned by attention-based transformer encoder. The obtained network is able to learn shape information effectively from cartoon contours only. The simplistic design attains surprisingly nearly 100% recognition accuracy, which beats both handcrafted and deep learning methods on the proposed challenging Cartoon dataset and traditional datasets. In particular, we gain 86.19% recognition accuracy on rotation test set, rendering an overwhelming superiority of 58.30 percentage higher than the state-of-the-art methods. Moreover, we develop an online cartoon character recognition application for animation scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 6

Fig. 7

Facial Animation Based on 2D Shape Regression

Generative character inpainting guided by structural information

Article 30 June 2021

CG Animation Creator: Auto-rendering of Motion Stick Figure Based on Conditional Adversarial Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated or analyzed during the current study are available on Google drive (https://drive.google.com/drive/folders/1vhw907BYVosw7wMKmhD7CAe4x0NbenIG?usp=sharing).

Notes

The histogram is one of the most commonly used.
We provide the introduction video of the application in the supplement material.
We provide more instances in supplement material.

References

Yu, J., Liu, D., Tao, D., Seah, H.S.: On combining multiple features for cartoon character retrieval and clip synthesis. IEEE Trans. Syst. Man Cybern. B (Cybern.) 42(5), 1413–1427 (2012)
Article Google Scholar
Rios, E.A., Cheng, W.-H., Lai, B.-C.: Daf: re: A challenging, crowd-sourced, large-scale, long-tailed dataset for anime character recognition. arXiv preprint arXiv:2101.08674 (2021)
Dosovitskiy, A., et al.: An image is worth $16 \times 16$ words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Wang, X., et al.: Domain selectivity in the parahippocampal gyrus is predicted by the same structural connectivity patterns in blind and sighted individuals. J. Neurosci. 37(18), 4705–4716 (2017)
Article Google Scholar
Geirhos, R., et al.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)
Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. Adv. Neural Inf. Process. Syst. 13, 831–837 (2001)
Google Scholar
Shekar, B., Pilar, B., Kittler, J.: An unification of inner distance shape context and local binary pattern for shape representation and classification. In: Proceedings of the 2nd International Conference on Perception and Machine Intelligence, pp. 46–55 (2015)
Wang, X., Feng, B., Bai, X., Liu, W., Latecki, L.J.: Bag of contour fragments for robust shape classification. Pattern Recognit. 47, 2116–2125 (2014)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1106–1114 (2012)
Google Scholar
Li, Y., Lao, L., Cui, Z., Shan, S., Yang, J.: Graph jigsaw learning for cartoon face recognition. arXiv:2107.06532 (2021)
Ritter, S., Barrett, D.G., Santoro, A., Botvinick, M.M.: Cognitive psychology for deep neural networks: a shape bias case study. In: International conference on machine learning, pp. 2940–2949 (2017)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. Adv. Neural Inf. Process. Syst. 30, 3856–3866 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Haseyama, M., Matsumura, A.: A cartoon character retrieval system including trainable scheme. In: Proceedings 2003 International Conference on Image Processing (Cat. No. 03CH37429), vol. 3, pp. III–37 (2003)
Hu, R., Jia, W., Ling, H., Zhao, Y., Gui, J.: Angular pattern and binary angular pattern for shape retrieval. IEEE Trans. Image Process. 23, 1118–1127 (2014)
Article MathSciNet Google Scholar
Wang, J., Bai, X., You, X., Liu, W., Latecki, L.J.: Shape matching and classification using height functions. Pattern Recognit. Lett. 33, 134–143 (2012)
Article Google Scholar
Jia, Q., et al.: Hierarchical projective invariant contexts for shape recognition. Pattern Recognit. 52, 358–374 (2016)
Article Google Scholar
Chen, S., Xia, R., Zhao, J., Chen, Y., Hu, M.: A hybrid method for ellipse detection in industrial images. Pattern Recognit. 68, 82–98 (2017)
Article Google Scholar
Micusik, B., Wildenauer, H.: Structure from motion with line segments under relaxed endpoint constraints. Int. J. Comput. Vis. 124, 65–79 (2017)
Article MathSciNet Google Scholar
Yu, Q., et al.: Sketch me that shoe. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 799–807 (2016)
Sarvadevabhatla, R.K., Kundu, J., Babu, R.V.: Enabling my robot to play pictionary: recurrent neural networks for sketch recognition. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 247–251 (2016)
Yu, Q., et al.: Sketch-a-net: a deep neural network that beats humans. Int. J. Comput. Vis. 122, 411–425 (2017)
Article MathSciNet Google Scholar
Wang, T.-Q., Liu, C.-L.: Fully convolutional network based skeletonization for handwritten Chinese characters. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2540–2547 (2018)
Lee, S.H., Chan, C.S., Mayo, S.J., Remagnino, P.: How deep learning extracts and learns leaf features for plant classification. Pattern Recognit. 71, 1–13 (2017)
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: Deep learning on point sets for 3d classification and segmentation. In: CVPR, pp. 652–660 (2017)
Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1017–1024 (2011)
Xu, P., et al.: SketchMate: deep hashing for million-scale human sketch retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8090–8098 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Miyagi, R., Aono, M.: Sliced voxel representations with LSTM and CNN for 3D shape recognition. In: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 320–323. IEEE (2017)
Dai, G., Xie, J., Fang, Y.: Siamese CNN-BiLSTM architecture for 3D shape representation learning. In: IJCAI, pp. 670–676 (2018)
Carion, N., et al.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020)
Touvron, H., et al.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Wang, X., Girshick, R., Gupta, A. He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp. 1243–1252. PMLR (2017)
Vaswani, A., Guyon, I., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) NIPS, vol. 30. Curran Associates Inc., Red Hook (2017)
Google Scholar
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL (2018)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.: Attentive language models beyond a fixed-length context, Transformer-xl. arXiv:1901.02860 (2019)
Chu, X., Zhang, B., Tian, Z., Wei, X., Xia, H.: Do we really need explicit position encodings for vision transformers. arXiv preprint arXiv:2102.10882 (2021)
Srinivas, A., et al.: Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16519–16529 (2021)
Söderkvist, O.: Computer vision classification of leaves from Swedish trees. Master’s thesis (2001)
Bai, X., Liu, W., Tu, Z.: Integrating contour and skeleton for shape classification. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 360–367 (2009)
Li, F.-F., Andreetto, M., Ranzato, M.A.: Caltech101 image dataset. http://www.vision.caltech.edu/Image_Datasets/Caltech101/ (2003)
Simard, P.Y., Steinkraus, D., Platt, J.C., et al.: Best practices for convolutional neural networks applied to visual document analysis. In: Icdar, vol. 3 (2003)
Hu, R.-X., Jia, W., Zhao, Y., Gui, J.: Perceptually motivated morphological strategies for shape retrieval. Pattern Recognit. 45, 3222–3230 (2012)
Article Google Scholar
Sirin, Y., Demirci, M.F.: 2D and 3D shape retrieval using skeleton filling rate. Multimed. Tools Appl. 76, 7823–7848 (2017)
Article Google Scholar
Shen, W., Du, C., Jiang, Y., Zeng, D., Zhang, Z.: Bag of shape features with a learned pooling function for shape recognition. Pattern Recognit. Lett. 106, 33–40 (2018)
Article Google Scholar
Ling, H., Jacobs, D.W.: Shape classification using the inner-distance. IEEE Trans. Pattern Anal. Mach. Intell. 29, 286–299 (2007)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of China under Grant 62272083 and Grant 61876030, in part by the Liaoning Provincial Natural Science Foundation under Grant 2022-MS-128, in part by the Fundamental Research Funds for the Central Universities DUT23YG109, and in part by the U.S. National Science Foundation under Grant I IS-1814745.

Author information

Authors and Affiliations

International School of Information Science and Engineering, Dalian University of Technology, Dalian, China
Qi Jia, Yi Wang & Xin Fan
School of Software Engineering, Dalian University of Technology, Dalian, China
Xinyu Chen
Department of Computer Science, Stony Brook University, Stony Brook, USA
Haibin Ling
Department of Computer and Information Sciences, Temple University, Philadelphia, USA
Longin Jan Latecki

Authors

Qi Jia
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Fan
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Ling
View author publications
You can also search for this author in PubMed Google Scholar
Longin Jan Latecki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Fan.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “A Rotation Robust Shape Transformer for Cartoon Character Recognition.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 8730 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jia, Q., Chen, X., Wang, Y. et al. A rotation robust shape transformer for cartoon character recognition. Vis Comput 40, 5575–5588 (2024). https://doi.org/10.1007/s00371-023-03123-2

Download citation

Accepted: 23 September 2023
Published: 27 October 2023
Issue Date: August 2024
DOI: https://doi.org/10.1007/s00371-023-03123-2

A rotation robust shape transformer for cartoon character recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Facial Animation Based on 2D Shape Regression

Generative character inpainting guided by structural information

CG Animation Creator: Auto-rendering of Motion Stick Figure Based on Conditional Adversarial Learning

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A rotation robust shape transformer for cartoon character recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Facial Animation Based on 2D Shape Regression

Generative character inpainting guided by structural information

CG Animation Creator: Auto-rendering of Motion Stick Figure Based on Conditional Adversarial Learning

Explore related subjects

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation