Abstract
Online video stabilization is important for hand-held camera shooting or remote robots control. Existing methods either need use the whole video to perform offline stabilization and result in long latency, or dismiss the nonuniform motion field in each frame and lead to large distortion. The non-uniform motion includes dynamic foreground motion and non-planar background motion. To better describe the shaky motion field online, we propose a novel attentive and multi-scale regression and refinement framework called ACP-Net. It exploits the idea of modeling camera motion on progressive levels, consisting of a flow-guided quiescent attention (FQA) module and a cascaded pyramid prediction (CPP) module. FQA module takes optical flow as an extra input and generates a soft mask to remedy the disturbance from dynamic foreground objects. Based on the attentive feature, the CPP module utilizes a multi-scale residual pyramid structure to do coarse to fine stabilization. Experimental results on public benchmarks show that our proposed method can achieve state-of-the-art performance both qualitatively and quantitatively, comparing to both online and offline methods.
Y. Xu and Q. Zhang—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Choi, J., Kweon, I.S.: Deep iterative frame interpolation for full-frame video stabilization. ACM Trans. Graph. (TOG) 39(1), 1–9 (2020)
Dosovitskiy, A., et al.: An image is worth 16\(\, \times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gleicher, M.L., Liu, F.: Re-cinematography: Improving the camerawork of casual video. ACM Trans. Multimedia Comput. Commun. Appl. 5(1), 1–28 (2008)
Goldstein, A., Fattal, R.: Video stabilization using Epipolar geometry. ACM Trans. Graph. (TOG) 31(5), 1–10 (2012)
Grundmann, M., Kwatra, V., Essa, I.: Auto-directed video stabilization with robust L1 optimal camera paths. In: CVPR 2011, pp. 225–232. IEEE (2011)
Huang, C.H., Yin, H., Tai, Y.W., Tang, C.K.: Stablenet: semi-online, multi-scale deep video stabilization. arXiv preprint arXiv:1907.10283 (2019)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, F., Gleicher, M., Jin, H., Agarwala, A.: Content-preserving warps for 3D video stabilization. ACM Trans. Graph. (TOG) 28(3), 1–9 (2009)
Liu, F., Gleicher, M., Wang, J., Jin, H., Agarwala, A.: Subspace video stabilization. ACM Trans. Graph. (TOG) 30(1), 1–10 (2011)
Liu, S., Tan, P., Yuan, L., Sun, J., Zeng, B.: MeshFlow: minimum latency online video stabilization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 800–815. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_48
Liu, S., Yuan, L., Tan, P., Sun, J.: Bundled camera paths for video stabilization. ACM Trans. Graph. (TOG) 32(4), 1–10 (2013)
Liu, S., Yuan, L., Tan, P., Sun, J.: SteadyFlow: spatially smooth optical flow for video stabilization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4209–4216 (2014)
Matsushita, Y., Ofek, E., Ge, W., Tang, X., Shum, H.Y.: Full-frame video stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1150–1163 (2006)
Roberto e Souza, M., Maia, H.D.A., Pedrini, H.: Survey on digital video stabilization: concepts, methods, and challenges. ACM Comput. Surv. (CSUR) 55(3), 1–37 (2022)
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Wang, M., et al.: Deep online video stabilization with multi-grid warping transformation learning. IEEE Trans. Image Process. 28(5), 2283–2292 (2018)
Wang, Y.S., Liu, F., Hsu, P.S., Lee, T.Y.: Spatially and temporally optimized video stabilization. IEEE Trans. Vis. Comput. Graph. 19(8), 1354–1361 (2013)
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Xu, H., Zhang, J., Cai, J., Rezatofighi, H., Tao, D.: Gmflow: Learning optical flow via global matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8121–8130 (2022)
Xu, S.Z., Hu, J., Wang, M., Mu, T.J., Hu, S.M.: Deep video stabilization using adversarial networks. In: Computer Graphics Forum, vol. 37, pp. 267–276. Wiley Online Library (2018)
Xu, Y., Zhang, J., Maybank, S.J., Tao, D.: DUT: learning video stabilization by simply watching unstable videos. IEEE Trans. Image Process. 31, 4306–4320 (2022)
Xu, Y., Zhang, J., Tao, D.: Out-of-boundary view synthesis towards full-frame video stabilization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4842–4851 (2021)
Xu, Y., Zhang, J., Zhang, Q., Tao, D.: ViTPose: simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484 (2022)
Xu, Y., Zhang, Q., Zhang, J., Tao, D.: ViTAE: vision transformer advanced by exploring intrinsic inductive bias. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Yu, J., Ramamoorthi, R.: Selfie video stabilization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 551–566 (2018)
Yu, J., Ramamoorthi, R.: Robust video stabilization by optimization in CNN weight space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3800–3808 (2019)
Yu, J., Ramamoorthi, R.: Learning video stabilization using optical flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8159–8167 (2020)
Zhang, J., Tao, D.: Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J. 8(10), 7789–7817 (2020)
Zhang, L., Chen, X.Q., Kong, X.Y., Huang, H.: Geodesic video stabilization in transformation space. IEEE Trans. Image Process. 26(5), 2219–2229 (2017)
Zhang, Q., Xu, Y., Zhang, J., Tao, D.: ViTAEv2: vision transformer advanced by exploring inductive bias for image recognition and beyond. arXiv preprint arXiv:2202.10108 (2022)
Zhang, Q., Xu, Y., Zhang, J., Tao, D.: VSA: learning varied-size window attention in vision transformers. arXiv preprint arXiv:2204.08446 (2022)
Zhao, M., Ling, Q.: PWStableNet: learning pixel-wise warping maps for video stabilization. IEEE Trans. Image Process. 29, 3582–3595 (2020)
Acknowledgement
Mr Yufei Xu, Mr Qiming Zhang, and Dr Jing Zhang are supported in part by ARC FL-170100117 and IH-180100002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Y., Zhang, Q., Zhang, J., Tao, D. (2022). Attentive Cascaded Pyramid Network for Online Video Stabilization. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-20497-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)