×

Nonlocal regularized CNN for image segmentation. (English) Zbl 1451.68307

Summary: Non-local dependency is a very important prior for many image segmentation tasks. Generally, convolutional operations are building blocks that process one local neighborhood at a time which means the convolutional neural networks (CNNs) usually do not explicitly make use of the non-local prior on image segmentation tasks. Though the pooling and dilated convolution techniques can enlarge the receptive field to use some nonlocal information during the feature extracting step, there is no nonlocal priori for feature classification step in the current CNNs’ architectures. In this paper, we present a non-local total variation (TV) regularized softmax activation function method for semantic image segmentation tasks. The proposed method can be integrated into the architecture of CNNs. To handle the difficulty of back-propagation for CNNs due to the non-smoothness of nonlocal TV, we develop a primal-dual hybrid gradient method to realize the back-propagation of non-local TV in CNNs. Experimental evaluations of the non-local TV regularized softmax layer on a series of image segmentation datasets showcase its good performance. Many CNNs can benefit from our proposed method on image segmentation tasks.

MSC:

68U10 Computing methodologies for image processing
49M37 Numerical methods based on nonlinear programming

References:

[1] R. Adams; L. Bischof, Seeded region growing, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 641-647 (1994) · doi:10.1109/34.295913
[2] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha and V. K. Asari, Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation, arXiv: 1802.06955.
[3] V. Badrinarayanan, A. Kendall and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, arXiv: 1511.00561.
[4] L. Barghout and L. Lee, Perceptual information processing system, US Patent App. 10/618,543, (2004).
[5] M. Benning; C. Brune; M. Burger; J. Müller, Higher-order tv methods-enhancement via bregman iteration, Journal of Scientific Computing, 54, 269-310 (2013) · Zbl 1308.94012 · doi:10.1007/s10915-012-9650-3
[6] H. Birkholz, A unifying approach to isotropic and anisotropic total variation denoising models, Journal of Computational and Applied Mathematics, 235, 2502-2514 (2011) · Zbl 1207.94005 · doi:10.1016/j.cam.2010.11.003
[7] J. Canny, A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 679-698 (1986) · doi:10.1016/B978-0-08-051581-6.50024-6
[8] G. Gilboa; S. Osher, Nonlocal operators with applications to image processing, Multiscale Modeling & Simulation, 7, 1005-1028 (2008) · Zbl 1181.35006 · doi:10.1137/070698592
[9] K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in Proceedings of the IEEE International Conference on Computer Vision, IEEE, 2015, 1026-1034.
[10] F. Jia, J. Liu and X. Tai, A regularized convolutional neural network for semantic image segmentation, Analysis and Applications, (2020) 1-19.
[11] M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen and R. Vasudevan, Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?, preprint, arXiv: 1610.01983.
[12] M. Kass, A. Witkin and D. Terzopoulos, Snakes: Active contour models, International Journal of Computer Vision, 1, (1988) 321-331. · Zbl 0646.68105
[13] P. Krähenbühl and V. Koltun, Efficient inference in fully connected crfs with gaussian edge potentials., Advances in Neural Information Processing Systems, (2011), 109-117.
[14] A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, (2012), 1097-1105.
[15] Y. LeCun; B. Boser; J. S. Denker; D. Henderson; R. E. Howard; W. Hubbard; L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1, 541-551 (1989) · doi:10.1162/neco.1989.1.4.541
[16] G. Lin, C. Shen, A. V. D. Hengel and I. Reid, Efficient piecewise training of deep structured models for semantic segmentation, in Proceedings of the IEEE Conference on Computer Cision and Pattern Recognition, IEEE, 2016, 3194-3203.
[17] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2015, 3431-3440.
[18] M. Lysaker, A. Lundervold and X.-C. Tai, Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time, IEEE Transactions on Image Processing, 12, (2003), 1579-1590. · Zbl 1286.94020
[19] D. R. Martin; C. C. Fowlkes; and J. Malik, Learning to detect natural image boundaries using local brightness, color, and texture cues, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 530-549 (2004) · doi:10.1109/TPAMI.2004.1273918
[20] K. Mikula, A. Sarti and F. Sgallari, Co-volume level set method in subjective surface based medical image segmentation, in Handbook of Biomedical Image Analysis, Springer, (2005), 583-626.
[21] D. Mumford; J. Shah, Optimal approximations by piecewise smooth functions and associated variational problems, Communications on Pure and Applied Mathematics, 42, 577-685 (1989) · Zbl 0691.49036 · doi:10.1002/cpa.3160420503
[22] H. Noh, S. Hong and B. Han, Learning deconvolution network for semantic segmentation, in Proceedings of the IEEE International Conference on Computer Vision, IEEE, 2015, 1520-1528.
[23] O. Oktay, et al., Attention u-net: Learning where to look for the pancreas, preprint, arXiv: 1804.03999.
[24] N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man and Cybernetics, 9, 62-66 (1979) · doi:10.1109/TSMC.1979.4310076
[25] O. Ronneberger, P. Fischer and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2015,234-241.
[26] L. I. Rudin; S. Osher; E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60, 259-268 (1992) · Zbl 0780.49028 · doi:10.1016/0167-2789(92)90242-F
[27] L. Shapiro and G. C. Stockman, Computer Vision, Prentice Hall, 2001.
[28] L. Shapiro and G. C. Stockman, Computer Vision, Prentice Hall, 2001.
[29] J. Shi; J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888-908 (2000)
[30] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
[31] M. Unger, T. Mauthner, T. Pock and H. Bischof, Tracking as segmentation of spatial-temporal volumes by anisotropic weighted tv, in International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, Springer 2009,193-206.
[32] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, Understanding convolution for semantic segmentation, in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2018, 1451-1460. · Zbl 1387.62080
[33] K. Wei, K. Yin, X.-C. Tai and T. F. Chan, New region force for variational models in image segmentation and high dimensional data clustering, preprint, arXiv: 1704.08218. · Zbl 1419.62159 · doi:10.1007/s10915-017-0429-4
[34] K. Yin; X.-C. Tai, An effective region force for some variational models for learning and clustering, Journal of Scientific Computing, 74, 175-196 (2018) · Zbl 1419.62159 · doi:10.1007/s10915-017-0429-4
[35] F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, preprint, arXiv: 1511.07122.
[36] L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, Advances in Neural Information Processing Systems, (2005), 1601-1608. · doi:10.1016/j.micron.2018.01.010
[37] X. Zheng; Y. Wang; G. Wang; J. Liu, Fast and robust segmentation of white blood cell images by self-supervised learning, Micron, 107, 55-71 (2018) · doi:10.1016/j.micron.2018.01.010
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.