×

Loose-limbed people: estimating 3D human pose and motion using non-parametric belief propagation. (English) Zbl 1254.68283

Summary: We formulate the problem of 3D human pose estimation and tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected body-parts. In particular, we model the body using an undirected graphical model in which nodes correspond to parts and edges to kinematic, penetration, and temporal constraints imposed by the joints and the world. These constraints are encoded using pair-wise statistical distributions, that are learned from motion-capture training data. Human pose and motion estimation is formulated as inference in this graphical model and is solved using particle message passing (PaMPas). PaMPas is a form of non-parametric belief propagation that uses a variation of particle filtering that can be applied over a general graphical model with loops. The loose-limbed model and decentralized graph structure allow us to incorporate information from “bottom-up” visual cues, such as limb and head detectors, into the inference process. These detectors enable automatic initialization and aid recovery from transient tracking failures. We illustrate the method by automatically tracking people in multi-view imagery using a set of calibrated cameras and present quantitative evaluation using the HumanEva dataset.

MSC:

68T45 Machine vision and scene understanding

Software:

HumanEva; OpenCV

References:

[1] Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58. · doi:10.1109/TPAMI.2006.21
[2] Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: people detection and articulated pose estimation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
[3] Balan, A., Sigal, L., & Black, M. J. (2005). A quantitative evaluation of video-based 3D person tracking. In IEEE workshop on visual surveillance and performance evaluation of tracking and surveillance (pp. 349–356). October 2005.
[4] Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6, 1345–1382. · Zbl 1190.62116
[5] Bergtholdt, M., Kappes, J., Schmidt, S., & Schnorr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117. · doi:10.1007/s11263-009-0209-1
[6] Bhatia, S., Sigal, L., Isard, M., & Black, M. J. (2004). 3D human limb detection using space carving and multi-view eigen models. In IEEE Workshop on articulated and nonrigid motion, CVPR’04 CDROM proceedings.
[7] Bregler, C., & Malik, J. (1998). Tracking people with twists and exponential maps. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 8–15).
[8] Bo, L., Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2008). Fast algorithms for large scale conditional 3D prediction. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
[9] Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 679–714. · doi:10.1109/TPAMI.1986.4767851
[10] Cham, T.-J., & Rehg, J. (1999). A multiple hypothesis approach to figure tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 239–245).
[11] Cheung, G. K. M., Baker, S., & Kanade, T. (2003). Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 77–84).
[12] Choo, K., & Fleet, D. J. (2001). People tracking with hybrid Monte Carlo. In IEEE international conference on computer vision (ICCV) (Vol. 2, pp. 321–328).
[13] Cochran, W. G. (1977). Sampling techniques. New York: Wiley. · Zbl 0353.62011
[14] Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. · doi:10.1109/34.1000236
[15] Cooper, G. (1990). The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42, 393–405. · Zbl 0717.68080 · doi:10.1016/0004-3702(90)90060-D
[16] Corazza, S., Muendermann, L., Chaudhari, A., Demattio, T., Cobelli, C., & Andriacchi, T. (2006). A markerless motion capture system to study musculoskeletal biomechanics: visual hull and simulated annealing approach. Annals of Biomedical Engineering 34(6), 1019–1029. · doi:10.1007/s10439-006-9122-8
[17] Deutscher, J., & Reid, I. D. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision 61(2), 185–205 · doi:10.1023/B:VISI.0000043757.18370.9c
[18] Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 126–133).
[19] Deutscher, J., Isard, M., & MacCormick, J. (2002). Automatic camera calibration from a single Manhattan image. In European conference on computer vision (ECCV) (Vol. 4, pp. 175–188). · Zbl 1039.68617
[20] Doucet, A., Godsill, S. J., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208. · doi:10.1023/A:1008935410038
[21] Doucet, A., de Freitas, N., & Gordon, N. (2001). Sequential Monte Carlo methods in practice. In Statistics for engineering and information sciences. Berlin: Springer. · Zbl 0967.00022
[22] Eichner, M., & Ferrari, V. (2009). Better appearance models for pictorial structures. In British machine vision conference (BMVC).
[23] Elgammal, A., & Lee, C. (2004). Inferring 3D body pose from silhouettes using activity manifold learning. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 681–688).
[24] Elidan, G., McGraw, I., & Koller, D. (2006). Residual belief propagation: Informed scheduling for asynchronous message passing. In Proceedings of the twenty-second conference on uncertainty in AI (UAI), July 2006.
[25] Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision 61(1), 55–79 · doi:10.1023/B:VISI.0000042934.15159.49
[26] Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 264–271).
[27] Fischler, M., & Elschlager, R. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1), 67–92. · doi:10.1109/T-C.1973.223602
[28] Foley, J., van Dam, A., Feiner, S., & Hughes, J. (1990). Computer graphics: Principles and practice. Reading: Addison Wesley. ISBN:0-201-12110-7 · Zbl 0875.68891
[29] Forsyth, D. A., Arikan, O., Ikemoto, L., O’Brien, J., & Ramanan, D. (2006). Computational studies of human motion: Part 1, tracking and motion synthesis. ISBN:1-933019-30-1, 178 pp.
[30] Gall, J., Potthoff, J., Schnoerr, C., Rosenhahn, B., & Seidel, H.-P. (2006). Interacting annealing particle filters: Mathematics and a recipe for applications (Technical Report MPI-I-2006-4-009). Saarbruecken, Germany, September 2006.
[31] Gall, J., Rosenhahn, B., & Seidel, H.-P. (2007). Clustered stochastic optimization for object recognition and pose estimation. In LNCS: Vol. 4713. Annual symposium of the German association for pattern recognition (DAGM) (pp. 32–41).
[32] Gall, J., Rosenhahn, B., Brox, T., & Seidel, H.-P. (2010). Optimization and filtering for human motion capture–A multi-layer framework. International Journal of Computer Vision, 87(1), 75–92. · doi:10.1007/s11263-008-0173-1
[33] Gavrila, D. (1999). The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73(1), 82–98. · Zbl 0924.68174 · doi:10.1006/cviu.1998.0716
[34] Gavrila, D., & Davis, L. (1996). 3-D model-based tracking of humans in action: A multi-view approach. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 73–80).
[35] Grauman, K., Shakhnarovich, G., & Darrell, T. (2003). Inferring 3D structure with a statistical image-based shape model. In IEEE International conference on computer vision (ICCV) (pp. 641–648).
[36] Guan, P., Weiss, A., Balan, A., & Black, M. J. (2009). Estimating human shape and pose from a single image. In IEEE International Conference on computer vision (ICCV).
[37] Hinton, G. E. (1976). Using relaxation to find a puppet. In Proceeding of the A.I.S.B. Summer conference (pp. 148–157).
[38] Hogg, D. C. (1983). Model-based vision: A program to see a walking person. Image and Vision Computing, 1, 5–20. · doi:10.1016/0262-8856(83)90003-3
[39] Horaud, R., Niskanen, M., Dewaele, G., & Boyer, E. (2008). Human motion tracking by registering an articulated surface to 3-D points and normals. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).
[40] Hua, G., Yang, M.-H., & Wu, Y. (2005). Learning to estimate human pose with data driven belief propagation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp, 747–754).
[41] Ihler, A. T., Sudderth, E. B., Freeman, W. T., & Willsky, A. S. (2003). Efficient multiscale sampling from products of Gaussian mixtures. Advances in Neural Information Processing Systems, 16, 1–8.
[42] Intel Open Source Computer Vision Library. Available at http://www.intel.com/research/mrl/research/opencv/ .
[43] Ioffe, S., & Forsyth, D. (2001a). Human tracking with mixtures of trees. In IEEE international conference on computer vision (ICCV) (Vol. 1, pp. 690–695).
[44] Ioffe, S., & Forsyth, D. (2001b). Probabilistic methods for finding people. International Journal of Computer Vision, 43(1), 45–68. · Zbl 0972.68605 · doi:10.1023/A:1011179004708
[45] Isard, M. (2003). Pampas: Real-valued graphical models for computer vision. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 613–620).
[46] John, V., Ivekovic, S., & Trucco, E. (2009). Articulated human motion tracking with HPSO. In International conference on computer vision theory and applications (VISSAPP) (pp. 531–538).
[47] Jordan, M. I., Sejnowski, T. J., & Poggio, T. (2001). Graphical models: Foundations of neural computation. Cambridge: MIT Press. · Zbl 1058.68097
[48] Ju, S., Black, M. J., & Yacoob, Y. (1996). Cardboard people: A parameterized model of articulated motion. In International conference on automatic face and gesture recognition (pp. 38–44).
[49] Kakadiaris, I. A., & Metaxas, D. (1996). Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 81–87).
[50] Kehl, R., Bray, M., & Gool, L. V. (2005). Full body tracking from multiple views using stochastic sampling. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 129–136).
[51] Kinoshita, K., Ma, Y., Lao, S., & Kawade, M. (2006). A fast and robust 3D head pose and gaze estimation system. In International conference on multimodal interfaces (ICMI) (pp. 137–138).
[52] Kirkpatrick, S., Gellatt, C., & Vecchi, M. (1982). Optimisation by simulated annealing (Technical report). IBM Thomas J. Watson Research Centre, Yorktown Heights, NY, USA.
[53] Knossow, D., Ronfard, R., & Horaud, R. (2008). Human motion tracking with a kinematic parameterization of extremal contours. International Journal of Computer Vision, 79(2), 247–269. · doi:10.1007/s11263-007-0116-2
[54] Koller, D., Lerner, U., & Angelov, D. (1999). A general algorithm for approximate inference and its application to hybrid Bayes nets. In Proceedings of the 15th annual conference on uncertainty in artificial intelligence (pp. 324–333).
[55] Lan, X., & Huttenlocher, D. (2005). Beyond trees: Common factor models for 2D human pose recovery. In IEEE international conference on computer vision (ICCV) (pp. 470–477).
[56] Lee, C.-S., & Elgammal, A. (2007). Modeling view and posture manifold for tracking. In IEEE international conference on computer vision (ICCV).
[57] Li, R., Yang, M.-H., Sclaroff, S., & Tian, T.-P. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In European conference on computer vision (ECCV) (Vol. 2, pp. 137–150).
[58] Li, R., Tian, T.-P., & Sclaroff, S. (2007). Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. In IEEE international conference on computer vision (ICCV).
[59] Lu, Z., Perpinan, M. C., & Sminchisescu, C. (2007). People tracking with the Laplacian eigenmaps latent variable model. Advances in Neural Information Processing Systems (NIPS).
[60] MacCormick, J., & Isard, M. (2000). Partitioned sampling, articulated objects, and interface-quality hand tracking. In European conference on computer vision (ECCV) (Vol. 2, pp. 3–19).
[61] Marr, D., & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three dimensional structure. Proceedings of the Royal Society of London. Series B, Biological Sciences, 200, 269–294. · doi:10.1098/rspb.1978.0020
[62] Moeslund, T., & Granum, E. (2001). A survey of computer vision-based human motion capture. Computer Vision and Image Understanding, 81(3), 231–268. · Zbl 1011.68548 · doi:10.1006/cviu.2000.0897
[63] Mori, G., Ren, X., Efros, A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 326–333).
[64] Navaratnam, R., Fitzgibbon, A., & Cipolla, R. (2007). Semi-supervised joint manifold learning for multi-valued regression. In IEEE international conference on computer vision (ICCV).
[65] Nevatia, R., & Binford, T. O. (1973). Structured descriptions of complex objects. In Proc. 3rd international joint conference on artificial intelligence (pp. 641–647).
[66] Opelt, A., Pinz, A., & Zisserman, A. (2006). A boundary-fragment-model for object detection. In European conference on computer vision (ECCV) (Vol. 2, pp. 575–588).
[67] Poppe, R. W. (2007a). Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108(1–2), 4–18. · doi:10.1016/j.cviu.2006.10.016
[68] Poppe, R. (2007b). Evaluating example-based pose estimation: experiments on the HumanEva sets. In Workshop on evaluation of articulated human motion and pose estimation (EHuM2).
[69] Ramanan, D., Forsyth, D., & Zisserman, A. (2005). Strike a pose: Tracking people by finding stylized poses. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 271–278).
[70] Ramanan, D., & Forsyth, D. (2003). Finding and tracking people from the bottom up. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 467–474).
[71] Rodgers, J., Anguelov, D., Pang, H.-C., & Koller, D. (2006). Object pose detection in range scan data. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 2445–2452).
[72] Rosales, R., & Sclaroff, S. (2000). Inferring body pose without tracking body parts. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 721–727).
[73] Rosales, R., & Sclaroff, S. (2002). Learning body pose via specialized maps. Advances in Neural Information Processing Systems, 15, 1263–1270.
[74] Rosenhahn, B., Schmaltz, C., Brox, T., Weickert, J., Cremers, D., & Seidel, H.-P. (2008). Markerless motion capture of man-machine interaction. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
[75] Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter sensitive hashing. In IEEE international conference on computer vision (ICCV) (Vol. 2, pp. 750–757).
[76] Siddiqui, M., & Medioni, G. (2006). Robust real-time upper body limb detection and tracking. In ACM international workshop on video surveillance & sensor networks (VSSN).
[77] Sidenbladh, H., & Black, M. J. (2003). Learning the statistics of people in images and video. International Journal of Computer Vision, 54(1–3), 183–209. · Zbl 1077.68881 · doi:10.1023/A:1023765619733
[78] Sidenbladh, H., Black, M. J., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision (ECCV) (Vol. 2, pp. 702–718).
[79] Sigal, L., & Black, M. J. (2006a). Predicting 3D people from 2D pictures. In LNCS: Vol. 4069. AMDO 2006–IV conference on articulated motion and deformable objects, Mallorca, Spain, July (pp. 185–195).
[80] Sigal, L., & Black, M. J. (2006b). Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 2041–2048).
[81] Sigal, L., Zhu, Y., Comaniciu, D., & Black, M. J. (2004a). Tracking complex objects using graphical object models. In LNCS: Vol. 3417. 1st international workshop on complex motion (pp. 227–238). Berlin: Springer.
[82] Sigal, L., Bhatia, S., Roth, S., Black, M. J., & Isard, M. (2004b). Tracking loose-limbed people. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 421–428).
[83] Sigal, L., Balan, A., & Black, M. J. (2007). Combined discriminative and generative articulated pose and non-rigid shape estimation. Advances in Neural Information Processing Systems (NIPS).
[84] Sigal, L., Balan, A., & Black, M. J. (2010). HumanEva synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87(1/2), 4–27. · doi:10.1007/s11263-009-0273-6
[85] Sminchisescu, C., & Triggs, B. (2003). Estimating articulated human motion with covariance scaled sampling. The International Journal of Robotics Research, 22(6), 371–393. · doi:10.1177/0278364903022006003
[86] Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Discriminative density propagation for 3D human motion estimation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 390–397).
[87] Sminchisescu, C., Kanajujia, A., & Metaxas, D. (2006). Learning joint top-down and bottom-up processes for 3D visual inference. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 1743–1752).
[88] Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 605–612).
[89] Sudderth, E., Mandel, M., Freeman, W., & Willsky, A. (2004). Distributed occlusion reasoning for tracking with nonparametric belief propagation. Advances in Neural Information Processing Systems, 17, 1369–1376.
[90] Sun, J., Shum, H., & Zheng, N. (2002). Stereo matching using belief propagation. In European conference on computer vision (ECCV) (pp. 510–524). · Zbl 1039.68730
[91] Tian, T.-P., & Sclaroff, S. (2010). Fast globally optimal 2D human detection with loopy graph models. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
[92] Urtasun, R., & Darrell, T. (2008). Local probabilistic regression for activity-independent human pose inference. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
[93] Urtasun, R., Fleet, D. J., & Fua, P. (2006). Gaussian process dynamical models for 3D people tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 238–245).
[94] Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 511–518).
[95] Wachter, S., & Nagel, H. (1999). Tracking of persons in monocular image sequences. Computer Vision and Image Understanding, 74(3), 174–192. · doi:10.1006/cviu.1999.0758
[96] Wainwright, M., Jaakkola, T., & Willsky, A. (2001). Tree-based reparameterization for approximate estimation on loopy graphs. Advances in Neural Information Processing Systems (NIPS), 1001–1008.
[97] Wang, P., & Rehg, J. M. (2006). A modular approach to the analysis and evaluation of particle filters for figure tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 790–797).
[98] Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In European conference on computer vision (ECCV) (pp. 18–32).
[99] Weiss, Y., & Freeman, W. T. (2001). Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation, 13, 2173–2200. · Zbl 0992.68055 · doi:10.1162/089976601750541769
[100] Wu, Y., Hua, G., & Yu, T. (2003). Tracking articulated body by dynamic Markov network. In IEEE international conference on computer vision (ICCV) (pp. 1094–1101).
[101] Wywill, G., & Kunii, T. L. (1985). A functional model for constructive solid geometry. The Visual Computer, 1(1), 3–14. · doi:10.1007/BF01901265
[102] Xu, X., & Li, B. (2007). Learning motion correlation for tracking articulated human body with a rao-blackwellised particle filter. In IEEE international conference on computer vision (ICCV).
[103] Yonemoto, S., Arita, D., & Taniguchi, R. (2000). Real-time human motion analysis and IK-based human figure control. In Proceedings of the workshop on human motion (HUMO).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.