×

Coding of image feature descriptors for distributed rate-efficient visual correspondences. (English) Zbl 1235.68301

Summary: Establishing visual correspondences is a critical step in many computer vision tasks involving multiple views of a scene. In a dynamic environment and when cameras are mobile, visual correspondences need to be updated on a recurring basis. At the same time, the use of wireless links between camera motes imposes tight rate constraints. This combination of issues motivates us to consider the problem of establishing visual correspondences in a distributed fashion between cameras operating under rate constraints. We propose a solution based on constructing distance preserving hashes using binarized random projections. By exploiting the fact that descriptors of regions in correspondence are highly correlated, we propose a novel use of distributed source coding via linear codes on the binary hashes to more efficiently exchange feature descriptors for establishing correspondences across multiple camera views. A systematic approach is used to evaluate rate vs visual correspondences retrieval performance; under a stringent matching criterion, our proposed methods demonstrate superior performance to a baseline scheme employing transform coding of descriptors.

MSC:

68T45 Machine vision and scene understanding

Software:

PCA-SIFT; SIFT
Full Text: DOI

References:

[1] Ahlswede, R., & Csiszár, I. (1981). To get a bit of information may be as hard as to get full information. IEEE Transactions on Information Theory, 27(4), 398–408. · Zbl 0504.94019 · doi:10.1109/TIT.1981.1056381
[2] Avidan, S., & Shashua, A. (1998). Novel view synthesis by cascading trilinear tensors. IEEE Transactions on Visualization and Computer Graphics, 4(4), 293–306. · doi:10.1109/2945.765324
[3] Barton-Sweeney, A., Lymberopoulos, D., & Savvides, A. (2006). Sensor localization and camera calibration in distributed camera sensor networks. In Proc. IEEE basenets.
[4] Berg, A., Berg, T., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondence. In Proc. IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 26–33).
[5] Bickel, P. J., & Doksum, K. A. (2000). Mathematical statistics: basic ideas and selected topics, 2nd edn. (Vol. 1). New York: Prentice Hall. · Zbl 0403.62001
[6] Cai, H., Mikolajczyk, K., & Matas, J. (2008). Learning linear discriminant projections for dimensionality reduction of image descriptors. In Proc. British machine vision conf.
[7] Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S. S., Grzeszczuk, R., & Girod, B. (2009a). CHoG: compressed histogram of gradients. In Conference on computer vision and pattern recognition, Miami, FL, USA (pp. 2504–2511).
[8] Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S. S., Singh, J., & Girod, B. (2009b). Transform coding of image feature descriptors. In Proc. SPIE visual communication and image processing.
[9] Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In Proc. ACM symposium on theory of computing (pp. 380–388). · Zbl 1192.68226
[10] Chen, P. W. C., Ahammad, P., Boyer, C., Huang, S. I., Lin, L., Lobaton, E. J., Meingast, M. L., Oh, S., Wang, S., Yan, P., Yang, A., Yeo, C., Chang, L. C., Tygar, D., & Sastry, S. S. (2008). Citric: a low-bandwidth wireless camera network platform. Tech. Rep. UCB/EECS-2008-50, EECS Department, University of California, Berkeley. http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-50.html .
[11] Cheng, Z., Devarajan, D., & Radke, R. J. (2007). Determining vision graphs for distributed camera networks using feature digests. EURASIP Journal on Advances in Signal Processing, 2007, Article ID 57,034, 11 pages. · Zbl 1168.94320 · doi:10.1155/2007/89691
[12] Cover, T., & Thomas, J. (1991). Elements of information theory. New York: Wiley. · Zbl 0762.94001
[13] Devarajan, D., & Radke, R. J. (2004). Distributed metric calibration of large camera networks. In Proc. workshop on broadband advanced sensor networks.
[14] Downes, I., Rad, L. B., & Aghajan, H. (2006). Development of a mote for wireless image sensor networks. In Proc. COGnitive systems with Interactive Sensors (COGIS).
[15] Ferrari, V., Tuytelaars, T., & Van Gool, L. (2004). Simultaneous object recognition and segmentation by image exploration. In Proc. European conference on computer vision (Vol. 1, pp. 40–54). Berlin: Springer. · Zbl 1098.68761
[16] Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395. http://doi.acm.org/10.1145/358669.358692 . · doi:10.1145/358669.358692
[17] Franke, U., & Joos, A. (2000). Real-time stereo vision for urban traffic scene understanding. In Proc. IEEE intelligent vehicles symposium (pp. 273–278).
[18] Gallager, R. G. (1963). Low-density parity-check codes. Cambridge: MIT Press. · Zbl 0156.40701
[19] Goemans, M. X., & Williamson, D. P. (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 42(6), 1115–1145. · Zbl 0885.68088 · doi:10.1145/227683.227684
[20] Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press. · Zbl 0956.68149
[21] Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on theory of computing (pp. 604–613). New York: ACM. · Zbl 1029.68541
[22] Jain, P., Kulis, B., & Grauman, K. (2008). Fast image search for learned metrics. In IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008 (pp. 1–8).
[23] Körner, J., & Marton, K. (1979). How to encode the modulo-two sum of binary sources. IEEE Transactions on Information Theory, 25(2), 219–221. · Zbl 0401.94017 · doi:10.1109/TIT.1979.1056022
[24] Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In Proc. ACM SIGKDD international conference on knowledge discovery and data mining (pp. 16–22). New York: ACM Press.
[25] Lee, H., & Aghajan, H. (2006). Collaborative node localization in surveillance networks using opportunistic target observations. In Proc. ACM international workshop on video surveillance and sensor networks (pp. 9–18). New York: ACM Press.
[26] Lin, Y. C., Varodayan, D., & Girod, B. (2007). Image authentication based on distributed source coding. In Proc. IEEE international conference on image processing. · Zbl 1372.94160
[27] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. · doi:10.1023/B:VISI.0000029664.99615.94
[28] Ma, Y., Soatto, S., Kosecka, J., & Sastry, S. S. (2004). An invitation to 3-D vision: from images to geometric models. Berlin: Springer. · Zbl 1043.65040
[29] Martinian, E., Yekhanin, S., & Yedidia, J. S. (2005). Secure biometrics via syndromes. In Proc. Allerton conference on communications, control and computing.
[30] Matusik, W., & Pfister, H. (2004). 3D TV: a scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes. ACM Transactions on Graphics, 23(3), 814–824. · doi:10.1145/1015706.1015805
[31] Mikolajczyk, K., & Matas, J. (2007). Improving descriptors for fast tree matching by optimal linear projection. In IEEE 11th international conference on computer vision, 2007 (pp. 1–8).
[32] Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86. · doi:10.1023/B:VISI.0000027790.02288.f2
[33] Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630. · doi:10.1109/TPAMI.2005.188
[34] Oh, S., Schenato, L., Chen, P., & Sastry, S. (2007). Tracking and coordination of multiple agents using sensor networks: system design, algorithms and experiments. Proceedings of the IEEE, 95, 234–254. · doi:10.1109/JPROC.2006.887296
[35] Rahimi, M., Baer, R., Iroezi, O., Garcia, J., Warrior, J., Estrin, D., & Srivastava, M. (2005). Cyclops: in situ image sensing and interpretation. In Proc. ACM conference on embedded networked sensor systems.
[36] Richardson, T. J., & Urbanke, R. L. (2001). The capacity of low-density parity-check codes under message-passing decoding. IEEE Transactions on Information Theory, 47(2), 599–618. · Zbl 1019.94033 · doi:10.1109/18.910577
[37] Roy, S., & Sun, Q. (2007). Robust hash for detecting and localizing image tampering. In Proc. IEEE international conference on image processing.
[38] Salakhutdinov, R., & Hinton, G. (2009). Semantic hashing. International Journal of Approximate Reasoning, 50(7), 969–978. · doi:10.1016/j.ijar.2008.11.006
[39] Schaffalitzky, F., & Zisserman, A. (2002). Multi-view matching for unordered image sets, or ”how do I organize my holiday snaps?”. In Proc. European conference on computer vision (Vol. 1, pp. 414–431). Berlin: Springer. · Zbl 1034.68662
[40] Se, S., Lowe, D., & Little, J. (2002). Global localization using distinctive visual features. In Proc. IEEE/RSJ international conference on intelligent robots and system (Vol. 1).
[41] Shum, H., & Kang, S. B. (2000). A review of image-based rendering techniques. In Proc. SPIE visual communications and image processing (pp. 2–13). Bellingham: SPIE.
[42] Slepian, D., & Wolf, J. (1973). Noiseless coding of correlated information sources. IEEE Transactions on Information Theory, 19(4), 471–480. · Zbl 0259.94008 · doi:10.1109/TIT.1973.1055037
[43] Szewczyk, R., Osterweil, E., Polastre, J., Hamilton, M., Mainwaring, A. M., & Estrin, D. (2004). Habitat monitoring with sensor networks. Communications of the ACM, 47(6), 34–40. · doi:10.1145/990680.990704
[44] Teixeira, T., Lymberopoulos, D., Culurciello, E., Aloimonos, Y., & Savvides, A. (2006). A lightweight camera sensor network operating on symbolic information. In Proc. workshop on distributed smart cameras, Boulder, Colorado.
[45] Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. Advances in Neural Information Processing Systems, 21, 1753–1760.
[46] Winder, S. A. J., & Brown, M. (2007). Learning local image descriptors. In IEEE conference on computer vision and pattern recognition, 2007. CVPR’07 (pp. 1–8).
[47] Wyner, A. D., & Ziv, J. (1976). The rate distortion function for source coding with side information at the decoder. IEEE Transactions on Information Theory, 22(1), 1–10. · Zbl 0324.94010 · doi:10.1109/TIT.1976.1055508
[48] Yeo, C., Ahammad, P., & Ramchandran, K. (2008a). A rate-efficient approach for establishing visual correspondences via distributed source coding. In Proc. SPIE visual communications and image processing. · Zbl 1235.68301
[49] Yeo, C., Ahammad, P., & Ramchandran, K. (2008b). Rate-efficient visual correspondences using random projections. In Proc. IEEE international conference on image processing. · Zbl 1235.68301
[50] Yeo, C., Ahammad, P., Zhang, H., & Ramchandran, K. (2009). Rate-constrained distributed distance testing and its applications. In Proc. IEEE international conference on acoustics, speech, and signal processing.
[51] Zhang, Z. (1998). Determining the epipolar geometry and its uncertainty: a review. International Journal of Computer Vision, 27(2), 161–195. · doi:10.1023/A:1007941100561
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.