×

Source localization for multiple speech sources using low complexity non-parametric source separation and clustering. (English) Zbl 1217.94067

Summary: This article presents a new method for localization of multiple concurrent speech sources that relies on simultaneous blind signal separation and direction of arrival (DOA) estimation, as well as a method to solve the intersection point selection problem that arises when locating multiple speech sources using multiple sensor arrays. The proposed method is based on a low complexity non-parametric blind signal separation method, making is suitable for real-time applications on embedded platforms. On top of reduced complexity in comparison to a previously presented method, the DOA estimation accuracy is also improved. Evaluation of the performance is done with both real recording and simulations, and a real-time prototype of the proposed method has been implemented on a DSP platform to evaluate the computational and the memory complexities in a real application.

MSC:

94A12 Signal theory (characterization, reconstruction, filtering, etc.)

References:

[1] Yılmaz, Ö.; Rickard, S.: Blind separation of speech mixtures via time-frequency masking, IEEE transactions on signal processing 52, No. 7, 1830-1847 (2004) · Zbl 1369.94383
[2] Araki, S.; Sawada, H.; Mukai, R.; Makino, S.: A novel blind source separation method with observation vector clustering, (2005)
[3] Cermak, J.; Araki, S.; Sawada, H.; Makino, S.: Blind speech separation by combining beamformers and a time frequency binary mask, (2006)
[4] Araki, S.; Makino, S.; Blin, A.; Mukai, R.; Sawada, H.: Underdetermined blind separation for speech in real environments with sparseness and ICA, Proceedings of IEEE international conference on acoustic, speech and signal processing 3, iii/881-iii/884 (2004)
[5] Sawada, H.; Araki, S.; Mukai, R.; Makino, S.: Blind extraction of dominant target sources using ICA and time-frequency masking, IEEE transactions on audio, speech, and language processing 14, No. 6, 2165-2173 (2006)
[6] Araki, S.; Makino, S.; Sawada, H.; Mukai, R.: Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask, Proceedings of IEEE international conference on acoustic, speech and signal processing 3, iii/81-iii/84 (2005)
[7] Araki, S.; Sawada, H.; Mukai, R.; Makino, S.: Blind sparse source separation with spatially smoothed time-frequency masking, (2006) · Zbl 1178.94108
[8] Swartling, M.; Grbić, N.; Claesson, I.: Direction of arrival estimation for multiple speakers using time-frequency orthogonal signal separation, Proceedings of IEEE international conference on acoustic, speech and signal processing 4, 833-836 (2006)
[9] Brandstein, M. S.; Adcock, J. E.; Silverman, H. F.: A closed-form location estimator for use with room environment microphone arrays, IEEE transactions on speech and audio processing 5, No. 1, 45-50 (1997)
[10] Swartling, M.; Nilsson, M.; Grbić, N.: Distinguishing true and false source locations when localizing multiple concurrent speech sources, , 361-364 (2008)
[11] Di Claudio, E. D.; Parisi, R.; Orlandi, G.: Multi-source localization in reverberant environments by ROOT-MUSIC and clustering, Proceedings of IEEE international conference on acoustic, speech and signal processing 2, 921-924 (2000)
[12] Nishiura, T.; Yamada, T.; Nakamura, S.; Shikano, K.: Localization of multiple sound sources based on a CSP analysis with a microphone array, Proceedings of IEEE international conference on acoustic, speech and signal processing 2, 1053-1056 (2000)
[13] Balan, R.; Rosca, J.; Rickard, S.; O’ruanaidh, J.: The influence of windowing of time delay estimates, Proceedings of conference on information sciences and systems 1, 15-17 (2000)
[14] Knapp, C. H.; Carter, G. C.: The generalized cross correlation method for estimation of time delay, IEEE transactions on acoustics, speech, and signal processing ASSP 24, No. 4, 320-327 (1976)
[15] , Microphone arrays: signal processing techniques and applications (2001)
[16] Zhang, C.; Florêncio, D.; Zhang, Z.: Why does PHAT work well in low noise, reverberative environments?, , 2565-2568 (2008)
[17] Rickard, S.; Balan, R.; Rosca, J.: Real-time time-frequency based blind source separation, , 651-656 (2001)
[18] Allen, J. B.; Berkley, D. A.: Image method for efficiently simulating small-room acoustics, Journal of the acoustical society of America 65, No. 4, 943-950 (1979)
[19] Vaidyanathan, P. P.: Multirate systems and filter banks, (1993) · Zbl 0784.93096
[20] Yiu, K. F. C.; Grbić, N.; Nordholm, S.; Teo, K. L.: Multi-criteria design of oversampled uniform DFT filter banks, IEEE signal processing letters 11, No. 6, 541-544 (2004)
[21] Forsythe, G. E.; Malcolm, M. A.; Moler, C. B.: Computer methods for mathematical computations, (1977) · Zbl 0361.65002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.