Abstract
Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cao, Y., Eggermont, P.P.B., Terebey, S.: Cross Burg entropy maximization and its application to ringing suppression in image reconstruction. IEEE Transactions on Image Processing 8(2), 286–292 (1999)
Cemgil, A.T.: Bayesian inference for nonnegative matrix factorisation models. Computational Intelligence and Neuroscience (Article ID 785152), 17 pages (2009); doi:10.1155/2009/785152
Févotte, C.: Itakura-Saito nonnegative factorizations of the power spectrogram for music signal decomposition. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems, ch. 11. IGI Global Press (August 2010), http://perso.telecom-paristech.fr/~fevotte/Chapters/isnmf.pdf
Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Computation 21(3), 793–830 (2009), http://www.tsi.enst.fr/~fevotte/Journals/neco09_is-nmf.pdf
FitzGerald, D., Cranitch, M., Coyle, E.: Non-negative tensor factorisation for sound source separation. In: Proc. of the Irish Signals and Systems Conference, Dublin, Ireland (September 2005)
FitzGerald, D., Cranitch, M., Coyle, E.: Extended nonnegative tensor factorisation models for musical sound source separation. Computational Intelligence and Neuroscience (Article ID 872425), 15 pages (2008)
Helén, M., Virtanen, T.: Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: Proc. 13th European Signal Processing Conference (EUSIPCO 2005) (2005)
Lee, D.D., Seung, H.S.: Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)
Neeser, F.D., Massey, J.L.: Proper complex random processes with applications to information theory. IEEE Transactions on Information Theory 39(4), 1293–1302 (1993)
Ozerov, A., Févotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech and Language Processing 18(3), 550–563 (2010), http://www.tsi.enst.fr/~fevotte/Journals/ieee_asl_multinmf.pdf
Parry, R.M., Essa, I.: Estimating the spatial position of spectral components in audio. In: Rosca, J.P., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds.) ICA 2006. LNCS, vol. 3889, pp. 666–673. Springer, Heidelberg (2006)
Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: Proc. 22nd International Conference on Machine Learning, pp. 792–799. ACM, Bonn (2005)
Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging 1(2), 113–122 (1982)
Smaragdis, P.: Convolutive speech bases and their application to speech separation. IEEE Transactions on Audio, Speech, and Language Processing 15(1), 1–12 (2007)
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2003) (October 2003)
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech and Language Processing 14(4), 1462–1469 (2006), http://www.tsi.enst.fr/~fevotte/Journals/ieee_asl_bsseval.pdf
Vincent, E., Sawada, H., Bofill, P., Makino, S., Rosca, J.P.: First stereo audio source separation evaluation campaign: Data, algorithms and results. In: Davies, M.E., James, C.J., Abdallah, S.A., Plumbley, M.D. (eds.) ICA 2007. LNCS, vol. 4666, pp. 552–559. Springer, Heidelberg (2007)
Vincent, E., Araki, S., Bofill, P.: Signal Separation Evaluation Campaign. In: (SiSEC 2008) / Under-determined speech and music mixtures task results (2008), http://www.irisa.fr/metiss/SiSEC08/SiSEC_underdetermined/dev2_eval.html
Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing 15(3), 1066–1074 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Févotte, C., Ozerov, A. (2011). Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues. In: Ystad, S., Aramaki, M., Kronland-Martinet, R., Jensen, K. (eds) Exploring Music Contents. CMMR 2010. Lecture Notes in Computer Science, vol 6684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23126-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-23126-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23125-4
Online ISBN: 978-3-642-23126-1
eBook Packages: Computer ScienceComputer Science (R0)