×

Hierarchical multidimensional scaling for the comparison of musical performance styles. (English) Zbl 1485.62162

Summary: Quantification of stylistic differences between musical artists is of academic interest to the music community and is also useful for other applications, such as music information retrieval and recommendation systems. Information about stylistic differences can be obtained by comparing the performances of different artists across common musical pieces. In this article we develop a statistical methodology for identifying and quantifying systematic stylistic differences among artists that are consistent across audio recordings of a common set of pieces, in terms of several musical features. Our focus is on a comparison of 10 different orchestras, based on data from audio recordings of the nine Beethoven symphonies. As generative or fully parametric models of raw audio data can be highly complex and more complex than necessary for our goal of identifying differences between orchestras, we propose to reduce the data from a set of audio recordings down to pairwise distances between orchestras, based on different musical characteristics of the recordings, such as tempo, dynamics and timbre. For each of these characteristics, we obtain multiple pairwise distance matrices, one for each movement of each symphony. We develop a hierarchical multidimensional scaling (HMDS) model to identify and quantify systematic differences between orchestras in terms of these three musical characteristics and interpret the results in the context of known qualitative information about the orchestras. This methodology is able to recover several expected systematic similarities between orchestras as well as to identify some more novel results. For example, we find that modern recordings exhibit a high degree of similarity to each other, as compared to older recordings.

MSC:

62P15 Applications of statistics to psychology
62F15 Bayesian inference
62R10 Functional data analysis
91C15 One- and multidimensional scaling in the social and behavioral sciences

References:

[1] Anderson, M. J. (2001). A new method for non-parametric multivariate analysis. Austral Ecology 26 32-46.
[2] Bakker, R. and Poole, K. T. (2013). Bayesian metric multidimensional scaling. Polit. Anal. 21 125-140.
[3] Borg, I., Groenen, P. J. F. and Mair, P. (2018). Applied Multidimensional Scaling and Unfolding. 2nd ed. SpringerBriefs in Statistics. Springer, Cham. · Zbl 1409.62006
[4] Carroll, J. D. and Chang, J.-J. (1970). Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika 35 283-319. · Zbl 0202.19101 · doi:10.1007/BF02310791
[5] Cook, N. (2005). Towards the compleat musicologist? Invited Talk at the International Symposium on Music Information Retrieval.
[6] Desain, P. and Honing, H. (1994). Does expressive timing in music performance scale proportionally with tempo? Psychol. Res. 56 285-292.
[7] Ellis, D. (2007). Chroma feature analysis and synthesis. https://labrosa.ee.columbia.edu/matlab/chroma-ansyn/.
[8] Fong, D. K. H., DeSarbo, W. S., Chen, Z. and Xu, Z. (2015). A Bayesian vector multidimensional scaling procedure incorporating dimension reparameterization with variable selection. Psychometrika 80 1043-1065. · Zbl 1329.62460 · doi:10.1007/s11336-015-9449-x
[9] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1 515-533. · Zbl 1331.62139 · doi:10.1214/06-BA117A
[10] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). Bayesian Data Analysis, 3rd ed. Texts in Statistical Science Series. CRC Press, Boca Raton, FL. · Zbl 1279.62004
[11] Goebl, W. and Widmer, G. (2009). On the use of computational methods for expressive music performance. In Modern Methods for Musicology: Prospects, Proposals and Realities 93-113. Ashgate, London.
[12] Grachten, M., Gasser, M., Arzt, A. and Widmer, G. (2013). Automatic alignment of music performances with structural differences. In Proceedings of the 14th International Society for Music Information Retrieval Conference 607-612.
[13] International Music Score Library Project (2019). IMSLP: Petrucci Music Library. https://imslp.org/wiki/Category:Beethoven,_Ludwig_van.
[14] Kammerl, J., Birkbeck, N., Inguva, S., Kelly, D., Crawford, A. J., Denman, H., Kokaram, A. and Pantofaru, C. (2014). Temporal synchronization of multiple audio signals. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4603-4607.
[15] Kirchhoff, H. and Lerch, A. (2011). Evaluation of features for audio-to-audio alignment. J. New Music Res. 40 27-41.
[16] Kruskal, J. B. (1983). An overview of sequence comparison: Time warps, string edits, and macromolecules. SIAM Rev. 25 201-237. · Zbl 0512.68048 · doi:10.1137/1025045
[17] Kunstderfuge. com (2018). Franz Liszt transcriptions. http://www.kunstderfuge.com/liszt.htm#Transcriptions.
[18] Liebman, E., Ornoy, E. and Chor, B. (2012). A phylogenetic approach to music performance analysis. J. New Music Res. 41 195-222.
[19] Liem, C. C. S. and Hanjalic, A. (2011a). Expressive timing from cross-performance and audio-based alignment patterns: An extended case study. In Proceedings of the 12th International Society for Music Information Retrieval Conference 519-524.
[20] Liem, C. C. S. and Hanjalic, A. (2011b). Expressivity in musical timing in relation to musical structure and interpretation: A cross-performance, audio-based approach. In Proceedings of the 42nd International AES Conference on Semantic Audio 255-264.
[21] Liem, C. C. S. and Hanjalic, A. (2015). Comparative analysis of orchestral performance recordings: An image-based approach. In Proceedings of the 16th International Society for Music Information Retrieval Conference 302-308.
[22] Lin, L. and Fong, D. K. H. (2019). Bayesian multidimensional scaling procedure with variable selection. Comput. Statist. Data Anal. 129 1-13. · Zbl 1469.62105 · doi:10.1016/j.csda.2018.07.007
[23] Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In Proceedings of the 1st International Symposium on Music Information Retrieval.
[24] McArdle, B. H. and Anderson, M. J. (2001). Fitting multivariate models to community data: A comment on distance-based redundancy analysis. Ecology 8 290-297.
[25] Minas, C. and Montana, G. (2014). Distance-based analysis of variance: Approximate inference. Stat. Anal. Data Min. 7 450-470. · Zbl 07260417 · doi:10.1002/sam.11227
[26] Müller, M. (2015). Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, Cham.
[27] Oh, M.-S. and Raftery, A. E. (2001). Bayesian multidimensional scaling and choice of dimension. J. Amer. Statist. Assoc. 96 1031-1044. · Zbl 1072.62543 · doi:10.1198/016214501753208690
[28] Park, J., DeSarbo, W. S. and Liechty, J. (2008). A hierarchical Bayesian multidimensional scaling methodology for accommodating both structural and preference heterogeneity. Psychometrika 73 451-472. · Zbl 1301.62123 · doi:10.1007/s11336-008-9064-1
[29] Penel, A. and Drake, C. (1998). Sources of timing variations in music performance: A psychological segmentation model. Psychol. Res. 61 12-32.
[30] Peperkamp, J., Hildebrandt, K. and Liem, C. C. S. (2017). A formalization of relative local tempo variations in collections of performances. In Proceedings of the 18th International Society for Music Information Retrieval Conference 158-164.
[31] Rizzo, M. L. and Székely, G. J. (2010). DISCO analysis: A nonparametric extension of analysis of variance. Ann. Appl. Stat. 4 1034-1055. · Zbl 1194.62054 · doi:10.1214/09-AOAS245
[32] Sapp, C. S. (2007). Comparative analysis of multiple musical performances. In Proceedings of the 8th International Conference on Music Information Retrieval 497-500.
[33] Sapp, C. S. (2008). Hybrid numeric/rank similarity metrics for musical performance analysis. In Proceedings of the 9th International Conference on Music Information Retrieval 501-506.
[34] Stan Development Team (2019a). RStan: The R interface to Stan. R package version 2.19.2.
[35] Stan Development Team (2019b). Stan Reference Manual, 2.21 ed.
[36] Sueur, J. (2018). Sound Analysis and Synthesis with R. Springer, Berlin.
[37] Sueur, J., Aubin, T., Simonis, C., Lellouch, L., Brown, E. C. et al. (2018). Package ‘seewave’: Sound Analysis and Synthesis. https://cran.r-project.org/web/packages/seewave/seewave.pdf.
[38] Thornburg, H. D., Leistikow, R. J. and Berger, J. (2007). Melody extraction and musical onset detection from framewise STFT peak data. IEEE Trans. Audio Speech Lang. Process. 15 1257-1272.
[39] Torgerson, W. S. (1952). Multidimensional scaling. I. Theory and method. Psychometrika 17 401-419. · Zbl 0049.37603 · doi:10.1007/BF02288916
[40] van den Oord, A., Dieleman, S. and Schrauwen, B. (2013). Deep content-based music recommendation. In Advances in Neural Information Processing Systems 26 (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 2643-2651. Curran Associates, Red Hook.
[41] van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. W. and Kavukcuoglu, K. (2016). WaveNet: A generative model for raw audio. CoRR abs/1609.03499.
[42] van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-SNE. J. Mach. Learn. Res. 9 2579-2605. · Zbl 1225.68219
[43] Vsevolozhskaya, O. A., Zaykin, D. V., Greenwood, M. C., Wei, C. and Lu, Q. (2014). Functional analysis of variance for association studies. PLoS ONE 9 1-13.
[44] Yanchenko, A. K. and Hoff, P. D. (2020). Supplement to “Hierarchical Multidimensional Scaling for the Comparison of Musical Performance Styles.” https://doi.org/10.1214/20-AOAS1391SUPPA, https://doi.org/10
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.