-
Inferring the parameters of Taylor's law in ecology
Authors:
Lionel Truquet,
Joel E. Cohen,
Paul Doukhan
Abstract:
Taylor's power law (TL) or fluctuation scaling has been verified empirically for the abundances of many species, human and non-human, and in many other fields including physics, meteorology, computer science, and finance. TL asserts that the variance is directly proportional to a power of the mean, exactly for population moments and, whether or not population moments exist, approximately for sampl…
▽ More
Taylor's power law (TL) or fluctuation scaling has been verified empirically for the abundances of many species, human and non-human, and in many other fields including physics, meteorology, computer science, and finance. TL asserts that the variance is directly proportional to a power of the mean, exactly for population moments and, whether or not population moments exist, approximately for sample moments. In many papers, linear regression of log variance as a function of log mean is used to estimate TL's parameters. We provide some statistical guarantees with large-sample asymptotics for this kind of inference under general conditions, and we derive confidence intervals for the parameters. In many ecological applications, the means and variances are estimated over time or across space from arrays of abundance data collected at different locations and time points. When the ratio between the time-series length and the number of spatial points converges to a constant as both become large, the usual normalized statistics are asymptotically biased. We provide a bias correction to get correct confidence intervals. TL, widely studied in multiple sciences, is a source of challenging new statistical problems in a nonstationary spatiotemporal framework. We illustrate our results with both simulated and real data sets.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Gaps Between Consecutive Primes and the Exponential Distribution
Authors:
Joel E. Cohen
Abstract:
Based on the primes less than $4 \times 10^{18}$, Oliveira e Silva et al. (2014) conjectured an asymptotic formula for the sum of the $k$th power of the gaps between consecutive primes less than a large number $x$. We show that the conjecture of Oliveira e Silva holds if and only if the $k$th moment of the first $n$ gaps is asymptotic to the $k$th moment of an exponential distribution with mean…
▽ More
Based on the primes less than $4 \times 10^{18}$, Oliveira e Silva et al. (2014) conjectured an asymptotic formula for the sum of the $k$th power of the gaps between consecutive primes less than a large number $x$. We show that the conjecture of Oliveira e Silva holds if and only if the $k$th moment of the first $n$ gaps is asymptotic to the $k$th moment of an exponential distribution with mean $\log n$, though the distribution of gaps is not exponential. Asymptotically exponential moments imply that the gaps asymptotically obey Taylor's law of fluctuation scaling: variance of the first $n$ gaps $\sim$ (mean of the first $n$ gaps)$^2$. If the distribution of the first $n$ gaps is asymptotically exponential with mean $\log n$, then the expectation of the largest of the first $n$ gaps is asymptotic to $(\log n)^2$. The largest of the first $n$ gaps is asymptotic to $(\log n)^2$ if and only if the Cramér-Shanks conjecture holds. Numerical counts of gaps and the maximal gap $G_n$ among the first $n$ gaps test these results. While most values of $G_n$ are better approximated by $(\log n)^2$ than by other models, seven values of $n$ with $G_{n} >2 e^{-γ}(\log n)^2$ suggest that $\limsup_{n \to\infty} G_n/[2e^{-γ}(\log n)^2]$ may exceed 1.
△ Less
Submitted 12 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models
Authors:
Jeremy E. Cohen,
Valentin Leplat
Abstract:
Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because o…
▽ More
Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because of the multifactor nature of these models and the lack of theory to back these choices. This paper aims at improving upon these issues. By studying a more general model called the Homogeneous Regularized Scale-Invariant, we prove that the scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects. This observation allows to better understand the effect of regularization functions in low-rank approximation models, to guide the choice of the regularization hyperparameters, and to design balancing strategies to enhance the convergence speed of dedicated optimization algorithms. Some of these results were already known but restricted to specific instances of regularized low-rank approximations. We also derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations, with convergence guarantees. We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.
△ Less
Submitted 8 June, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm
Authors:
Axel Marmoret,
Jérémy E. Cohen,
Frédéric Bimbot
Abstract:
Music Structure Analysis (MSA) is a Music Information Retrieval task consisting of representing a song in a simplified, organized manner by breaking it down into sections typically corresponding to ``chorus'', ``verse'', ``solo'', etc. In this work, we extend an MSA algorithm called the Correlation Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b). The CBM algorithm is a…
▽ More
Music Structure Analysis (MSA) is a Music Information Retrieval task consisting of representing a song in a simplified, organized manner by breaking it down into sections typically corresponding to ``chorus'', ``verse'', ``solo'', etc. In this work, we extend an MSA algorithm called the Correlation Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b). The CBM algorithm is a dynamic programming algorithm that segments self-similarity matrices, which are a standard description used in MSA and in numerous other applications. In this work, self-similarity matrices are computed from the feature representation of an audio signal and time is sampled at the bar-scale. This study examines three different standard similarity functions for the computation of self-similarity matrices. Results show that, in optimal conditions, the proposed algorithm achieves a level of performance which is competitive with supervised state-of-the-art methods while only requiring knowledge of bar positions. In addition, the algorithm is made open-source and is highly customizable.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Generalizations of Bertrand's Postulate to Sums of Any Number of Primes
Authors:
Joel E. Cohen
Abstract:
In 1845, Bertrand conjectured that twice any prime strictly exceeds the next prime. Tchebichef proved Bertrand's postulate in 1850. In 1934, Ishikawa proved a stronger result: the sum of any two consecutive primes strictly exceeds the next prime, except for the only equality $2+3=5$. This observation is a special case of a more general result, perhaps not previously noticed: if $p_n$ denotes the…
▽ More
In 1845, Bertrand conjectured that twice any prime strictly exceeds the next prime. Tchebichef proved Bertrand's postulate in 1850. In 1934, Ishikawa proved a stronger result: the sum of any two consecutive primes strictly exceeds the next prime, except for the only equality $2+3=5$. This observation is a special case of a more general result, perhaps not previously noticed: if $p_n$ denotes the $n$th prime, $n=1, 2, \ldots$, with $p_1=2, p_2=3, \ldots$, and if $c_1, \ldots, c_g$ are nonnegative integers (not necessarily distinct), and $d_1, \ldots, d_h$ are positive integers (not necessarily distinct), and $g>h\ge 1$, then there exists a positive integer $N$ such that $p_{n-c_1}+p_{n-c_2}+\cdots +p_{n-c_g}>p_{n+d_1}+\cdots +p_{n+d_h}$ for all $n\ge N$. We prove this result using only the prime number theorem. For any instance of this result, we sketch a way to find the least possible $N$. We give some numerical results and unanswered questions.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Polytopic Analysis of Music
Authors:
Axel Marmoret,
Jérémy E. Cohen,
Frédéric Bimbot
Abstract:
Structural segmentation of music refers to the task of finding a symbolic representation of the organisation of a song, reducing the musical flow to a partition of non-overlapping segments. Under this definition, the musical structure may not be unique, and may even be ambiguous. One way to resolve that ambiguity is to see this task as a compression process, and to consider the musical structure a…
▽ More
Structural segmentation of music refers to the task of finding a symbolic representation of the organisation of a song, reducing the musical flow to a partition of non-overlapping segments. Under this definition, the musical structure may not be unique, and may even be ambiguous. One way to resolve that ambiguity is to see this task as a compression process, and to consider the musical structure as the optimization of a given compression criteria. In that viewpoint, C. Guichaoua developed a compression-driven model for retrieving the musical structure, based on the "System and Contrast" model, and on polytopes, which are extension of nhypercubes. We present this model, which we call "polytopic analysis of music", along with a new opensource dedicated toolbox called MusicOnPolytopes (in Python). This model is also extended to the use of the Tonnetz as a relation system. Structural segmentation experiments are conducted on the RWC Pop dataset. Results show improvements compared to the previous ones, presented by C. Guichaoua.
△ Less
Submitted 22 December, 2022; v1 submitted 21 December, 2022;
originally announced December 2022.
-
Convolutive Block-Matching Segmentation Algorithm with Application to Music Structure Analysis
Authors:
Axel Marmoret,
Jérémy E. Cohen,
Frédéric Bimbot
Abstract:
Music Structure Analysis (MSA) consists of representing a song in sections (such as ``chorus'', ``verse'', ``solo'' etc), and can be seen as the retrieval of a simplified organization of the song. This work presents a new algorithm, called Convolutive Block-Matching (CBM) algorithm, devoted to MSA. In particular, the CBM algorithm is a dynamic programming algorithm, applying on autosimilarity matr…
▽ More
Music Structure Analysis (MSA) consists of representing a song in sections (such as ``chorus'', ``verse'', ``solo'' etc), and can be seen as the retrieval of a simplified organization of the song. This work presents a new algorithm, called Convolutive Block-Matching (CBM) algorithm, devoted to MSA. In particular, the CBM algorithm is a dynamic programming algorithm, applying on autosimilarity matrices, a standard tool in MSA. In this work, autosimilarity matrices are computed from the feature representation of an audio signal, and time is sampled on the barscale. We study three different similarity functions for the computation of autosimilarity matrices. We report that the proposed algorithm achieves a level of performance competitive to that of supervised State-of-the-Art methods on 3 among 4 metrics, while being unsupervised.
△ Less
Submitted 26 September, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Taylor's Law for some infinitely divisible probabbility distributions from population models
Authors:
Joel E. Cohen,
Thierry E Huillet
Abstract:
In a family of random variables, Taylor's law or Taylor's power law offluctuation scaling is a variance function that gives the variance $σ^{2}>0$ of a random variable (rv) $X$ with expectation $μ>0$ as a powerof $μ$: $σ^{2}=Aμ^{b}$ for finite real $A>0,\ b$ that are thesame for all rvs in the family. Equivalently, TL holds when $\log σ^{2}=a+b\log μ,\ a=\log A$, for all rvs in some set. Here we a…
▽ More
In a family of random variables, Taylor's law or Taylor's power law offluctuation scaling is a variance function that gives the variance $σ^{2}>0$ of a random variable (rv) $X$ with expectation $μ>0$ as a powerof $μ$: $σ^{2}=Aμ^{b}$ for finite real $A>0,\ b$ that are thesame for all rvs in the family. Equivalently, TL holds when $\log σ^{2}=a+b\log μ,\ a=\log A$, for all rvs in some set. Here we analyze thepossible values of the TL exponent $b$ in five families of infinitelydivisible two-parameter distributions and show how the values of $b$ dependon the parameters of these distributions. The five families areTweedie-Bar-Lev-Enis, negative binomial, compound Poisson-geometric,compound geometric-Poisson (or Pólya-Aeppli), and gamma distributions.These families arise frequently in empirical data and population models, and they are limit laws of Markov processes that we exhibit in each case.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Semi-Supervised Convolutive NMF for Automatic Piano Transcription
Authors:
Haoran Wu,
Axel Marmoret,
Jérémy E. Cohen
Abstract:
Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, which focuses on piano transcription, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervis…
▽ More
Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, which focuses on piano transcription, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.
△ Less
Submitted 14 April, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Barwise Compression Schemes for Audio-Based Music Structure Analysis
Authors:
Axel Marmoret,
Jérémy E. Cohen,
Frédéric Bimbot
Abstract:
Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article…
▽ More
Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.
△ Less
Submitted 15 April, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Dictionary-based Low-Rank Approximations and the Mixed Sparse Coding problem
Authors:
Jeremy E. Cohen
Abstract:
Constrained tensor and matrix factorization models allow to extract interpretable patterns from multiway data. Therefore identifiability properties and efficient algorithms for constrained low-rank approximations are nowadays important research topics. This work deals with columns of factor matrices of a low-rank approximation being sparse in a known and possibly overcomplete basis, a model coined…
▽ More
Constrained tensor and matrix factorization models allow to extract interpretable patterns from multiway data. Therefore identifiability properties and efficient algorithms for constrained low-rank approximations are nowadays important research topics. This work deals with columns of factor matrices of a low-rank approximation being sparse in a known and possibly overcomplete basis, a model coined as Dictionary-based Low-Rank Approximation (DLRA). While earlier contributions focused on finding factor columns inside a dictionary of candidate columns, i.e. one-sparse approximations, this work is the first to tackle DLRA with sparsity larger than one. I propose to focus on the sparse-coding subproblem coined Mixed Sparse-Coding (MSC) that emerges when solving DLRA with an alternating optimization strategy. Several algorithms based on sparse-coding heuristics (greedy methods, convex relaxations) are provided to solve MSC. The performance of these heuristics is evaluated on simulated data. Then, I show how to adapt an efficient MSC solver based on the LASSO to compute Dictionary-based Matrix Factorization and Canonical Polyadic Decomposition in the context of hyperspectral image processing and chemometrics. These experiments suggest that DLRA extends the modeling capabilities of low-rank approximations, helps reducing estimation variance and enhances the identifiability and interpretability of estimated factors.
△ Less
Submitted 21 January, 2022; v1 submitted 24 November, 2021;
originally announced November 2021.
-
Exploring single-song autoencoding schemes for audio-based music structure analysis
Authors:
Axel Marmoret,
Jérémy E. Cohen,
Frédéric Bimbot
Abstract:
The ability of deep neural networks to learn complex data relations and representations is established nowadays, but it generally relies on large sets of training data. This work explores a "piece-specific" autoencoding scheme, in which a low-dimensional autoencoder is trained to learn a latent/compressed representation specific to a given song, which can then be used to infer the song structure.…
▽ More
The ability of deep neural networks to learn complex data relations and representations is established nowadays, but it generally relies on large sets of training data. This work explores a "piece-specific" autoencoding scheme, in which a low-dimensional autoencoder is trained to learn a latent/compressed representation specific to a given song, which can then be used to infer the song structure. Such a model does not rely on supervision nor annotations, which are well-known to be tedious to collect and often ambiguous in Music Structure Analysis. We report that the proposed unsupervised auto-encoding scheme achieves the level of performance of supervised state-of-the-art methods with 3 seconds tolerance when using a Log Mel spectrogram representation on the RWC-Pop dataset.
△ Less
Submitted 7 March, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of Audio Signals
Authors:
Axel Marmoret,
Florian Voorwinden,
Valentin Leplat,
Jérémy E. Cohen,
Frédéric Bimbot
Abstract:
Nonnegative Tucker decomposition (NTD), a tensor decomposition model, has received increased interest in the recent years because of its ability to blindly extract meaningful patterns, in particular in Music Information Retrieval. Nevertheless, existing algorithms to compute NTD are mostly designed for the Euclidean loss. This work proposes a multiplicative updates algorithm to compute NTD with th…
▽ More
Nonnegative Tucker decomposition (NTD), a tensor decomposition model, has received increased interest in the recent years because of its ability to blindly extract meaningful patterns, in particular in Music Information Retrieval. Nevertheless, existing algorithms to compute NTD are mostly designed for the Euclidean loss. This work proposes a multiplicative updates algorithm to compute NTD with the beta-divergence loss, often considered a better loss for audio processing. We notably show how to implement efficiently the multiplicative rules using tensor algebra. Finally, we show on a music structure analysis task that unsupervised NTD fitted with beta-divergence loss outperforms earlier results obtained with the Euclidean loss.
△ Less
Submitted 2 August, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
An AO-ADMM approach to constraining PARAFAC2 on all modes
Authors:
Marie Roald,
Carla Schenker,
Vince D. Calhoun,
Tülay Adalı,
Rasmus Bro,
Jeremy E. Cohen,
Evrim Acar
Abstract:
Analyzing multi-way measurements with variations across one mode of the dataset is a challenge in various fields including data mining, neuroscience and chemometrics. For example, measurements may evolve over time or have unaligned time profiles. The PARAFAC2 model has been successfully used to analyze such data by allowing the underlying factor matrices in one mode (i.e., the evolving mode) to ch…
▽ More
Analyzing multi-way measurements with variations across one mode of the dataset is a challenge in various fields including data mining, neuroscience and chemometrics. For example, measurements may evolve over time or have unaligned time profiles. The PARAFAC2 model has been successfully used to analyze such data by allowing the underlying factor matrices in one mode (i.e., the evolving mode) to change across slices. The traditional approach to fit a PARAFAC2 model is to use an alternating least squares-based algorithm, which handles the constant cross-product constraint of the PARAFAC2 model by implicitly estimating the evolving factor matrices. This approach makes imposing regularization on these factor matrices challenging. There is currently no algorithm to flexibly impose such regularization with general penalty functions and hard constraints. In order to address this challenge and to avoid the implicit estimation, in this paper, we propose an algorithm for fitting PARAFAC2 based on alternating optimization with the alternating direction method of multipliers (AO-ADMM). With numerical experiments on simulated data, we show that the proposed PARAFAC2 AO-ADMM approach allows for flexible constraints, recovers the underlying patterns accurately, and is computationally efficient compared to the state-of-the-art. We also apply our model to two real-world datasets from neuroscience and chemometrics, and show that constraining the evolving mode improves the interpretability of the extracted patterns.
△ Less
Submitted 8 July, 2022; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation
Authors:
Axel Marmoret,
Jérémy E. Cohen,
Nancy Bertin,
Frédéric Bimbot
Abstract:
Recent work has proposed the use of tensor decomposition to model repetitions and to separate tracks in loop-based electronic music. The present work investigates further on the ability of Nonnegative Tucker Decompositon (NTD) to uncover musical patterns and structure in pop songs in their audio form. Exploiting the fact that NTD tends to express the content of bars as linear combinations of a few…
▽ More
Recent work has proposed the use of tensor decomposition to model repetitions and to separate tracks in loop-based electronic music. The present work investigates further on the ability of Nonnegative Tucker Decompositon (NTD) to uncover musical patterns and structure in pop songs in their audio form. Exploiting the fact that NTD tends to express the content of bars as linear combinations of a few patterns, we illustrate the ability of the decomposition to capture and single out repeated motifs in the corresponding compressed space, which can be interpreted from a musical viewpoint. The resulting features also turn out to be efficient for structural segmentation, leading to experimental results on the RWC Pop data set which are potentially challenging state-of-the-art approaches that rely on extensive example-based learning schemes.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
PARAFAC2 AO-ADMM: Constraints in all modes
Authors:
Marie Roald,
Carla Schenker,
Jeremy E. Cohen,
Evrim Acar
Abstract:
The PARAFAC2 model provides a flexible alternative to the popular CANDECOMP/PARAFAC (CP) model for tensor decompositions. Unlike CP, PARAFAC2 allows factor matrices in one mode (i.e., evolving mode) to change across tensor slices, which has proven useful for applications in different domains such as chemometrics, and neuroscience. However, the evolving mode of the PARAFAC2 model is traditionally m…
▽ More
The PARAFAC2 model provides a flexible alternative to the popular CANDECOMP/PARAFAC (CP) model for tensor decompositions. Unlike CP, PARAFAC2 allows factor matrices in one mode (i.e., evolving mode) to change across tensor slices, which has proven useful for applications in different domains such as chemometrics, and neuroscience. However, the evolving mode of the PARAFAC2 model is traditionally modelled implicitly, which makes it challenging to regularise it. Currently, the only way to apply regularisation on that mode is with a flexible coupling approach, which finds the solution through regularised least-squares subproblems. In this work, we instead propose an alternating direction method of multipliers (ADMM)-based algorithm for fitting PARAFAC2 and widen the possible regularisation penalties to any proximable function. Our numerical experiments demonstrate that the proposed ADMM-based approach for PARAFAC2 can accurately recover the underlying components from simulated data while being both computationally efficient and flexible in terms of imposing constraints.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions
Authors:
Todd Huster,
Jeremy E. J. Cohen,
Zinan Lin,
Kevin Chan,
Charles Kamhoua,
Nandi Leslie,
Cho-Yu Jason Chiang,
Vyas Sekar
Abstract:
Generative adversarial networks (GANs) are often billed as "universal distribution learners", but precisely what distributions they can represent and learn is still an open question. Heavy-tailed distributions are prevalent in many different domains such as financial risk-assessment, physics, and epidemiology. We observe that existing GAN architectures do a poor job of matching the asymptotic beha…
▽ More
Generative adversarial networks (GANs) are often billed as "universal distribution learners", but precisely what distributions they can represent and learn is still an open question. Heavy-tailed distributions are prevalent in many different domains such as financial risk-assessment, physics, and epidemiology. We observe that existing GAN architectures do a poor job of matching the asymptotic behavior of heavy-tailed distributions, a problem that we show stems from their construction. Additionally, when faced with the infinite moments and large distances between outlier points that are characteristic of heavy-tailed distributions, common loss functions produce unstable or near-zero gradients. We address these problems with the Pareto GAN. A Pareto GAN leverages extreme value theory and the functional properties of neural networks to learn a distribution that matches the asymptotic behavior of the marginal distributions of the features. We identify issues with standard loss functions and propose the use of alternative metric spaces that enable stable and efficient learning. Finally, we evaluate our proposed approach on a variety of heavy-tailed datasets.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
Matrix-wise $\ell_0$-constrained Sparse Nonnegative Least Squares
Authors:
Nicolas Nadisic,
Jeremy E Cohen,
Arnaud Vandaele,
Nicolas Gillis
Abstract:
Nonnegative least squares problems with multiple right-hand sides (MNNLS) arise in models that rely on additive linear combinations. In particular, they are at the core of most nonnegative matrix factorization algorithms and have many applications. The nonnegativity constraint is known to naturally favor sparsity, that is, solutions with few non-zero entries. However, it is often useful to further…
▽ More
Nonnegative least squares problems with multiple right-hand sides (MNNLS) arise in models that rely on additive linear combinations. In particular, they are at the core of most nonnegative matrix factorization algorithms and have many applications. The nonnegativity constraint is known to naturally favor sparsity, that is, solutions with few non-zero entries. However, it is often useful to further enhance this sparsity, as it improves the interpretability of the results and helps reducing noise, which leads to the sparse MNNLS problem. In this paper, as opposed to most previous works that enforce sparsity column- or row-wise, we first introduce a novel formulation for sparse MNNLS, with a matrix-wise sparsity constraint. Then, we present a two-step algorithm to tackle this problem. The first step divides sparse MNNLS in subproblems, one per column of the original problem. It then uses different algorithms to produce, either exactly or approximately, a Pareto front for each subproblem, that is, to produce a set of solutions representing different tradeoffs between reconstruction error and sparsity. The second step selects solutions among these Pareto fronts in order to build a sparsity-constrained matrix that minimizes the reconstruction error. We perform experiments on facial and hyperspectral images, and we show that our proposed two-step approach provides more accurate results than state-of-the-art sparse coding heuristics applied both column-wise and globally.
△ Less
Submitted 22 June, 2022; v1 submitted 22 November, 2020;
originally announced November 2020.
-
Continuous dictionaries meet low-rank tensor approximations
Authors:
Clement Elvira,
Jeremy E. Cohen,
Cedric Herzet,
Remi Gribonval
Abstract:
In this short paper we bridge two seemingly unrelated sparse approximation topics: continuous sparse coding and low-rank approximations. We show that for a specific choice of continuous dictionary, linear systems with nuclear-norm regularization have the same solutions as a BLasso problem. Although this fact was already partially understood in the matrix case, we further show that for tensor data,…
▽ More
In this short paper we bridge two seemingly unrelated sparse approximation topics: continuous sparse coding and low-rank approximations. We show that for a specific choice of continuous dictionary, linear systems with nuclear-norm regularization have the same solutions as a BLasso problem. Although this fact was already partially understood in the matrix case, we further show that for tensor data, using BLasso solvers for the low-rank approximation problem leads to a new branch of optimization methods yet vastly unexplored. In particular, the proposed Frank-Wolfe algorithm is showcased on an automatic tensor rank selection problem.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
A Flexible Optimization Framework for Regularized Matrix-Tensor Factorizations with Linear Couplings
Authors:
Carla Schenker,
Jeremy E. Cohen,
Evrim Acar
Abstract:
Coupled matrix and tensor factorizations (CMTF) are frequently used to jointly analyze data from multiple sources, also called data fusion. However, different characteristics of datasets stemming from multiple sources pose many challenges in data fusion and require to employ various regularizations, constraints, loss functions and different types of coupling structures between datasets. In this pa…
▽ More
Coupled matrix and tensor factorizations (CMTF) are frequently used to jointly analyze data from multiple sources, also called data fusion. However, different characteristics of datasets stemming from multiple sources pose many challenges in data fusion and require to employ various regularizations, constraints, loss functions and different types of coupling structures between datasets. In this paper, we propose a flexible algorithmic framework for coupled matrix and tensor factorizations which utilizes Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM). The framework facilitates the use of a variety of constraints, loss functions and couplings with linear transformations in a seamless way. Numerical experiments on simulated and real datasets demonstrate that the proposed approach is accurate, and computationally efficient with comparable or better performance than available CMTF methods for Frobenius norm loss, while being more flexible. Using Kullback-Leibler divergence on count data, we demonstrate that the algorithm yields accurate results also for other loss functions.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Nonconcavity of the Spectral Radius in Levinger's Theorem
Authors:
Lee Altenberg,
Joel E. Cohen
Abstract:
Let ${\bf A} \in R^{n \times n}$ be a nonnegative irreducible square matrix and let $r({\bf A})$ be its spectral radius and Perron-Frobenius eigenvalue. Levinger asserted and several have proven that $r(t):=r((1{-}t) {\bf A} + t {\bf A}^\top)$ increases over $t \in [0,1/2]$ and decreases over $t \in [1/2,1]$. It has further been stated that $r(t)$ is concave over $t \in (0,1)$. Here we show that t…
▽ More
Let ${\bf A} \in R^{n \times n}$ be a nonnegative irreducible square matrix and let $r({\bf A})$ be its spectral radius and Perron-Frobenius eigenvalue. Levinger asserted and several have proven that $r(t):=r((1{-}t) {\bf A} + t {\bf A}^\top)$ increases over $t \in [0,1/2]$ and decreases over $t \in [1/2,1]$. It has further been stated that $r(t)$ is concave over $t \in (0,1)$. Here we show that the latter claim is false in general through a number of counterexamples, but prove it is true for ${\bf A} \in R^{2\times 2}$, weighted shift matrices (but not cyclic weighted shift matrices), tridiagonal Toeplitz matrices, and the 3-parameter Toeplitz matrices from Fiedler, but not Toeplitz matrices in general. A general characterization of the range of $t$, or the class of matrices, for which the spectral radius is concave in Levinger's homotopy remains an open problem.
△ Less
Submitted 19 August, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Sparse Separable Nonnegative Matrix Factorization
Authors:
Nicolas Nadisic,
Arnaud Vandaele,
Jeremy E. Cohen,
Nicolas Gillis
Abstract:
We propose a new variant of nonnegative matrix factorization (NMF), combining separability and sparsity assumptions. Separability requires that the columns of the first NMF factor are equal to columns of the input matrix, while sparsity requires that the columns of the second NMF factor are sparse. We call this variant sparse separable NMF (SSNMF), which we prove to be NP-complete, as opposed to s…
▽ More
We propose a new variant of nonnegative matrix factorization (NMF), combining separability and sparsity assumptions. Separability requires that the columns of the first NMF factor are equal to columns of the input matrix, while sparsity requires that the columns of the second NMF factor are sparse. We call this variant sparse separable NMF (SSNMF), which we prove to be NP-complete, as opposed to separable NMF which can be solved in polynomial time. The main motivation to consider this new model is to handle underdetermined blind source separation problems, such as multispectral image unmixing. We introduce an algorithm to solve SSNMF, based on the successive nonnegative projection algorithm (SNPA, an effective algorithm for separable NMF), and an exact sparse nonnegative least squares solver. We prove that, in noiseless settings and under mild assumptions, our algorithm recovers the true underlying sources. This is illustrated by experiments on synthetic data sets and the unmixing of a multispectral image.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Computing the proximal operator of the $\ell_1$ induced matrix norm
Authors:
Jeremy E. Cohen
Abstract:
In this short article, for any matrix $X\in\mathbb{R}^{n\times m}$ the proximity operator of two induced norms $ \|X\|_1 $ and $ \|X\|_{\infty}$ are derived. Although no close form expression is obtained, an algorithmic procedure is described which costs roughly $\mathcal{O}(nm)$. This algorithm relies on a bisection on a real parameter derived from the Karush-Kuhn-Tucker conditions, following the…
▽ More
In this short article, for any matrix $X\in\mathbb{R}^{n\times m}$ the proximity operator of two induced norms $ \|X\|_1 $ and $ \|X\|_{\infty}$ are derived. Although no close form expression is obtained, an algorithmic procedure is described which costs roughly $\mathcal{O}(nm)$. This algorithm relies on a bisection on a real parameter derived from the Karush-Kuhn-Tucker conditions, following the proof idea of the proximal operator of the $ \max $ function found in Parikh(2014).
△ Less
Submitted 15 January, 2021; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Accelerating Block Coordinate Descent for Nonnegative Tensor Factorization
Authors:
Andersen Man Shun Ang,
Jeremy E. Cohen,
Nicolas Gillis,
Le Thi Khanh Hien
Abstract:
This paper is concerned with improving the empirical convergence speed of block-coordinate descent algorithms for approximate nonnegative tensor factorization (NTF). We propose an extrapolation strategy in-between block updates, referred to as heuristic extrapolation with restarts (HER). HER significantly accelerates the empirical convergence speed of most existing block-coordinate algorithms for…
▽ More
This paper is concerned with improving the empirical convergence speed of block-coordinate descent algorithms for approximate nonnegative tensor factorization (NTF). We propose an extrapolation strategy in-between block updates, referred to as heuristic extrapolation with restarts (HER). HER significantly accelerates the empirical convergence speed of most existing block-coordinate algorithms for dense NTF, in particular for challenging computational scenarios, while requiring a negligible additional computational budget.
△ Less
Submitted 20 November, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Universal Lipschitz Approximation in Bounded Depth Neural Networks
Authors:
Jeremy E. J. Cohen,
Todd Huster,
Ra Cohen
Abstract:
Adversarial attacks against machine learning models are a rather hefty obstacle to our increasing reliance on these models. Due to this, provably robust (certified) machine learning models are a major topic of interest. Lipschitz continuous models present a promising approach to solving this problem. By leveraging the expressive power of a variant of neural networks which maintain low Lipschitz co…
▽ More
Adversarial attacks against machine learning models are a rather hefty obstacle to our increasing reliance on these models. Due to this, provably robust (certified) machine learning models are a major topic of interest. Lipschitz continuous models present a promising approach to solving this problem. By leveraging the expressive power of a variant of neural networks which maintain low Lipschitz constants, we prove that three layer neural networks using the FullSort activation function are Universal Lipschitz function Approximators (ULAs). This both explains experimental results and paves the way for the creation of better certified models going forward. We conclude by presenting experimental results that suggest that ULAs are a not just a novelty, but a competitive approach to providing certified classifiers, using these results to motivate several potential topics of further research.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Identifiability of Complete Dictionary Learning
Authors:
Jérémy E. Cohen,
Nicolas Gillis
Abstract:
Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix $M$ and an integer $r$, find a dictionary $D$ with $r$ columns and a matrix $B$ with $k$-sparse columns (that is, each column of $B$ has at most $k$ non-zero entries) such that $M \approx DB$. A key issue in SCA is identifiability, that is, characterizing the conditions unde…
▽ More
Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix $M$ and an integer $r$, find a dictionary $D$ with $r$ columns and a matrix $B$ with $k$-sparse columns (that is, each column of $B$ has at most $k$ non-zero entries) such that $M \approx DB$. A key issue in SCA is identifiability, that is, characterizing the conditions under which $D$ and $B$ are essentially unique (that is, they are unique up to permutation and scaling of the columns of $D$ and rows of $B$). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of samples (that is, columns of $M$) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when $D$ is (under)complete. While previous bounds feature a combinatorial term $r \choose k$, we exhibit a sufficient condition involving $\mathcal{O}(r^3/(r-k)^2)$ samples that yields an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by $r-1$ columns of $D$. We also exhibit a necessary lower bound on the number of samples that contradicts previous results in the literature when $k$ equals $r-1$. Our bounds provide a drastic improvement compared to the state of the art, and imply for example that for a fixed proportion of zeros (constant and independent of $r$, e.g., 10\% of zero entries in $B$), one only requires $\mathcal{O}(r)$ data points to guarantee identifiability.
△ Less
Submitted 28 March, 2019; v1 submitted 27 August, 2018;
originally announced August 2018.
-
Nonnegative PARAFAC2: a flexible coupling approach
Authors:
Jeremy E. Cohen,
Rasmus Bro
Abstract:
Modeling variability in tensor decomposition methods is one of the challenges of source separation. One possible solution to account for variations from one data set to another, jointly analysed, is to resort to the PARAFAC2 model. However, so far imposing constraints on the mode with variability has not been possible. In the following manuscript, a relaxation of the PARAFAC2 model is introduced,…
▽ More
Modeling variability in tensor decomposition methods is one of the challenges of source separation. One possible solution to account for variations from one data set to another, jointly analysed, is to resort to the PARAFAC2 model. However, so far imposing constraints on the mode with variability has not been possible. In the following manuscript, a relaxation of the PARAFAC2 model is introduced, that allows for imposing nonnegativity constraints on the varying mode. An algorithm to compute the proposed flexible PARAFAC2 model is derived, and its performance is studied on both synthetic and chemometrics data.
△ Less
Submitted 14 February, 2018;
originally announced February 2018.
-
Curve Registered Coupled Low Rank Factorization
Authors:
Jeremy Emile Cohen,
Rodrigo Cabral Farias,
Bertrand Rivet
Abstract:
We propose an extension of the canonical polyadic (CP) tensor model where one of the latent factors is allowed to vary through data slices in a constrained way. The components of the latent factors, which we want to retrieve from data, can vary from one slice to another up to a diffeomorphism. We suppose that the diffeomorphisms are also unknown, thus merging curve registration and tensor decompos…
▽ More
We propose an extension of the canonical polyadic (CP) tensor model where one of the latent factors is allowed to vary through data slices in a constrained way. The components of the latent factors, which we want to retrieve from data, can vary from one slice to another up to a diffeomorphism. We suppose that the diffeomorphisms are also unknown, thus merging curve registration and tensor decomposition in one model, which we call registered CP. We present an algorithm to retrieve both the latent factors and the diffeomorphism, which is assumed to be in a parametrized form. At the end of the paper, we show simulation results comparing registered CP with other models from the literature.
△ Less
Submitted 9 February, 2018;
originally announced February 2018.
-
Spectral Unmixing with Multiple Dictionaries
Authors:
Jeremy E. Cohen,
Nicolas Gillis
Abstract:
Spectral unmixing aims at recovering the spectral signatures of materials, called endmembers, mixed in a hyperspectral or multispectral image, along with their abundances. A typical assumption is that the image contains one pure pixel per endmember, in which case spectral unmixing reduces to identifying these pixels. Many fully automated methods have been proposed in recent years, but little work…
▽ More
Spectral unmixing aims at recovering the spectral signatures of materials, called endmembers, mixed in a hyperspectral or multispectral image, along with their abundances. A typical assumption is that the image contains one pure pixel per endmember, in which case spectral unmixing reduces to identifying these pixels. Many fully automated methods have been proposed in recent years, but little work has been done to allow users to select areas where pure pixels are present manually or using a segmentation algorithm. Additionally, in a non-blind approach, several spectral libraries may be available rather than a single one, with a fixed number (or an upper or lower bound) of endmembers to chose from each. In this paper, we propose a multiple-dictionary constrained low-rank matrix approximation model that address these two problems. We propose an algorithm to compute this model, dubbed M2PALS, and its performance is discussed on both synthetic and real hyperspectral images.
△ Less
Submitted 8 November, 2017;
originally announced November 2017.
-
Dictionary-based Tensor Canonical Polyadic Decomposition
Authors:
Jérémy E. Cohen,
Nicolas Gillis
Abstract:
To ensure interpretability of extracted sources in tensor decomposition, we introduce in this paper a dictionary-based tensor canonical polyadic decomposition which enforces one factor to belong exactly to a known dictionary. A new formulation of sparse coding is proposed which enables high dimensional tensors dictionary-based canonical polyadic decomposition. The benefits of using a dictionary in…
▽ More
To ensure interpretability of extracted sources in tensor decomposition, we introduce in this paper a dictionary-based tensor canonical polyadic decomposition which enforces one factor to belong exactly to a known dictionary. A new formulation of sparse coding is proposed which enables high dimensional tensors dictionary-based canonical polyadic decomposition. The benefits of using a dictionary in tensor decomposition models are explored both in terms of parameter identifiability and estimation accuracy. Performances of the proposed algorithms are evaluated on the decomposition of simulated data and the unmixing of hyperspectral images.
△ Less
Submitted 8 November, 2017; v1 submitted 3 April, 2017;
originally announced April 2017.
-
About Notations in Multiway Array Processing
Authors:
Jeremy E. Cohen
Abstract:
This paper gives an overview of notations used in multiway array processing. We redefine the vectorization and matricization operators to comply with some properties of the Kronecker product. The tensor product and Kronecker product are also represented with two different symbols, and it is shown how these notations lead to clearer expressions for multiway array operations. Finally, the paper reca…
▽ More
This paper gives an overview of notations used in multiway array processing. We redefine the vectorization and matricization operators to comply with some properties of the Kronecker product. The tensor product and Kronecker product are also represented with two different symbols, and it is shown how these notations lead to clearer expressions for multiway array operations. Finally, the paper recalls the useful yet widely unknown properties of the array normal law with suggested notations.
△ Less
Submitted 3 February, 2016; v1 submitted 4 November, 2015;
originally announced November 2015.
-
Exploring multimodal data fusion through joint decompositions with flexible couplings
Authors:
Rodrigo Cabral Farias,
Jeremy Emile Cohen,
Pierre Comon
Abstract:
A Bayesian framework is proposed to define flexible coupling models for joint tensor decompositions of multiple data sets. Under this framework, a natural formulation of the data fusion problem is to cast it in terms of a joint maximum a posteriori (MAP) estimator. Data driven scenarios of joint posterior distributions are provided, including general Gaussian priors and non Gaussian coupling prior…
▽ More
A Bayesian framework is proposed to define flexible coupling models for joint tensor decompositions of multiple data sets. Under this framework, a natural formulation of the data fusion problem is to cast it in terms of a joint maximum a posteriori (MAP) estimator. Data driven scenarios of joint posterior distributions are provided, including general Gaussian priors and non Gaussian coupling priors. We present and discuss implementation issues of algorithms used to obtain the joint MAP estimator. We also show how this framework can be adapted to tackle the problem of joint decompositions of large datasets. In the case of a conditional Gaussian coupling with a linear transformation, we give theoretical bounds on the data fusion performance using the Bayesian Cramer-Rao bound. Simulations are reported for hybrid coupling models ranging from simple additive Gaussian models, to Gamma-type models with positive variables and to the coupling of data sets which are inherently of different size due to different resolution of the measurement devices.
△ Less
Submitted 25 May, 2016; v1 submitted 28 May, 2015;
originally announced May 2015.
-
Sample and population exponents of generalized Taylor's law
Authors:
Andrea Giometto,
Marco Formentin,
Andrea Rinaldo,
Joel E. Cohen,
Amos Maritan
Abstract:
Taylor's law (TL) states that the variance $V$ of a non-negative random variable is a power function of its mean $M$, i.e. $V=a M^b$. The ubiquitous empirical verification of TL, typically displaying sample exponents $b \simeq 2$, suggests a context-independent mechanism. However, theoretical studies of population dynamics predict a broad range of values of $b$. Here, we explain this apparent cont…
▽ More
Taylor's law (TL) states that the variance $V$ of a non-negative random variable is a power function of its mean $M$, i.e. $V=a M^b$. The ubiquitous empirical verification of TL, typically displaying sample exponents $b \simeq 2$, suggests a context-independent mechanism. However, theoretical studies of population dynamics predict a broad range of values of $b$. Here, we explain this apparent contradiction by using large deviations theory to derive a generalized TL in terms of sample and populations exponents $b_{jk}$ for the scaling of the $k$-th vs the $j$-th cumulant (conventional TL is recovered for $b=b_{12}$), with the sample exponent found to depend predictably on the number of observed samples. Thus, for finite numbers of observations one observes sample exponents $b_{jk}\simeq k/j$ (thus $b\simeq2$) independently of population exponents. Empirical analyses on two datasets support our theoretical results.
△ Less
Submitted 16 December, 2014;
originally announced December 2014.