-
A second-order-like optimizer with adaptive gradient scaling for deep learning
Authors:
Jérôme Bolte,
Ryan Boustany,
Edouard Pauwels,
Andrei Purica
Abstract:
In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory requirements of standard DL methods as AdamW or SGD with momentum.After having recalled our geometrical motivations, we provide quite extensive experiments. On image classif…
▽ More
In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory requirements of standard DL methods as AdamW or SGD with momentum.After having recalled our geometrical motivations, we provide quite extensive experiments. On image classification (CIFAR-10, ImageNet) and language modeling (GPT-2), INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings. Our code is publicly available at \url{https://github.com/innaprop/innaprop}.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
SPH modelling of AGB wind morphology in hierarchical triple systems \& comparison to observation of R Aql
Authors:
Jolien Malfait,
Lionel Siess,
Owen Vermeulen,
Mats Esseldeurs,
Sofia H. J. Wallström,
Anita M. S. Richards,
Frederik De Ceuster,
Silke Maes,
Jan Bolte,
Leen Decin
Abstract:
Asymmetric 3D structures are observed in the outflows of evolved low- and intermediate-mass stars, and are believed to be shaped through the interaction of companions that are hidden within the dense wind. We investigate how triple systems can shape the outflow of AGB stars. We focus on coplanar systems in a hierarchical, stable orbit, consisting of an AGB star with one relatively close companion,…
▽ More
Asymmetric 3D structures are observed in the outflows of evolved low- and intermediate-mass stars, and are believed to be shaped through the interaction of companions that are hidden within the dense wind. We investigate how triple systems can shape the outflow of AGB stars. We focus on coplanar systems in a hierarchical, stable orbit, consisting of an AGB star with one relatively close companion, and one at large orbital separation. We model a grid of hierarchical triple systems including a wind-launching AGB star, with the smoothed-particle-hydrodynamic Phantom code. We vary the outer companion mass, the AGB wind velocity and the orbital eccentricities to study the impact of these parameters on the wind morphology. Further, we investigate if the outflow of the AGB star R Aql could be shaped by a triple system, by post-processing one of our triple models with a radiative transfer routine, and comparing this to data of the ALMA ATOMIUM programme. The characteristic wind structures resulting from a hierarchical triple system are the following. A large two-edged spiral wake results behind the outer companion star. This structure lies on top of the spiral structure formed by the close binary, which is affected by the orbital motion around the system centre-of-mass. This dense inner spiral pattern interacts with, and strongly impacts, the spiral wake of the outer companion, resulting in a waved-pattern on the outer edge of this spiral wake. From the comparison of our models to the observations of R Aql, we conclude that this circumstellar environment might be shaped by a similar system as the ones modelled in this work, but an elaborate study of the observational data is needed to determine better the orbital parameters and characteristics of the central system.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Chemical tracers of a highly eccentric AGB-main sequence star binary
Authors:
T. Danilovich,
J. Malfait,
M. Van de Sande,
M. Montargès,
P. Kervella,
F. De Ceuster,
A. Coenegrachts,
T. J. Millar,
A. M. S. Richards,
L. Decin,
C. A. Gottlieb,
C. Pinte,
E. De Beck,
D. J. Price,
K. T. Wong,
J. Bolte,
K. M. Menten,
A. Baudry,
A. de Koter,
S. Etoka,
D. Gobrecht,
M. Gray,
F. Herpin,
M. Jeste,
E. Lagadec
, et al. (10 additional authors not shown)
Abstract:
Binary interactions have been proposed to explain a variety of circumstellar structures seen around evolved stars, including asymptotic giant branch (AGB) stars and planetary nebulae. Studies resolving the circumstellar envelopes of AGB stars have revealed spirals, discs and bipolar outflows, with shaping attributed to interactions with a companion. For the first time, we have used a combined chem…
▽ More
Binary interactions have been proposed to explain a variety of circumstellar structures seen around evolved stars, including asymptotic giant branch (AGB) stars and planetary nebulae. Studies resolving the circumstellar envelopes of AGB stars have revealed spirals, discs and bipolar outflows, with shaping attributed to interactions with a companion. For the first time, we have used a combined chemical and dynamical analysis to reveal a highly eccentric and long-period orbit for W Aquilae, a binary system containing an AGB star and a main sequence companion. Our results are based on anisotropic SiN emission, the first detections of NS and SiC towards an S-type star, and density structures observed in the CO emission. These features are all interpreted as having formed during periastron interactions. Our astrochemistry-based method can yield stringent constraints on the orbital parameters of long-period binaries containing AGB stars, and will be applicable to other systems.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Geometric and computational hardness of bilevel programming
Authors:
Jérôme Bolte,
Quoc-Tung Le,
Edouard Pauwels,
Samuel Vaiter
Abstract:
We first show a simple but striking result in bilevel optimization: unconstrained $C^\infty$ smooth bilevel programming is as hard as general extended-real-valued lower semicontinuous minimization. We then proceed to a worst-case analysis of box-constrained bilevel polynomial optimization. We show in particular that any extended-real-valued semi-algebraic function, possibly non-continuous, can be…
▽ More
We first show a simple but striking result in bilevel optimization: unconstrained $C^\infty$ smooth bilevel programming is as hard as general extended-real-valued lower semicontinuous minimization. We then proceed to a worst-case analysis of box-constrained bilevel polynomial optimization. We show in particular that any extended-real-valued semi-algebraic function, possibly non-continuous, can be expressed as the value function of a polynomial bilevel program. Secondly, from a computational complexity perspective, the decision version of polynomial bilevel programming is one level above NP in the polynomial hierarchy ($Σ^p_2$-hard). Both types of difficulties are uncommon in non-linear programs for which objective functions are typically continuous and belong to the class NP. These results highlight the irremediable hardness attached to general bilevel optimization and the necessity of imposing some form of regularity on the lower level.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Inexact subgradient methods for semialgebraic functions
Authors:
Jérôme Bolte,
Tam Le,
Éric Moulines,
Edouard Pauwels
Abstract:
Motivated by the widespread use of approximate derivatives in machine learning and optimization, we study inexact subgradient methods with non-vanishing additive errors and step sizes. In the nonconvex semialgebraic setting, under boundedness assumptions, we prove that the method provides points that eventually fluctuate close to the critical set at a distance proportional to $ε^ρ$ where $ε$ is t…
▽ More
Motivated by the widespread use of approximate derivatives in machine learning and optimization, we study inexact subgradient methods with non-vanishing additive errors and step sizes. In the nonconvex semialgebraic setting, under boundedness assumptions, we prove that the method provides points that eventually fluctuate close to the critical set at a distance proportional to $ε^ρ$ where $ε$ is the error in subgradient evaluation and $ρ$ relates to the geometry of the problem. In the convex setting, we provide complexity results for the averaged values. We also obtain byproducts of independent interest, such as descent-like lemmas for nonsmooth nonconvex problems and some results on the limit of affine interpolants of differential inclusions.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
ATOMIUM: Molecular inventory of 17 oxygen-rich evolved stars observed with ALMA
Authors:
S. H. J. Wallstrom,
T. Danilovich,
H. S. P. Muller,
C. A. Gottlieb,
S. Maes,
M. Van de Sande,
L. Decin,
A. M. S. Richards,
A. Baudry,
J. Bolte,
T. Ceulemans,
F. De Ceuster,
A. de Koter,
I. El Mellah,
M. Esseldeurs,
S. Etoka,
D. Gobrecht,
E. Gottlieb,
M. Gray,
F. Herpin,
M. Jeste,
D. Kee,
P. Kervella,
T. Khouri,
E. Lagadec
, et al. (13 additional authors not shown)
Abstract:
The dusty winds of cool evolved stars are a major contributor of the newly synthesised material enriching the Galaxy and future generations of stars. However, the details of the physics and chemistry behind dust formation and wind launching have yet to be pinpointed. Recent spatially resolved observations show the importance of gaining a more comprehensive view of the circumstellar chemistry, but…
▽ More
The dusty winds of cool evolved stars are a major contributor of the newly synthesised material enriching the Galaxy and future generations of stars. However, the details of the physics and chemistry behind dust formation and wind launching have yet to be pinpointed. Recent spatially resolved observations show the importance of gaining a more comprehensive view of the circumstellar chemistry, but a comparative study of the intricate interplay between chemistry and physics is still difficult because observational details such as frequencies and angular resolutions are rarely comparable. Aiming to overcome these deficiencies, ATOMIUM is an ALMA Large Programme to study the physics and chemistry of the circumstellar envelopes of a diverse set of oxygen-rich evolved stars under homogeneous observing conditions at three angular resolutions between ~0.02"-1.4". Here we summarize the molecular inventory of these sources, and the correlations between stellar parameters and molecular content. Seventeen oxygen-rich or S-type asymptotic giant branch (AGB) and red supergiant (RSG) stars have been observed in several tunings with ALMA Band 6, targeting a range of molecules to probe the circumstellar envelope and especially the chemistry of dust formation close to the star. We systematically assigned the molecular carriers of the spectral lines and measured their spectroscopic parameters and the angular extent of the emission of each line from integrated intensity maps. Across the ATOMIUM sample, we detect 291 transitions of 24 different molecules and their isotopologues. This includes several first detections in oxygen-rich AGB/RSG stars: PO v=1, SO2 v1=1 and v2=2, and several high energy H2O transitions. We also find several first detections in S-type AGB stars: vibrationally excited HCN v2=2,3 and SiS v=4,5,6, as well as first detections of the molecules SiC, AlCl, and AlF in W Aql...
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
One-step differentiation of iterative algorithms
Authors:
Jérôme Bolte,
Edouard Pauwels,
Samuel Vaiter
Abstract:
In appropriate frameworks, automatic differentiation is transparent to the user at the cost of being a significant computational burden when the number of operations is large. For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagatio…
▽ More
In appropriate frameworks, automatic differentiation is transparent to the user at the cost of being a significant computational burden when the number of operations is large. For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as performant as implicit differentiation for fast algorithms (e.g., superlinear optimization methods). We provide a complete theoretical approximation analysis with specific examples (Newton's method, gradient descent) along with its consequences in bilevel optimization. Several numerical examples illustrate the well-foundness of the one-step estimator.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
The VLT/SPHERE view of the ATOMIUM cool evolved star sample. I. Overview: Sample characterization through polarization analysis
Authors:
M. Montargès,
E. Cannon,
A. de Koter,
T. Khouri,
E. Lagadec,
P. Kervella,
L. Decin,
I. McDonald,
W. Homan,
L. B. F. M. Waters,
R. Sahai,
C. A. Gottlieb,
J. Malfait,
S. Maes,
B. Pimpanuwat,
M. Jeste,
T. Danilovich,
F. De Ceuster,
M. Van de Sande,
D. Gobrecht,
S. H. J. Wallström,
K. T. Wong,
I. El Mellah,
J. Bolte,
F. Herpin
, et al. (10 additional authors not shown)
Abstract:
Aims. Through the ATOMIUM project, based on an ALMA large program, we aim to present a consistent view of a sample of 17 nearby cool evolved stars (Aymptotic Giant Branch and red supergiant stars).
Methods. Here we present VLT/SPHERE-ZIMPOL polarimetric maps obtained in the visible of 14 out of the 17 ATOMIUM sources. They were obtained contemporaneously with the ALMA high spatial resolution dat…
▽ More
Aims. Through the ATOMIUM project, based on an ALMA large program, we aim to present a consistent view of a sample of 17 nearby cool evolved stars (Aymptotic Giant Branch and red supergiant stars).
Methods. Here we present VLT/SPHERE-ZIMPOL polarimetric maps obtained in the visible of 14 out of the 17 ATOMIUM sources. They were obtained contemporaneously with the ALMA high spatial resolution data. To help interpret the polarized signal, we produced synthetic maps of light scattering by dust, through 3D radiative transfer simulations with the RADMC3D code.
Results. The degree of linear polarization (DoLP) observed by ZIMPOL spreads across several optical filters. We infer that it primarily probes dust located just outside of the point spread function, and in or near the plane of the sky, with a total optical depth close to unity in the line of sight, representing only a fraction of the total circumstellar dust. The maximum DoLP ranges from 0.03-0.38 depending on the source, fractions that can be reproduced by our 3D pilot models for grains composed of common dust species. The spatial structure of the DoLP shows a diverse set of shapes. Only for three sources do we note a correlation between the ALMA CO and SiO lines, which trace the gas density, and the DoLP, which traces the dust.
Conclusion. The clumpiness of the DoLP and the lack of a consistent correlation between the gas and the dust location show that, in the inner circumstellar environment (CSE), dust formation occurs at very specific sites. This has potential consequences for the derived mass-loss rates and dust-to-gas ratio in the inner region of the CSE. Except for $π^1$~Gru and perhaps GY Aql, we do not detect interactions between the circumstellar wind and the hypothesized companions that shape the wind at larger scales. This suggests that the orbits of any other companions are tilted out of the plane of the sky.
△ Less
Submitted 5 January, 2023;
originally announced January 2023.
-
Differentiating Nonsmooth Solutions to Parametric Monotone Inclusion Problems
Authors:
Jérôme Bolte,
Edouard Pauwels,
Antonio Silveti-Falls
Abstract:
We leverage path differentiability and a recent result on nonsmooth implicit differentiation calculus to give sufficient conditions ensuring that the solution to a monotone inclusion problem will be path differentiable, with formulas for computing its generalized gradient. A direct consequence of our result is that these solutions happen to be differentiable almost everywhere. Our approach is full…
▽ More
We leverage path differentiability and a recent result on nonsmooth implicit differentiation calculus to give sufficient conditions ensuring that the solution to a monotone inclusion problem will be path differentiable, with formulas for computing its generalized gradient. A direct consequence of our result is that these solutions happen to be differentiable almost everywhere. Our approach is fully compatible with automatic differentiation and comes with assumptions which are easy to check, roughly speaking: semialgebraicity and strong monotonicity. We illustrate the scope of our results by considering three fundamental composite problem settings: strongly convex problems, dual solutions to convex minimization problems and primal-dual solutions to min-max problems.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
3D hydrodynamical survey of the impact of a companion on the morphology and dynamics of AGB outflow
Authors:
Jolien Malfait,
Silke Maes,
Ward Homan,
Jan Bolte,
Lionel Siess,
Frederik De Ceuster,
Leen Decin
Abstract:
With the use of high-resolution ALMA observations, complex structures that resemble those observed in post-AGB stars and planetary nebulae are detected in the circumstellar envelopes of low-mass evolved stars. These deviations from spherical symmetry are believed to be caused primarily by the interaction with a companion star or planet. With the use of three-dimensional hydrodynamic simulations, w…
▽ More
With the use of high-resolution ALMA observations, complex structures that resemble those observed in post-AGB stars and planetary nebulae are detected in the circumstellar envelopes of low-mass evolved stars. These deviations from spherical symmetry are believed to be caused primarily by the interaction with a companion star or planet. With the use of three-dimensional hydrodynamic simulations, we study the impact of a binary companion on the wind morphology and dynamics of an AGB outflow. We classifiy the wind structures and morphology that form in these simulations with the use of a classification parameter, constructed with characteristic parameters of the binary configuration. Finally we conclude that the companion alters the wind expansion velocity through the slingshot mechanism, if it is massive enough.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
On the complexity of nonsmooth automatic differentiation
Authors:
Jérôme Bolte,
Ryan Boustany,
Edouard Pauwels,
Béatrice Pesquet-Popescu
Abstract:
Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the backward mode turns out to be independent of the dimension when using programs with locally Lipschitz semi-algebraic or definable elementary functions. This co…
▽ More
Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the backward mode turns out to be independent of the dimension when using programs with locally Lipschitz semi-algebraic or definable elementary functions. This considerably extends Baur-Strassen's smooth cheap gradient principle. We illustrate our results by establishing fast backpropagation results of conservative gradients through feedforward neural networks with standard activation and loss functions. Nonsmooth backpropagation's cheapness contrasts with concurrent forward approaches, which have, to this day, dimensional-dependent worst-case overhead estimates. We provide further results suggesting the superiority of backward propagation of conservative gradients. Indeed, we relate the complexity of computing a large number of directional derivatives to that of matrix multiplication, and we show that finding two subgradients in the Clarke subdifferential of a function is an NP-hard problem.
△ Less
Submitted 6 February, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Automatic differentiation of nonsmooth iterative algorithms
Authors:
Jérôme Bolte,
Edouard Pauwels,
Samuel Vaiter
Abstract:
Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and c…
▽ More
Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and can it be used effectively in machine learning? Is there a connection with classical derivative? All these questions are addressed under appropriate nonexpansivity conditions in the framework of conservative derivatives which has proved useful in understanding nonsmooth AD. For nonsmooth piggyback iterations, we characterize the attractor set of nonsmooth piggyback iterations as a set-valued fixed point which remains in the conservative framework. This has various consequences and in particular almost everywhere convergence of classical derivatives. Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Swarm gradient dynamics for global optimization: the density case
Authors:
Jérôme Bolte,
Laurent Miclo,
Stéphane Villeneuve
Abstract:
Using jointly geometric and stochastic reformulations of nonconvex problems and exploiting a Monge-Kantorovich gradient system formulation with vanishing forces, we formally extend the simulated annealing method to a wide class of global optimization methods. Due to an inbuilt combination of a gradient-like strategy and particles interactions, we call them swarm gradient dynamics. As in the origin…
▽ More
Using jointly geometric and stochastic reformulations of nonconvex problems and exploiting a Monge-Kantorovich gradient system formulation with vanishing forces, we formally extend the simulated annealing method to a wide class of global optimization methods. Due to an inbuilt combination of a gradient-like strategy and particles interactions, we call them swarm gradient dynamics. As in the original paper of Holley-Kusuoka-Stroock, the key to the existence of a schedule ensuring convergence to a global minimizer is a functional inequality. One of our central theoretical contributions is the proof of such an inequality for one-dimensional compact manifolds. We conjecture the inequality to be true in a much wider setting. We also describe a general method allowing for global optimization and evidencing the crucial role of functional inequalities {à} la Łojasiewicz.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Subgradient sampling for nonsmooth nonconvex minimization
Authors:
Jérôme Bolte,
Tam Le,
Edouard Pauwels
Abstract:
Risk minimization for nonsmooth nonconvex problems naturally leads to first-order sampling or, by an abuse of terminology, to stochastic subgradient descent. We establish the convergence of this method in the path-differentiable case and describe more precise results under additional geometric assumptions. We recover and improve results from Ermoliev and Norkin [Cybern. Syst. Anal., 34 (1998), pp.…
▽ More
Risk minimization for nonsmooth nonconvex problems naturally leads to first-order sampling or, by an abuse of terminology, to stochastic subgradient descent. We establish the convergence of this method in the path-differentiable case and describe more precise results under additional geometric assumptions. We recover and improve results from Ermoliev and Norkin [Cybern. Syst. Anal., 34 (1998), pp. 196--215] by using a different approach: conservative calculus and the ODE method. In the definable case, we show that first-order subgradient sampling avoids artificial critical points with probability one and applies moreover to a large range of risk minimization problems in deep learning, based on the backpropagation oracle. As byproducts of our approach, we obtain several results on integration of independent interest, such as an interchange result for conservative derivatives and integrals or the definability of set-valued parameterized integrals.
△ Less
Submitted 23 July, 2024; v1 submitted 28 February, 2022;
originally announced February 2022.
-
The Iterates of the Frank-Wolfe Algorithm May Not Converge
Authors:
Jérôme Bolte,
Cyrille W. Combettes,
Édouard Pauwels
Abstract:
The Frank-Wolfe algorithm is a popular method for minimizing a smooth convex function $f$ over a compact convex set $\mathcal{C}$. While many convergence results have been derived in terms of function values, hardly nothing is known about the convergence behavior of the sequence of iterates $(x_t)_{t\in\mathbb{N}}$. Under the usual assumptions, we design several counterexamples to the convergence…
▽ More
The Frank-Wolfe algorithm is a popular method for minimizing a smooth convex function $f$ over a compact convex set $\mathcal{C}$. While many convergence results have been derived in terms of function values, hardly nothing is known about the convergence behavior of the sequence of iterates $(x_t)_{t\in\mathbb{N}}$. Under the usual assumptions, we design several counterexamples to the convergence of $(x_t)_{t\in\mathbb{N}}$, where $f$ is $d$-time continuously differentiable, $d\geq2$, and $f(x_t)\to\min_\mathcal{C}f$. Our counterexamples cover the cases of open-loop, closed-loop, and line-search step-size strategies. We do not assume \emph{misspecification} of the linear minimization oracle and our results thus hold regardless of the points it returns, demonstrating the fundamental pathologies in the convergence behavior of $(x_t)_{t\in\mathbb{N}}$.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
ATOMIUM: ALMA tracing the origins of molecules in dust forming oxygen rich M-type stars: Motivation, sample, calibration, and initial results
Authors:
C. A. Gottlieb,
L. Decin,
A. M. S. Richards,
F. De Ceuster,
W. Homan,
S. H. J. Wallstrom,
T. Danilovich,
T. J. Millar,
M. Montarges,
K. T. Wong,
I. McDonald,
A. Baudry,
J. Bolte,
E. Cannon,
E. De Beck,
A. de Koter,
I. El Mellah,
S. Etoka,
D. Gobrecht,
M. Gray,
F. Herpin,
M. Jeste,
P. Kervella,
T. Khouri,
E. Lagadec
, et al. (11 additional authors not shown)
Abstract:
This overview paper presents ATOMIUM, a Large Programme in Cycle 6 with the Atacama Large Millimeter-submillimeter Array (ALMA). The goal of ATOMIUM is to understand the dynamics and the gas phase and dust formation chemistry in the winds of evolved asymptotic giant branch (AGB) and red supergiant (RSG) stars. A more general aim is to identify chemical processes applicable to other astrophysical e…
▽ More
This overview paper presents ATOMIUM, a Large Programme in Cycle 6 with the Atacama Large Millimeter-submillimeter Array (ALMA). The goal of ATOMIUM is to understand the dynamics and the gas phase and dust formation chemistry in the winds of evolved asymptotic giant branch (AGB) and red supergiant (RSG) stars. A more general aim is to identify chemical processes applicable to other astrophysical environments. 17 oxygen-rich AGB and RSG stars spanning a range in (circum)stellar parameters and evolutionary phases were observed in a homogeneous observing strategy allowing for an unambiguous comparison. Data were obtained between 213.83 and 269.71 GHz at high (0.025-0.050 arcsec), medium (0.13-0.24 arcsec), and low (about 1 arcsec) angular resolution. The sensitivity per 1.3 km/s channel was 1.5-5 mJy/beam. 13 molecules were designated as primary molecules in the survey: CO, SiO, AlO, AlOH, TiO, TiO2, HCN, SO, SO2, SiS, CS, H2O, and NaCl. The scientific motivation, survey design, sample properties, data reduction, and an overview of the data products are described; and we highlight one scientific result - the wind kinematics of the ATOMIUM sources. The ATOMIUM sources often have a slow wind acceleration, and a fraction of the gas reaches a velocity which can be up to a factor of two times larger than previously reported terminal velocities assuming isotropic expansion, and the wind kinematic profiles establish that the radial velocity described by the momentum equation for a spherical wind structure cannot capture the complexity of the velocity field. In 15 sources, some molecular transitions other than 12CO v=0 J=2-1 reach a higher outflow velocity, with a spatial emission zone that is often greater than 30 stellar radii, but much less than the extent of CO. Binary interaction with a (sub)stellar companion might (partly) explain the non-monotonic behaviour of the projected velocity field.
△ Less
Submitted 13 December, 2021; v1 submitted 8 December, 2021;
originally announced December 2021.
-
ATOMIUM: Halide molecules around the S-type AGB star W Aquilae
Authors:
T. Danilovich,
M. Van de Sande,
J. M. C. Plane,
T. J. Millar,
P. Royer,
M. A. Amor,
K. Hammami,
L. Decock,
C. A. Gottlieb,
L. Decin,
A. M. S. Richards,
E. De Beck,
A. Baudry,
J. Bolte,
E. Cannon,
F. De Ceuster,
A. de Koter,
S. Etoka,
D. Gobrecht,
M. Gray,
F. Herpin,
W. Homan,
M. Jeste,
P. Kervella,
T. Khouri
, et al. (14 additional authors not shown)
Abstract:
S-type asymptotic giant branch (AGB) stars are thought to be intermediates in the evolution of oxygen- to carbon-rich AGB stars. The chemical compositions of their circumstellar envelopes are also intermediate, but have not been studied in as much detail as their carbon- and oxygen-rich counterparts. We aim to determine the abundances of AlCl and AlF from rotational lines, which have been observed…
▽ More
S-type asymptotic giant branch (AGB) stars are thought to be intermediates in the evolution of oxygen- to carbon-rich AGB stars. The chemical compositions of their circumstellar envelopes are also intermediate, but have not been studied in as much detail as their carbon- and oxygen-rich counterparts. We aim to determine the abundances of AlCl and AlF from rotational lines, which have been observed for the first time towards an S-type AGB star, W Aql. In combination with models based on PACS observations, we aim to update our chemical kinetics network based on these results. We analyse ALMA observations towards W Aql of AlCl in the ground and first two vibrationally excited states and AlF in the ground vibrational state. Using radiative transfer models, we determine the abundances and spatial abundance distributions of Al$^{35}$Cl, Al$^{37}$Cl, and AlF. We also model HCl and HF emission and compare these models to PACS spectra to constrain the abundances of these species. AlCl is found in clumps very close to the star, with emission confined within 0.1$^{\prime\prime}$ of the star. AlF emission is more extended, with faint emission extending 0.2$^{\prime\prime}$ to 0.6$^{\prime\prime}$ from the continuum peak. We find peak abundances, relative to H$_2$, of $1.7\times 10^{-7}$ for Al$^{35}$Cl, $7\times 10^{-8}$ for Al$^{37}$Cl and $1\times 10^{-7}$ for AlF. From the PACS spectra, we find abundances of $9.7\times 10^{-8}$ and $\leq 10^{-8}$, relative to H$_2$, for HCl and HF, respectively. The AlF abundance exceeds the solar F abundance, indicating that fluorine synthesised in the AGB star has already been dredged up to the surface of the star and ejected into the circumstellar envelope. From our analysis of chemical reactions in the wind, we conclude that AlF may participate in the dust formation process, but we cannot fully explain the rapid depletion of AlCl seen in the wind.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
SPH modelling of wind-companion interactions in eccentric AGB binary systems
Authors:
J. Malfait,
W. Homan,
S. Maes,
J. Bolte,
L. Siess,
F. De Ceuster,
L. Decin
Abstract:
The late evolutionary stages of low- and intermediate-mass stars are characterised by mass loss through a dust-driven stellar wind. Recent observations reveal complex structures within these winds, that are believed to be formed primarily via interaction with a companion. How these complexities arise, and which structures are formed in which type of systems, is still poorly understood. Particularl…
▽ More
The late evolutionary stages of low- and intermediate-mass stars are characterised by mass loss through a dust-driven stellar wind. Recent observations reveal complex structures within these winds, that are believed to be formed primarily via interaction with a companion. How these complexities arise, and which structures are formed in which type of systems, is still poorly understood. Particularly, there is a lack of studies investigating the structure formation in eccentric systems. We aim to improve our understanding of the wind morphology of eccentric AGB binary systems by investigating the mechanism responsible for the different small-scale structures and global morphologies that arise in a polytropic wind with different velocities. Using the smoothed particle hydrodynamics (SPH) code Phantom, we generate nine different high-resolution, 3D simulations of an AGB star with a solar-mass companion with various wind velocity and eccentricity combinations. The models assume a polytropic gas, with no additional cooling. We conclude that for models with a high wind velocity, the short interaction with the companion results in a regular spiral morphology, that is flattened. In the case of a lower wind velocity, the stronger interaction results in the formation of a high-energy region and bow-shock structure that can shape the wind into an irregular morphology if instabilities arise. High-eccentricity models show a complex, phase-dependent interaction leading to wind structures that are irregular in three dimensions. However, the significant interaction with the companion compresses matter into an equatorial density enhancement, irrespective of eccentricity.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
SPH modelling of companion-perturbed AGB outflows including a new morphology classification scheme
Authors:
S. Maes,
W. Homan,
J. Malfait,
L. Siess,
J. Bolte,
F. De Ceuster,
L. Decin
Abstract:
Asymptotic giant branch (AGB) stars are known to lose a significant amount of mass by a stellar wind, which controls the remainder of their stellar lifetime. High angular-resolution observations show that the winds of these cool stars typically exhibit mid- to small-scale density perturbations such as spirals and arcs, believed to be caused by the gravitational interaction with a (sub-)stellar com…
▽ More
Asymptotic giant branch (AGB) stars are known to lose a significant amount of mass by a stellar wind, which controls the remainder of their stellar lifetime. High angular-resolution observations show that the winds of these cool stars typically exhibit mid- to small-scale density perturbations such as spirals and arcs, believed to be caused by the gravitational interaction with a (sub-)stellar companion. We aim to explore the effects of the wind-companion interaction on the 3D density and velocity distribution of the wind, as a function of three key parameters: wind velocity, binary separation and companion mass. For the first time, we compare the impact on the outflow of a planetary companion to that of a stellar companion. We intend to devise a morphology classification scheme based on a singular parameter. With our grid of models we cover the prominent morphology changes in a companion-perturbed AGB outflow: slow winds with a close, massive binary companion show a more complex morphology. Additionally, we prove that massive planets are able to significantly impact the density structure of an AGB wind. We find that the interaction with a companion affects the terminal velocity of the wind, which can be explained by the gravitational slingshot mechanism. We distinguish between two types of wind focussing to the orbital plane resulting from distinct mechanisms: global flattening of the outflow as a result of the AGB star's orbital motion and the formation of an EDE as a consequence of the companion's gravitational pull. We investigate different morphology classification schemes and uncover that the ratio of the gravitational potential energy density of the companion to the kinetic energy density of the AGB outflow yields a robust classification parameter for the models presented in this paper.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Numerical influence of ReLU'(0) on backpropagation
Authors:
David Bertoin,
Jérôme Bolte,
Sébastien Gerchinovitz,
Edouard Pauwels
Abstract:
In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (f…
▽ More
In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU'(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU'(0)'s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.
△ Less
Submitted 3 November, 2023; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Nonsmooth Implicit Differentiation for Machine Learning and Optimization
Authors:
Jérôme Bolte,
Tam Le,
Edouard Pauwels,
Antonio Silveti-Falls
Abstract:
In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clar…
▽ More
In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.
△ Less
Submitted 5 April, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Atomium: The astounding complexity of the near circumstellar environment of the M-type AGB star R Hydrae. I. Morpho-kinematical interpretation of CO and SiO emission
Authors:
Ward Homan,
Bannawit Pimpanuwat,
Fabrice Herpin,
Taissa Danilovich,
Iain McDonald,
Sofia H. J. Wallström,
Anita M. S. Richards,
Alain Baudry,
Raghvendra Sahai,
Tom J. Millar,
Alex de Koter,
C. A. Gottlieb,
Pierre Kervella,
Miguel Montargès,
Marie Van de Sande,
Leen Decin,
Albert Zijlstra,
Sandra Etoka,
Manali Jeste,
Holger S. P. Müller,
Silke Maes,
Jolien Malfait,
Karl Menten,
John Plane,
Kelvin Lee
, et al. (14 additional authors not shown)
Abstract:
Evolved low- to intermediate-mass stars are known to shed their gaseous envelope into a large, dusty, molecule-rich circumstellar nebula which typically develops a high degree of structural complexity. Most of the large-scale, spatially correlated structures in the nebula are thought to originate from the interaction of the stellar wind with a companion. As part of the Atomium large programme, we…
▽ More
Evolved low- to intermediate-mass stars are known to shed their gaseous envelope into a large, dusty, molecule-rich circumstellar nebula which typically develops a high degree of structural complexity. Most of the large-scale, spatially correlated structures in the nebula are thought to originate from the interaction of the stellar wind with a companion. As part of the Atomium large programme, we observed the M-type asymptotic giant branch (AGB) star R Hydrae with ALMA. The morphology of the inner wind of R Hya, which has a known companion at ~3500 au, was determined from maps of CO and SiO obtained at high angular resolution. A map of the CO emission reveals a multi-layered structure consisting of a large elliptical feature at an angular scale of ~10'' that is oriented along the north-south axis. The wind morphology within the elliptical feature is dominated by two hollow bubbles. The bubbles are on opposite sides of the AGB star and lie along an axis with a position angle of ~115 deg. Both bubbles are offset from the central star, and their appearance in the SiO channel maps indicates that they might be shock waves travelling through the AGB wind. An estimate of the dynamical age of the bubbles yields an age of the order of 100 yr, which is in agreement with the previously proposed elapsed time since the star last underwent a thermal pulse. When the CO and SiO emission is examined on subarcsecond angular scales, there is evidence for an inclined, differentially rotating equatorial density enhancement, strongly suggesting the presence of a second nearby companion. The position angle of the major axis of this disc is ~70 deg in the plane of the sky. We tentatively estimate that a lower limit on the mass of the nearby companion is ~0.65 Msol on the basis of the highest measured speeds in the disc and the location of its inner rim at ~6 au from the AGB star.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Second-order step-size tuning of SGD for non-convex optimization
Authors:
Camille Castera,
Jérôme Bolte,
Cédric Févotte,
Edouard Pauwels
Abstract:
In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version o…
▽ More
In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM.
△ Less
Submitted 21 November, 2021; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Magritte, a modern software library for 3D radiative transfer: II. Adaptive ray-tracing, mesh construction and reduction
Authors:
Frederik De Ceuster,
Jan Bolte,
Ward Homan,
Silke Maes,
Jolien Malfait,
Leen Decin,
Jeremy Yates,
Peter Boyle,
James Hetherington
Abstract:
Radiative transfer is a notoriously difficult and computationally demanding problem. Yet, it is an indispensable ingredient in nearly all astrophysical and cosmological simulations. Choosing an appropriate discretization scheme is a crucial part of the simulation, since it not only determines the direct memory cost of the model but also largely determines the computational cost and the achievable…
▽ More
Radiative transfer is a notoriously difficult and computationally demanding problem. Yet, it is an indispensable ingredient in nearly all astrophysical and cosmological simulations. Choosing an appropriate discretization scheme is a crucial part of the simulation, since it not only determines the direct memory cost of the model but also largely determines the computational cost and the achievable accuracy. In this paper, we show how an appropriate choice of directional discretization scheme as well as spatial model mesh can help alleviate the computational cost, while largely retaining the accuracy. First, we discuss the adaptive ray-tracing scheme implemented in our 3D radiative transfer library Magritte, that adapts the rays to the spatial mesh and uses a hierarchical directional discretization based on HEALPix. Second, we demonstrate how the free and open-source software library Gmsh can be used to generate high quality meshes that can be easily tailored for Magritte. In particular, we show how the local element size distribution of the mesh can be used to optimise the sampling of both analytically and numerically defined models. Furthermore, we show that when using the output of hydrodynamics simulations as input for a radiative transfer simulation, the number of elements in the input model can often be reduced by an order of magnitude, without significant loss of accuracy in the radiation field. We demonstrate this for two models based on a hierarchical octree mesh resulting from adaptive mesh refinement (AMR), as well as two models based on smoothed-particle hydrodynamics (SPH) data.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Atomium: A high-resolution view on the highly asymmetric wind of the AGB star Pi1 Gruis. I. First detection of a new companion and its effect on the inner wind
Authors:
Ward Homan,
Miguel Montarges,
Bannawit Pimpanuwat,
Anita M. S. Richards,
Sofia H. J. Wallstrom,
Pierre Kervella,
Leen Decin,
Albert Zijlstra,
Taissa Danilovich,
Alex de Koter,
Karl Menten,
Raghvendra Sahai,
John Plane,
Kelvin Lee,
Rens Waters,
Alain Baudry,
Ka Tat Wong,
Tom J. Millar,
Marie Van de Sande,
Eric Lagadec,
David Gobrecht,
Jeremy Yates,
Daniel Price,
Emily Cannon,
Jan Bolte
, et al. (13 additional authors not shown)
Abstract:
The nebular circumstellar environments of cool evolved stars are known to harbour a rich morphological complexity of gaseous structures on different length scales. A large part of these density structures are thought to be brought about by the interaction of the stellar wind with a close companion. The S-type asymptotic giant branch star Pi1 Gruis, which has a known companion at ~440 au and is tho…
▽ More
The nebular circumstellar environments of cool evolved stars are known to harbour a rich morphological complexity of gaseous structures on different length scales. A large part of these density structures are thought to be brought about by the interaction of the stellar wind with a close companion. The S-type asymptotic giant branch star Pi1 Gruis, which has a known companion at ~440 au and is thought to harbour a second, closer-by (<10 au) companion, was observed with the Atacama Large Millimeter/submillimeter Array as part of the ATOMIUM Large programme. In this work, the brightest CO, SiO, and HCN molecular line transitions are analysed. The continuum map shows two maxima, separated by 0.04'' (6 au). The CO data unambiguously reveal that Pi1 Gru's circumstellar environment harbours an inclined, radially outflowing, equatorial density enhancement. It contains a spiral structure at an angle of 38+/-3 deg with the line-of-sight. The HCN emission in the inner wind reveals a clockwise spiral, with a dynamical crossing time of the spiral arms consistent with a companion at a distance of 0.04'' from the asymptotic giant branch star, which is in agreement with the position of the secondary continuum peak. The inner wind dynamics imply a large acceleration region, consistent with a beta-law power of ~6. The CO emission suggests that the spiral is approximately Archimedean within 5'', beyond which this trend breaks down as the succession of the spiral arms becomes less periodic. The SiO emission at scales smaller than 0.5'' exhibits signatures of gas in rotation, which is found to fit the expected behaviour of gas in the wind-companion interaction zone. An investigation of SiO maser emission reveals what could be a stream of gas accelerating from the surface of the AGB star to the companion. Using these dynamics, we have tentatively derived an upper limit on the companion mass to be ~1.1 Msol.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
(Sub)stellar companions shape the winds of evolved stars
Authors:
L. Decin,
M. Montargès,
A. M. S. Richards,
C. A. Gottlieb,
W. Homan,
I. McDonald,
I. El Mellah,
T. Danilovich,
S. H. J. Wallström,
A. Zijlstra,
A. Baudry,
J. Bolte,
E. Cannon,
E. De Beck,
F. De Ceuster,
A. de Koter,
J. De Ridder,
S. Etoka,
D. Gobrecht,
M. Gray,
F. Herpin,
M. Jeste,
E. Lagadec,
P. Kervella,
T. Khouri
, et al. (10 additional authors not shown)
Abstract:
Binary interactions dominate the evolution of massive stars, but their role is less clear for low- and intermediate-mass stars. The evolution of a spherical wind from an asymptotic giant branch (AGB) star into a nonspherical planetary nebula (PN) could be due to binary interactions. We observed a sample of AGB stars with the Atacama Large Millimeter/submillimeter Array (ALMA) and found that their…
▽ More
Binary interactions dominate the evolution of massive stars, but their role is less clear for low- and intermediate-mass stars. The evolution of a spherical wind from an asymptotic giant branch (AGB) star into a nonspherical planetary nebula (PN) could be due to binary interactions. We observed a sample of AGB stars with the Atacama Large Millimeter/submillimeter Array (ALMA) and found that their winds exhibit distinct nonspherical geometries with morphological similarities to planetary nebulae (PNe). We infer that the same physics shapes both AGB winds and PNe; additionally, the morphology and AGB mass-loss rate are correlated. These characteristics can be explained by binary interaction. We propose an evolutionary scenario for AGB morphologies that is consistent with observed phenomena in AGB stars and PNe.
△ Less
Submitted 28 September, 2020; v1 submitted 24 September, 2020;
originally announced September 2020.
-
A Hölderian backtracking method for min-max and min-min problems
Authors:
Jérôme Bolte,
Lilian Glaudin,
Edouard Pauwels,
Mathieu Serrurier
Abstract:
We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptatio…
▽ More
We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptation which departs from the usual overly cautious backtracking methods. In a general framework, we provide convergence theoretical guarantees and rates. We apply our findings on simple GAN problems obtaining promising numerical results.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
A mathematical model for automatic differentiation in machine learning
Authors:
Jerome Bolte,
Edouard Pauwels
Abstract:
Automatic differentiation, as implemented today, does not have a simple mathematical model adapted to the needs of modern machine learning. In this work we articulate the relationships between differentiation of programs as implemented in practice and differentiation of nonsmooth functions. To this end we provide a simple class of functions, a nonsmooth calculus, and show how they apply to stochas…
▽ More
Automatic differentiation, as implemented today, does not have a simple mathematical model adapted to the needs of modern machine learning. In this work we articulate the relationships between differentiation of programs as implemented in practice and differentiation of nonsmooth functions. To this end we provide a simple class of functions, a nonsmooth calculus, and show how they apply to stochastic approximation methods. We also evidence the issue of artificial critical points created by algorithmic differentiation and show how usual methods avoid these points with probability one.
△ Less
Submitted 29 October, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Long term dynamics of the subgradient method for Lipschitz path differentiable functions
Authors:
Jerome Bolte,
Edouard Pauwels,
Rodolfo Rios-Zertuche
Abstract:
We consider the long-term dynamics of the vanishing stepsize subgradient method in the case when the objective function is neither smooth nor convex. We assume that this function is locally Lipschitz and path differentiable, i.e., admits a chain rule. Our study departs from other works in the sense that we focus on the behavoir of the oscillations, and to do this we use closed measures. We recover…
▽ More
We consider the long-term dynamics of the vanishing stepsize subgradient method in the case when the objective function is neither smooth nor convex. We assume that this function is locally Lipschitz and path differentiable, i.e., admits a chain rule. Our study departs from other works in the sense that we focus on the behavoir of the oscillations, and to do this we use closed measures. We recover known convergence results, establish new ones, and show a local principle of oscillation compensation for the velocities. Roughly speaking, the time average of gradients around one limit point vanishes. This allows us to further analyze the structure of oscillations, and establish their perpendicularity to the general drift.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
Curiosities and counterexamples in smooth convex optimization
Authors:
Jerome Bolte,
Edouard Pauwels
Abstract:
Counterexamples to some old-standing optimization problems in the smooth convex coercive setting are provided. We show that block-coordinate, steepest descent with exact search or Bregman descent methods do not generally converge. Other failures of various desirable features are established: directional convergence of Cauchy's gradient curves, convergence of Newton's flow, finite length of Tikhono…
▽ More
Counterexamples to some old-standing optimization problems in the smooth convex coercive setting are provided. We show that block-coordinate, steepest descent with exact search or Bregman descent methods do not generally converge. Other failures of various desirable features are established: directional convergence of Cauchy's gradient curves, convergence of Newton's flow, finite length of Tikhonov path, convergence of central paths, or smooth Kurdyka-Lojasiewicz inequality. All examples are planar. These examples are based on general smooth convex interpolation results. Given a decreasing sequence of positively curved C k convex compact sets in the plane, we provide a level set interpolation of a C k smooth convex function where k $\ge$ 2 is arbitrary. If the intersection is reduced to one point our interpolant has positive definite Hessian, otherwise it is positive definite out of the solution set. Furthermore , given a sequence of decreasing polygons we provide an interpolant agreeing with the vertices and whose gradients coincide with prescribed normals.
△ Less
Submitted 29 January, 2020; v1 submitted 22 January, 2020;
originally announced January 2020.
-
Wind morphology around cool evolved stars in binaries: the case of slowly accelerating oxygen-rich outflows
Authors:
I. El Mellah,
J. Bolte,
L. Decin,
W. Homan,
R. Keppens
Abstract:
The late stellar evolutionary phases of low and intermediate-mass stars are strongly constrained by their mass-loss rates. The wind surrounding cool evolved stars frequently shows non-spherical features, thought to be due to an unseen companion orbiting the donor star. We study the morphology of the circumbinary envelope, in particular around oxygen-rich asymptotic giant branch (AGB) stars. We run…
▽ More
The late stellar evolutionary phases of low and intermediate-mass stars are strongly constrained by their mass-loss rates. The wind surrounding cool evolved stars frequently shows non-spherical features, thought to be due to an unseen companion orbiting the donor star. We study the morphology of the circumbinary envelope, in particular around oxygen-rich asymptotic giant branch (AGB) stars. We run a grid of 70 3D hydrodynamics simulations of a progressively accelerating wind propagating in the Roche potential formed by a mass-loosing evolved star in orbit with a main sequence companion. We resolve the flow structure both in the immediate vicinity of the secondary, where bow shocks, outflows and wind-captured disks form, and up to 40 orbital separations, where spiral arms, arcs and equatorial density enhancements develop. When the companion is deeply engulfed in the wind, the lower terminal wind speeds and more progressive wind acceleration around oxygen-rich AGB stars make them more prone than carbon-rich AGB stars to display more disturbed outflows, a disk-like structure around the companion and a wind concentrated in the orbital plane. In these configurations, a large fraction of the wind is captured by the companion which leads to a significant shrinking of the orbit over the mass-loss timescale, if the donor star is at least a few times more massive than its companion. Provided the companion has a mass of at least a tenth of the mass of the donor star, it can compress the wind in the orbital plane up to large distances. Our grid of models covers a wide scope of configurations function of the dust chemical content, the terminal wind speed relative to the orbital speed, the extension of the dust condensation region around the cool evolved star and the mass ratio. It provides a frame of reference to interpret high-resolution maps of the outflows surrounding cool evolved stars.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
Optimal Complexity and Certification of Bregman First-Order Methods
Authors:
Radu-Alexandru Dragomir,
Adrien Taylor,
Alexandre d'Aspremont,
Jérôme Bolte
Abstract:
We provide a lower bound showing that the $O(1/k)$ convergence rate of the NoLips method (a.k.a. Bregman Gradient) is optimal for the class of functions satisfying the $h$-smoothness assumption. This assumption, also known as relative smoothness, appeared in the recent developments around the Bregman Gradient method, where acceleration remained an open issue. On the way, we show how to constructiv…
▽ More
We provide a lower bound showing that the $O(1/k)$ convergence rate of the NoLips method (a.k.a. Bregman Gradient) is optimal for the class of functions satisfying the $h$-smoothness assumption. This assumption, also known as relative smoothness, appeared in the recent developments around the Bregman Gradient method, where acceleration remained an open issue. On the way, we show how to constructively obtain the corresponding worst-case functions by extending the computer-assisted performance estimation framework of Drori and Teboulle (Mathematical Programming, 2014) to Bregman first-order methods, and to handle the classes of differentiable and strictly convex functions.
△ Less
Submitted 17 February, 2021; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning
Authors:
Jérôme Bolte,
Edouard Pauwels
Abstract:
Modern problems in AI or in numerical analysis require nonsmooth approaches with a flexible calculus. We introduce generalized derivatives called conservative fields for which we develop a calculus and provide representation formulas. Functions having a conservative field are called path differentiable: convex, concave, Clarke regular and any semialgebraic Lipschitz continuous functions are path d…
▽ More
Modern problems in AI or in numerical analysis require nonsmooth approaches with a flexible calculus. We introduce generalized derivatives called conservative fields for which we develop a calculus and provide representation formulas. Functions having a conservative field are called path differentiable: convex, concave, Clarke regular and any semialgebraic Lipschitz continuous functions are path differentiable. Using Whitney stratification techniques for semialgebraic and definable sets, our model provides variational formulas for nonsmooth automatic differentiation oracles, as for instance the famous backpropagation algorithm in deep learning. Our differential model is applied to establish the convergence in values of nonsmooth stochastic gradient methods as they are implemented in practice.
△ Less
Submitted 9 April, 2020; v1 submitted 23 September, 2019;
originally announced September 2019.
-
An Inertial Newton Algorithm for Deep Learning
Authors:
Camille Castera,
Jérôme Bolte,
Cédric Févotte,
Edouard Pauwels
Abstract:
We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both…
▽ More
We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of INNA for most deep learning problems. To do so, we provide a well-suited framework to analyze deep learning loss functions involving tame optimization in which we study a continuous dynamical system together with its discrete stochastic approximations. We prove sublinear convergence for the continuous-time differential inclusion which underlies our algorithm. Additionally, we also show how standard optimization mini-batch methods applied to non-smooth non-convex problems can yield a certain type of spurious stationary points never discussed before. We address this issue by providing a theoretical framework around the new idea of $D$-criticality; we then give a simple asymptotic analysis of INNA. Our algorithm allows for using an aggressive learning rate of $o(1/\log k)$. From an empirical viewpoint, we show that INNA returns competitive results with respect to state of the art (stochastic gradient descent, ADAGRAD, ADAM) on popular deep learning benchmark problems.
△ Less
Submitted 28 July, 2021; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Towards Corner Case Detection for Autonomous Driving
Authors:
Jan-Aike Bolte,
Andreas Bär,
Daniel Lipinski,
Tim Fingscheidt
Abstract:
The progress in autonomous driving is also due to the increased availability of vast amounts of training data for the underlying machine learning approaches. Machine learning systems are generally known to lack robustness, e.g., if the training data did rarely or not at all cover critical situations. The challenging task of corner case detection in video, which is also somehow related to unusual e…
▽ More
The progress in autonomous driving is also due to the increased availability of vast amounts of training data for the underlying machine learning approaches. Machine learning systems are generally known to lack robustness, e.g., if the training data did rarely or not at all cover critical situations. The challenging task of corner case detection in video, which is also somehow related to unusual event or anomaly detection, aims at detecting these unusual situations, which could become critical, and to communicate this to the autonomous driving system (online use case). Such a system, however, could be also used in offline mode to screen vast amounts of data and select only the relevant situations for storing and (re)training machine learning algorithms. So far, the approaches for corner case detection have been limited to videos recorded from a fixed camera, mostly for security surveillance. In this paper, we provide a formal definition of a corner case and propose a system framework for both the online and the offline use case that can handle video signals from front cameras of a naturally moving vehicle and can output a corner case score.
△ Less
Submitted 26 February, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Quartic First-Order Methods for Low-Rank Minimization
Authors:
Radu-Alexandru Dragomir,
Alexandre d'Aspremont,
Jérôme Bolte
Abstract:
We study a generalized nonconvex Burer-Monteiro formulation for low-rank minimization problems. We use recent results on non-Euclidean first order methods to provide efficient and scalable algorithms. Our approach uses geometries induced by quartic kernels on matrix spaces; for unconstrained cases we introduce a novel family of Gram kernels that considerably improves numerical performances. Numeri…
▽ More
We study a generalized nonconvex Burer-Monteiro formulation for low-rank minimization problems. We use recent results on non-Euclidean first order methods to provide efficient and scalable algorithms. Our approach uses geometries induced by quartic kernels on matrix spaces; for unconstrained cases we introduce a novel family of Gram kernels that considerably improves numerical performances. Numerical experiments for Euclidean distance matrix completion and symmetric nonnegative matrix factorization show that our algorithms scale well and reach state of the art performance when compared to specialized methods.
△ Less
Submitted 17 February, 2021; v1 submitted 30 January, 2019;
originally announced January 2019.
-
Hessian Riemannian gradient flows in convex programming
Authors:
Felipe Alvarez,
Jérôme Bolte,
Olivier Brahic
Abstract:
Motivated by a constrained minimization problem, it is studied the gradient flows with respect to Hessian Riemannian metrics induced by convex functions of Legendre type. The first result characterizes Hessian Riemannian structures on convex sets as those metrics that have a specific integration property with respect to variational inequalities, giving a new motivation for the introduction of Breg…
▽ More
Motivated by a constrained minimization problem, it is studied the gradient flows with respect to Hessian Riemannian metrics induced by convex functions of Legendre type. The first result characterizes Hessian Riemannian structures on convex sets as those metrics that have a specific integration property with respect to variational inequalities, giving a new motivation for the introduction of Bregman-type distances. Then, the general evolution problem is introduced and a differential inclusion reformulation is given. A general existence result is proved and global convergence is established under quasi-convexity conditions, with interesting refinements in the case of convex minimization. Some explicit examples of these gradient flows are discussed. Dual trajectories are identified and sufficient conditions for dual convergence are examined for a convex program with positivity and equality constraints. Some convergence rate results are established. In the case of a linear objective function, several optimality characterizations of the orbits are given: optimal path of viscosity methods, continuous-time model of Bregman-type proximal algorithms, geodesics for some adequate metrics and projections of $\dot q$-trajectories of some Lagrange equations and completely integrable Hamiltonian systems.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Many-particle quantum graphs: A review
Authors:
Jens Bolte,
Joachim Kerner
Abstract:
In this paper we review recent work that has been done on quantum many-particle systems on metric graphs. Topics include the implementation of singular interactions, Bose-Einstein condensation, sovable models and spectral properties of some simple models in connection with superconductivity in wires.
In this paper we review recent work that has been done on quantum many-particle systems on metric graphs. Topics include the implementation of singular interactions, Bose-Einstein condensation, sovable models and spectral properties of some simple models in connection with superconductivity in wires.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
Nonconvex Lagrangian-Based Optimization: Monitoring Schemes and Global Convergence
Authors:
Jérôme Bolte,
Shoham Sabach,
Marc Teboulle
Abstract:
We introduce a novel approach addressing global analysis of a difficult class of nonconvex-nonsmooth optimization problems within the important framework of Lagrangian-based methods. This genuine nonlinear class captures many problems in modern disparate fields of applications. It features complex geometries, qualification conditions, and other regularity properties do not hold everywhere. To addr…
▽ More
We introduce a novel approach addressing global analysis of a difficult class of nonconvex-nonsmooth optimization problems within the important framework of Lagrangian-based methods. This genuine nonlinear class captures many problems in modern disparate fields of applications. It features complex geometries, qualification conditions, and other regularity properties do not hold everywhere. To address these issues we work along several research lines to develop an original general Lagrangian methodology which can deal, all at once, with the above obstacles. A first innovative feature of our approach is to introduce the concept of Lagrangian sequences for a broad class of algorithms. Central to this methodology is the idea of turning an arbitrary descent method into a multiplier method. Secondly, we provide these methods with a transitional regime allowing us to identify in finitely many steps a zone where we can tune the step-sizes of the algorithm for the final converging regime. Then, despite the min-max nature of Lagrangian methods, using an original Lyapunov method we prove that each bounded sequence generated by the resulting monitoring schemes are globally convergent to a critical point for some fundamental Lagrangian-based methods in the broad semialgebraic setting, which to the best of our knowledge, are the first of this kind.
△ Less
Submitted 9 January, 2018;
originally announced January 2018.
-
The multiproximal linearization method for convex composite problems
Authors:
Jérôme Bolte,
Zheng Chen,
Edouard Pauwels
Abstract:
Composite minimization involves a collection of smooth functions which are aggregated in a nonsmooth manner. In the convex setting, we design an algorithm by linearizing each smooth component in accordance with its main curvature. The resulting method, called the Multiprox method, consists in solving successively simple problems (e.g. constrained quadratic problems) which can also feature some pro…
▽ More
Composite minimization involves a collection of smooth functions which are aggregated in a nonsmooth manner. In the convex setting, we design an algorithm by linearizing each smooth component in accordance with its main curvature. The resulting method, called the Multiprox method, consists in solving successively simple problems (e.g. constrained quadratic problems) which can also feature some proximal operators. To study the complexity and the convergence of this method we are led to study quantitative qualification conditions to understand the impact of multipliers on the complexity bounds. We obtain explicit complexity results of the form $O(\frac{1}{k})$ involving new types of constant terms. A distinctive feature of our approach is to be able to cope with oracles involving moving constraints. Our method is flexible enough to include the moving balls method, the proximal Gauss-Newton's method, or the forward-backward splitting, for which we recover known complexity results or establish new ones. We show through several numerical experiments how the use of multiple proximal terms can be decisive for problems with complex geometries.
△ Less
Submitted 23 March, 2019; v1 submitted 7 December, 2017;
originally announced December 2017.
-
First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems
Authors:
Jérôme Bolte,
Shoham Sabach,
Marc Teboulle,
Yakov Vaisbourd
Abstract:
We focus on nonconvex and nonsmooth minimization problems with a composite objective, where the differentiable part of the objective is freed from the usual and restrictive global Lipschitz gradient continuity assumption. This longstanding smoothness restriction is pervasive in first order methods (FOM), and was recently circumvent for convex composite optimization by Bauschke, Bolte and Teboulle,…
▽ More
We focus on nonconvex and nonsmooth minimization problems with a composite objective, where the differentiable part of the objective is freed from the usual and restrictive global Lipschitz gradient continuity assumption. This longstanding smoothness restriction is pervasive in first order methods (FOM), and was recently circumvent for convex composite optimization by Bauschke, Bolte and Teboulle, through a simple and elegant framework which captures, all at once, the geometry of the function and of the feasible set. Building on this work, we tackle genuine nonconvex problems. We first complement and extend their approach to derive a full extended descent lemma by introducing the notion of smooth adaptable functions. We then consider a Bregman-based proximal gradient methods for the nonconvex composite model with smooth adaptable functions, which is proven to globally converge to a critical point under natural assumptions on the problem's data. To illustrate the power and potential of our general framework and results, we consider a broad class of quadratic inverse problems with sparsity constraints which arises in many fundamental applications, and we apply our approach to derive new globally convergent schemes for this class.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Qualification Conditions in Semi-algebraic Programming
Authors:
Jérôme Bolte,
Antoine Hochart,
Edouard Pauwels
Abstract:
For an arbitrary finite family of semi-algebraic/definable functions, we consider the corresponding inequality constraint set and we study qualification conditions for perturbations of this set. In particular we prove that all positive diagonal perturbations, save perhaps a finite number of them, ensure that any point within the feasible set satisfies Mangasarian-Fromovitz constraint qualification…
▽ More
For an arbitrary finite family of semi-algebraic/definable functions, we consider the corresponding inequality constraint set and we study qualification conditions for perturbations of this set. In particular we prove that all positive diagonal perturbations, save perhaps a finite number of them, ensure that any point within the feasible set satisfies Mangasarian-Fromovitz constraint qualification. Using the Milnor-Thom theorem, we provide a bound for the number of singular perturbations when the constraints are polynomial functions. Examples show that the order of magnitude of our exponential bound is relevant. Our perturbation approach provides a simple protocol to build sequences of "regular" problems approximating an arbitrary semi-algebraic/definable problem. Applications to sequential quadratic programming methods and sum of squares relaxation are provided.
△ Less
Submitted 7 March, 2018; v1 submitted 23 May, 2017;
originally announced May 2017.
-
Solvable models of interacting n-particle systems on quantum graphs
Authors:
Jens Bolte,
George Garforth
Abstract:
We introduce n-particle quantum graphs with singular two-particle interactions in such a way that eigenfunctions can be given in the form of a Bethe ansatz. We show that this leads to a secular equation characterising eigenvalues of the Hamiltonian that is based on a finite-dimensional determinant. These findings generalise previous results about two-particle quantum graphs.
We introduce n-particle quantum graphs with singular two-particle interactions in such a way that eigenfunctions can be given in the form of a Bethe ansatz. We show that this leads to a secular equation characterising eigenvalues of the Hamiltonian that is based on a finite-dimensional determinant. These findings generalise previous results about two-particle quantum graphs.
△ Less
Submitted 3 April, 2017;
originally announced April 2017.
-
A family of functional inequalities: Lojasiewicz inequalities and displacement convex functions
Authors:
Jérôme Bolte,
Adrien Blanchet
Abstract:
For displacement convex functionals in the probability space equip\-ped with the Monge-Kantorovich metric we prove the equivalence between the gradient and functional type Łoja\-sie\-wicz inequalities. \chg{We also discuss the more general case of $λ$-convex functions and we provide a general convergence theorem for the corresponding gradient dynamics. Specialising our results to the Boltzmann ent…
▽ More
For displacement convex functionals in the probability space equip\-ped with the Monge-Kantorovich metric we prove the equivalence between the gradient and functional type Łoja\-sie\-wicz inequalities. \chg{We also discuss the more general case of $λ$-convex functions and we provide a general convergence theorem for the corresponding gradient dynamics. Specialising our results to the Boltzmann entropy, we recover Otto-Villani's theorem asserting the equivalence between logarithmic Sobolev and Talagrand's inequalities. The choice of power-type entropies shows a new equivalence between Gagliardo-Nirenberg inequality and a nonlinear Talagrand inequality. Some nonconvex results and other types of equivalences are discussed.
△ Less
Submitted 8 October, 2018; v1 submitted 8 December, 2016;
originally announced December 2016.
-
The Berry-Keating operator on a lattice
Authors:
Jens Bolte,
Sebastian Egger,
Stefan Keppeler
Abstract:
We construct and study a version of the Berry-Keating operator with a built-in truncation of the phase space, which we choose to be a two-dimensional torus. The operator is a Weyl quantisation of the classical Hamiltonian for an inverted harmonic oscillator, producing a difference operator on a finite, periodic lattice. We investigate the continuum and the infinite-volume limit of our model in con…
▽ More
We construct and study a version of the Berry-Keating operator with a built-in truncation of the phase space, which we choose to be a two-dimensional torus. The operator is a Weyl quantisation of the classical Hamiltonian for an inverted harmonic oscillator, producing a difference operator on a finite, periodic lattice. We investigate the continuum and the infinite-volume limit of our model in conjunction with the semiclassical limit. Using semiclassical methods, we show that a specific combination of the limits leads to a logarithmic mean spectral density as it was anticipated by Berry and Keating.
△ Less
Submitted 14 February, 2017; v1 submitted 20 October, 2016;
originally announced October 2016.
-
Exactly solvable interacting two-particle quantum graphs
Authors:
Jens Bolte,
George Garforth
Abstract:
We construct models of exactly solvable two-particle quantum graphs with certain non-local two-particle interactions, establishing appropriate boundary conditions via suitable self-adjoint realisations of the two-particle Laplacian. Showing compatibility with the Bethe ansatz method, we calculate quantisation conditions in the form of secular equations from which the spectra can be deduced. We com…
▽ More
We construct models of exactly solvable two-particle quantum graphs with certain non-local two-particle interactions, establishing appropriate boundary conditions via suitable self-adjoint realisations of the two-particle Laplacian. Showing compatibility with the Bethe ansatz method, we calculate quantisation conditions in the form of secular equations from which the spectra can be deduced. We compare spectral statistics of some examples to well known results in random matrix theory, analysing the chaotic properties of their classical counterparts.
△ Less
Submitted 3 September, 2016;
originally announced September 2016.
-
Discontinuous Galerkin finite element methods for radiative transfer in spherical symmetry
Authors:
D. Kitzmann,
J. Bolte,
A. B. C. Patzer
Abstract:
The discontinuous Galerkin finite element method (DG-FEM) is successfully applied to treat a broad variety of transport problems numerically. In this work, we use the full capacity of the DG-FEM to solve the radiative transfer equation in spherical symmetry. We present a discontinuous Galerkin method to directly solve the spherically-symmetric radiative transfer equation as a two-dimensional probl…
▽ More
The discontinuous Galerkin finite element method (DG-FEM) is successfully applied to treat a broad variety of transport problems numerically. In this work, we use the full capacity of the DG-FEM to solve the radiative transfer equation in spherical symmetry. We present a discontinuous Galerkin method to directly solve the spherically-symmetric radiative transfer equation as a two-dimensional problem. The transport equation in spherical atmospheres is more complicated than in the plane-parallel case due to the appearance of an additional derivative with respect to the polar angle. The DG-FEM formalism allows for the exact integration of arbitrarily complex scattering phase functions, independent of the angular mesh resolution. We show that the discontinuous Galerkin method is able to describe accurately the radiative transfer in extended atmospheres and to capture discontinuities or complex scattering behaviour which might be present in the solution of certain radiative transfer tasks and can, therefore, cause severe numerical problems for other radiative transfer solution methods.
△ Less
Submitted 25 July, 2016;
originally announced July 2016.
-
A Gutzwiller trace formula for large hermitian matrices
Authors:
Jens Bolte,
Sebastian Egger,
Stefan Keppeler
Abstract:
We develop a semiclassical approximation for the dynamics of quantum systems in finite-dimensional Hilbert spaces whose classical counterparts are defined on a toroidal phase space. In contrast to previous models of quantum maps, the time evolution is in continuous time and, hence, is generated by a Schrödinger equation. In the framework of Weyl quantisation, we construct discrete, semiclassical F…
▽ More
We develop a semiclassical approximation for the dynamics of quantum systems in finite-dimensional Hilbert spaces whose classical counterparts are defined on a toroidal phase space. In contrast to previous models of quantum maps, the time evolution is in continuous time and, hence, is generated by a Schrödinger equation. In the framework of Weyl quantisation, we construct discrete, semiclassical Fourier integral operators approximating the unitary time evolution and use these to prove a Gutzwiller trace formula. We briefly discuss a semiclassical quantisation condition for eigenvalues as well as some simple examples.
△ Less
Submitted 3 August, 2017; v1 submitted 18 December, 2015;
originally announced December 2015.
-
Monte-Carlo simulations of intensity profiles for energetic particle propagation
Authors:
R. C. Tautz,
J. Bolte,
A. Shalchi
Abstract:
Aims. Numerical test-particle simulations are a reliable and frequently used tool to test analytical transport theories and to predict mean-free paths. The comparison between solutions of the diffusion equation and the particle flux is used to critically judge the applicability of diffusion to the stochastic transport of energetic particles in magnetized turbulence. Methods. A Monte-Carlo simulati…
▽ More
Aims. Numerical test-particle simulations are a reliable and frequently used tool to test analytical transport theories and to predict mean-free paths. The comparison between solutions of the diffusion equation and the particle flux is used to critically judge the applicability of diffusion to the stochastic transport of energetic particles in magnetized turbulence. Methods. A Monte-Carlo simulation code is extended to allow for the generation of intensity profiles as well as anisotropy-time profiles. Due to the relatively low number density of computational particles, a kernel function has to be used to describe the spatial extent of each particle. Results. The obtained intensity profiles are interpreted as solutions of the diffusion equation by inserting the diffusion coefficients that have been directly determined from the mean-square displacements. The comparison shows that the time dependence of the diffusion coefficients needs to be considered, in particular the initial ballistic phase and the often sub-diffusive perpendicular coefficient. Conclusions. It is argued that the perpendicular component of the distribution function is essential if agreement between the diffusion solution and the simulated flux is to be obtained. In addition, time-dependent diffusion can provide a better description than the classic diffusion equation only after the initial ballistic phase.
△ Less
Submitted 24 November, 2015;
originally announced November 2015.
-
From error bounds to the complexity of first-order descent methods for convex functions
Authors:
Jérôme Bolte,
Trong Phong Nguyen,
Juan Peypouquet,
Bruce Suter
Abstract:
This paper shows that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the Kurdyka-Łojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex functions having a moderately flat profile near th…
▽ More
This paper shows that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the Kurdyka-Łojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex functions having a moderately flat profile near the set of minimizers (as those of functions with Hölderian growth). A counterexample shows that the equivalence is no longer true for extremely flat functions. This fact reveals the relevance of an approach based on KL inequality. In a second stage, we show how KL inequalities can in turn be employed to compute new complexity bounds for a wealth of descent methods for convex problems. Our approach is completely original and makes use of a one-dimensional worst-case proximal sequence in the spirit of the famous majorant method of Kantorovich. Our result applies to a very simple abstract scheme that covers a wide class of descent methods. As a byproduct of our study, we also provide new results for the globalization of KL inequalities in the convex framework.
Our main results inaugurate a simple methodology: derive an error bound, compute the desingularizing function whenever possible, identify essential constants in the descent method and finally compute the complexity using the one-dimensional worst case proximal sequence. Our method is illustrated through projection methods for feasibility problems, and through the famous iterative shrinkage thresholding algorithm (ISTA), for which we show that the complexity bound is of the form $O(q^{k})$ where the constituents of the bound only depend on error bound constants obtained for an arbitrary least squares objective with $\ell^1$ regularization.
△ Less
Submitted 20 July, 2016; v1 submitted 28 October, 2015;
originally announced October 2015.