subscribe to arXiv mailings

Benign Overfitting in Single-Head Attention

Authors: Roey Magen, Shuning Shang, Zhiwei Xu, Spencer Frei, Wei Hu, Gal Vardi

Abstract: The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/convolutional networks. In this work, we study benign overfitting in a single-head softmax attention model, which is the fundamental building block of Transformers… ▽ More The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/convolutional networks. In this work, we study benign overfitting in a single-head softmax attention model, which is the fundamental building block of Transformers. We prove that under appropriate conditions, the model exhibits benign overfitting in a classification setting already after two steps of gradient descent. Moreover, we show conditions where a minimum-norm/maximum-margin interpolator exhibits benign overfitting. We study how the overfitting behavior depends on the signal-to-noise ratio (SNR) of the data distribution, namely, the ratio between norms of signal and noise tokens, and prove that a sufficiently large SNR is both necessary and sufficient for benign overfitting. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2410.01774 [pdf, other]

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

Authors: Spencer Frei, Gal Vardi

Abstract: Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-traine… ▽ More Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-trained on random instances for linear regression tasks, these trained transformers make predictions using an algorithm similar to that of ordinary least squares. In this work, we investigate the behavior of linear transformers trained on random linear classification tasks. Via an analysis of the implicit regularization of gradient descent, we characterize how many pre-training tasks and in-context examples are needed for the trained transformer to generalize well at test-time. We further show that in some settings, these trained transformers can exhibit "benign overfitting in-context": when in-context examples are corrupted by label flipping noise, the transformer memorizes all of its in-context examples (including those with noisy labels) yet still generalizes near-optimally for clean test examples. △ Less

Submitted 2 October, 2024; originally announced October 2024.

Comments: 34 pages

arXiv:2407.18904 [pdf, other]

Birational geometry of Fano varieties of lines on cubic fourfolds containing pairs of cubic scrolls

Authors: Corey Brooke, Sarah Frei, Lisa Marquand, Xuqiang Qin

Abstract: We characterize the birational geometry of some hyperkähler fourfolds of Picard rank $3$ obtained as the Fano varieties of lines on cubic fourfolds containing pairs of cubic scrolls. In each of the two cases considered, we provide a census of the birational models, relating each model to familiar geometric constructions. We also provide structural results about the birational automorphism groups,… ▽ More We characterize the birational geometry of some hyperkähler fourfolds of Picard rank $3$ obtained as the Fano varieties of lines on cubic fourfolds containing pairs of cubic scrolls. In each of the two cases considered, we provide a census of the birational models, relating each model to familiar geometric constructions. We also provide structural results about the birational automorphism groups, giving generators in both cases and a full set of relations in one case. Finally, as a byproduct of our analysis, we obtain non-isomorphic cubic fourfolds whose Fano varieties of lines are birationally equivalent. △ Less

Submitted 2 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

Comments: v2: updated figures and ancillary files. 34 pages, 4 figures, two ancillary files

MSC Class: 14J42; 14E08; 14E07; 14J35

arXiv:2406.13510 [pdf, ps, other]

Conic bundle threefolds differing by a constant Brauer class and connections to rationality

Authors: Sarah Frei, Lena Ji, Soumya Sankar, Bianca Viray, Isabel Vogt

Abstract: A double cover $Y$ of $\mathbb{P}^1 \times \mathbb{P}^2$ ramified over a general $(2,2)$-divisor will have the structure of a geometrically standard conic bundle ramified over a smooth plane quartic $Δ\subset \mathbb{P}^2$ via the second projection. These threefolds are rational over algebraically closed fields, but over nonclosed fields, including over $\mathbb{R}$, their rationality is an open p… ▽ More A double cover $Y$ of $\mathbb{P}^1 \times \mathbb{P}^2$ ramified over a general $(2,2)$-divisor will have the structure of a geometrically standard conic bundle ramified over a smooth plane quartic $Δ\subset \mathbb{P}^2$ via the second projection. These threefolds are rational over algebraically closed fields, but over nonclosed fields, including over $\mathbb{R}$, their rationality is an open problem. In this paper, we characterize rationality over $\mathbb{R}$ when $Δ(\mathbb{R})$ has at least two connected components (extending work of M. Ji and the second author) and over local fields when all odd degree fibers of the first projection have nonsquare discriminant. We obtain these applications by proving general results comparing the conic bundle structure on $Y$ with the conic bundle structure on a well-chosen intersection of two quadrics. The difference between these two conic bundles is encoded by a constant Brauer class, and we prove that this class measures a certain failure of Galois descent for the codimension 2 Chow group of $Y$. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 18 pages

MSC Class: 14C25 (Primary) 14E08; 14G27; 14H40; 14K30 (Secondary)

arXiv:2404.00522 [pdf, other]

Minimum-Norm Interpolation Under Covariate Shift

Authors: Neil Mallinar, Austin Zane, Spencer Frei, Bin Yu

Abstract: Transfer learning is a critical part of real-world machine learning deployments and has been extensively studied in experimental works with overparameterized neural networks. However, even in the simplest setting of linear regression a notable gap still exists in the theoretical understanding of transfer learning. In-distribution research on high-dimensional linear regression has led to the identi… ▽ More Transfer learning is a critical part of real-world machine learning deployments and has been extensively studied in experimental works with overparameterized neural networks. However, even in the simplest setting of linear regression a notable gap still exists in the theoretical understanding of transfer learning. In-distribution research on high-dimensional linear regression has led to the identification of a phenomenon known as \textit{benign overfitting}, in which linear interpolators overfit to noisy training labels and yet still generalize well. This behavior occurs under specific conditions on the source covariance matrix and input data dimension. Therefore, it is natural to wonder how such high-dimensional linear models behave under transfer learning. We prove the first non-asymptotic excess risk bounds for benignly-overfit linear interpolators in the transfer learning setting. From our analysis, we propose a taxonomy of \textit{beneficial} and \textit{malignant} covariate shifts based on the degree of overparameterization. We follow our analysis with empirical studies that show these beneficial and malignant covariate shifts for linear interpolators on real image data, and for fully-connected neural networks in settings where the input data dimension is larger than the training sample size. △ Less

Submitted 17 July, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: The Forty-first International Conference on Machine Learning (ICML 2024)

arXiv:2403.12517 [pdf, other]

On decompositions for Fano schemes of intersections of two quadrics

Authors: Pieter Belmans, Jishnu Bose, Sarah Frei, Benjamin Gould, James Hotchkiss, Alicia Lamarche, Jack Petok, Cristian Rodriguez Avila, Saket Shah

Abstract: We propose conjectural semiorthogonal decompositions for Fano schemes of linear subspaces on intersections of two quadrics, in terms of symmetric powers of the associated hyperelliptic (resp. stacky) curve. When the intersection is odd-dimensional, we moreover conjecture an identity in the Grothendieck ring of varieties and other motivic contexts. The evidence for these conjectures is given by upg… ▽ More We propose conjectural semiorthogonal decompositions for Fano schemes of linear subspaces on intersections of two quadrics, in terms of symmetric powers of the associated hyperelliptic (resp. stacky) curve. When the intersection is odd-dimensional, we moreover conjecture an identity in the Grothendieck ring of varieties and other motivic contexts. The evidence for these conjectures is given by upgrading recent results of Chen-Vilonen-Xue, to obtain formulae for the Hodge numbers of these Fano schemes. This allows us to numerically verify the conjecture in the hyperelliptic case, and establish a combinatorial identity as evidence for the stacky case. △ Less

Submitted 8 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 27 pages, comments are very welcome; v2: added an inadvertently omitted reference in support of Conjecture A

arXiv:2402.15615 [pdf, other]

Attached and separated rotating flow over a finite height ridge

Authors: Stefan Frei, Erik Burman, Edward R Johnson

Abstract: This paper discusses the effect of rotation on the boundary layer in high Reynolds number flow over a ridge using a numerical method based on stabilised finite elements that captures steady solutions up to Reynolds number of order $10^6$. The results are validated against boundary layer computations in shallow flows and for deep flows against experimental observations reported in Machicoane et al.… ▽ More This paper discusses the effect of rotation on the boundary layer in high Reynolds number flow over a ridge using a numerical method based on stabilised finite elements that captures steady solutions up to Reynolds number of order $10^6$. The results are validated against boundary layer computations in shallow flows and for deep flows against experimental observations reported in Machicoane et al. (Phys. Rev. Fluids, 2018). In all cases considered the boundary layer remains attached, even at large Reynolds numbers, provided the Rossby number of the flow is sufficiently small. At any fixed Rossby number the flow detaches at sufficiently high Reynolds number to form a steady recirculating region in the lee of the ridge. At even higher Reynolds numbers no steady flow is found. This disappearance of steady solutions closely reproduces the transition to unsteadiness seen in the laboratory. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.00209 [pdf, other]

Modeling and numerical simulation of fully Eulerian fluid-structure interaction using cut finite elements

Authors: Stefan Frei, Tobias Knoke, Marc C. Steinbach, Anne-Kathrin Wenske, Thomas Wick

Abstract: We present a monolithic finite element formulation for (nonlinear) fluid-structure interaction in Eulerian coordinates. For the discretization we employ an unfitted finite element method based on inf-sup stable finite elements. So-called ghost penalty terms are used to guarantee the robustness of the approach independently of the way the interface cuts the finite element mesh. The resulting system… ▽ More We present a monolithic finite element formulation for (nonlinear) fluid-structure interaction in Eulerian coordinates. For the discretization we employ an unfitted finite element method based on inf-sup stable finite elements. So-called ghost penalty terms are used to guarantee the robustness of the approach independently of the way the interface cuts the finite element mesh. The resulting system is solved in a monolithic fashion using Newton's method. Our developments are tested on a numerical example with fixed interface. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2312.17373 [pdf, other]

A non-intrusive neural-network based BFGS algorithm for parameter estimation in non-stationary elasticity

Authors: Stefan Frei, Jan Reichle, Stefan Volkwein

Abstract: We present a non-intrusive gradient and a non-intrusive BFGS algorithm for parameter estimation problems in non-stationary elasticity. To avoid multiple (and potentially expensive) solutions of the underlying partial differential equation (PDE), we approximate the PDE solver by a neural network within the algorithms. The network is trained offline for a given set of parameters. The algorithms are… ▽ More We present a non-intrusive gradient and a non-intrusive BFGS algorithm for parameter estimation problems in non-stationary elasticity. To avoid multiple (and potentially expensive) solutions of the underlying partial differential equation (PDE), we approximate the PDE solver by a neural network within the algorithms. The network is trained offline for a given set of parameters. The algorithms are applied to an unsteady linear-elastic contact problem; their convergence and approximation properties are investigated numerically. △ Less

Submitted 15 August, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2310.02541 [pdf, other]

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data

Authors: Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu

Abstract: Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can sometimes be benign. Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performanc… ▽ More Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can sometimes be benign. Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training. In this work, we show that both of these phenomena provably occur in two-layer ReLU networks trained by GD on XOR cluster data where a constant fraction of the training labels are flipped. In this setting, we show that after the first step of GD, the network achieves 100% training accuracy, perfectly fitting the noisy labels in the training data, but achieves near-random test accuracy. At a later training step, the network achieves near-optimal test accuracy while still fitting the random labels in the training data, exhibiting a "grokking" phenomenon. This provides the first theoretical result of benign overfitting in neural network classification when the data distribution is not linearly separable. Our proofs rely on analyzing the feature learning process under GD, which reveals that the network implements a non-generalizable linear classifier after one step and gradually learns generalizable features in later steps. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.00531 [pdf, ps, other]

On abelian varieties whose torsion is not self-dual

Authors: Sarah Frei, Katrina Honigs, John Voight

Abstract: We construct infinitely many abelian surfaces $A$ defined over the rational numbers such that, for $\ell\leqslant 7$ prime, the $\ell$-torsion subgroup of $A$ is not isomorphic as a Galois module to the $\ell$-torsion subgroup of the dual abelian surface. We do this by analyzing the action of the Galois group on the $\ell$-adic Tate module and its reduction modulo $\ell$. We construct infinitely many abelian surfaces $A$ defined over the rational numbers such that, for $\ell\leqslant 7$ prime, the $\ell$-torsion subgroup of $A$ is not isomorphic as a Galois module to the $\ell$-torsion subgroup of the dual abelian surface. We do this by analyzing the action of the Galois group on the $\ell$-adic Tate module and its reduction modulo $\ell$. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: 22 pages

arXiv:2308.03215 [pdf, other]

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

Authors: Nikhil Ghosh, Spencer Frei, Wooseok Ha, Bin Yu

Abstract: In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activation on orthogonal data. We show that for this non-convex problem, randomly initialized SGD with a constant step size successfully finds a global minimum for any batch size choice. However, the particular global minimum found depends upon the batch size… ▽ More In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activation on orthogonal data. We show that for this non-convex problem, randomly initialized SGD with a constant step size successfully finds a global minimum for any batch size choice. However, the particular global minimum found depends upon the batch size. In the full-batch setting, we show that the solution is dense (i.e., not sparse) and is highly aligned with its initialized direction, showing that relatively little feature learning occurs. On the other hand, for any batch size strictly smaller than the number of samples, SGD finds a global minimum which is sparse and nearly orthogonal to its initialization, showing that the randomness of stochastic gradients induces a qualitatively different type of "feature selection" in this setting. Moreover, if we measure the sharpness of the minimum by the trace of the Hessian, the minima found with full batch gradient descent are flatter than those found with strictly smaller batch sizes, in contrast to previous works which suggest that large batches lead to sharper minima. To prove convergence of SGD with a constant step size, we introduce a powerful tool from the theory of non-homogeneous random walks which may be of independent interest. △ Less

Submitted 6 August, 2023; originally announced August 2023.

arXiv:2306.09927 [pdf, other]

Trained Transformers Learn Linear Models In-Context

Authors: Ruiqi Zhang, Spencer Frei, Peter L. Bartlett

Abstract: Attention-based neural networks such as transformers have demonstrated a remarkable ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an unseen task, they can formulate relevant per-token and next-token predictions without any parameter updates. By embedding a sequence of labeled training data and unlabeled test data as a prompt, this allows for transformer… ▽ More Attention-based neural networks such as transformers have demonstrated a remarkable ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an unseen task, they can formulate relevant per-token and next-token predictions without any parameter updates. By embedding a sequence of labeled training data and unlabeled test data as a prompt, this allows for transformers to behave like supervised learning algorithms. Indeed, recent work has shown that when training transformer architectures over random instances of linear regression problems, these models' predictions mimic those of ordinary least squares. Towards understanding the mechanisms underlying this phenomenon, we investigate the dynamics of ICL in transformers with a single linear self-attention layer trained by gradient flow on linear regression tasks. We show that despite non-convexity, gradient flow with a suitable random initialization finds a global minimum of the objective function. At this global minimum, when given a test prompt of labeled examples from a new prediction task, the transformer achieves prediction error competitive with the best linear predictor over the test prompt distribution. We additionally characterize the robustness of the trained transformer to a variety of distribution shifts and show that although a number of shifts are tolerated, shifts in the covariate distribution of the prompts are not. Motivated by this, we consider a generalized ICL setting where the covariate distributions can vary across prompts. We show that although gradient flow succeeds at finding a global minimum in this setting, the trained transformer is still brittle under mild covariate shifts. We complement this finding with experiments on large, nonlinear transformer architectures which we show are more robust under covariate shifts. △ Less

Submitted 19 October, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 50 pages, revised definition 3.2 and corollary 4.3

arXiv:2304.09306 [pdf, ps, other]

doi 10.1007/s40993-024-00515-8

A threefold violating a local-to-global principle for rationality

Authors: Sarah Frei, Lena Ji

Abstract: In this note we construct an example of a smooth projective threefold that is irrational over $\mathbb Q$ but is rational at all places. Our example is a complete intersection of two quadrics in $\mathbb P^5$, and we show it has the desired rationality behavior by constructing an explicit element of order $4$ in the Tate--Shafarevich group of the Jacobian of an associated genus $2$ curve. In this note we construct an example of a smooth projective threefold that is irrational over $\mathbb Q$ but is rational at all places. Our example is a complete intersection of two quadrics in $\mathbb P^5$, and we show it has the desired rationality behavior by constructing an explicit element of order $4$ in the Tate--Shafarevich group of the Jacobian of an associated genus $2$ curve. △ Less

Submitted 26 June, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: 7 pages. v2: the example is now unconditional

MSC Class: Primary: 14E08. Secondary: 14G12; 14J20; 11G30

Journal ref: Res. Number Theory 10 (2024), no. 2, Paper No. 39, 9 pp

arXiv:2303.01462 [pdf, ps, other]

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

Authors: Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro

Abstract: Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization. In this work we establish a number of settings where the satisfaction of these KKT conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks: the estim… ▽ More Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization. In this work we establish a number of settings where the satisfaction of these KKT conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks: the estimators interpolate noisy training data and simultaneously generalize well to test data. The settings include variants of the noisy class-conditional Gaussians considered in previous work as well as new distributional settings where benign overfitting has not been previously observed. The key ingredient to our proof is the observation that when the training data is nearly-orthogonal, both linear classifiers and leaky ReLU networks satisfying the KKT conditions for their respective margin maximization problems behave like a nearly uniform average of the training examples. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 53 pages

arXiv:2303.01456 [pdf, ps, other]

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Authors: Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro

Abstract: In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are highly vulnerable to adversarial e… ▽ More In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are highly vulnerable to adversarial examples. Our results hold even in cases where the network has many more parameters than training examples. Despite the potential for harmful overfitting in such overparameterized settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial $\ell_2$-perturbations), even though robust networks that fit the data exist. △ Less

Submitted 31 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 42 pages; NeurIPS 2023 camera ready

arXiv:2210.07082 [pdf, other]

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Authors: Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

Abstract: The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient… ▽ More The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an $\ell_2$-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training. We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 54 pages

arXiv:2207.14035 [pdf, ps, other]

Groups of symplectic involutions on symplectic varieties of Kummer type and their fixed loci

Authors: Sarah Frei, Katrina Honigs

Abstract: We describe the Galois action on the middle $\ell$-adic cohomology of smooth, projective fourfolds $K_A(v)$ that occur as a fiber of the Albanese morphism on moduli spaces of sheaves on an abelian surface $A$ with Mukai vector $v$. We show this action is determined by the action on $H^2_{ét}(A_{\bar{k}},\mathbb{Q}_\ell(1))$ and on a subgroup $G_A(v) \leqslant (A\times \hat{A})[3]$, which depends o… ▽ More We describe the Galois action on the middle $\ell$-adic cohomology of smooth, projective fourfolds $K_A(v)$ that occur as a fiber of the Albanese morphism on moduli spaces of sheaves on an abelian surface $A$ with Mukai vector $v$. We show this action is determined by the action on $H^2_{ét}(A_{\bar{k}},\mathbb{Q}_\ell(1))$ and on a subgroup $G_A(v) \leqslant (A\times \hat{A})[3]$, which depends on $v$. This generalizes the analysis carried out by Hassett and Tschinkel [HT13] over $\mathbb{C}$. As a consequence, over number fields, we give a condition under which $K_2(A)$ and $K_2(\hat{A})$ are not derived equivalent. The points of $G_A(v)$ correspond to involutions of $K_A(v)$. Over $\mathbb{C}$, they are known to be symplectic and contained in the kernel of the map $\mathrm{Aut}(K_A(v))\to \mathrm{O}(H^2(K_A(v),\mathbb{Z}))$. We describe this kernel for all varieties $K_A(v)$ of dimension at least $4$. When $K_A(v)$ is a fourfold over a field of characteristic 0, the fixed-point loci of the involutions contain K3 surfaces whose cycle classes span a large portion of the middle cohomology. We examine the fixed loci in fourfolds $K_A(0,l,s)$ over $\mathbb{C}$ where $l$ is a $(1,3)$-polarization, finding the K3 surface to be elliptically fibered under a Lagrangian fibration of $K_A(0,l,s)$. △ Less

Submitted 15 April, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: 44 pages. v3: minor revisions. final version to appear in Forum Math. Sigma

arXiv:2207.07093 [pdf, ps, other]

doi 10.14231/AG-2024-014

Curve classes on conic bundle threefolds and applications to rationality

Authors: Sarah Frei, Lena Ji, Soumya Sankar, Bianca Viray, Isabel Vogt

Abstract: We undertake a study of conic bundle threefolds $π\colon X\to W$ over geometrically rational surfaces whose associated discriminant covers $\tildeΔ\toΔ\subset W$ are smooth and geometrically irreducible. First, we determine the structure of the group $\mathrm{CH}^2 X_{\overline{k}}$ of rational equivalence classes of curves. Precisely, we construct a Galois-equivariant group homomorphism from… ▽ More We undertake a study of conic bundle threefolds $π\colon X\to W$ over geometrically rational surfaces whose associated discriminant covers $\tildeΔ\toΔ\subset W$ are smooth and geometrically irreducible. First, we determine the structure of the group $\mathrm{CH}^2 X_{\overline{k}}$ of rational equivalence classes of curves. Precisely, we construct a Galois-equivariant group homomorphism from $\mathrm{CH}^2X_{\overline{k}}$ to a group scheme associated to the discriminant cover $\tildeΔ\to Δ$ of $X$. The target group scheme is a generalization of the Prym variety of $\tildeΔ\toΔ$ and so our result can be viewed as a generalization of Beauville's result that the algebraically trivial curve classes on $X_{\overline{k}}$ are parametrized by the Prym variety. We apply our structural result on curve classes to study the refined intermediate Jacobian torsor (IJT) obstruction to rationality introduced by Hassett--Tschinkel and Benoist--Wittenberg. The first case of interest is $W = \mathbb P^2$ and $Δ$ is a smooth plane quartic. In this case, we show that the IJT obstruction characterizes rationality when the ground field has less arithmetic complexity (precisely, when the $2$-torsion in the Brauer group of the ground field is trivial). We also show that a hypothesis of this form is necessary by constructing, over any $k \subset\mathbb R$, a conic bundle threefold with $Δ$ a smooth quartic where the IJT obstruction vanishes, yet $X$ is irrational over $k$. △ Less

Submitted 21 July, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

Comments: 39 pages. Comments welcome! v2: Updated introduction. v3: Added Section 3, Subsection 5.5, and Example 8.6

MSC Class: 14C25 (Primary); 14E08; 14G27; 14H40; 14K30 (Secondary)

Journal ref: Algebr. Geom. 11 (3) (2024) 421-459

arXiv:2207.02081 [pdf, other]

Efficient coarse correction for parallel time-stepping in plaque growth simulations

Authors: Stefan Frei, Alexander Heinlein

Abstract: In order to make the numerical simulation of atherosclerotic plaque growth feasible, a temporal homogenization approach is employed. The resulting macro-scale problem for the plaque growth can be further accelerated by using parallel time integration schemes, such as the parareal algorithm. However, the parallel scalability is dominated by the computational cost of the coarse propagator. Therefore… ▽ More In order to make the numerical simulation of atherosclerotic plaque growth feasible, a temporal homogenization approach is employed. The resulting macro-scale problem for the plaque growth can be further accelerated by using parallel time integration schemes, such as the parareal algorithm. However, the parallel scalability is dominated by the computational cost of the coarse propagator. Therefore, in this paper, an interpolation-based coarse propagator, which uses growth values from previously computed micro-scale problems, is introduced. For a simple model problem, it is shown that this approach reduces both the computational work for a single parareal iteration as well as the required number of parareal iterations. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: This article has been written as a contribution to a conference proceedings, which builds on our previous article arXiv:2203.06526

MSC Class: 65M60; 65Y05

arXiv:2203.06581 [pdf, other]

Analysis of an implicitly extended Crank-Nicolson scheme for the heat equation on a time-dependent domain

Authors: Stefan Frei, Maneesh Kumar Singh

Abstract: We consider a time-stepping scheme of Crank-Nicolson type for the heat equation on a moving domain in Eulerian coordinates. As the spatial domain varies between subsequent time steps, an extension of the solution from the previous time step is required. Following Lehrenfeld \& Olskanskii [ESAIM: M2AN, 53(2):\,585-614, 2019], we apply an implicit extension based on so-called ghost-penalty terms. Fo… ▽ More We consider a time-stepping scheme of Crank-Nicolson type for the heat equation on a moving domain in Eulerian coordinates. As the spatial domain varies between subsequent time steps, an extension of the solution from the previous time step is required. Following Lehrenfeld \& Olskanskii [ESAIM: M2AN, 53(2):\,585-614, 2019], we apply an implicit extension based on so-called ghost-penalty terms. For spatial discretisation, a cut finite element method is used. We derive a complete a priori error analysis in space and time, which shows in particular second-order convergence in time under a parabolic CFL condition. Finally, we present numerical results in two and three space dimensions that confirm the analytical estimates, even for much larger time steps. △ Less

Submitted 28 April, 2023; v1 submitted 13 March, 2022; originally announced March 2022.

MSC Class: 65M12; 65M60; 65M85

arXiv:2203.06526 [pdf, other]

doi 10.1016/j.jcp.2023.112347

Towards parallel time-stepping for the numerical simulation of atherosclerotic plaque growth

Authors: Stefan Frei, Alexander Heinlein

Abstract: The numerical simulation of atherosclerotic plaque growth is computationally prohibitive, since it involves a complex cardiovascular fluid-structure interaction (FSI) problem with a characteristic time scale of milliseconds to seconds, as well as a plaque growth process governed by reaction-diffusion equations, which takes place over several months. In this work we combine a temporal homogenizatio… ▽ More The numerical simulation of atherosclerotic plaque growth is computationally prohibitive, since it involves a complex cardiovascular fluid-structure interaction (FSI) problem with a characteristic time scale of milliseconds to seconds, as well as a plaque growth process governed by reaction-diffusion equations, which takes place over several months. In this work we combine a temporal homogenization approach, which separates the problem in computationally expensive FSI problems on a micro scale and a reaction-diffusion problem on the macro scale, with parallel time-stepping algorithms. It has been found in the literature that parallel time-stepping algorithms do not perform well when applied directly to the FSI problem. To circumvent this problem, a parareal algorithm is applied on the macro-scale reaction-diffusion problem instead of the micro-scale FSI problem. We investigate modifications in the coarse propagator of the parareal algorithm, in order to further reduce the number of costly micro problems to be solved. The approaches are tested in detailed numerical investigations based on serial simulations. △ Less

Submitted 25 April, 2023; v1 submitted 12 March, 2022; originally announced March 2022.

arXiv:2202.07626 [pdf, other]

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

Authors: Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

Abstract: In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although line… ▽ More In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features. △ Less

Submitted 13 September, 2023; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: 46 pages; JMLR camera ready revision

arXiv:2202.05928 [pdf, ps, other]

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Authors: Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

Abstract: Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We… ▽ More Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear. △ Less

Submitted 13 September, 2023; v1 submitted 11 February, 2022; originally announced February 2022.

Comments: 39 pages; minor corrections

arXiv:2111.08668 [pdf, ps, other]

Reduction of Brauer classes on K3 surfaces, rationality and derived equivalence

Authors: Sarah Frei, Brendan Hassett, Anthony Várilly-Alvarado

Abstract: We consider the reduction of Brauer classes on surfaces over number fields, with a view toward applications to rationality and derived equivalence. We show that a Brauer class on a very general polarized K3 surface over a number field becomes trivial upon reduction for a set of places of positive natural density. As a consequence, there are cubic fourfolds which become rational upon reduction for… ▽ More We consider the reduction of Brauer classes on surfaces over number fields, with a view toward applications to rationality and derived equivalence. We show that a Brauer class on a very general polarized K3 surface over a number field becomes trivial upon reduction for a set of places of positive natural density. As a consequence, there are cubic fourfolds which become rational upon reduction for a positive proportion of places, and there are twisted derived equivalent K3 surfaces which become derived equivalent upon reduction for a positive proportion of places. △ Less

Submitted 4 March, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: 23 pages; minor changes to clarify main argument

MSC Class: 14J28; 14F22 (primary); 14E08; 14F08 (secondary)

arXiv:2106.13805 [pdf, other]

Self-training Converts Weak Learners to Strong Learners in Mixture Models

Authors: Spencer Frei, Difan Zou, Zixiang Chen, Quanquan Gu

Abstract: We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbolβ_{\mathrm{pl}}$ can achieve classification error at most… ▽ More We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbolβ_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbolβ_0 := \boldsymbolβ_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbolβ_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $\boldsymbolβ_{\mathrm{pl}}$ with classification error $C_{\mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $\varepsilon$). Together our results imply that mixture models can be learned to within $\varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $\tilde O(d/\varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm. △ Less

Submitted 25 August, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: 23 pages. This version has added more detailed comparisons with related work, fixed a technical issue in the original proof, and improved the convergence guarantee to be about the last iterate of stochastic gradient descent

arXiv:2106.13792 [pdf, ps, other]

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Authors: Spencer Frei, Quanquan Gu

Abstract: Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the particular s… ▽ More Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the particular setup in each problem, making it difficult to generalize across different settings. To address this drawback in the literature, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that gradient descent on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities. △ Less

Submitted 13 September, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: 16 pages. Updated presentation, changed results from online SGD to batch GD

arXiv:2106.09394 [pdf, other]

On temporal homogenization in the numerical simulation of atherosclerotic plaque growth

Authors: Stefan Frei, Alexander Heinlein, Thomas Richter

Abstract: A temporal homogenization approach for the numerical simulation of atherosclerotic plaque growth is extended to fully coupled fluid-structure interaction (FSI) simulations. The numerical results indicate that the two-scale approach yields significantly different results compared to a simple heuristic averaging, where only stationary long-scale FSI problems are solved, confirming the importance of… ▽ More A temporal homogenization approach for the numerical simulation of atherosclerotic plaque growth is extended to fully coupled fluid-structure interaction (FSI) simulations. The numerical results indicate that the two-scale approach yields significantly different results compared to a simple heuristic averaging, where only stationary long-scale FSI problems are solved, confirming the importance of incorporating stress variations on small time-scales. In the homogenization approach, a periodic fine-scale problem, which is periodic with respect to the heart beat, has to be solved for each long-scale time step. Even if no exact initial conditions are available, periodicity can be achieved within only 2-3 heart beats by simple time-stepping. △ Less

Submitted 17 June, 2021; originally announced June 2021.

MSC Class: 65M60; 74F10; 76M10; 76Z05; 92C35

arXiv:2104.09437 [pdf, other]

Provable Robustness of Adversarial Training for Learning Halfspaces with Noise

Authors: Difan Zou, Spencer Frei, Quanquan Gu

Abstract: We analyze the properties of adversarial training for learning adversarially robust halfspaces in the presence of agnostic label noise. Denoting $\mathsf{OPT}_{p,r}$ as the best robust classification error achieved by a halfspace that is robust to perturbations of $\ell_{p}$ balls of radius $r$, we show that adversarial training on the standard binary cross-entropy loss yields adversarially robust… ▽ More We analyze the properties of adversarial training for learning adversarially robust halfspaces in the presence of agnostic label noise. Denoting $\mathsf{OPT}_{p,r}$ as the best robust classification error achieved by a halfspace that is robust to perturbations of $\ell_{p}$ balls of radius $r$, we show that adversarial training on the standard binary cross-entropy loss yields adversarially robust halfspaces up to (robust) classification error $\tilde O(\sqrt{\mathsf{OPT}_{2,r}})$ for $p=2$, and $\tilde O(d^{1/4} \sqrt{\mathsf{OPT}_{\infty, r}} + d^{1/2} \mathsf{OPT}_{\infty,r})$ when $p=\infty$. Our results hold for distributions satisfying anti-concentration properties enjoyed by log-concave isotropic distributions among others. We additionally show that if one instead uses a nonconvex sigmoidal loss, adversarial training yields halfspaces with an improved robust classification error of $O(\mathsf{OPT}_{2,r})$ for $p=2$, and $O(d^{1/4}\mathsf{OPT}_{\infty, r})$ when $p=\infty$. To the best of our knowledge, this is the first work to show that adversarial training provably yields robust classifiers in the presence of noise. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: 42 pages, 2 figures

arXiv:2103.11243 [pdf, other]

doi 10.1016/j.cma.2022.114637

A mechanically consistent model for fluid-structure interactions with contact including seepage

Authors: Erik Burman, Miguel A. Fernández, Stefan Frei, Fannie M. Gerosa

Abstract: We present a new approach for the mechanically consistent modelling and simulation of fluid-structure interactions with contact. The fundamental idea consists of combining a relaxed contact formulation with the modelling of seepage through a porous layer of co-dimension 1 during contact. For the latter, a Darcy model is considered in a thin porous layer attached to a solid boundary in the limit of… ▽ More We present a new approach for the mechanically consistent modelling and simulation of fluid-structure interactions with contact. The fundamental idea consists of combining a relaxed contact formulation with the modelling of seepage through a porous layer of co-dimension 1 during contact. For the latter, a Darcy model is considered in a thin porous layer attached to a solid boundary in the limit of infinitesimal thickness. In combination with a relaxation of the contact conditions the computational model is both mechanically consistent and simple to implement. We analyse the approach in detailed numerical studies with both thick- and thin-walled solids, within a fully Eulerian and an immersed approach for the fluid-structure interaction and using fitted and unfitted finite element discretisations. △ Less

Submitted 20 March, 2021; originally announced March 2021.

arXiv:2101.01152 [pdf, other]

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

Authors: Spencer Frei, Yuan Cao, Quanquan Gu

Abstract: We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Eq… ▽ More We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Equivalently, such networks can generalize when the data distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. To the best of our knowledge, this is the first work to show that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise. △ Less

Submitted 15 February, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: 30 pages, 10 figures

arXiv:2011.08691 [pdf, other]

doi 10.1063/5.0037971

Falling balls in a viscous fluid with contact: Comparing numerical simulations with experimental data

Authors: Henry von Wahl, Thomas Richter, Stefan Frei, Thomas Hagemeier

Abstract: We evaluate a number of different finite element approaches for fluid-structure (contact) interaction problems against data from physical experiments. For this we take the data from experiments by Hagemeier [Mendeley Data, doi: 10.17632/mf27c92nc3.1]. This consists of trajectories of single particles falling through a highly viscous fluid and rebounding off the bottom fluid tank wall. The resultin… ▽ More We evaluate a number of different finite element approaches for fluid-structure (contact) interaction problems against data from physical experiments. For this we take the data from experiments by Hagemeier [Mendeley Data, doi: 10.17632/mf27c92nc3.1]. This consists of trajectories of single particles falling through a highly viscous fluid and rebounding off the bottom fluid tank wall. The resulting flow is in the transitional regime between creeping and turbulent flows. This type of configuration is particularly challenging for numerical methods due to the large change of the fluid domain and the contact between the wall and particle. In the numerical simulations we consider both rigid body and linear elasticity models for the falling particles. In the first case, we compare results obtained with the well established Arbitrary Lagrangian Eulerian (ALE) approach and a moving domain CutFEM method together with a simple and common approach for contact avoidance. For the full fluid-structure interaction (FSI) problem with contact, we use a fully Eulerian approach in combination with a unified FSI-contact treatment using Nitsche's method. For higher computational efficiency we use the geometrical symmetry of the experimental set up to reformulate the FSI system into two spatial dimensions. Finally, we show full three dimensional ALE computations to study the effects of small perturbations in the initial state of the particle to investigate deviations from a perfectly vertical fall observed in the experiment. The methods are implemented in open-source finite element libraries and the results are made freely available to aide reproducibility. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Journal ref: Physics of Fluids 33, 033304 (2021)

arXiv:2010.00539 [pdf, other]

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

Authors: Spencer Frei, Yuan Cao, Quanquan Gu

Abstract: We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If $\mathsf{OPT}$ is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error $\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon$ in… ▽ More We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If $\mathsf{OPT}$ is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error $\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon$ in $\mathrm{poly}(d,1/\varepsilon)$ time and sample complexity for a broad class of distributions that includes log-concave isotropic distributions as a subclass. Along the way we answer a question recently posed by Ji et al. (2020) on how the tail behavior of a loss function can affect sample complexity and runtime guarantees for gradient descent. △ Less

Submitted 13 February, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

Comments: 25 pages, 1 table

arXiv:2007.13906 [pdf, other]

A locally modified second-order finite element method for interface problems and its implementation in 2 dimensions

Authors: Stefan Frei, Gozel Judakova, Thomas Richter

Abstract: The locally modified finite element method, which is introduced in [Frei, Richter: SINUM 52(2014), p. 2315-2334], is a simple fitted finite element method that is able to resolve weak discontinuities in interface problems. The method is based on a fixed structured coarse mesh, which is then refined into sub-elements to resolve an interior interface. In this work, we extend the locally modified f… ▽ More The locally modified finite element method, which is introduced in [Frei, Richter: SINUM 52(2014), p. 2315-2334], is a simple fitted finite element method that is able to resolve weak discontinuities in interface problems. The method is based on a fixed structured coarse mesh, which is then refined into sub-elements to resolve an interior interface. In this work, we extend the locally modified finite element method {in two space dimensions} to second order using an isoparametric approach in the interface elements. Thereby we need to take care that the resulting curved edges do not lead to degenerate sub-elements. We prove optimal a priori error estimates in the $L^2$-norm and in a discrete energy norm. Finally, we present numerical examples to substantiate the theoretical findings. △ Less

Submitted 28 April, 2023; v1 submitted 27 July, 2020; originally announced July 2020.

arXiv:2005.14426 [pdf, other]

Agnostic Learning of a Single Neuron with Gradient Descent

Authors: Spencer Frei, Yuan Cao, Quanquan Gu

Abstract: We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\mathbb{E}_{(x,y)\sim \mathcal{D}}[(σ(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient descent to minimize the empirical risk induced by a set of i.i.d. samples $S\sim \mathcal{D}^n$. The activation function $σ$ is an arbitrary Lipschitz and non-decreasin… ▽ More We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\mathbb{E}_{(x,y)\sim \mathcal{D}}[(σ(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient descent to minimize the empirical risk induced by a set of i.i.d. samples $S\sim \mathcal{D}^n$. The activation function $σ$ is an arbitrary Lipschitz and non-decreasing function, making the optimization problem nonconvex and nonsmooth in general, and covers typical neural network activation functions and inverse link functions in the generalized linear model setting. In the agnostic PAC learning setting, where no assumption on the relationship between the labels $y$ and the input $x$ is made, if the optimal population risk is $\mathsf{OPT}$, we show that gradient descent achieves population risk $O(\mathsf{OPT})+ε$ in polynomial time and sample complexity when $σ$ is strictly increasing. For the ReLU activation, our population risk guarantee is $O(\mathsf{OPT}^{1/2})+ε$. When labels take the form $y = σ(v^\top x) + ξ$ for zero-mean sub-Gaussian noise $ξ$, we show that the population risk guarantees for gradient descent improve to $\mathsf{OPT} + ε$. Our sample complexity and runtime guarantees are (almost) dimension independent, and when $σ$ is strictly increasing, require no distributional assumptions beyond boundedness. For ReLU, we show the same results under a nondegeneracy assumption for the marginal distribution of the input. △ Less

Submitted 31 August, 2020; v1 submitted 29 May, 2020; originally announced May 2020.

Comments: 31 pages, 3 tables. This version improves the risk bound from O(OPT^1/2) to O(OPT) for strictly increasing activation functions

arXiv:1912.08503 [pdf, other]

3D-2D Stokes-Darcy coupling for the modelling of seepage with an application to fluid-structure interaction with contact

Authors: Erik Burman, Miguel A. Fernández, Stefan Frei, Fannie M. Gerosa

Abstract: In this note we introduce a mixed dimensional Stokes-Darcy coupling where a $d$ dimensional Stokes' flow is coupled to a Darcy model on the $d-1$ dimensional boundary of the domain. The porous layer introduces tangential creeping flow along the boundary and allows for the modelling of boundary flow due to surface roughness. This leads to a new model of flow in fracture networks with reservoirs in… ▽ More In this note we introduce a mixed dimensional Stokes-Darcy coupling where a $d$ dimensional Stokes' flow is coupled to a Darcy model on the $d-1$ dimensional boundary of the domain. The porous layer introduces tangential creeping flow along the boundary and allows for the modelling of boundary flow due to surface roughness. This leads to a new model of flow in fracture networks with reservoirs in an impenetrable bulk matrix. Exploiting this modelling capability, we then formulate a fluid-structure interaction method with contact, where the porous layer allows for mechanically consistent contact and release. Physical seepage in the contact zone due to rough surfaces is modelled by the porous layer. Some numerical examples are reported, both on the Stokes'-Darcy coupling alone and on the fluid-structure interaction with contact in the porous boundary layer. △ Less

Submitted 18 December, 2019; originally announced December 2019.

arXiv:1910.03054 [pdf, other]

Eulerian time-stepping schemes for the non-stationary Stokes equations on time-dependent domains

Authors: Erik Burman, Stefan Frei, Andre Massing

Abstract: This article is concerned with the discretisation of the Stokes equations on time-dependent domains in an Eulerian coordinate framework. Our work can be seen as an extension of a recent paper by Lehrenfeld & Olshanskii [ESAIM: M2AN, 53(2):585-614, 2019], where BDF-type time-stepping schemes are studied for a parabolic equation on moving domains. For space discretisation, a geometrically unfitted f… ▽ More This article is concerned with the discretisation of the Stokes equations on time-dependent domains in an Eulerian coordinate framework. Our work can be seen as an extension of a recent paper by Lehrenfeld & Olshanskii [ESAIM: M2AN, 53(2):585-614, 2019], where BDF-type time-stepping schemes are studied for a parabolic equation on moving domains. For space discretisation, a geometrically unfitted finite element discretisation is applied in combination with Nitsche's method to impose boundary conditions. Physically undefined values of the solution at previous time-steps are extended implicitly by means of so-called ghost penalty stabilisations. We derive a complete a priori error analysis of the discretisation error in space and time, including optimal $L^2(L^2)$-norm error bounds for the velocities. Finally, the theoretical results are substantiated with numerical examples. △ Less

Submitted 1 December, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

arXiv:1910.02934 [pdf, other]

Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks

Authors: Spencer Frei, Yuan Cao, Quanquan Gu

Abstract: The skip-connections used in residual networks have become a standard architecture choice in deep learning due to the increased training stability and generalization performance with this architecture, although there has been limited theoretical understanding for this improvement. In this work, we analyze overparameterized deep residual networks trained by gradient descent following random initial… ▽ More The skip-connections used in residual networks have become a standard architecture choice in deep learning due to the increased training stability and generalization performance with this architecture, although there has been limited theoretical understanding for this improvement. In this work, we analyze overparameterized deep residual networks trained by gradient descent following random initialization, and demonstrate that (i) the class of networks learned by gradient descent constitutes a small subset of the entire neural network function class, and (ii) this subclass of networks is sufficiently large to guarantee small training error. By showing (i) we are able to demonstrate that deep residual networks trained with gradient descent have a small generalization gap between training and test error, and together with (ii) this guarantees that the test error will be small. Our optimization and generalization guarantees require overparameterization that is only logarithmic in the depth of the network, while all known generalization bounds for deep non-residual networks have overparameterization requirements that are at least polynomial in the depth. This provides an explanation for why residual networks are preferable to non-residual ones. △ Less

Submitted 7 October, 2019; originally announced October 2019.

Comments: 37 pages. In NeurIPS 2019

arXiv:1908.05580 [pdf, ps, other]

doi 10.1137/19M1281721

Weak imposition of Signorini boundary conditions on the boundary element method

Authors: Erik Burman, Stefan Frei, Matthew W. Scroggs

Abstract: We derive and analyse a boundary element formulation for boundary conditions involving inequalities. In particular, we focus on Signorini contact conditions. The Calderón projector is used for the system matrix and boundary conditions are weakly imposed using a particular variational boundary operator designed using techniques from augmented Lagrangian methods. We present a complete numerical a pr… ▽ More We derive and analyse a boundary element formulation for boundary conditions involving inequalities. In particular, we focus on Signorini contact conditions. The Calderón projector is used for the system matrix and boundary conditions are weakly imposed using a particular variational boundary operator designed using techniques from augmented Lagrangian methods. We present a complete numerical a priori error analysis and present some numerical examples to illustrate the theory. △ Less

Submitted 13 May, 2020; v1 submitted 15 August, 2019; originally announced August 2019.

Journal ref: SIAM Journal on Numerical Analysis 58(4), 2020, 2334-2350

arXiv:1906.02261 [pdf, ps, other]

doi 10.1112/S0010437X21007089

Rational points and derived equivalence

Authors: Nicolas Addington, Benjamin Antieau, Sarah Frei, Katrina Honigs

Abstract: We give the first examples of derived equivalences between varieties defined over non-closed fields where one has a rational point and the other does not. We begin with torsors over Jacobians of curves over Q and F_q(t), and conclude with a pair of hyperkaehler 4-folds over Q. The latter is independently interesting as a new example of a transcendental Brauer-Manin obstruction to the Hasse princip… ▽ More We give the first examples of derived equivalences between varieties defined over non-closed fields where one has a rational point and the other does not. We begin with torsors over Jacobians of curves over Q and F_q(t), and conclude with a pair of hyperkaehler 4-folds over Q. The latter is independently interesting as a new example of a transcendental Brauer-Manin obstruction to the Hasse principle. △ Less

Submitted 16 December, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

Comments: 20 pages, magma code as ancillary files. final version to appear in Compositio Math

Journal ref: Compositio Math. 157 (2021) 1036-1050

arXiv:1903.12234 [pdf, other]

doi 10.1137/19M1258396

Efficient approximation of flow problems with multiple scales in time

Authors: Stefan Frei, Thomas Richter

Abstract: In this article we address flow problems that carry a multiscale character in time. In particular we consider the Navier-Stokes flow in a channel on a fast scale that influences the movement of the boundary which undergoes a deformation on a slow scale in time. We derive an averaging scheme that is of first order with respect to the ratio of time-scales $ε$. In order to cope with the problem of un… ▽ More In this article we address flow problems that carry a multiscale character in time. In particular we consider the Navier-Stokes flow in a channel on a fast scale that influences the movement of the boundary which undergoes a deformation on a slow scale in time. We derive an averaging scheme that is of first order with respect to the ratio of time-scales $ε$. In order to cope with the problem of unknown initial data for the fast scale problem, we assume near-periodicity in time. Moreover, we construct a second-order accurate time discretisation scheme and derive a complete error analysis for a corresponding simplified ODE system. The resulting multiscale scheme does not ask for the continuous simulation of the fast scale variable and shows powerful speed-ups up to 1:10000 compared to a resolved simulation. Finally, we present some numerical examples for the full Navier-Stokes system to illustrate the convergence and performance of the approach. △ Less

Submitted 31 March, 2020; v1 submitted 28 March, 2019; originally announced March 2019.

Journal ref: SIAM Multiscale Modeling and Simulation 18(2), 2020

arXiv:1810.06735 [pdf, ps, other]

Moduli spaces of sheaves on K3 surfaces and Galois representations

Authors: Sarah Frei

Abstract: We consider two K3 surfaces defined over an arbitrary field, together with a smooth proper moduli space of stable sheaves on each. When the moduli spaces have the same dimension, we prove that if the étale cohomology groups (with Q_ell coefficients) of the two surfaces are isomorphic as Galois representations, then the same is true of the two moduli spaces. In particular, if the field of definitio… ▽ More We consider two K3 surfaces defined over an arbitrary field, together with a smooth proper moduli space of stable sheaves on each. When the moduli spaces have the same dimension, we prove that if the étale cohomology groups (with Q_ell coefficients) of the two surfaces are isomorphic as Galois representations, then the same is true of the two moduli spaces. In particular, if the field of definition is finite and the K3 surfaces have equal zeta functions, then so do the moduli spaces, even when the moduli spaces are not birational. △ Less

Submitted 12 May, 2021; v1 submitted 15 October, 2018; originally announced October 2018.

Comments: 16 pages. Minor changes to match published version, which appeared in Selecta Math

arXiv:1810.04766 [pdf, other]

An edge-based pressure stabilisation technique for finite elements on arbitrarily anisotropic meshes

Authors: Stefan Frei

Abstract: In this article, we analyse a stabilised equal-order finite element approximation for the Stokes equations on anisotropic meshes. In particular, we allow arbitrary anisotropies in a sub-domain, for example along the boundary of the domain, with the only condition that a maximum angle is fulfilled in each element.This discretisation is motivated by applications on moving domains as arising e.g. in… ▽ More In this article, we analyse a stabilised equal-order finite element approximation for the Stokes equations on anisotropic meshes. In particular, we allow arbitrary anisotropies in a sub-domain, for example along the boundary of the domain, with the only condition that a maximum angle is fulfilled in each element.This discretisation is motivated by applications on moving domains as arising e.g. in fluid-structure interaction or multiphase-flow problems. To deal with the anisotropies, we define a modification of the original Continuous Interior Penalty stabilisation approach. We show analytically the discrete stability of the method and convergence of order ${\cal O}(h^{3/2})$ in the energy norm and ${\cal O}(h^{5/2})$ in the $L^2$-norm of the velocities. We present numerical examples for a linear Stokes problem and for a non-linear fluid-structure interaction problem, that substantiate the analytical results and show the capabilities of the approach. △ Less

Submitted 10 October, 2018; originally announced October 2018.

MSC Class: 65N12; 65N30; 76D07

arXiv:1808.08758 [pdf, other]

A Nitsche-based formulation for fluid-structure interactions with contact

Authors: Erik Burman, Miguel A. Fernández, Stefan Frei

Abstract: We derive a Nitsche-based formulation for fluid-structure interaction (FSI) problems with contact. The approach is based on the work of Chouly and Hild [SIAM Journal on Numerical Analysis. 2013;51(2):1295--1307] for contact problems in solid mechanics. We present two numerical approaches, both of them formulating the FSI interface and the contact conditions simultaneously in equation form on a joi… ▽ More We derive a Nitsche-based formulation for fluid-structure interaction (FSI) problems with contact. The approach is based on the work of Chouly and Hild [SIAM Journal on Numerical Analysis. 2013;51(2):1295--1307] for contact problems in solid mechanics. We present two numerical approaches, both of them formulating the FSI interface and the contact conditions simultaneously in equation form on a joint interface-contact surface $Γ(t)$. The first approach uses a relaxation of the contact conditions to allow for a small mesh-dependent gap between solid and wall. The second alternative introduces an artificial fluid below the contact surface. The resulting systems of equations can be included {in a consistent fashion} within a monolithic variational formulation, which prevents the so-called "chattering" phenomenon. To deal with the topology changes in the fluid domain at the time of impact, we use a fully Eulerian approach for the FSI problem. We compare the effect of slip and no-slip interface conditions and study the performance of the method by means of numerical examples. △ Less

Submitted 27 August, 2018; originally announced August 2018.

MSC Class: 65M60

arXiv:1806.00999 [pdf, other]

On the implementation of a locally modified finite element method for interface problems in deal.II

Authors: Stefan Frei, Thomas Richter, Thomas Wick

Abstract: In this work, we describe a simple finite element approach that is able to resolve weak discontinuities in interface problems accurately. The approach is based on a fixed patch mesh consisting of quadrilaterals, that will stay unchanged independent of the position of the interface. Inside the patches we refine once more, either in eight triangles or in four quadrilaterals, in such a way that the i… ▽ More In this work, we describe a simple finite element approach that is able to resolve weak discontinuities in interface problems accurately. The approach is based on a fixed patch mesh consisting of quadrilaterals, that will stay unchanged independent of the position of the interface. Inside the patches we refine once more, either in eight triangles or in four quadrilaterals, in such a way that the interface is locally resolved. The resulting finite element approach can be considered a fitted finite element approach. In our practical implementation, we do not construct this fitted mesh, however. Instead, the local degrees of freedom are included in a parametric way in the finite element space, or to be more precise in the local mappings between a reference patch and the physical patches. We describe the implementation in the open source C++ finite element library deal.II in detail and present two numerical examples to illustrate the performance of the approach. Finally, detailed studies of the behavior of iterative linear solvers complement this work. △ Less

Submitted 4 June, 2018; originally announced June 2018.

MSC Class: 65M60

arXiv:1804.02768 [pdf, other]

Finite element simulation of fluid dynamics and CO$_2$ gas exchange in the alveolar sacs of the human lung

Authors: Luis J. Caucha, Stefan Frei, Obidio Rubio

Abstract: In this article we present a numerical framework based on continuum models for the fluid dynamics and the CO$_2$ gas distribution in the alveolar sacs of the human lung during expiration and inspiration, including the gas exchange to the cardiovascular system. We include the expansion and contraction of the geometry by means of the Arbitrary Lagrangian Eulerian (ALE) method. For discretisation, we… ▽ More In this article we present a numerical framework based on continuum models for the fluid dynamics and the CO$_2$ gas distribution in the alveolar sacs of the human lung during expiration and inspiration, including the gas exchange to the cardiovascular system. We include the expansion and contraction of the geometry by means of the Arbitrary Lagrangian Eulerian (ALE) method. For discretisation, we use equal-order finite elements in combination with pressure-stabilisation techniques based on local projections or interior penalties. We derive formulations for both techniques that are suitable on arbitrarily anisotropic meshes. These formulations are novel within the ALE method. Moreover, we investigate the effect of different boundary conditions, that vary between inspiration and expiration. We present numerical results on a simplified two-dimensional alveolar sac geometry and investigate the influence of the pressure stabilisations as well as the boundary conditions △ Less

Submitted 8 April, 2018; originally announced April 2018.

MSC Class: 65M60; 76Z05

arXiv:1706.00632 [pdf, other]

An adaptive Newton algorithm for optimal control problems with application to optimal electrode design

Authors: Thomas Carraro, Simon Dörsam, Stefan Frei, Daniel Schwarz

Abstract: In this work we present an adaptive Newton-type method to solve nonlinear constrained optimization problems in which the constraint is a system of partial differential equations discretized by the finite element method. The adaptive strategy is based on a goal-oriented a posteriori error estimation for the discretization and for the iteration error. The iteration error stems from an inexact soluti… ▽ More In this work we present an adaptive Newton-type method to solve nonlinear constrained optimization problems in which the constraint is a system of partial differential equations discretized by the finite element method. The adaptive strategy is based on a goal-oriented a posteriori error estimation for the discretization and for the iteration error. The iteration error stems from an inexact solution of the nonlinear system of first order optimality conditions by the Newton-type method. This strategy allows to balance the two errors and to derive effective stopping criteria for the Newton-iterations. The algorithm proceeds with the search of the optimal point on coarse grids which are refined only if the discretization error becomes dominant. Using computable error indicators the mesh is refined locally leading to a highly efficient solution process. The performance of the algorithm is shown with several examples and in particular with an application in the neurosciences: the optimal electrode design for the study of neuronal networks. △ Less

Submitted 2 June, 2017; originally announced June 2017.

arXiv:1703.03516 [pdf, ps, other]

The a-number of hyperelliptic curves

Authors: Sarah Frei

Abstract: It is known that for a smooth hyperelliptic curve to have a large $a$-number, the genus must be small relative to the characteristic of the field, $p>0$, over which the curve is defined. It was proven by Elkin that for a genus $g$ hyperelliptic curve $C$ to have $a_C=g-1$, the genus is bounded by $g<\frac{3p}{2}$. In this paper, we show that this bound can be lowered to $g <p$. The method of proof… ▽ More It is known that for a smooth hyperelliptic curve to have a large $a$-number, the genus must be small relative to the characteristic of the field, $p>0$, over which the curve is defined. It was proven by Elkin that for a genus $g$ hyperelliptic curve $C$ to have $a_C=g-1$, the genus is bounded by $g<\frac{3p}{2}$. In this paper, we show that this bound can be lowered to $g <p$. The method of proof is to force the Cartier-Manin matrix to have rank one and examine what restrictions that places on the affine equation defining the hyperelliptic curve. We then use this bound to summarize what is known about the existence of such curves when $p=3,5$ and $7$. △ Less

Submitted 26 June, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

Comments: 7 pages. v2: revised and improved the proof of the main theorem based on suggestions from the referee. To appear in the proceedings volume of Women in Numbers Europe-2

arXiv:1603.04130 [pdf, ps, other]

A lower bound for $p_c$ in range-$R$ bond percolation in two and three dimensions

Authors: Spencer Frei, Edwin Perkins

Abstract: We use the connection between bond percolation and SIR epidemics to establish lower bounds for the critical percolation probability in $2$ and $3$ dimensions as the range becomes large. The bound agrees with the conjectured asymptotics for the long range critical probability, refines results of M. Penrose, and complements results of van der Hofstad and Sakai in dimensions greater than $6$. We use the connection between bond percolation and SIR epidemics to establish lower bounds for the critical percolation probability in $2$ and $3$ dimensions as the range becomes large. The bound agrees with the conjectured asymptotics for the long range critical probability, refines results of M. Penrose, and complements results of van der Hofstad and Sakai in dimensions greater than $6$. △ Less

Submitted 14 March, 2016; originally announced March 2016.

MSC Class: 60K35 (Primary) 60J68; 60J80; 92D30 (Secondary)

arXiv:1007.4104 [pdf, ps, other]

doi 10.1103/PhysRevE.83.056101

Quantification of deviations from rationality with heavy-tails in human dynamics

Authors: Thomas Maillart, Didier Sornette, Stefan Frei, Thomas Duebendorfer, Alexander Saichev

Abstract: The dynamics of technological, economic and social phenomena is controlled by how humans organize their daily tasks in response to both endogenous and exogenous stimulations. Queueing theory is believed to provide a generic answer to account for the often observed power-law distributions of waiting times before a task is fulfilled. However, the general validity of the power law and the nature of o… ▽ More The dynamics of technological, economic and social phenomena is controlled by how humans organize their daily tasks in response to both endogenous and exogenous stimulations. Queueing theory is believed to provide a generic answer to account for the often observed power-law distributions of waiting times before a task is fulfilled. However, the general validity of the power law and the nature of other regimes remain unsettled. Using anonymized data collected by Google at the World Wide Web level, we identify the existence of several additional regimes characterizing the time required for a population of Internet users to execute a given task after receiving a message. Depending on the under- or over-utilization of time by the population of users and the strength of their response to perturbations, the pure power law is found to be coextensive with an exponential regime (tasks are performed without too much delay) and with a crossover to an asymptotic plateau (some tasks are never performed). The characterization of the availability and efficiency of humans on their actions revealed by our study have important consequences to understand human decision-making, optimal designs of policies such as for Internet security, with spillovers to collective behaviors, crowds dynamics, and social epidemics. △ Less

Submitted 23 July, 2010; originally announced July 2010.

Comments: 17 pages, 4 figures

Showing 1–50 of 50 results for author: Frei, S