Abstract

Optimal transport maps and plans between two absolutely continuous measures |$\mu$| and |$\nu$| can be approximated by solving semidiscrete or fully discrete optimal transport problems. These two problems ensue from approximating |$\mu$| or both |$\mu$| and |$\nu$| by Dirac measures. Extending an idea from Gigli (2011, On Hölder continuity-in-time of the optimal transport map towards measures along a curve. Proc. Edinb. Math. Soc. (2), 54, 401–409), we characterize how transport plans change under the perturbation of both |$\mu$| and |$\nu$|⁠. We apply this insight to prove error estimates for semidiscrete and fully discrete algorithms in terms of errors solely arising from approximating measures. We obtain weighted |$L^2$| error estimates for both types of algorithms with a convergence rate |$O(h^{1/2})$|⁠. This coincides with the rate in Theorem 5.4 in Berman (2018, Convergence rates for discretized Monge–Ampère equations and quantitative stability of optimal transport. Preprint available at arXiv:1803.00785) for semidiscrete methods, but the error notion is different.

1. Introduction

The optimal transport (OT) problem, first proposed by Monge (1781) and later generalized by Kantorovich (1942), has been extensively studied from the theoretical point of view (Brenier, 1987, 1989; Caffarelli, 1992a,b, 1996; Villani, 2009; Gigli, 2011; Santambrogio, 2015). It has a wide variety of applications in economics (Chiappori et al., 2010; Beiglböck et al., 2013), fluid mechanics (Brenier, 1989), meteorology (Cullen & Douglas, 1999; Cullen & Maroofi, 2003), image processing (Rubner et al., 2000; Rabin et al., 2010), transport (Xia, 2003; Carlier & Santambrogio, 2005) and optic design (Gutiérrez & Huang, 2009; Prins et al., 2014). In this section, we briefly introduce the basic theory and numerical methods of OT and point out our contribution to the numerical analysis of OT in this paper.

1.1 Introduction to optimal transport

If |$X, Y$| are subdomains of |${\mathbb{R}}^d$| and |$\mu \in P(X), \nu \in P(Y)$| are given probability measures, the Monge formulation of OT is to find an optimal map |$T: X \to Y$| that minimizes the transport cost, i.e.
(1.1)
where the cost function |$c(x,y): X \times Y \to [0,\infty )$| is given. Hereafter, |$T_{\#}\mu$| denotes the pushforward of measure |$\mu$| through |$T$|⁠, namely |$T_{\#}\mu = \nu$| means that for any measurable set |$A \subset Y$|⁠, we have |$\nu (A) = \mu (T^{-1}(A))$|⁠, or equivalently
(1.2)
for all continuous functions |$\phi :Y\to \mathbb{R}$|⁠. Since this problem is difficult to study, and sometimes the optimal map does not exist, Kantorovich (1942) generalized the notion of transport map and considered the following problem:
(1.3)
where |$\varPi (\mu ,\nu )$| is the set of transport plans between |$\mu$| and |$\nu$|⁠, namely
Here, |$\pi _1$| and |$\pi _2$| are the projections defined as |$\pi _1(x,y) = x, \pi _2(x,y) = y$|⁠. This definition implies that for |$\gamma \in \varPi (\mu ,\nu )$| and any measurable sets |$A \subset X, B \subset Y$|⁠, we have |$\gamma (A \times Y) = \mu (A), \gamma (X \times B) = \nu (B)$|⁠. For |$X=Y={\mathbb{R}}^d$| and |$1 \le p < \infty$|⁠, let |$P_p({\mathbb{R}}^d) \subset P({\mathbb{R}}^d)$| be the set of probability measures with bounded |$p$| moment, i.e.
(1.4)
which clearly contains those probabilities measures with bounded support. If |$\mu , \nu \in P_p({\mathbb{R}}^d)$| then |$\gamma \in P_p({\mathbb{R}}^{2d})$| for all |$\gamma \in \varPi (\mu ,\nu )$| because
(1.5)
Moreover, if |$c(x,y) = |x-y|^p$|⁠, the Kantorovich problem (1.3) always has a finite minimum value. In fact, this defines the well-known Wasserstein metric on |$P_p({\mathbb{R}}^d)$|⁠:
In addition, for quadratic cost, i.e. |$p = 2$|⁠, Brenier (1987) shows that there exists a unique optimal map |$T = \nabla \varphi$| for a convex function |$\varphi$| (⁠|$T$| is uniquely determined |$\mu$|-a.e.) provided that |$\mu$| gives no mass to |$(d-1)$|-surfaces of class |$C^2$|⁠. If |$\mu , \nu$| are absolutely continuous measures with respect to Lebesgue measure with densities |$f,g\ge 0$|⁠, then
whence |$\varphi$| satisfies the generalized Monge–Ampère equation
(1.6)
with second-type boundary condition |$\nabla \varphi (X) = Y$| (Figalli, 2017, Section 4.6) provided that the domains |$X,Y \subset{\mathbb{R}}^d$| are chosen so that
(1.7)
The optimal transport map |$T$| induces an optimal transport plan |$\gamma \in P(X,Y)$| given by |$\gamma = (id,T)_{\#}\mu$|⁠, where |$id$| denotes the identity map, namely
for all measurable sets |$A \subset X\times Y$|⁠. Similar results also hold for any |$p \in (1,\infty )$|⁠, and for general costs |$c(x,y)$| satisfying a twist condition (Villani, 2009, Chapter 10). However, the quantitative stability and error estimates studied in this paper are restricted to the case |$p=2$|⁠. Their generalization to |$p \neq 2$| and more general costs is open.

1.2 Numerical methods for OT and our contribution

In view of the numerous and diverse applications of OT, developing fast and sound numerical methods for OT is of paramount importance. Several algorithms do exist, ranging from those inspired by partial differential equations (PDEs) and variational techniques for absolutely continuous measures |$\mu$| and |$\nu$| (Benamou & Brenier, 2000; Angenent et al., 2003; Froese, 2012; Benamou et al., 2014; Papadakis et al., 2014; Lindsey & Rubinstein, 2017; Benamou & Duval, 2019  Hamfeldt, 2019) to those approximating one or both measures by Dirac masses and then solving the approximate OT (Mérigot, 2011; Burkard et al., 2012; Cuturi, 2013; Schmitzer & Schnörr, 2013; Benamou et al., 2015; Lévy, 2015; Oberman & Ruan, 2015; Schmitzer, 2016; Kitagawa et al., 2019). However, intrinsic difficulties make their numerical analysis far from complete.

In this paper, we develop stability and error analyses for quadratic costs that account for the effect of approximating measures |$\mu$| and |$\nu$| by Dirac masses. To this end, we extend the stability estimates of Gigli (2011), originally derived for optimal maps, to optimal plans |$\gamma$|⁠. This is critical because optimal maps might not exist for Dirac measures. We do not develop new techniques to approximate |$\gamma$|⁠.

If at least one of the two measures is discrete then it is possible to solve for the exact transport map numerically without further approximations, which includes semidiscrete algorithms (Aurenhammer et al., 1998; Mérigot, 2011; Lévy, 2015; Kitagawa et al., 2019) and fully discrete methods (Burkard et al., 2012; Schmitzer & Schnörr, 2013; Schmitzer, 2016). For these methods, since their errors solely come from approximating absolutely continuous measures with discrete measures, our results directly lead to error estimates. We also point out that an error estimate for a semidiscrete method was recently obtained by Berman (2018), but there are no such results for fully discrete schemes.

We now describe our results. Let |$X, Y$| be compact sets of |${\mathbb{R}}^d$| and |$\mu , \nu$| be absolutely continuous measures with respect to the Lebesgue measure with densities |$f\in L^1(X), g\in L^1(Y)$|⁠, respectively. Let
be approximations of |$\mu , \nu$| governed by a parameter |$h>0$|⁠, which means the Wasserstein distances satisfy
To make this more concrete, we briefly introduce one way to obtain an approximation |$\mu _h$| of |$\mu$|⁠. Choose |$(x_i)_{i=1}^N$| such that |$X \subset \cup _{i=1}^N B_h(x_i)$|⁠, where |$B_r(x)$| denotes the open ball with radius |$r$| centered at |$x$|⁠. Then, consider Voronoi tessellations: let
(1.8)
be the Voronoi cell for |$x_i$| and
Notice that we may assume |$f_i> 0$| since otherwise for |$f_i = 0$| we could just drop the Dirac measure at |$x_i$|⁠. Define the map |$U_h: X \to (x_i)_{i=1}^N$| such that |$U_h(x) = x_i$| for all |$x \in V_i$|⁠. It can be easily checked that this map |$U_h$| is well defined and satisfies |$|U_h(x) - x| \le h$| a.e. in |$X$| because the intersection between |$V_i$| and |$V_j$| for |$i \neq j$| is of zero Lebesgue measure and |$X \subset \cup _{i=1}^N B_h(x_i)$|⁠; in particular, |$V_i \subset B_h(x_i)$| for all |$1 \le i \le N$|⁠. Therefore, |$U_h$| is a transport map from |$\mu$| to |$\mu _h$|⁠, and thus
(1.9)
Similarly, we could approximate the absolutely continuous measure |$\nu$| with density |$g$| and bounded support |$Y$| with |$\nu _h = \sum _{j=1}^M g_j \delta _{y_j}$| satisfying |$W_2(\nu , \nu _h) \le h$|⁠.
The semidiscrete algorithms solve the OT between |$\mu _h$| and |$\nu$| upon finding a nodal function |$\varphi _h: (x_i)_{i=1}^N \to{\mathbb{R}}$| such that
(1.10)
where the discrete subdifferential is given by
This type of discretization was introduced by Oliker & Prussner (1989) for Dirichlet boundary conditions, whereas error estimates have been derived in Neilan & Zhang (2018); Nochetto & Zhang (2019). The set |$F_i$| coincides with the subdifferential of the convex envelope |$\varGamma (\varphi _h)$| associated with |$\varphi _h$| (Nochetto & Zhang, 2019, Lemma 2.1). The function |$\varGamma (\varphi _h)$| is piecewise linear and induces a mesh with nodes |$(x_i)_{i=1}^N$| whose elements may be quite elongated; see (Nochetto & Zhang, 2019, Section 2.2). We also refer to Berman (2018), who has derived error estimates for the second-type boundary condition involving |$\varGamma (\varphi _h)$|⁠.
Denote the barycenter of |$\partial \varphi _h(x_i)$| with respect to the measure |$\nu$| by |$m_i$|⁠, namely |$m_i:= f_i^{-1}\int _{F_i} yg(y) \, \textrm{d}y$|⁠, and define the map |$T_h$| such that |$T_h(x_i) = m_i$| for all |$1 \le i \le N$|⁠. Under proper assumptions on measures |$(\mu , \nu )$|⁠, to be stated in Section 2, we prove a weighted |$L^2$| error estimate in Theorem 4.2 for the exact optimal transport map |$T$|  
(1.11)
This rate of convergence coincides with that in (Berman, 2018, Theorem 5.4) for |$\nabla \varGamma (\varphi _h)$|⁠, but the error notion is different; we refer to Section 4 for details. Our approach is tailored to discrete transport plans and thus extends to fully discrete methods.
Fully discrete methods aim to find the discrete optimal transport plan
between |$\mu _h$| and |$\nu _h$| through the constrained minimization problem
(1.12)
If we construct the map |$T_h(x_i):= \frac{1}{f_i} \sum _{j=1}^M \gamma _{h,ij} y_j$| from |$\gamma _h$| for |$1\le i \le N$|⁠, then we also obtain a weighted |$L^2$| error estimate in Theorem 5.1 for the optimal map |$T$|  
(1.13)
under suitable assumptions on measures |$(\mu , \nu )$| described in Section 2. This is a new error estimate for fully discrete schemes, and the convergence rate in (1.13) is the same as (1.11) for semidiscrete methods.

1.3 Outline

In Section 2, we introduce the notion of |$\lambda$|-regularity and show that it leads to |$W^1_\infty$|-regularity of the transport map |$T$|⁠. We prove in Section 3 that |$\lambda$|-regularity implies a stability bound characterizing how the optimal transport plan |$\gamma$| changes under perturbations of both |$\mu$| and |$\nu$|⁠; this hinges on an idea from Gigli (2011). We measure the change of transport plans using either a weighted |$L^2$| norm in Theorem 3.5 or the Wasserstein metric in Corollary 3.9. We apply the stability bound to derive error estimates in Sections 4 and 5. For semidiscrete schemes, we obtain weighted |$L^2$| error estimates in Theorem 4.1 and Corollary 4.2 of Section 4 with a convergence rate |$O(h^{1/2})$|⁠. We also compare our geometric estimate of Corollary 4.2 with a similar one due to Berman (2018); however, the two error notions are different. Moreover, we obtain in Theorem 5.1 of Section 5 an entirely new error estimate of order |$O(h^{1/2})$| for fully discrete schemes with errors measured in both the weighted |$L^2$| norm and the Wasserstein distance.

2. Regularity and nondegeneracy

We now discuss the assumptions we make on measures |$\mu , \nu \in P_2({\mathbb{R}}^d)$| in order to prove quantitative stability bounds for optimal transport plans and error estimates for numerical methods.

 

Assumption 2.1 (⁠|$\lambda$|-regularity).
We say that |$(\mu ,\nu ,\varphi )$| is |$\lambda$|-regular for |$\lambda> 0$| if |$\varphi : {\mathbb{R}}^d \to{\mathbb{R}}$| is a convex function such that |$\nabla \varphi$| is the optimal transport map from |$\mu$| to |$\nu$| and |$\varphi$| is |$\lambda$|-smooth in the sense that
(2.1)
for any |$x_0,x_1 \in{\mathbb{R}}^d$| and |$t \in [0,1]$|⁠, i.e. |$\frac{\lambda }{2}|x|^2 - \varphi (x)$| is convex. We also say |$(\mu ,\nu )$| is |$\lambda$|-regular if there exists a |$\varphi$| such that |$(\mu ,\nu ,\varphi )$| is |$\lambda$|-regular.

We will see below that (2.1) implies |$\varphi \in W^2_\infty (\mathbb{R}^d)$|⁠; note that if |$\varphi \in C^2(\mathbb{R}^d)$| then |$D^2 \varphi (x) \le \lambda I$| for all |$x\in \mathbb{R}^d$|⁠. Although we require |$\varphi$| to be defined in the whole |${\mathbb{R}}^d$| in Assumption 2.1, this is not restrictive because any convex function |$\varphi :X\to \mathbb{R}$| defined in a bounded convex set |$X$| can be extended as in (Figalli, 2017, Theorem 4.23)
Then, one can show that |$E{\varphi } = \varphi$| in |$X$|⁠, and |$E{\varphi }$| satisfies (2.1) for any |$x_0,x_1 \in{\mathbb{R}}^d$| if (2.1) holds for |$\varphi$| and any |$x_0, x_1 \in X$|⁠.
Now, we introduce the Legendre transform  |$\varphi ^*$| of |$\varphi$|⁠, which is defined by
(2.2)
Since |$\varphi$| is convex, given |$y \in \partial \varphi (x)$| we readily get for all |$z\in \mathbb{R}^d$|  
whence in view of (2.2) the following two key properties of |$\varphi ^*$| are valid:
and |$y \in \partial \varphi (x)$| if and only if |$x \in \partial \varphi ^*(y)$|⁠. To see that |$y \in \partial \varphi (x)$| implies |$x \in \partial \varphi ^*(y)$|⁠, let |$z \in{\mathbb{R}}^d$| be arbitrary and notice that (2.2) for |$\varphi ^*(z)$| yields
Consequently, if |$\varphi$| and |$\varphi ^*$| are of class |$C^1$| then |$\nabla \varphi ^*=(\nabla \varphi )^{-1}$| is the inverse of the transport map |$\nabla \varphi$|⁠. Moreover,
(2.3)
provided |$(\mu ,\nu ,\varphi )$| is |$\lambda$|-regular. In fact, since |$(\nabla \varphi )_{\#}\mu = \nu$|⁠, in view of (1.2) we have
We exploit the convexity of |$\varphi$| and |$\frac{\lambda }{2} |\cdot |^2 - \varphi$| to obtain
Since |$\mu \in P_2({\mathbb{R}}^d)$|⁠, (1.4) imply that the following integrals are finite and yield (2.3)

The following lemma states that |$\lambda$|-regularity of a convex function |$\varphi$| is equivalent to uniform convexity of its Legendre transform |$\varphi ^*$| (Azé & Penot, 1995, Proposition 2.6).

 

Lemma 2.2
(⁠|$\lambda$|-regularity vs uniform convexity).

If |$\varphi : {\mathbb{R}}^d \to{\mathbb{R}}$| is convex then the following statements are valid:

  • (a) if |$\varphi$| is |$\lambda$|-smooth then its Legendre transform |$\varphi ^*$| must satisfy
    (2.4)
    for all |$y_0,y_1\in \mathbb{R}^d$| and |$t \in [0,1]$|⁠, i.e. |$\varphi ^*(y) - \frac{1}{2\lambda }|y|^2$| is convex;
  • (b) if |$\varphi (y) - \frac{1}{2\lambda }|y|^2$| is convex then its Legendre transform |$\varphi ^*$| is |$\lambda$|-smooth.

 

Proof.
For completeness, we repeat the proof of (Azé & Penot, 1995, Proposition 2.6) to show (a); the proof of (b) is similar. Given |$y_0, y_1 \in{\mathbb{R}}^d$| and |$t \in [0,1]$|⁠, for any |$x_0, v \in{\mathbb{R}}^d$| we set |$x_t:= x_0 + tv$| and |$y_t:= y_0 + t(y_1 - y_0)$| and use (2.2) to write
We exploit that |$x_0, v$| are arbitrary. Taking supremum with respect to |$x_0$|⁠, we get
Maximizing the last two terms with respect to |$v$|⁠, we obtain |$v=\frac{1}{\lambda }(y_1-y_0)$| and
which is the asserted inequality (2.4). That |$\varphi ^*(y)-\frac{1}{2\lambda } |y|^2$| is convex is a straightforward calculation that completes the proof.

 

Lemma 2.3
(⁠|$W^2_\infty$|-regularity).
A |$\lambda$|-smooth convex map |$\varphi$| is of class |$W^2_\infty (\mathbb{R}^d)$| and the Lipschitz constant of |$T=\nabla \varphi$| is |$\lambda$|⁠, namely
(2.5)

 

Proof.
According to Lemma 2.2(a), the function |$\psi (y):= \varphi ^*(y) - \frac{1}{2\lambda } |y|^2$| is convex. If |$x_1\in \partial \varphi ^*(y_1)$| and |$x_2\in \partial \varphi ^*(y_2)$|⁠, then
Adding the two inequalities and rearranging terms yield
This implies the assertion (2.5).

It is now apparent from Lemma 2.3 that the constant |$\lambda$| dictates the stability of the transport map |$T=\nabla \varphi$|⁠; |$\lambda$| is also the stability constant that appears in our error estimates of Sections 4 and 5. If |$\varphi \in C^2({\mathbb{R}}^d)$| then (2.5) is equivalent to |$D^2 \varphi (x) \le \lambda I$| for all |$x$|⁠. As we have already mentioned at the end of Section 1.1, the optimal map from |$\mu$| to |$\nu$| for quadratic cost is given by |$\nabla \varphi$| under suitable assumptions on |$\mu$|⁠. Caffarelli’s regularity results (Caffarelli, 1992a,b, 1996) provide sufficient conditions for |$\lambda$|-regularity: if both |$\overline{X}={\operatorname{supp}}(\mu )$| and |$\overline{Y}={\operatorname{supp}}(\nu )$| are uniformly convex domains of |${\mathbb{R}}^d$| with |$C^2$| boundary and the densities |$f,g$| of |$\mu , \nu$| are bounded away from |$0$| and |$f \in C^{0,\alpha }(\overline{X}), g\in C^{0,\alpha }(\overline{Y})$|⁠, then the solution |$\varphi$| of the corresponding second boundary value Monge–Ampère problem (1.6) is of class |$C^{2,\alpha }(\overline{X})$|⁠. This implies that |$(\mu ,\nu ,\varphi )$| is |$\lambda$|-regular for some |$\lambda> 0$| if we extend |$\varphi$| to |${\mathbb{R}}^d$|⁠. Moreover, the same assumptions imply that |$(\nu ,\mu ,\varphi ^*)$| is |$\xi$|-regular for the Legendre transform |$\varphi ^* \in C^{2,\alpha }(\overline{Y})$| of |$\varphi$| and some |$\xi> 0$|⁠.

3. Quantitative stability of optimal transport plans

In this section, we generalize the quantitative stability bounds of (Gigli, 2011, Corollary 3.4) for optimal transport maps and show some consequences of our theorem under suitable assumptions on measures |$\mu$| and |$\nu$|⁠. They will be useful in Sections 4 and 5 to derive error estimates. The stability estimate in (Gigli, 2011, Corollary 3.4) is based on bounding the |$L^2$| difference between an optimal transport map and another feasible transport map by the difference between their transport costs. Proposition 3.1 below is just a generalization of this important property in that it replaces transport maps by transport plans. This is essential in deriving error estimates because optimal maps might not exist for semidiscrete and fully discrete optimal transport problems; see Section 1.2 as well as (Santambrogio, 2015, Section 1.4) for some examples.

 

Proposition 3.1
(stability of transport plans).
Let |$(\mu ,\nu ,\varphi )$| be |$\lambda$|-regular for |$\lambda> 0$|⁠, and |$T = \nabla \varphi$| be the optimal transport map from |$\mu$| to |$\nu$|⁠. Then, for any transport plan |$\gamma \in \varPi (\mu ,\nu )$|⁠, there holds
(3.1)

 

Proof.
We proceed in the same way as in Gigli (2011). If |$\varphi ^*$| is the Legendre transform of |$\varphi$| then |$\int _Y \varphi ^*(y) \: \textrm{d}\nu$| is finite from (2.3). Since |$T_{\#} \mu = \nu$|⁠, using (1.2), we have
In view of Lemma 2.3 (⁠|$W^2_\infty$|-regularity), the map |$T=\nabla \varphi$| is of class |$W^1_\infty$|⁠. It thus follows that |$x = \nabla \varphi ^* (T(x))$| for all |$x\in X$| because |$\nabla \varphi ^* = (\nabla \varphi )^{-1}$|⁠. Using
together with Lemma 2.2 (⁠|$\lambda$|-regularity vs. uniform convexity), namely |$x \mapsto \varphi ^*(x) - \frac{1}{2\lambda }|x|^2$| is convex, we further obtain that
Noticing that
we deduce
In view of (1.2), the last two terms are equal and cancel out. Consequently,
whence
Rearranging the equation above gives (3.1).

 

Remark 3.2
(interpretation of (3.1)).

Notice that the right-hand side in (3.1) without factor |$\lambda$| is the difference of transport costs between the given transport plan |$\gamma$| and the optimal transport map |$T$|⁠. The factor |$\lambda$| acts as a stability constant.

 

Remark 3.3
(stability of transport maps).
We point out that the left-hand side of (3.1) can be seen as the square of a weighted |$L^2$|-error between the transport plan |$\gamma$| and the optimal map |$T$|⁠. To understand this, notice that if the plan |$\gamma$| in Proposition 3.1 is induced by a map |$S: {\mathbb{R}}^d \to{\mathbb{R}}^d$|⁠, i.e. |$(id, S)_{\#} \mu = \gamma$|⁠, then (3.1) can be written as
which is the same as (Gigli, 2011, Proposition 3.3). This is an estimate of |$\Vert S - T \Vert ^2_{L^2_{\mu }(X)}$|⁠.

Let us recall the gluing lemma for measures (see (Villani, 2009, p.23)), which plays an important role in proving the triangle inequality of Wasserstein distance in the theory of optimal transport. We refer to (Santambrogio, 2015, Lemma 5.5) for a proof.

 

Lemma 3.4
(gluing of measures).

Let |$(X_i, \mu _i)$| be probability spaces for |$i=1,2,3$|⁠, with |$X_i\subset{\mathbb{R}}^d$|⁠, and |$\gamma _{1,2} \in \varPi (\mu _1, \mu _2)$|⁠, |$\gamma _{2,3} \in \varPi (\mu _2, \mu _3)$|⁠. Then, there exists (at least) one measure |$\gamma \in P(X_1 \times X_2 \times X_3)$| such that |$(\pi _{1,2})_{\#} \gamma =\gamma _{1,2}$| and |$(\pi _{2,3})_{\#} \gamma =\gamma _{2,3}$|⁠, where |$\pi _{i,j}$| is the projection defined as |$\pi _{i,j}(x_1,x_2,x_3) = (x_i,x_j)$|⁠.

We use this lemma to glue the measures |$\mu ,\nu$| with their approximations |$\mu _h,\nu _h$| as follows. Given optimal transport plans |$\gamma _h\in \varPi (\mu _h,\nu _h)$| between |$\mu _h$| and |$\nu _h$|⁠, |$\alpha _h \in \varPi (\mu , \mu _h)$| between |$\mu$| and |$\mu _h$|⁠, and |$\beta _h \in \varPi (\nu _h, \nu )$| between |$\nu _h$| and |$\nu$|⁠, we let |$\varGamma _h \in P(X_1 \times X_2 \times X_3 \times X_4)$| be a probability measure so that
(3.2)
where |$X_1=X_2=X$| and |$X_3=X_4=Y$|⁠; Lemma 3.4 guarantees the existence of |$\varGamma _h$|⁠. This in particular implies that projections |$\pi _i$| on the coordinate |$x_i$| satisfy
(3.3)
along with |$\sigma :=(\pi _{1,4})_\# \varGamma _h \in \varPi (\mu ,\nu )$|⁠. Moreover, the following formula holds
(3.4)
for |$(x_1,x_2;\alpha _h)$| and similar expressions are valid for |$(x_2,x_3;\gamma _h)$|⁠, |$(x_3,x_4;\beta _h)$| and |$(x_1,x_4;\sigma )$|⁠. If |$X=Y={\mathbb{R}}^d$|⁠, and |$\mu ,\mu _h,\nu _h,\nu \in P_2({\mathbb{R}}^d)$| then |$\alpha _h,\beta _h,\gamma _h,\sigma \in P_2({\mathbb{R}}^{2d})$| according to (1.5).

Now, we state and prove our main perturbation estimates for transport plans.

 

Theorem 3.5
(perturbation of optimal transport plans).
Let |$T = \nabla \varphi$| be the optimal transport map from |$\mu$| to |$\nu$| and |$(\mu ,\nu ,\varphi )$| be |$\lambda$|-regular for |$\lambda>0$|⁠. Let |$\mu _h, \nu _h \in P_2({\mathbb{R}}^d)$| be approximations of |$\mu , \nu$|⁠, and let |$\alpha _h \in \varPi (\mu , \mu _h), \beta _h \in \varPi (\nu _h, \nu )$| be two transport plans between |$\mu ,\mu _h$| and |$\nu _h,\nu$| with |$L^2$|-errors
(3.5)
and let |$e_h:= e_{\alpha _h} + e_{\beta _h}$|⁠. If |$\gamma _h$| is an optimal transport plan between |$\mu _h$| and |$\nu _h$|⁠, then
(3.6)

 

Remark 3.6
(interpretation of (3.6)).

We emphasize that, in view of Remark 3.3 (stability of transport maps), the left-hand side of (3.6) can be viewed as the square of the |$L^2$|-error between the optimal map |$T$| and the transport plan |$\gamma _h$|⁠. The quantity |$e_h$| is a measure of the approximation errors between |$\mu ,\mu _h$| and |$\nu ,\nu _h$|⁠. A typical choice of |$\alpha _h, \beta _h$| is to take the optimal transport plans between |$\mu ,\mu _h$| and |$\nu _h,\nu$| respectively, which implies |$e_h = W_2(\mu ,\mu _h) + W_2(\nu _h,\nu )$|⁠.

 

Remark 3.7
(non-uniqueness).

In general, an optimal transport plan |$\gamma _h$| between |$\mu _h$| and |$\nu _h$| may be neither unique nor induced by a map, no matter how small |$e_h$| is; for instance, if |$X=\{(\pm 1,0)\}_{i=1}^2, Y=\{(0,\pm 1)\}_{j=1}^2$| then any plan from |$X$| to |$Y$| has the same cost. However, (3.6) in Theorem 3.5 shows that all optimal transport plans are somewhat close to the map |$T$| in a certain |$L^2$|-sense dictated by |$e_h^{1/2}$|⁠.

 

Proof of Theorem 3.5.
The proof proceeds as in (Gigli, 2011, Corollary 3.4). Let |$\varGamma _h\in P_2(X^2 \times Y^2)$| be a measure gluing |$\mu , \mu _h, \nu _h, \nu$| and satisfying (3.2) and (3.3). Since |$\sigma = (\pi _{1,4})_{\#} \varGamma _h \in \varPi (\mu ,\nu )$| is a transport plan between |$\mu$| and |$\nu$|⁠, Proposition 3.1 (stability of transport plans) gives
(3.7)
according to the definition of Wasserstein distance |$W_2^2(\mu ,\nu ) = \int _{X} |T(x)-x|^2 \: \textrm{d}\mu (x)$| and the fact that |$T$| is the optimal map from |$\mu$| to |$\nu$|⁠. Since |$\gamma _h= (\pi _{2,3})_\# \varGamma _h \in \varPi (\mu _h,\nu _h)$| is an optimal transport plan between |$\mu _h$| and |$\nu _h$|⁠, i.e.
applying the triangle inequality, we deduce
(3.8)
Combining the relation |$\sigma =(\pi _{1,4})_\#\varGamma _h$| between |$\sigma$| and |$\varGamma _h$|⁠, namely
with the triangle inequality yields
In view of (3.2) and (3.8), this expression can be equivalently rewritten as
This implies the inequality
whence substitution into (3.7) immediately gives
(3.9)
To prove (3.6), we first recall (3.2) and (3.3) to write
and next utilize the triangle inequality together with the Lipschitz property (2.5) of the optimal transport map |$T$| to obtain
Rearranging the above inequality and using (3.9) prove (3.6).

One could also consider a plan |$\gamma _h \in \varPi (\mu _h, \nu _h)$| that is not exactly optimal, but close to the optimal transport plan. For this case, we prove the following corollary.

 

Corollary 3.8
(perturbation of transport plans in |$L^2$|⁠).
Let |$\mu ,\nu ,\mu _h,\nu _h$| and |$\varphi$| be as in Theorem 3.5 (perturbation of optimal transport plans). Let |$\gamma _h\in \varPi (\mu _h,\nu _h)$| be a transport plan between |$\mu _h$| and |$\nu _h$|⁠, and let |$\widetilde{e}_h$| be given by
(3.10)
and |$e_h=e_{\alpha _h} + e_{\beta _h}$| be defined in (3.5). We then have
(3.11)

 

Proof.
We argue as in Theorem 3.5, except that instead of (3.8) we now have
whence (3.9) becomes
(3.12)
The rest of the proof continues as in Theorem 3.5.

Instead of measuring the |$L^2$|-error in (3.11) between the optimal transport map |$T$| from |$\mu$| to |$\nu$| and the optimal transport plan |$\gamma _h$| from |$\mu _h$| to |$\nu _h$|⁠, we could alternatively characterize the discrepancy between the corresponding plans |$\gamma$| and |$\gamma _h$| in terms of the Wasserstein distance. This is possible in light of (1.5) because |$\gamma , \gamma _h \in P_2({\mathbb{R}}^{2d})$| provided |$\mu ,\nu \in P_2({\mathbb{R}}^d)$| and |$\mu _h,\nu _h\in P_2({\mathbb{R}}^d)$|⁠, respectively.

 

Corollary 3.9
(perturbation of transport plans in Wasserstein metric).
Let |$T = \nabla \varphi$| be the optimal transport map from |$\mu$| to |$\nu$| and |$(\mu ,\nu ,\varphi )$| be |$\lambda$|-regular for |$\lambda> 0$|⁠. Let |$\gamma = (id,T)_\# \mu$| be the optimal transport plan induced by |$T$|⁠. Let |$\mu _h, \nu _h \in P_2({\mathbb{R}}^d)$| be approximations to |$\mu ,\nu$| and |$\gamma _h \in \varPi (\mu _h, \nu _h)$| be a transport plan between |$\mu _h$| and |$\nu _h$|⁠. Then, we have
where |$e_h = W_2(\mu , \mu _h) + W_2(\nu , \nu _h)$| and |$\widetilde{e}_h$| is defined in (3.10).

 

Proof.
Let |$\alpha _h \in \varPi (\mu , \mu _h), \beta _h = \varPi (\nu _h, \nu )$| be the optimal transport plans for the corresponding OT, and |$\varGamma _h\in P_2({\mathbb{R}}^4)$| be a gluing measure satisfying (3.2) and (3.3). We need to construct a suitable transport plan |$\widetilde{\varGamma }_h \in \varPi (\gamma ,\gamma _h)$| from |$\gamma$| to |$\gamma _h$| and use the fact that the corresponding transport cost dominates |$W_2(\gamma , \gamma _h)$|⁠. To this end, we introduce the map |$S: {\mathbb{R}}^{4d} \to{\mathbb{R}}^{2d}$| defined as |$S(x,x^{\prime},y^{\prime},y):= \left(x, T(x)\right)$|⁠, and consider the pushforward of measure |$\varGamma _h$| through |$S$|  
Since |$(\pi _{2,3})_{\#} {\varGamma }_h = \gamma _h$| according to (3.2), we infer that the map |$\widetilde{S}:=(S, \pi _{2,3} \big ):{\mathbb{R}}^{4d} \to{\mathbb{R}}^{4d}$| defined by
induces a pushforward measure |$\widetilde{\varGamma }_h:= \widetilde{S}_{\#} \varGamma _h \in \varPi (\gamma ,\gamma _h)$| of |$\varGamma _h$| through |$\widetilde{S}$| that happens to be a transport plan between |$\gamma$| and |$\gamma _h$|⁠. This implies
Using the triangle inequality along with (3.4) yields
because |$\alpha _h$| and |$\beta _h$| are optimal transport plans. For the last term, we recall that |$\sigma =(\pi _{1,4})_\#\varGamma _h$| and employ (3.4) again, together with (3.12), to arrive at
This finishes the proof of the corollary.

In the next two sections, we explain how to use the preceding stability estimates to obtain some useful error estimates for numerical solutions of OT between the probability measures |$\mu ,\nu$| with densities |$f,g$|⁠, supported in |$\overline{X}, \overline{Y} \subset{\mathbb{R}}^d$|⁠, respectively, and quadratic cost |$c(x,y) = |x-y|^2$|⁠.

4. Error estimates for semi-discrete schemes

In this section, we present and prove error estimates for semidiscrete schemes such as those in Aurenhammer et al. (1998); Mérigot (2011); Lévy (2015); Kitagawa et al. (2019). We conclude our discussion with comparisons between our results and those of Berman (2018).

Let |$\mu _h = \sum _{i=1}^N f_i \delta _{x_i}$| (⁠|$f_i> 0$|⁠) be an approximation of |$\mu$| satisfying |$W_2(\mu , \mu _h) \le h$|⁠. One way to obtain such an approximation is explained in Section 1.2. The semidiscrete schemes are designed to compute the optimal transport plan between the discrete measure |$\mu _h$| and the continuous measure |$\nu$|⁠. The semidiscrete problem is equivalent to finding a nodal function |$\varphi _h:= \big (\varphi _h(x_i)\big )_{i=1}^N \in{\mathbb{R}}^N$| so that
(4.1)
where |$Y = \{y \in{\mathbb{R}}^d: g(y)> 0 \}$| and the discrete subdifferential set is given by
The convex sets |$F_i$| are called Laguerre cells (or generalized Voronoi cells). We do not discuss methods to determine |$\varphi _h$| for which we refer to Aurenhammer et al. (1998); Mérigot (2011); Lévy (2015); Kitagawa et al. (2019). As pointed out in (Berman, 2018, Lemma 5.3), vector |$\varphi _h \in{\mathbb{R}}^N$| is unique up to a constant if |$Y$| is a Lipschitz domain, which is a direct consequence of results in Kitagawa et al. (2019). Once we have found |$\varphi _h$|⁠, the optimal transport map from |$\nu$| to |$\mu _h$| is given by the map |$S_h$|  
and the induced optimal transport plan |$\gamma _h \in P(X \times Y)$| reads |$\gamma _h = (S_h, id)_{\#} \nu$| and
(4.2)
for all Borel sets |$E \subset X\times Y$|⁠. We also point out that there is no optimal map from |$\mu _h$| to |$\nu$| since every node |$x_i$| is associated with the Laguerre cell |$F_i$|⁠, which has infinite number of points (multivalued function). We are now ready to estimate the difference of |$\gamma _h$| and the continuous optimal map |$T$|⁠.

 

Theorem 4.1
(convergence rate for semidiscrete schemes).
Let the triple |$(\mu ,\nu ,\varphi )$| be |$\lambda$|-regular for |$\lambda> 0$| where the measures |$\mu$| to |$\nu$| have non-negative densities |$f$| and |$g$|⁠. Let |$T = \nabla \varphi$| be the optimal transport map from |$\mu$| to |$\nu$|⁠. Let |$\mu _h = \sum _{i=1}^N f_i \delta _{x_i}$| be an approximation of |$\mu$| with |$f_i> 0$|⁠. If |$\varphi _h$| is a nodal function that solves (4.1) and |$\gamma _h$| is the corresponding semidiscrete optimal plan (4.2), then we have
(4.3)
where
(4.4)
Moreover, if |$\mu _h$| satisfies |$W_2(\mu , \mu _h) \le h$| then there exists a constant |$C$| depending on |$(\mu , \nu )$| such that for |$h \le \lambda ^{-1}$| we have
(4.5)

 

Proof.
The first equality in (4.3) follows from the relation between |$\gamma _h=(S_h,id)_\#\nu$| and |$\varphi _h$| expressed in (4.1) and (4.2)
Now, we resort to Theorem 3.5 (perturbation of optimal transport plans). Since there is no discretization of |$\nu$|⁠, we let |$\beta _h = (id,id)_{\#} \nu$| and |$\alpha _h$| be the optimal transport plan between |$\mu$| and |$\mu _h$|⁠. Then, (3.6) in Theorem 3.5 implies that
with |$e_h=e_{\alpha _h} = W_2(\mu , \mu _h)$| and |$e_{\beta _h} = 0$|⁠. This proves (4.3), whereas (4.5) follows directly from |$W_2(\mu ,\mu _h) \le h$| and |$h\le \lambda ^{-1}$|⁠.

We recall that the optimal transport plan |$\alpha _h=(id,U_h)_{\#\mu }$| between |$\mu$| and |$\mu _h$| defined before (1.9) satisfies |$W_2(\mu , \mu _h) \le h$| according to (1.9). In addition, our estimates (4.3) and (4.5) contain geometric information about the Laguerre cells |$F_i$| provided |$g$| is strictly positive. To see this, for each |$1\le i\le N$| we introduce the center of mass (or barycenter)  |$m_i$| of |$F_i$| with respect to the measure |$\nu$|  
(4.6)

 

Corollary 4.2
(convergence rate for center of mass and Laguerre cells).
We make the same hypotheses as in Theorem 4.1 and further assume that |$g(y) \in [c_1, c_2]$| for all |$y \in Y$| for some |$c_2 \ge c_1> 0$|⁠, and that |$Y$| is a bounded convex set. Then, there exists a constant |$C$| depending on |$d$| and |$c_1, c_2$| such that
(4.7)
where |$F_i, m_i$| and |$E_h$| are defined in (4.1), (4.6) and (4.4), respectively. Moreover, if |$\mu _h$| satisfies |$W_2(\mu , \mu _h) \le h$|⁠, then there exists a constant |$C$| depending on |$(d, \mu , \nu , c_1, c_2)$| such that for |$h \le \lambda ^{-1}$| we have
(4.8)

 

Proof.
Using the facts that |$f_i = \nu (F_i) = \int _{F_i} g(y) \: \textrm{d}y$| and |$\int _{F_i} (m_i - y) g(y) \: \textrm{d}y = 0$|⁠, which are valid due to (4.1) and (4.6), we infer that
whence
Therefore, to prove (4.7) it remains to show that there exists a constant |$C$| such that
Notice that |$F_i = \partial \varphi _h(x_i) \cap Y$| is convex because both |$\partial \varphi _h(x_i)$| and |$Y$| are convex. Let |$\widetilde{m}_i:= |F_i|^{-1} \int _{F_i} y \: \textrm{d}y$| be the center of mass of |$F_i$| with respect to the Lebesgue measure, and apply (Gutiérrez, 2016, Theorem 1.8.2) to deduce the existence of an ellipsoid |$E$| centered at |$\widetilde{m}_i$| such that |$d^{-3/2} E \subset F_i \subset E$|⁠, where |$\alpha E$| denotes the |$\alpha$|-dilation of |$E$| with respect to |$\widetilde{m}_i$|⁠. Then, we have
where |$c_d$| is a constant depending on |$d$|⁠. In fact, we will prove below that for any ellipsoid |$E_0$| centered at |$z$|⁠, we have
(4.9)
where |$c^{\prime}_d$| is a constant depending on |$d$|⁠. We finally utilize the properties
and |$\mbox{diam}(E) \ge \mbox{diam}(F_i)$| because |$F_i \subset E$|⁠, to obtain
This finishes our proof of (4.7) by choosing |$C = c_d c_1 c_2^{-1}$|⁠. Finally, (4.8) is a direct consequence of (4.7) because |$E_h^2 \le C\lambda h$| whenever |$W_2(\mu ,\mu _h) \le h$| and |$h\le \lambda ^{-1}$|⁠.
It remains to show the geometric property (4.9). Let |$E_0$| be given by
and |$a_1 = \max _{1\le j \le d} a_j$|⁠; hence, |${\operatorname{diam}}(E_0)=2a_1$| and the barycenter |$z$| of |$E_0$| coincides with the origin. We consider now the subset of |$E_0$|  
which satisfies
We see that |$E_0 \subset E_1 = [-a_1,a_1] \times \big \{(y_j)_{j=2}^d: \sum _{j=2}^d \frac{y_j^2}{a_j^2} \le 1 \big \}$|⁠, whence |$|E_0| < |E_1| \le c_d |\widetilde{E}_0|$|⁠, the last inequality being a consequence of a scaling argument between |$E_1$| and |$\widetilde{E}_1$|⁠. Since |$|y|\ge \frac{a_1}{3} = \frac{1}{6} {\operatorname{diam}} (E_0)$| in |$\widetilde{E}_0$|⁠, we infer that
This shows (4.9) and concludes the proof.

 

Remark 4.3
(discrete transport map).
To compensate for not having an optimal transport map from |$\mu _h$| to |$\nu$|⁠, because |$(x_i\mapsto F_i)_{i=1}^N$| is a multivalued function, we could define the discrete transport map |$T_h$| to be
Expression (4.8) immediately yields the weighted |$L^2$| error estimate

 

Remark 4.4
(mean errors).

We point out that since |$\sum _{i=1}^N f_i = 1$|⁠, (4.8) shows that on average the pointwise error |$|T(x_i) - m_i|^2$| and |${\operatorname{diam}}(F_i)^2$| are of order |$O(h)$|⁠.

 

Remark 4.5
(approximate transport plans).
In practice, the semidiscrete schemes may not solve (4.1) exactly, but rather approximately: let |$\widetilde{\varphi }_h$| solve
with |$\widetilde{f}_i$| close to |$f_i$|⁠, and induce an optimal plan |$\widetilde{\gamma }_h$| between |$\widetilde{\mu }_h = \sum _{i=1}^N \widetilde{f}_i \delta _{x_i}$| and |$\nu$|⁠. We note that the error estimate (4.5) in Theorem 4.1 (convergence rate for semidiscete schemes) only requires |$W_2(\mu , \widetilde{\mu }_h) \le Ch$|⁠, and thus deduce that as long as the approximate measure |$\widetilde{\mu }_h$| satisfies |$W_2(\mu _h, \widetilde{\mu }_h) \le Ch$|⁠, we are still able to obtain
because |$W_2(\mu , \widetilde{\mu }_h) \le W_2(\mu , \mu _h) + W_2(\mu _h, \widetilde{\mu }_h) \le Ch$|⁠. This shows that enforcing (4.1) exactly may not be computationally profitable.

We conclude this section comparing our results of Theorem 4.1 and Corollary 4.2 with that of (Berman, 2018, Theorem 5.4). The nodal function |$\varphi _h=(\varphi _h(x_i))_{i=1}^N$| that satisfies (4.1) also induces the convex function |$\varGamma (\varphi _h)$|⁠, in fact the convex envelope of |$(\varphi _h(x_i))_{i=1}^N$|⁠. For convenience, we still denote |$\varGamma (\varphi _h)$| by |$\varphi _h$| and observe that, being piecewise linear, it dictates a partition |$\mathcal{T}_h$| of |$X$| into simplices with vertices |$(x_i)_{i=1}^N$|⁠. However, this partition |$\mathcal{T}_h$| of |$X$| might not be shape-regular in general even for a quasi-uniform distribution of nodes |$(x_i)_{i=1}^N$|⁠; we refer to (Nochetto & Zhang, 2019, Section 2.2). Therefore, there is no direct relation between elements |$K_j$| of |$\mathcal{T}_h$| containing |$x_i$| and the Voronoi cells |$V_i$| in (1.8). By assuming that |$\lambda ^{-1} I \le D^2 \varphi \le \lambda I$| for some |$\lambda> 0$| and the density |$g$| satisfies |$g(y) \ge c_1> 0$| for all |$y\in Y$|⁠, Berman obtains the following error estimate for the piecewise constant function |$\nabla \varphi _h$| over |$\mathcal{T}_h$| (Berman, 2018, Theorem 5.4)
(4.10)
where the constant |$C$| depends on |$\lambda , c_1$| and other information. We see that this rate of convergence is similar to those in Theorem 4.1 and Corollary 4.2, but the error notion is different. To explore this fact, we rewrite (4.10) as follows
Since the Hessian |$D^2 \varphi$| is uniformly bounded both from above and below, we infer that |$\nabla \varphi$| and |$(\nabla \varphi )^{-1}$| are both Lipschitz with Lipschitz constants proportional to |$\lambda$| and |$\lambda ^{-1}$|⁠. Therefore, the previous inequality is equivalent to
Let |$z_j = |K_j|^{-1} \int _{K_j} x \: \textrm{d}x$| be the barycenter of |$K_j$| with respect to the Lebesgue measure. An argument similar to the proof of Corollary 4.2 yields
where |$y_j = \nabla \varphi _h(x)$| is the constant slope for |$x \in K_j$|⁠. Exploiting again that |$T=\nabla \varphi$| is Lipschitz, we see that (4.10) leads to
(4.11)
This looks similar to (4.8), but involving simplices |$K_j$| instead of Laguerre cells |$F_i$|⁠; (4.8) and (4.11) are, however, intrinsically different. Several comments are in order.
  • |$\bullet$|

    The Laguerre cell |$F_i$| is the convex hull of all vectors |$y_j=\nabla \varphi |_{K_j}$| where |$K_j\in \mathcal{T}_h$| are the simplices containing |$x_i$|⁠, the so-called star (or patch) |$\omega _i$| (Nochetto & Zhang, 2018, Section 5.3). Therefore, |$m_i$| in (4.8) can be replaced by any |$y_j$| for |$K_j\subset \omega _i$| because of the occurrence of |${\operatorname{diam}} (F_i)$| in (4.8). However, there is no immediate relation between |$f_i=\nu (F_i)$| and |$|\omega _i|$| or |$\mu (\omega )$|⁠.

  • |$\bullet$|

    Both estimates (4.8) and (4.11) require a lower positive bound for |$g$|⁠. However, (4.5) just entails |$\lambda$|-regularity of |$(\mu ,\nu ,\varphi )$|⁠, but not a lower bound on either |$g$| or |$f$|⁠; note that the assumption |$f_i>0$| in Theorem 4.1 is for convenience because if |$f_i=0$| we could simply drop the Dirac mass at |$x_i$|⁠.

  • |$\bullet$|

    The estimate (4.5) applies to discrete transport plans and extends to fully discrete schemes; see Section 5. It is not clear what a fully discrete version of (4.11) could be if the measure |$\nu$| is further discretized into a sum of Dirac measures.

  • |$\bullet$|

    The derivation of Theorem 4.1 and Corollary 4.2 is purely analytical and hinges on the quantitative stability estimates of Gigli (2011). In contrast, the proof of (4.11) in Berman (2018) uses a complexification argument to deduce estimates from well-known inequalities in Kähler geometry and pluripotential theory.

  • |$\bullet$|

    Both (4.8) and (4.11) are meaningful ways to measure the error between the continuous optimal map |$T$| and the semidiscrete one |$T_h$|⁠. Our error notion is natural in dealing with optimal transport maps and plans, while the error notion in Berman (2018) is perhaps more natural in the setting of Monge–Ampère equations.

5. Error estimates for fully discrete schemes

If we approximate both measures |$\mu , \nu$| by discrete measures |$\mu _h = \sum _{i=1}^N f_i \delta _{x_i}$| and |$\nu _h = \sum _{j=1}^M g_j \delta _{y_j}$|⁠, respectively, then we can consider the following linear programming problem to approximate the optimal transport map |$T$| (Cuturi, 2013; Schmitzer & Schnörr, 2013; Benamou et al., 2015; Oberman & Ruan, 2015; Schmitzer, 2016): find the discrete measure |$\gamma _h=\sum _{i=1}^N\sum _{j=1}^M\gamma _{h,ij} \delta _{x_i} \delta _{y_j}$| such that
(5.1)
where |$c_{ij} = |x_i - y_j|^2$|⁠. One thing worth pointing out here is that the solution of the above problem may not be unique even though the continuous problem has a unique optimal map |$T$| (and plan |$\gamma$|⁠); see Remark 3.7 (non-uniqueness). The minimum transport cost of (5.1) is equal to |$W^2_2(\mu _h, \nu _h)$| according to the definition, but in practice we may not get an exact optimal plan |$\gamma _h$|⁠, whence the cost |$\sum _{i,j} \gamma _{h,ij} c_{ij}$| might be larger than |$W^2_2(\mu _h, \nu _h)$|⁠.
Since the discrete plan |$\gamma _h$| may not be induced by a map, i.e. for some |$i$| there might exist more than one |$j$| such that |$\gamma _{h,ij}> 0$|⁠, one may need to define a map |$T_h$| from |$\gamma _h$| to approximate the optimal transport map |$T$| between |$\mu$| and |$\nu$|⁠. One way is to use a barycentric projection (Oberman & Ruan, 2015, Definition 5) and define |$T_h$| to be
(5.2)
This is a discrete counterpart of (4.6). In general, the quantity |$T_h(x_i)$| may not belong to the set |$\{y_j: j=1,\cdots ,M\}$|⁠. The following theorem quantifies the errors committed in approximating the optimal plan |$\gamma$| by |$\gamma _h$| and the optimal map |$T$| by the map |$T_h$| generated from |$\gamma _h$|⁠.

 

Theorem 5.1
(convergence rate for fully discrete schemes).
Let the triple |$(\mu ,\nu ,\varphi )$| be |$\lambda$|-regular for |$\lambda> 0$| where the measures |$\mu$| to |$\nu$| have non-negative densities |$f$| and |$g$|⁠. Let |$T = \nabla \varphi$| be the optimal transport map from |$\mu$| to |$\nu$|⁠. Let the approximations |$\mu _h = \sum _{i=1}^N f_i \delta _{x_i}$|⁠, |$\nu _h = \sum _{j=1}^M g_j \delta _{y_j}$| of |$\mu , \nu$| satisfy |$W_2(\mu ,\mu _h), W_2(\nu ,\nu _h) \le C_1h$|⁠. If |$\gamma _h = \sum _{i,j=1}^{N,M}\gamma _{h,ij} \delta _{x_i} \delta _{y_j}$|  |$\in \varPi (\mu _h, \nu _h)$| is a discrete transport plan so that
then for |$h \le \min \{\lambda ,\lambda ^{-1}\}$| we have
(5.3)
where |$C$| is a constant depending on |$(\mu , \nu , C_1, C_2)$|⁠. In addition, the Wasserstein distance between |$\gamma$| and |$\gamma _h$| satisfies
(5.4)
and the map |$T_h$| generated by |$\gamma _h$| according to (5.2) verifies
(5.5)

 

Proof.
The equality in (5.3) follows from the definition of |$\gamma _h=\sum _{i,j=1}^{N,M}\gamma _{h,ij} \delta _{x_i} \delta _{y_j}$|⁠. By letting |$\alpha _h \in \varPi (\mu ,\mu _h), \beta _h \in \varPi (\nu _h, \nu )$| be the corresponding optimal transport plans between |$\mu ,\mu _h$| and |$\nu ,\nu _h$|⁠, the inequality in (5.3) is a direct consequence of (3.11) in Corollary 3.8 (perturbation of transport plans in |$L^2$|⁠) because
and |$h\le \min \{\lambda ,\lambda ^{-1}\}$|⁠. Similarly, (5.4) follows from Corollary 3.9 (perturbation of transport plans in Wasserstein metric). The proof of (5.5) is a discrete version of Corollary 4.2 (convergence rate for center of mass and Laguerre cells). We first notice that (5.1), together with the definition (5.2) of |$T_h$|⁠, yields
This implies the orthogonality relation
whence
Finally, we combine this equality and (5.3) to arrive at
This shows (5.5) and thus finishes the proof of this theorem.

Theorem 5.1 (convergence rates for fully discrete schemes) is the first result known to us that quantitatively measures the errors between |$\gamma$| and |$\gamma _h$| and between |$T$| and |$T_h$|⁠. The convergence rate |$O(h^{1/2})$| coincides with that for semidiscrete schemes developed in Section 4 and first proved by Berman (2018) for a different error notion. Moreover, Theorem 5.1 reveals that it is enough to solve the linear programming problem (5.1) approximately provided that the cost for |$\gamma _h$| is within |$O(h)$| from the optimal cost. Therefore, approximate techniques from Cuturi (2013); Benamou et al. (2015); Oberman & Ruan (2015) are relevant in practice.

It is also worth pointing out that, according to Theorem 5.1, although |$\gamma _h$| is not generally sparse it must be close to the sparse plan |$\widetilde{\gamma }_h:= (S_h)_{\#\mu _h}$|⁠, the pushforward of |$\mu _h$| by the map |$S_h:=(id,T_h): \{x_i\}_{i=1}^N\subset X \to X\times Y$| and given by
for all measurable sets |$A \subset X\times Y$|⁠. We hope this observation might provide insight on acceleration processes for solving problem (5.1) that take advantage of sparsity of |$\gamma _h$|⁠, as shown for instance in Oberman & Ruan (2015).

Funding

R.H. Nochetto was partially supported by the National Science Foundation Grants DMS-1411808 and DMS-1908267. W. Li was partially supported by National Science Foundation Grant DMS-1411808 and the Patrick and Marguerite Sung Fellowship in Mathematics of the University of Maryland.

References

Angenent
,
S.
,
Haker
,
S.
&
Tannenbaum
,
A.
(
2003
)
Minimizing flows for the Monge–Kantorovich problem
.
SIAM J. Math. Anal.
,
35
,
61
97
.

Aurenhammer
,
F.
,
Hoffmann
,
F.
&
Aronov
,
B.
(
1998
)
Minkowski-type theorems and least-squares clustering
.
Algorithmica
,
20
,
61
76
.

Azé
,
D.
&
Penot
,
J.-P.
(
1995
)
Uniformly convex and uniformly smooth convex functions
.
Ann. Fac. Sci. Toulouse Math. (6)
,
4
,
705
730
.

Beiglböck
,
M.
,
Henry-Labordère
,
P.
&
Penkner
,
F.
(
2013
)
Model-independent bounds for option prices—a mass transport approach
.
Finance Stoch.
,
17
,
477
501
.

Benamou
,
J.-D.
&
Brenier
,
Y.
(
2000
)
A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem
.
Numer. Math.
,
84
,
375
393
.

Benamou
,
J.-D.
,
Carlier
,
G.
,
Cuturi
,
M.
,
Nenna
,
L.
&
Peyré
,
G.
(
2015
)
Iterative Bregman projections for regularized transportation problems
.
SIAM J. Sci. Comput.
,
37
,
A1111
A1138
.

Benamou
,
J.-D.
&
Duval
,
V.
(
2019
)
Minimal convex extensions and finite difference discretization of the quadratic Monge–Kantorovich problem
.
European J. Appl. Math.
,
30
,
1041
1078
.

Benamou
,
J.-D.
,
Froese
,
B. D.
&
Oberman
,
A. M.
(
2014
)
Numerical solution of the optimal transportation problem using the Monge–Ampère equation
.
J. Comput. Phys.
,
260
,
107
126
.

Berman
,
R. J.
(
2018
)
Convergence rates for discretized Monge–Ampère equations and quantitative stability of optimal transport
.
Preprint available at arXiv:1803.00785
.

Brenier
,
Y.
(
1987
)
Décomposition polaire et réarrangement monotone des champs de vecteurs
.
C. R. Acad. Sci. Paris Sér. I Math.
,
305
,
805
808
.

Brenier
,
Y.
(
1989
)
The least action principle and the related concept of generalized flows for incompressible perfect fluids
.
J. Amer. Math. Soc.
,
2
,
225
255
.

Burkard
,
R.
,
Dell’Amico
,
M.
&
Martello
,
S.
(
2012
)
Assignment Problems, Revised Reprint
, vol. 106.
Philadelphia, PA, USA
:
Society for Industrial and Applied Mathematics (SIAM)
.

Caffarelli
,
L. A.
(
1992a
)
Boundary regularity of maps with convex potentials
.
Comm. Pure Appl. Math.
,
45
,
1141
1151
.

Caffarelli
,
L. A.
(
1992b
)
The regularity of mappings with a convex potential
.
J. Amer. Math. Soc.
,
5
,
99
104
.

Caffarelli
,
L. A.
(
1996
)
Boundary regularity of maps with convex potentials, II
.
Ann. of Math. (2)
,
144
,
453
496
.

Carlier
,
G.
&
Santambrogio
,
F.
(
2005
)
A variational model for urban planning with traffic congestion
.
ESAIM Control Optim. Calc. Var.
,
11
,
595
613
.

Chiappori
,
P.-A.
,
McCann
,
R. J.
&
Nesheim
,
L. P.
(
2010
)
Hedonic price equilibria, stable matching, and optimal transport: equivalence, topology, and uniqueness
.
Econom. Theory
,
42
,
317
354
.

Cullen
,
M.
&
Maroofi
,
H.
(
2003
)
The fully compressible semi-geostrophic system from meteorology
.
Arch. Rational Mech. Anal.
,
167
,
309
336
.

Cullen
,
M. J. P.
&
Douglas
,
R. J.
(
1999
)
Applications of the Monge–Ampère equation and Monge transport problem to meteorology and oceanography
.
Monge Ampère Equation: Applications to Geometry and Optimization (Deerfield Beach, FL, 1997)
.
Contemp. Math.
, vol. 226.
Providence, RI
:
American Mathematical Society
, pp.
33
53
.

Cuturi
,
M.
(
2013
)
Sinkhorn distances: lightspeed computation of optimal transport
.
Advances in Neural Information Processing Systems 26
, (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani & K. Q. Weinberger eds).
Red Hook, NY, United States
:
Curran Associates, Inc.
pp.
2292
2300
.

Figalli
,
A.
(
2017
)
The Monge–Ampère Equation and Its Applications
.
Zurich Lectures in Advanced Mathematics
.
Zürich
:
European Mathematical Society (EMS)
.

Froese
,
B. D.
(
2012
)
A numerical method for the elliptic Monge–Ampère equation with transport boundary conditions
.
SIAM J. Sci. Comput.
,
34
,
A1432
A1459
.

Gigli
,
N.
(
2011
)
On Hölder continuity-in-time of the optimal transport map towards measures along a curve
.
Proc. Edinb. Math. Soc. (2)
,
54
,
401
409
.

Gutiérrez
,
C. E.
(
2016
)
The Monge–Ampère Equation
.
Progress in Nonlinear Differential Equations and Their Applications
, vol. 89, 2nd edn.
Birkhäuser
:
Springer
.

Gutiérrez
,
C. E.
&
Huang
,
Q.
(
2009
)
The refractor problem in reshaping light beams
.
Arch. Rational Mech. Anal.
,
193
,
423
.

Hamfeldt
,
B. F.
(
2019
)
Convergence framework for the second boundary value problem for the Monge–Ampère equation
.
SIAM J. Numer. Anal.
,
57
,
945
971
.

Kantorovich
,
L. V.
(
1942
)
On the translocation of masses
.
C. R. (Doklady) Acad. Sci. URSS (N.S.)
,
37
,
199
201
.

Kitagawa
,
J.
,
Mérigot
,
Q.
&
Thibert
,
B.
(
2019
)
Convergence of a Newton algorithm for semi-discrete optimal transport
.
J. Eur. Math. Soc. (JEMS)
,
21
,
2603
2651
.

Lévy
,
B.
(
2015
)
A numerical algorithm for |$L_2$| semi-discrete optimal transport in 3D
.
ESAIM Math. Model. Numer. Anal.
,
49
,
1693
1715
.

Lindsey
,
M.
&
Rubinstein
,
Y. A.
(
2017
)
Optimal transport via a Monge–Ampère optimization problem
.
SIAM J. Math. Anal.
,
49
,
3073
3124
.

Mérigot
,
Q.
(
2011
)
A multiscale approach to optimal transport
.
Comput. Graph. Forum
,
30
,
1583
1592
.
Wiley Online Library
.

Monge
,
G.
(
1781
)
Mémoire sur la théorie des déblais et des remblais
.
Histoire de l’Académie Royale des Sciences de Paris
.

Neilan
,
M.
&
Zhang
,
W.
(
2018
)
Rates of convergence in |${\mathrm{W}}_{\mathrm{p}}^2$|-norm for the Monge–Ampère equation
.
SIAM J. Numer. Anal.
,
56
,
3099
3120
.

Nochetto
,
R. H.
&
Zhang
,
W.
(
2018
)
Discrete ABP estimate and convergence rates for linear elliptic equations in non-divergence form
.
Found. Comput. Math.
,
18
,
537
593
.

Nochetto
,
R. H.
&
Zhang
,
W.
(
2019
)
Pointwise rates of convergence for the Oliker–Prussner method for the Monge–Ampère equation
.
Numer. Math.
,
141
,
253
288
.

Oberman
,
A. M.
&
Ruan
,
Y.
(
2015
)
An efficient linear programming method for optimal transportation
.
Preprint available at arXiv:1509.03668
.

Oliker
,
V. I.
&
Prussner
,
L. D.
(
1989
)
On the numerical solution of the equation |$({\partial }^2\mathrm{z}/\partial{\mathrm{x}}^2 )({\partial }^2\mathrm{z}/\partial{\mathrm{y}}^2)-{(({\partial }^2\mathrm{z}/\mathrm{\partial x\partial y}))}^2=\mathrm{f}$| and its discretizations. I
.
Numer. Math.
,
54
,
271
293
.

Papadakis
,
N.
,
Peyré
,
G.
&
Oudet
,
E.
(
2014
)
Optimal transport with proximal splitting
.
SIAM J. Imaging Sci.
,
7
,
212
238
.

Prins
,
C. R.
,
ten Thije Boonkkamp
,
J. H. M.
,
Van Roosmalen
,
J.
,
Jzerman
,
W. L.
&
Tukker
,
T. W.
(
2014
)
A Monge–Ampère-solver for free-form reflector design
.
SIAM J. Sci. Comput.
,
36
,
B640
B660
.

Rabin
,
J.
,
Delon
,
J.
&
Gousseau
,
Y.
(
2010
)
Regularization of transportation maps for color and contrast transfer
.
2010 17th IEEE International Conference on Image Processing (ICIP)
.
IEEE
, pp.
1933
1936
.

Rubner
,
Y.
,
Tomasi
,
C.
&
Guibas
,
L. J.
(
2000
)
The earth mover’s distance as a metric for image retrieval
.
Int. J. Comput. Vis.
,
40
,
99
121
.

Santambrogio
,
F.
(
2015
)
Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling
.
Progress in Nonlinear Differential Equations and Their Applications
, vol. 87.
Cham
:
Birkhäuser/Springer
.

Schmitzer
,
B.
(
2016
)
A sparse multiscale algorithm for dense optimal transport
.
J. Math. Imaging Vision
,
56
,
238
259
.

Schmitzer
,
B.
&
Schnörr
,
C.
(
2013
)
A hierarchical approach to optimal transport
.
International Conference on Scale Space and Variational Methods in Computer Vision
.
Springer
, pp.
452
464
.

Villani
,
C.
(
2009
)
Optimal Transport: Old and New
.
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]
, vol. 338.
Berlin
:
Springer
.

Xia
,
Q.
(
2003
)
Optimal paths related to transport problems
.
Commun. Contemp. Math.
,
5
,
251
279
.

Author notes

Dedicated to the memory of John W. Barrett

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)