The infinity Wasserstein distance $W_\infty$ and the weak topology

Question

Let $X$ be a compact metric space (without isolated points). The $\infty$-Wasserstein distance $W_\infty$ on the space of Borel probability measures on $X$ can be described as $$W_\infty(\mu,\nu) = \inf\{r>0 \mid \mu(U)\le\nu(U_r)\,\forall \text{ open } U\subseteq X\},$$ where $U_r=\{x\in X \mid d(x,U)\le r\}$. The topology induced by this distance is in general finer than the weak (or vague) topology. For example, if $X=[0,1]$ and $\delta_x$ denotes the point mass at $x$, the measures $\mu_n=\frac{n-1}{n}\delta_0+\frac{1}{n}\delta_1$ converge weakly to $\delta_0$, but $W_\infty(\mu_n,\delta_0)=1$ for every $n$. On the other hand, $W_\infty$ dominates the Levy-Prokhorov distance, so $W_\infty$-convergence does imply weak convergence.

Question: is $W_\infty$-convergence equivalent to weak convergence if one restricts to fully supported measures?

(By fully supported I mean those measures $\mu$ such that $\mu(U)>0$ for every nonempty open set $U\subseteq X$.)

I believe this is the case at least for $X=[0,1]$, but I just want to make sure it is not obviously ridiculous!

This is a duplicate of a question I asked on stackexchange, but perhaps it is too specialised for that site.

Couldn't you just add Lebesgue measure to both $\mu_n$ and $\delta_0$ (and normalize if you want), and then you have two fully supported measures with weak but not $W_\infty$ convergence for the same reason? — Ronnie Pavlov, Commented Sep 23, 2021 at 13:20
And it shouldn't hold even if you assume your measures are monatomic; you could just replace your delta-measures by normalized Lebesgue measure over a pair of disjoint closed intervals, and it should have weak but not $W_\infty$ convergence for the same reason as yours. It seems that $W_\infty$ is just a drastically stronger notion of convergence. — Ronnie Pavlov, Commented Sep 23, 2021 at 13:24
But doesn't adding the Lebesgue measure in fact give $W_\infty$ convergence? My intuition for what goes wrong in the example I gave is that any small neighbourhood $U$ of $1$ must be extended all the way to $0$ to get $\delta_0(U_r)>\mu_n(U)$. But with Lebesgue available, you can get away with extending by a small amount. Thanks for your response, and sorry if I am way off the mark. — Baruch Spinoza, Commented Sep 23, 2021 at 14:00
Also, I guess what I really should have said instead of "without isolated points" is "connected". — Baruch Spinoza, Commented Sep 23, 2021 at 14:12
Of course you're right. I figured I was missing something easy. Interesting question! — Ronnie Pavlov, Commented Sep 23, 2021 at 20:16

Steve · Accepted Answer · 2021-09-24 09:24:02Z

First, I interpret your condition that "$X$ has no isolated points" in the following ways: First, every ball $B(x, \varepsilon)$ has non-empty interior. This means, in particular, that we can find arbitrarily fine partitions of $X$ with sets that each have non-empty interior. And second, if we have $X = \Omega_1 \stackrel{.}{\cup} \Omega_2$ for two non-empty sets $\Omega_1, \Omega_2$, then $\inf_{x_1 \in \Omega_1, x_2 \in \Omega_2} d(x_1, x_2) = 0$. I am not entirely sure if these conditions can be stated in simpler terms, but they don't seem unreasonable to me.

If this is what you had in mind (it's certainly satisfied for $[0, 1]$ or convex and compact subsets of $\mathbb{R}^d$), then I believe the two notions of convergence are indeed equivalent, see the proposed proof below. It's rather lengthy, so I hope I didn't miss anything. Please let me know if the proof seems incomplete.

Let $\mu \in \mathcal{P}(X)$ be fully supported and $(\mu_k)_{k \in \mathbb{N}}$ be a sequence in $\mathcal{P}(X)$ such that $\mu_k$ converges weakly to $\mu$. In particular, $d_P(\mu_k, \mu) \rightarrow 0$ for the Prokhorov metric $d_P$.

Let $\mathcal{F}_1, \mathcal{F}_2, ...$ be a sequence of refinements of partitions of $X$, i.e., $\mathcal{F}_N = \{\Omega_{N, 1}, \dots, \Omega_{N, N}\}$, $X = \stackrel{.}{\cup}_{i=1}^N \Omega_{N, i}$, such that $\delta(N) := \max_{i=1, \dots, N} \max_{x, y \in \Omega_{N, i}}d(x, y) \rightarrow 0$ for $N \rightarrow 0$, each $\Omega_{N, i}$ has non-empty interior and choose some fixed $x_{N, i} \in \Omega_{N, i}$. Note that by the "connectedness condition" on X, for any $J \subsetneq \{1, \dots, N\}$, it holds $\min_{j \in J} \min_{i \not\in J} d(x_i, x_j) \leq 2 \delta(N)$.

Define $\bar\mu^N := \sum_{i=1}^N \delta_{x_{N, i}} \mu(\Omega_{N, i})$. It is clear that $W_\infty(\mu, \bar\mu^N) \leq \delta(N)$. The fact that the partitions are refinements implies, in particular, that $\bar\mu^{N_2}(\Omega_{N_1, i}) = \bar\mu^{N_1}(\Omega_{N_1, i})$ for $N_2 \geq N_1$. Define $w(N) := \min_{i=1, \dots, N} \mu(\Omega_{N, i})$, which is positive by assumption on $\mu$.

Note that for any $N_1, N_2 \in \mathbb{N}$ with $N_2 \geq N_1$ and any $A \subset X$ open such that $\bar\mu^{N_2}(A^{3 \delta(N_1)}) \neq 1$, it holds $$ \bar\mu^{N_2}(A^{4 \delta(N_1)}) \geq \bar\mu^{N_2}(A^{2 \delta(N_1)}) + w(N_1), $$ the reason being as follows: $\bar\mu^{N_2}(A^{3 \delta(N_1)}) \neq 1$ implies that there exists some $i \in \{1, \dots, N_1\}$ such that $\Omega_{N_1, i} \cap A^{2 \delta(N_1)} = \emptyset$ (since otherwise $A^{3 \delta(N_1)} = X$). In particular, by connectedness of $X$ there exists such an $i$ such that $\Omega_{N_1, i} \subset A^{4 \delta(N_1)}$. I.e., \begin{align} \bar\mu^{N_2}(A^{4 \delta(N_1)}) &\geq \bar\mu^{N_2}(A^{2 \delta(N_1)}) + \bar\mu^{N_2}(\Omega_{N_1, i})\\ &= \bar\mu^{N_2}(A^{2 \delta(N_1)}) + \bar\mu^{N_1}(\Omega_{N_1, i})\\ &\geq \bar\mu^{N_2}(A^{2 \delta(N_1)}) + w(N_1). \end{align}

Now for $\varepsilon > 0$, we choose $N_1$ large enough so that $\delta(N_1) < \varepsilon$. Let $r(N_1) := \min\{2 \delta(N_1), w(N_1)\}$. Choose $N_2, k$ large enough so that $d_P(\bar\mu^{N_2}, \mu_k) \leq r(N_1)$.

Let $A \subset X$ open. Either $\bar\mu^{N_2}(A^{3 \delta(N_1)}) = 1$, and hence $\bar\mu^{N_2}(A^{4 \delta(N_1)}) \geq \mu_k(A)$ holds trivially. Or, $\bar\mu^{N_2}(A^{3 \delta(N_1)}) < 1$ and hence by the above $$ \bar\mu^{N_2}(A^{4 \delta(N_1)}) \geq \bar\mu^{N_2}(A^{2 \delta(N_1)}) + w(N_1) \geq \bar\mu^{N_2}(A^{r(N_1)}) + r(N_1) \geq \mu_k(A), $$ where the last step follows since $d_P(\bar\mu^{N_2}, \mu_k) \leq r(N_1)$. Therefore, $W_\infty(\bar\mu^{N_2}, \mu_k) \leq 4 \delta(N_1) \leq 4 \varepsilon$. By the triangle inequality, $W_\infty(\mu, \mu_k) \leq W_\infty(\bar\mu^{N_2}, \mu_k) + W_\infty(\bar\mu^{N_2}, \mu) \leq 5 \varepsilon$, which yields the claim.

Thanks a lot, this all looks good to me. And for $X=[0,1]$ your argument does match with what I had in mind. Very nice! — Baruch Spinoza, Commented Sep 24, 2021 at 11:27

Stack Exchange Network

The infinity Wasserstein distance $W_\infty$ and the weak topology

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
measure-theory
geometric-probability
.

Linked

The infinity Wasserstein distance $W_\infty$ and the weak topology

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged measure-theorygeometric-probability.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
measure-theory
geometric-probability
.