
Matrix versions of the Hellinger distance. (English) Zbl 1420.15016

Lett. Math. Phys. 109, No. 8, 1777-1804 (2019); correction ibid. 109, No. 12, 2779-2781 (2019).
Let \((p_{1},p_{2},\dots,p_{n})\) and \((q_{1},q_{2},\dots,q_{n})\) be two probability distributions. Then the Hellinger distance between them is defined to be \(\left\{ \sum_{i}(\frac{1}{2}(p_{i}+q_{i})-\sqrt{p_{i}q_{i}})\right\} ^{1/2}\). In terms of the diagonal matrices \(P:=\mathrm{diag}(p_{1},p_{2},\dots)\) and \(Q:=\mathrm{diag}(q_{1},q_{2},\dots,q_{n})\), this can be written \[ d_{H}(P,Q):=\sqrt{\operatorname{tr}\mathcal{A}(P,Q)-\operatorname{tr}\mathcal{G}(P,Q)}, \] where \(\mathcal{A}\) and \(\mathcal{G}\) represent the arithmetic and geometric means of \(P\) and \(Q\), respectively. The goal of the present paper is to examine some extensions of the above definition for general \(n\times n\) complex semipositive definite matrices.
Although there is a natural unique way to define \(\mathcal{A}\), there is more than one way to define the square root of a product of semipositive definite matrices and hence more than one way to define \(\mathcal{G}\). Let \(A\) and \(B\) be arbitrary semipositive definite matrices and let \(A^{1/2}\) and \(B^{1/2}\) be their (unique) positive semidefinite square roots. Write \(\left\Vert \ \right\Vert _{2}\) to denote the Frobenius norm and \(\mathbb{P}\) to denote the \(n\times n\) positive definite matrices. Then the following functions are considered: \(d_{1}(A,B):=\left\Vert A^{1/2}-B^{1/2}\right\Vert _{2}=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}A^{1/2}B^{1/2}\right\} ^{1/2}\); \(d_{2}(A,B):=\left\{ \operatorname{tr}(A+B)-\operatorname{tr}(A^{1/2}BA^{1/2})^{1/2}\right\} ^{1/2}\); \(d_{3}(A,B):=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}A\#B\right\} ^{1/2}\) where \(A\#B:=A^{1/2}(A^{-1/2} BA^{-1/2})^{1/2}A^{1/2}\); and \(d_{4}(A,B):=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}\mathcal{L} (A,B)\right\} ^{1/2}\) where \(\mathcal{L}(A,B):=\exp\left( \frac{1}{2}(\log A+\log B)\right) \) (defined only for strictly positive definite matrices). The functions \(d_{1}\) and \(d_{2}\) define metrics (\(d_{2}\) is sometimes called the Bures distance or Wasserstein metric) but \(d_{3}\) and \(d_{4}\) fail to satisfy the triangle inequality so do not define metrics. The main results of this paper concern the functions \(\Phi_{k}(A,B):=d_{k}(A,B)^{2}\) for \(k=3\) and \(4\). In particular, it is shown that \(\Phi_{3}\) and \(\Phi_{4}\) are divergence functions (see [S.-i. Amari [Information geometry and its applications. Tokyo: Springer (2016; Zbl 1350.94001)]) and have useful convexity properties such as (Theorem 8): for each \(A\in\mathbb{P}\) the function \(X\longmapsto\Phi_{4}(A,X)\) is strictly convex on \(\mathbb{P}\).


15A60 Norms of matrices, numerical range, applications of functional analysis to matrix theory
51K05 General theory of distance geometry
15B48 Positive matrices and their generalizations; cones of matrices
49K35 Optimality conditions for minimax problems
94A17 Measures of information, entropy
81P45 Quantum information, communication, networks (quantum-theoretic aspects)


Zbl 1350.94001


