×

Interpolating between the jaccard distance and an analogue of the normalized information distance. (English) Zbl 07638203

Summary: Jiménez, Becerra and Gelbukh (2013) defined a family of ‘symmetric Tversky ratio models’ \(S_{\alpha ,\beta}\), \(0\leq\alpha\leq1\), \(\beta > 0\). Each function \(D_{\alpha ,\beta} = 1-S_{\alpha, \beta}\) is a semimetric on the powerset of a given finite set.
We show that \(D_{\alpha, \beta}\) is a metric if and only if \(0\leq \alpha \leq \frac{1}{2}\) and \(\beta\geq1/(1-\alpha)\). This result is formally verified in the Lean proof assistant.
The extreme points of this parametrized space of metrics are \(\mathcal{V_1} = D_{1/2, 2}\), the Jaccard distance and \(\mathcal{V}_\infty = D_{0, 1}\), an analogue of the normalized information distance of M. Li, Chen, X. Li, Ma and Vitányi (2004).
As a second interpolation, in general, we also show that \(\mathcal{V}_p\) is a metric, \(1\leq p\leq \infty\), where \begin{gather*} \varDelta_p(A,B) = (|B\setminus A|^p + |A\setminus B|^p)^{1/p}, \\ \mathcal{V}_p(A, B) = \frac{\varDelta_p(A,B)}{|A\cap B| + \varDelta_p(A, B)}. \end{gather*}

MSC:

03-XX Mathematical logic and foundations
68-XX Computer science