Embedding-based statistical inference
on generative models

Hayden Helm
Nomic AI & Helivan Research &Aranyak Acharyya
Johns Hopkins University &Brandon Duderstadt
Nomic AI \ANDYoungser Park
Johns Hopkins University &Carey Priebe
Johns Hopkins University
Abstract

The recent cohort of publicly available generative models can produce human expert level content across a variety of topics and domains. Given a model in this cohort as a base model, methods such as parameter efficient fine-tuning, in-context learning, and constrained decoding have further increased generative capabilities and improved both computational and data efficiency. Entire collections of derivative models have emerged as a byproduct of these methods and each of these models has a set of associated covariates such as a score on a benchmark, an indicator for if the model has (or had) access to sensitive information, etc. that may or may not be available to the user. For some model-level covariates, it is possible to use “similar” models to predict an unknown covariate. In this paper we extend recent results related to embedding-based representations of generative models – the data kernel perspective space – to classical statistical inference settings. We demonstrate that using the perspective space as the basis of a notion of “similar” is effective for multiple model-level inference tasks.


corresponding author

1 Introduction

Generative models have recently met or surpassed human-level standards on benchmarks across a range of tasks (Nori et al., 2023; Katz et al., 2024; Dubey et al., 2024). While these claims warrant skepticism and further robustness evaluation (Ness et al., 2024), the impressive capabilities have created a competitive environment for training state-of-the-art models and have inspired the development of complementary methods to adapt models to particular use cases. For example, quantizing the weights of a neural model enables a trade-off of model precision with vRAM and disk space (Gholami et al., 2022) while methods such as Low Rank Adaptation (LoRA) (Hu et al., 2021) and prompt-tuning (Lester et al., 2021) enable compute- and data-efficient model adaptation. Other methods such as retrieval-augmented generation (Lewis et al., 2020), model merging (Matena & Raffel, 2022), constrained decoding (Hokamp & Liu, 2017), etc. (Dettmers et al., 2024; Edge et al., 2024), have similarly contributed to the rapid development of a large population of diverse and accessible models.

Each model in the population has an accompanying set of covariates – scores on benchmarks, training mixture proportions, model safety scores, probability of hallucination, etc. – that are a function of the model, the training set, the architecture, the retrieval database, or derivatives thereof. For a given model, the covariate of interest or the function to calculate it may not be available to the user. For example, a proprietary model may have been trained on copyrighted data and the indicator for whether or not the model has had access to the data is unknown. Methods for predicting model-level covariates are necessary in these settings, and others, to fully understand the behavior and properties of a model.

In this paper we extend recent theoretical results for embedding-based representations of generative models (Acharyya et al., 2024) to statistical inference settings. In particular, our results show that the embedding-based representations of a collection of models can be used for consistent inference for a wide class of inference problems. We demonstrate the effectiveness of the representations for three downstream inference tasks, including predicting the presence of sensitive information in a model’s training mixture and predicting a model’s safety. We include empirical investigations of performance sensitivity to hyperparameters required to induce the model representations.

1.1 Background & related work

Our work is an extension of recent theoretical and empirical results on embedding-based representations of generative models (Acharyya et al., 2024), which itself is a continuation of a long-line of embedding-based investigations of the inputs and outputs of generative models (Mikolov, 2013; Reimers, 2019; Neelakantan et al., 2022; Patil et al., 2023). Of particular relevance is recent empirical work defining the data kernel (Duderstadt et al., 2023) and investigations into its ability to track the dynamics of interacting models (Helm et al., 2024), in which the experiments demonstrate the ability to parlay the embeddings of a collection of inputs or outputs into useful vector representations of the generative models themselves in both white-box and black-box settings. The results herein further theoretically and empirically validate these findings in the context of statistical inference.

Our work is also related to using embedding-based techniques for inference on complicated objects such as entire mouse connectomes (Wang et al., 2020), physiological data (Chen et al., 2022), and classification distributions (Helm et al., 2021). In each of these settings, the authors define a pairwise distance matrix on the objects and apply multi-dimensionsal scaling to obtain vector representations of each of the objects. Once there is one vector representation per object, standard inference methods for the specific task can be used. The method we study herein follows this general formula, with the additional complication that the objects are random mappings.

Lastly, our work is a part of the relatively new literature on inference on generative models. For example, FlashHELM (Perlitz et al., 2023) uses the score on a subset of a benchmark such as HELM (Liang et al., 2022) to quickly identify where a model fits on a leaderboard. More recent work proposes scaling laws to predict the performance of base models on a suite of benchmarks as a function of training FLOPs (Ruan et al., 2024). The method proposed herein does not assume access to a scoring function nor does it assume access to a function of the model’s weights at inference time. Perhaps most importantly, our setting and method can be applied to general model-level prediction scenarios.

2 The Data Kernel Perspective Space

Following Acharyya et al. (2024), we consider a generative model f𝑓fitalic_f with weights W𝑊Witalic_W and decoding function (temperature, maximum response length, etc.) δ𝛿\deltaitalic_δ to be a random map from a query space 𝒬𝒬\mathscr{Q}script_Q to a response space 𝒳𝒳\mathcal{X}caligraphic_X. We let f1,,fnsubscript𝑓1subscript𝑓𝑛f_{1},\ldots,f_{n}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be n𝑛nitalic_n such random maps for a shared 𝒬𝒬\mathscr{Q}script_Q and 𝒳𝒳\mathcal{X}caligraphic_X. For example, fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT may be the neural model parameterized by the fixed context augmentation aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the retrieval vector database Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the fine-tuned model with weights W+ΔWi𝑊Δsubscript𝑊𝑖W+\Delta W_{i}italic_W + roman_Δ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or the model with decoding function δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Given a collection of queries Q={q1,,qm}𝑄subscript𝑞1subscript𝑞𝑚Q=\{q_{1},\ldots,q_{m}\}italic_Q = { italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } we let Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT denote the distribution on the response space for fixed fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and qjsubscript𝑞𝑗q_{j}italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and assume model responses fi(qj)1,,fi(qj)r=xij1,,xijrformulae-sequencesubscript𝑓𝑖subscriptsubscript𝑞𝑗1subscript𝑓𝑖subscriptsubscript𝑞𝑗𝑟subscript𝑥𝑖𝑗1subscript𝑥𝑖𝑗𝑟f_{i}(q_{j})_{1},\ldots,f_{i}(q_{j})_{r}=x_{ij1},\ldots,x_{ijr}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i italic_j 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_i italic_j italic_r end_POSTSUBSCRIPT are i.i.d.formulae-sequence𝑖𝑖𝑑i.i.d.italic_i . italic_i . italic_d . realizations from Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT.

Our goal is to induce a representation of the neural models that captures information relevant to model-level statistical inference. We abuse notation by letting x𝑥xitalic_x represent a realization of a random variable and the random variable itself and include necessary context when there is potential for confusion.

We let g:𝒳p:𝑔𝒳superscript𝑝g:\mathcal{X}\to\mathbb{R}^{p}italic_g : caligraphic_X → blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT be an embedding function that maps a response to a p𝑝pitalic_p-dimensional real-valued vector and, for notational convenience, let x¯ij=1rk=1rg(xijk)subscript¯𝑥𝑖𝑗1𝑟superscriptsubscript𝑘1𝑟𝑔subscript𝑥𝑖𝑗𝑘\bar{x}_{ij}=\frac{1}{r}\sum_{k=1}^{r}g(x_{ijk})over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT ) be the average of the embedded responses. We use X¯im×psubscript¯𝑋𝑖superscript𝑚𝑝\bar{X}_{i}\in\mathbb{R}^{m\times p}over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_p end_POSTSUPERSCRIPT to denote the matrix where the j𝑗jitalic_jth row is x¯ijsubscript¯𝑥𝑖𝑗\bar{x}_{ij}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and view this matrix as a representation of model fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with respect to Q𝑄Qitalic_Q and g𝑔gitalic_g.

We define D𝐷Ditalic_D as the n×n𝑛𝑛n\times nitalic_n × italic_n pairwise distance matrix with entries

Dii=1mX¯iX¯iF.subscript𝐷𝑖superscript𝑖1𝑚subscriptnormsubscript¯𝑋𝑖subscript¯𝑋superscript𝑖𝐹\displaystyle D_{ii^{\prime}}=\frac{1}{m}\big{|}\big{|}\bar{X}_{i}-\bar{X}_{i^% {\prime}}\big{|}\big{|}_{F}.italic_D start_POSTSUBSCRIPT italic_i italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG | | over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT . (1)

Each entry Diisubscript𝐷𝑖superscript𝑖D_{ii^{\prime}}italic_D start_POSTSUBSCRIPT italic_i italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the average difference between the average embedded response between model fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and model fisubscript𝑓superscript𝑖f_{i^{\prime}}italic_f start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and captures the difference in average model behavior with respect to Q𝑄Qitalic_Q and g𝑔gitalic_g. With the form described by Eq. (1), D𝐷Ditalic_D is a Euclidean distance matrix and the multidimensional scaling (MDS) of D𝐷Ditalic_D yields d𝑑ditalic_d-dimensional Euclidean representations of the matrices X¯isubscript¯𝑋𝑖\bar{X}_{i}over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (Torgerson, 1952) and, thus, Euclidean representations of the models f1,,fnsubscript𝑓1subscript𝑓𝑛f_{1},\ldots,f_{n}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with respect to Q𝑄Qitalic_Q and g𝑔gitalic_g. Letting ψ^:=MDS(D)assign^𝜓MDS𝐷\widehat{\psi}:=\text{MDS}(D)over^ start_ARG italic_ψ end_ARG := MDS ( italic_D ), we refer to the i𝑖iitalic_ith row of ψ^idsubscript^𝜓𝑖superscript𝑑\widehat{\psi}_{i}\in\mathbb{R}^{d}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT as the perspective of model fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the entire configuration ψ^^𝜓\widehat{\psi}over^ start_ARG italic_ψ end_ARG as the data kernel perspective space (DKPS).

2.1 Analytical properties of the data kernel perspective space

While the perspectives are Euclidean objects, it is not possible to comment on their properties without imposing constraints on the Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. To facilitate analysis we assume each query is from the query distribution G𝐺Gitalic_G with support 𝒬𝒬\mathscr{Q}script_Q and let μim×psubscript𝜇𝑖superscript𝑚𝑝\mu_{i}\in\mathbb{R}^{m\times p}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_p end_POSTSUPERSCRIPT denote the matrix whose ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT row is 𝔼xFij[g(x)]subscript𝔼similar-to𝑥subscript𝐹𝑖𝑗delimited-[]𝑔𝑥\mathbb{E}_{x\sim F_{ij}}\left[g(x)\right]blackboard_E start_POSTSUBSCRIPT italic_x ∼ italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_g ( italic_x ) ]. We let ΔΔ\Deltaroman_Δ be such that Δii=1mμiμiFsubscriptΔ𝑖superscript𝑖1𝑚subscriptnormsubscript𝜇𝑖subscript𝜇superscript𝑖𝐹\Delta_{ii^{\prime}}=\frac{1}{m}||\mu_{i}-\mu_{i^{\prime}}||_{F}roman_Δ start_POSTSUBSCRIPT italic_i italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG | | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. For settings where m𝑚mitalic_m grows and qjiidGsubscript𝑞𝑗𝑖𝑖𝑑similar-to𝐺q_{j}\overset{iid}{\sim}Gitalic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_OVERACCENT italic_i italic_i italic_d end_OVERACCENT start_ARG ∼ end_ARG italic_G, we refer to μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as μi(G)subscript𝜇𝑖𝐺\mu_{i}(G)italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_G ) and let Δ(G)superscriptΔ𝐺\Delta^{*}(G)roman_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) be such that Δii(G)=limm1mμiμiFsubscriptsuperscriptΔ𝑖superscript𝑖𝐺subscript𝑚1𝑚subscriptnormsubscript𝜇𝑖subscript𝜇superscript𝑖𝐹\Delta^{*}_{ii^{\prime}}(G)=\lim_{m\to\infty}\frac{1}{m}||\mu_{i}-\mu_{i^{% \prime}}||_{F}roman_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_G ) = roman_lim start_POSTSUBSCRIPT italic_m → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG | | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. Finally, we assume g𝑔gitalic_g is bounded.111Boundedness is typically satisfied in practice for well-defined g𝑔gitalic_g: for language models, an element of 𝒳𝒳\mathcal{X}caligraphic_X is a finite sequence from a finite vocabulary; for text-to-image models 𝒳={0,,255}3𝒳superscript02553\mathcal{X}=\{0,\ldots,255\}^{3}caligraphic_X = { 0 , … , 255 } start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT is finite.

With q1,,qmiidGsubscript𝑞1subscript𝑞𝑚𝑖𝑖𝑑similar-to𝐺q_{1},\ldots,q_{m}\overset{iid}{\sim}Gitalic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_OVERACCENT italic_i italic_i italic_d end_OVERACCENT start_ARG ∼ end_ARG italic_G, and xij1,,xijriidFijsubscript𝑥𝑖𝑗1subscript𝑥𝑖𝑗𝑟𝑖𝑖𝑑similar-tosubscript𝐹𝑖𝑗x_{ij1},\ldots,x_{ijr}\overset{iid}{\sim}F_{ij}italic_x start_POSTSUBSCRIPT italic_i italic_j 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_i italic_j italic_r end_POSTSUBSCRIPT start_OVERACCENT italic_i italic_i italic_d end_OVERACCENT start_ARG ∼ end_ARG italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, we have

Dii=1mX¯iX¯iΔii(G)subscript𝐷𝑖superscript𝑖1𝑚delimited-∥∥subscript¯𝑋𝑖subscript¯𝑋superscript𝑖subscriptsuperscriptΔ𝑖superscript𝑖𝐺D_{ii^{\prime}}=\frac{1}{m}\left\lVert\bar{X}_{i}-\bar{X}_{i^{\prime}}\right% \rVert\to\Delta^{*}_{ii^{\prime}}(G)italic_D start_POSTSUBSCRIPT italic_i italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∥ over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ → roman_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_G ) (2)

with high probability as m,r𝑚𝑟m,r\to\inftyitalic_m , italic_r → ∞ for all (i,i){1,,n}×{1,,n}𝑖superscript𝑖1𝑛1𝑛(i,i^{\prime})\in\{1,\ldots,n\}\times\{1,\ldots,n\}( italic_i , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ { 1 , … , italic_n } × { 1 , … , italic_n }. Under technical assumptions described in Acharyya et al. (2024), there exists ψ(G):=MDS(Δ(G))assign𝜓𝐺MDSsuperscriptΔ𝐺\psi(G):=\text{MDS}(\Delta^{*}(G))italic_ψ ( italic_G ) := MDS ( roman_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) ) such that ψ^ψ(G)^𝜓𝜓𝐺\widehat{\psi}\to\psi(G)over^ start_ARG italic_ψ end_ARG → italic_ψ ( italic_G ). The configuration ψ(G)𝜓𝐺\psi(G)italic_ψ ( italic_G ) captures the true geometry of the mean descrepancies of the model responses with respect to queries from G𝐺Gitalic_G. We drop the argument of ψ𝜓\psiitalic_ψ when the distribution of the queries is clear. In a more general setting where the models are assumed to be i.i.d.formulae-sequence𝑖𝑖𝑑i.i.d.italic_i . italic_i . italic_d . realizations from a model distribution Fmodelsubscript𝐹𝑚𝑜𝑑𝑒𝑙F_{model}italic_F start_POSTSUBSCRIPT italic_m italic_o italic_d italic_e italic_l end_POSTSUBSCRIPT, the distance matrix D𝐷Ditalic_D converges to ΔsuperscriptΔ\Delta^{*}roman_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and, under technical assumptions, there exists ψ𝜓\psiitalic_ψ such that ψ^ψ^𝜓𝜓\widehat{\psi}\to\psiover^ start_ARG italic_ψ end_ARG → italic_ψ as m,r𝑚𝑟m,r\to\inftyitalic_m , italic_r → ∞, for all n𝑛nitalic_n (Acharyya et al., 2024).

In settings where the query space contains only a single element or where each Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is a point mass on xijsubscript𝑥𝑖𝑗x_{ij}italic_x start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, ΔiisubscriptΔ𝑖superscript𝑖\Delta_{ii^{\prime}}roman_Δ start_POSTSUBSCRIPT italic_i italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is exactly the maximum mean discrepancy (Gretton et al., 2012) of the distributions Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and Fijsubscript𝐹superscript𝑖𝑗F_{i^{\prime}j}italic_F start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_j end_POSTSUBSCRIPT under a linear map applied to g(x)𝑔𝑥g(x)italic_g ( italic_x ). For settings where |𝒬|>1𝒬1|\mathscr{Q}|>1| script_Q | > 1 and the Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are not all point masses, the analysis is more complicated since both Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and x¯ijsubscript¯𝑥𝑖𝑗\bar{x}_{ij}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are random variables.

2.2 Statistical inference in the data kernel perspective space

Consider the classical statistical learning problem (Hastie et al., 2009, Chapter 2): Given training data 𝒯n={(X1,y1),,(Xn,yn)}subscript𝒯𝑛subscript𝑋1subscript𝑦1subscript𝑋𝑛subscript𝑦𝑛\mathcal{T}_{n}=\{(X_{1},y_{1}),\ldots,(X_{n},y_{n})\}caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } assumed to be i.i.d.formulae-sequence𝑖𝑖𝑑i.i.d.italic_i . italic_i . italic_d . realizations from the joint distribution PXYsubscript𝑃𝑋𝑌P_{XY}italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT, select the decision function h:𝒳𝒴:𝒳𝒴h:\mathcal{X}\to\mathcal{Y}italic_h : caligraphic_X → caligraphic_Y that minimizes the expected value of a loss function \ellroman_ℓ with respect to PXYsubscript𝑃𝑋𝑌P_{XY}italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT within a class of decision functions \mathcal{H}caligraphic_H for a test observation X𝑋Xitalic_X assumed to be an independent realization from the marginal distribution PXsubscript𝑃𝑋P_{X}italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT. Or, with (PXY,h):=𝔼PXY[(h(X),y)]assignsubscriptsubscript𝑃𝑋𝑌subscript𝔼subscript𝑃𝑋𝑌delimited-[]𝑋𝑦\mathcal{R}_{\ell}(P_{XY},h):=\mathbb{E}_{P_{XY}}\left[\ell(h(X),y)\right]caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT , italic_h ) := blackboard_E start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_ℓ ( italic_h ( italic_X ) , italic_y ) ], select hhitalic_h such that

hargminh(PXY,h).subscriptargminsubscriptsubscript𝑃𝑋𝑌\displaystyle h\in\operatornamewithlimits{argmin}_{h\in\mathcal{H}}\mathcal{R}% _{\ell}(P_{XY},h).italic_h ∈ roman_argmin start_POSTSUBSCRIPT italic_h ∈ caligraphic_H end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT , italic_h ) .

We let (PXY,)superscriptsubscriptsubscript𝑃𝑋𝑌\mathcal{R}_{\ell}^{*}(P_{XY},\mathcal{H})caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT , caligraphic_H ) denote the expected loss of hhitalic_h in the argmin and let (PXY)subscriptsuperscriptsubscript𝑃𝑋𝑌\mathcal{R}^{*}_{\ell}(P_{XY})caligraphic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT ) denote the minimum (PXY,)superscriptsubscriptsubscript𝑃𝑋𝑌\mathcal{R}_{\ell}^{*}(P_{XY},\mathcal{H})caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT , caligraphic_H ) over all \mathcal{H}caligraphic_H.

The joint distribution PXYsubscript𝑃𝑋𝑌P_{XY}italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT is often unavailable and the decision function is selected based on 𝒯nsubscript𝒯𝑛\mathcal{T}_{n}caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. We let h(;𝒯n)subscript𝒯𝑛h(\;\cdot\;;\mathcal{T}_{n})italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) denote such a decision function and say the sequence of decision functions (h(;𝒯1),,h(;𝒯n))subscript𝒯1subscript𝒯𝑛\left(h(\;\cdot\;;\mathcal{T}_{1}),\ldots,h(\;\cdot\;;\mathcal{T}_{n})\right)( italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) is consistent for PXYsubscript𝑃𝑋𝑌P_{XY}italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT with respect to \mathcal{H}caligraphic_H if

Pr(|(PXY,h(;𝒯n))(PXY,)|>ϵ)0𝑃𝑟subscriptsubscript𝑃𝑋𝑌subscript𝒯𝑛subscriptsuperscriptsubscript𝑃𝑋𝑌italic-ϵ0\displaystyle Pr\big{(}\;\big{|}\mathcal{R}_{\ell}(P_{XY},h(\;\cdot\;;\mathcal% {T}_{n}))-\mathcal{R}^{*}_{\ell}(P_{XY},\mathcal{H})\big{|}>\epsilon\big{)}\to 0italic_P italic_r ( | caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) - caligraphic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT , caligraphic_H ) | > italic_ϵ ) → 0

as n𝑛n\to\inftyitalic_n → ∞.

In our setting we observe 𝒯^n={(ψ^1,y1),,(ψ^n,yn)}subscript^𝒯𝑛subscript^𝜓1subscript𝑦1subscript^𝜓𝑛subscript𝑦𝑛\widehat{\mathcal{T}}_{n}=\{(\widehat{\psi}_{1},y_{1}),\ldots,(\widehat{\psi}_% {n},y_{n})\}over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } and the test observation ψ^n+1subscript^𝜓𝑛1\widehat{\psi}_{n+1}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT where ψ^isubscript^𝜓𝑖\widehat{\psi}_{i}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is an errorful observation of the true-but-unknown ψisubscript𝜓𝑖\psi_{i}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the model-level covariate corresponding to fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Two important extensions of the convergence results described in (Acharyya et al., 2024) are the following theorems:

Theorem 1.

Under technical assumptions described in Appendix A,

(PψY,h(;𝒯^n))(PψY,h(;𝒯n))subscriptsubscript𝑃𝜓𝑌subscript^𝒯𝑛subscriptsubscript𝑃𝜓𝑌subscript𝒯𝑛\displaystyle\mathcal{R}_{\ell}(P_{\psi Y},h(\;\cdot\;;\widehat{\mathcal{T}}_{% n}))\to\mathcal{R}_{\ell}(P_{\psi Y},h(\;\cdot\;;\mathcal{T}_{n}))caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) → caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) )

as m,r𝑚𝑟m,r\to\inftyitalic_m , italic_r → ∞;

and

Theorem 2.

Under technical assumptions described in Appendix A, if (h(;𝒯1),,h(;𝒯n))subscript𝒯1subscript𝒯𝑛(h(\;\cdot\;;\mathcal{T}_{1}),\ldots,h(\;\cdot\;;\mathcal{T}_{n}))( italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) is consistent for PψYsubscript𝑃𝜓𝑌P_{\psi Y}italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT with respect to \mathcal{H}caligraphic_H then (h(;𝒯^1),,h(;𝒯^n))subscript^𝒯1subscript^𝒯𝑛(h(\;\cdot\;;\widehat{\mathcal{T}}_{1}),\ldots,h(\;\cdot\;;\widehat{\mathcal{T% }}_{n}))( italic_h ( ⋅ ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_h ( ⋅ ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) is consistent for PψYsubscript𝑃𝜓𝑌P_{\psi Y}italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT with respect to \mathcal{H}caligraphic_H as n,m,r𝑛𝑚𝑟n,m,r\to\inftyitalic_n , italic_m , italic_r → ∞.

The proofs of Theorems 1 and 2 are provided in Appendix A.

These results do not say that with enough models, queries, and responses that inference using the ψ^^𝜓\widehat{\psi}over^ start_ARG italic_ψ end_ARG is “optimal”. Instead, Theorem 1 says that with enough queries and responses for each query that the performance of a decision function that uses 𝒯^nsubscript^𝒯𝑛\widehat{\mathcal{T}}_{n}over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT will be “close” to the performance of the decision function that uses 𝒯nsubscript𝒯𝑛\mathcal{T}_{n}caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT – regardless of how far away (PψY,h(;𝒯n))subscriptsubscript𝑃𝜓𝑌subscript𝒯𝑛\mathcal{R}_{\ell}(P_{\psi Y},h(\;\cdot\;;\mathcal{T}_{n}))caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) is from (PψY,)superscriptsubscriptsubscript𝑃𝜓𝑌\mathcal{R}_{\ell}^{*}(P_{\psi Y},\mathcal{H})caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , caligraphic_H ).

Similarly, Theorem 2 says that if there is a sequence of decision functions known to be consistent with respect to \mathcal{H}caligraphic_H when using 𝒯nsubscript𝒯𝑛\mathcal{T}_{n}caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, then the same sequence of decision functions is consistent with respect to \mathcal{H}caligraphic_H when using 𝒯^nsubscript^𝒯𝑛\widehat{\mathcal{T}}_{n}over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT where m𝑚mitalic_m and r𝑟ritalic_r are allowed to grow. The result does not provide instructions on how to construct such a sequence. Nor does it provide insight as to how close (PψY,)superscriptsubscriptsubscript𝑃𝜓𝑌\mathcal{R}_{\ell}^{*}(P_{\psi Y},\mathcal{H})caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , caligraphic_H ) is to (PψY)superscriptsubscriptsubscript𝑃𝜓𝑌\mathcal{R}_{\ell}^{*}(P_{\psi Y})caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT ). Most importantly – and most unique to our setting – Theorem 2 does not provide guidance for how to select a query distribution G𝐺Gitalic_G such that |(PfY)(PψY)|superscriptsubscriptsubscript𝑃𝑓𝑌superscriptsubscriptsubscript𝑃𝜓𝑌|\mathcal{R}_{\ell}^{*}(P_{fY})-\mathcal{R}_{\ell}^{*}(P_{\psi Y})|| caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_f italic_Y end_POSTSUBSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT ) | is minimal.

While we do not provide theoretical details as to which situations optimal model performance is possible, if yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be described as a function of the average of model responses for a set of queries from an optimal Gsuperscript𝐺G^{*}italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT then ψ(G)𝜓superscript𝐺\psi(G^{*})italic_ψ ( italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) will be such that (PfY)=(PψY)superscriptsubscriptsubscript𝑃𝑓𝑌superscriptsubscriptsubscript𝑃𝜓𝑌\mathcal{R}_{\ell}^{*}(P_{fY})=\mathcal{R}_{\ell}^{*}(P_{\psi Y})caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_f italic_Y end_POSTSUBSCRIPT ) = caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT ) (Devroye et al., 2013, Chapter 32). Conversely, it is unclear how effective decision functions constructed with ψ(G)𝜓superscript𝐺\psi(G^{*})italic_ψ ( italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) for model-level covariates that cannot be described as a function of the average response will behave. Further, it is unclear how “close” a query distribution G𝐺Gitalic_G needs to be to Gsuperscript𝐺G^{*}italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to induce an informative representation of the models with respect to yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We demonstrate a potential trade-off between the proximity of G𝐺Gitalic_G to Gsuperscript𝐺G^{*}italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and the number of queries necessary to tease out model differences in one of the experimental settings below.

Lastly, we note that our setting, methodology, and theorems are general to classes of generative models in which there exists a well-defined embedding function from the output space to psuperscript𝑝\mathbb{R}^{p}blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, though our experimental settings focus entirely on collections of language models.

2.3 An illustrative example – “Was RA Fisher great?”

Refer to caption
Refer to caption
Figure 1: Left. The 2-d Data Kernel Perspective Space (DKPS) and covariate surface for a collection of 550 models parameterized by fixed augmentations. Right. The performance of the 1-nearest neighbor regressor in DKPS for predicting the probability that an unlabeled model responds “yes” to “Was RA Fisher great?”.

The first model-level inference task we study is a toy example where the task is to predict the probability that a model will respond “yes” to the question “Was RA Fisher great?”. The question is chosen due to its subjectiveness – there is no correct answer to the question of a person’s greatness – and its duality – Ronald A. Fisher is considered one of the most influential statisticians in history and is considered an advocate of the 20th century Eugenics movement (Bodmer et al., 2021).

We use a 4-bit version of Meta’s LLaMA-2-7B-Chat (Touvron et al., 2023) as a base model and consider a collection of models parameterized by fixed context augmentations. Each augmentation aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT contains information related to RA Fisher’s statistical achievements (i.e., aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = “RA Fisher pioneered the principles of the design of experiments”) or to his involvement in the eugenics movement (i.e., aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = “RA Fisher’s view on eugenics were primarily based on anecdotes and prejudice.”) and is prepended to every query. The covariate corresponding to a given model is calculated by prompting the base model with the appropriately formatted prompt “Give a precise answer to the question based on the context. Don’t be verbose. The answer should be either a yes or a no. aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Was RA Fisher great?” until there are 100 valid responses. We let yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the average number of “yes”es.

To induce a DKPS for this task, we consider queries sampled from OpenAI’s ChatGPT with the prompt “Provide 100 questions related to RA Fisher.”. For a given query qjsubscript𝑞𝑗q_{j}italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT we prompt the base model with the appropriately formatted prompt “aiqjsubscript𝑎𝑖subscript𝑞𝑗a_{i}\;q_{j}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT” and fix r=1𝑟1r=1italic_r = 1. The left figure of Figure 1 is a 3-d figure where the first two dimensions are the DKPS of the n=550𝑛550n=550italic_n = 550 (275275275275 statistics augmentations, 275275275275 eugenics augmentations) models induced with m=100𝑚100m=100italic_m = 100 queries. Model responses are embedded by averaging the per-token last layer activation of the base model. The third dimension is an interpolated yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT surface with a linear kernel (Du Toit, 2008). The first dimension of the DKPS is clearly capable of distinguishing between models adorned with “statistics” augmentations and models adorned with “eugenics” augmentations. The shape of the interpolated covariate surface is highly correlated with this feature. A description of the augmentations that parameterize the models and the queries used to induce the DKPS is provided in Appendix B.1.

The right figure of Figure 1 shows the performance of a 1-nearest neighbor (1-NN) regressor in DKPS for a varying number of labeled models and a varying number of queries. The DKPS is induced with n𝑛nitalic_n models and the regressor is trained with n1𝑛1n-1italic_n - 1 of the model-level covariates. The mean squared error reported is the error of the 1-NN regressor for predicting the “left out” model’s covariate (±plus-or-minus\pm± 1 S.E.). As Theorems 1 and 2 suggest, the performance of the regressor is dependent on both the number of models and the number of queries: the more models and the more queries the better. The scale of the impact of more models and more queries, however, depends on its counterpart. The amount of models does not have a large impact on predictive performance if the number of queries is small. The amount of queries has a large impact on performance regardless of the number of models.

3 Experiments

We next consider two experiments with more realistic model-level covariates: predicting whether or not a model has had access to sensitive information and predicting model safety. For all experiments we fix r=1𝑟1r=1italic_r = 1 as suggested by the empirical rates of convergence of the perspectives described in Acharyya et al. (2024). We use the MDS implementation from Graspologic (Chung et al., 2019) throughout. We use the profile likelihood of the singular values of D𝐷Ditalic_D to determine the dimensionality of the DKPS (Zhu & Ghodsi, 2006) and note that this may be larger than two. We show only the first two dimensions for visualization purposes.

3.1 Has a model seen sensitive information?

Refer to caption
Figure 2: Left. The 2-d data kernel perspective space (DKPS) of 50 fine-tuned models – 25 with “sensitive” data in the fine-tuning data mixture (red), 25 with none (black) – induced by an evaluation set containing 10 prompts relevant to the sensitive data. For models trained on sensitive data, color intensity correlates with amount of sensitive data in the training mixture. Center. The 2-d DKPS of the models induced by a set of 10 prompts “orthogonal” to the difference between models with sensitive data in their fine-tuning data mixture and models with no sensitive data in their fine-tuning data mixture. Right. Classification performance as a function of number of labeled models and size of evaluation set for both sensitive and orthogonal evaluation sets.

Modern language models are trained with trillions of tokens of text (Touvron et al., 2023). For proprietary models such as OpenAI’s GPT series or Anthropic’s Claude series, the exact sources of the training mixtures are unknown and – given the models’ propensities to produce content that is strikingly similar to copyrighted content (Henderson et al., 2023) – its curation and use is ethically questionable (Lemley & Casey, 2020). Further, the training mixture of some models may include sensitive informative such as personal information, trade secrets, or government-classified information that should never be presented to the end user. Developing classifiers to identify models that are either trained on sensitive or copyrighted information or are likely to produce sensitive or copyrighted information is thus paramount to uphold the rights of the stakeholders of the original content.

To investigate the utility of DKPS for this purpose, we again use a 4-bit version of LLaMA-2-7B-Chat as a base model and train 50 different LoRA adapters with different subsets of the Yahoo! Answers (YA) dataset (Zhang et al., 2015). The YA dataset consists of data from 10 topics. We consider data from the topic “Politics & Government” to be “sensitive” information, data from the topics “Society & Culture”, “Science & Mathematics”, “Health”, “Education & Reference”, “Computers & Internet”, and “Sports” to be “not-sensitive”, and data from the remaining topics to be “orthogonal”. We trained each of the 50 adapters with 500 question-answer pairs for 3 epochs with a learning rate of 5×1055superscript1055\times 10^{-5}5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT and a batch size of 8. Each adapter is rank 8, has a scaling factor of 32, targets all attention layers, does not have bias terms, and has a dropout probability of 0.05 when training. For 25 of the adapters, the adapter training mixture consisted wholly of randomly selected not-sensitive data. For the remaining 25 adapters, the adapter training mixture consisted of pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT randomly selected sensitive data and 500pi500subscript𝑝𝑖500-p_{i}500 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT randomly selected not-sensitive data. We let yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the indicator of whether or not the adapter’s training mixture contained any sensitive data.

We study classification of the models in two different DKPS: one induced by a set of randomly selected sensitive queries, one induced by a set of randomly selected orthogonal queries. Both DKPS use the open source embedding model nomic-embed-v1.5 (Nussbaum et al., 2024). A 2-d DKPS induced by m=10𝑚10m=10italic_m = 10 randomly selected sensitive queries is shown on the left of Figure 2. In this space the models are separated by their label and the models that have seen more sensitive information are generally farther from the class-boundary. A 2-d DKPS induced by m=10𝑚10m=10italic_m = 10 orthogonal queries is shown in the center of Figure 2. Here, by contrast, the models are not easily separable by their label.

The right figure of Figure 2 shows the classification performance of Fisher’s Linear Discriminant trained on varying amounts of DKPS representations of models for the two query distributions. For a given n𝑛nitalic_n and m𝑚mitalic_m, we induce the DKPS with all

Refer to caption
Figure 3: Top. The 1-d FLD projection of the models from a DKPS induced by queries from the sensitive topic versus the amount of sensitive data the adapter had access to during training. Bottom. The same but for a DKPS induced by queries from orthogonal topics.

models and a randomly selected query set. We report the expected risk of the classifier trained on a random subset of the models for the remaining models. We observe similar phenomena to the “Was RA Fisher great?” experiment in that both the amount of models and the amount of queries impact performance. For a fixed m𝑚mitalic_m, the expected risk curves also highlight the observed difference in separability of the models in the two DKPS, with the expected risk with sensitive queries being significantly lower than the expected risk with orthogonal queries. Indeed, the expected risk with m=10𝑚10m=10italic_m = 10 sensitive queries is similar to the expected risk with m=50𝑚50m=50italic_m = 50 orthogonal queries.

Lastly, we highlight the 1-d projection learned by Fisher’s linear discriminant for m=10𝑚10m=10italic_m = 10 for both DKPS in Figure 3. As can be seen in the top figure of Figure 3, the projection of the models is correlated with the proportion of sensitive data that the model has had access to when the DKPS is induced with sensitive queries. While the linear goodness-of-fit is not large (R2=0.37superscript𝑅20.37R^{2}=0.37italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.37), the correlation is statistically significant per the hypothesis test using Kendall’s rank correlation coefficient (τ=0.42,p<0.01formulae-sequence𝜏0.42𝑝0.01\tau=0.42,p<0.01italic_τ = 0.42 , italic_p < 0.01). The projection of the models when the DKPS is induced with orthogonal queries has a line of best fit with a negligible slope and a much smaller Kendall’s rank correlation coefficient (τ=0.08𝜏0.08\tau=0.08italic_τ = 0.08). The second of which results in a p𝑝pitalic_p value of 0.570.570.570.57 – meaning we fail to reject the null that the amount of sensitive information the model has had access to is correlated with the learned 1-d projection.

Throughout this experiment we use the term “orthogonal” for queries from topics that are not used when fine-tuning the models under study. The term is chosen because, naïvely, queries from these topics should not elicit different responses from models trained on sensitive data and models trained on not-sensitive data. In reality, the sensitive data and the not-sensitive data have different underlying token distributions and this difference will cause systematic differences in model responses after fine-tuning even for topics that are irrelevant a priori. Further, it is likely that the documents in the “orthogonal” topics share some content commonalities with documents in the sensitive and not-sensitive topics. We see this phenomenon in the classification results in the right figure of Figure 2 where the linear classifier is able to perform better than chance with enough “orthogonal” queries.

3.2 How safe is a model?

Refer to caption
Figure 4: Left. A graph where each node is a model and an edge between two models exists if model i𝑖iitalic_i is fine-tuned from model isuperscript𝑖i^{\prime}italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT or if model isuperscript𝑖i^{\prime}italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTs weights were used in a model-merge that resulted in model i𝑖iitalic_i, etc. Right, top. The two-dimensional data kernel perspective spaces (DKPS) corresponding to the toxicity and bias prediction tasks. Dot size is proportional to model toxicity or bias. Right, bottom. Relative performance of three regression techniques. Local predictions in DKPS are more effective than both global predictions and local predictions in model relationship space.

Model safety is one of the biggest concerns when deploying a language model in production. An unsafe model is prone to propagating harmful stereotypes (Ferrara, 2024), using toxic language (Wen et al., 2023), and misunderstanding the user’s intent (Ji et al., 2023) – all of which can adversely affect the user and their experience. Hence, developing techniques to understand how unsafe a model is an important aspect of the model-production pipeline. As with predicting if a model has had access to sensitive information above, we investigate using DKPS to predict model safety through the lens of model toxicity and model bias.

We consider a collection of 58 models. Each model in the collection is a base model, a fine-tuned version of a base model, or the result of weight-merging various other models in the collection. We view this collection of models as a graph where each model is a node and an undirected edge exists between nodes if one of the models is a fine-tuned version of the other or if one of the models is the result of a model merge including the other. The graph representing the collection of models is shown on the left of Figure 4. The list of models under study is provided in Appendix B.2.

For each model in the collection we consider two covariates: model toxicity and model bias. To determine a model’s toxicity, we prompt each model with a collection of queries from the Real Toxicity Prompts (RTP)

Refer to caption
Figure 5: The relative time improvement (larger is better) when using local predictions in DKPS instead of calculating the ground-truth model-level covariate using HuggingFace’s API.

(Gehman et al., 2020) dataset and subsequently evaluate the toxicity of each response with the neural model roberta-hate-speech-dynabench-r4 (Vidgen et al., 2021). The model-level toxicity is simply the average response toxicity. An analagous process is used to define a model’s bias with the dataset Bias in Open-ended Language Generation Dataset (BOLD) (Dhamala et al., 2021) and the regard model (Sheng et al., 2019).

We induce the perspective spaces for the toxicity and bias tasks with randomly sampled prompts from RTP and BOLD, respectively, and the embedding model nomic-embed-v1.5. The 2-d DKPS – induced with m=2000𝑚2000m=2000italic_m = 2000 queries – for both tasks is shown in the top right of Figure 4. The size of the dot is correlated with the model’s covariate. We highlight an example unlabeled model (green) and its corresponding neighbor in DKPS (red) and neighbors in graph-space (blue) in Figure 4. Importantly, the relative position of a model in the respective DKPS is predictive of the model’s toxicity and the model’s bias.

We quantify this observation by evaluating regressors for predicting the model-level toxicity and bias of an unlabeled model. We consider three different regressors for these tasks. The first is a constant equal to the average covariate of the labeled models (i.e., the “global mean”). The second uses the average covariate of models who share an edge with the unlabaled model (i.e., “1-NN (graph)”). The third uses the covariate of the nearest neighbor in DKPS (i.e., “1-NN (DKPS)”). For 1-NN (DKPS) we consider varying amounts of randomly sampled queries from RTP and BOLD to induce the DKPS. We use the global mean as a standard and report the relative absolute error of the three methods in the bottom right of Figure 4. The reported performance of the 1-NN (DKPS) regressor is the average of the smaller of 200200200200 and 2000/m2000𝑚2000/m2000 / italic_m random samples of m𝑚mitalic_m queries. Notably, given enough queries local predictions in DKPS outperform local predictions in the graph space and predictions using the global mean.

In addition to reporting the relative performance, we report the time it takes to use the DKPS machinery to predict Mistralv1.0’s (Jiang et al., 2023) toxicity and bias relative to using the evaluation model through HuggingFace’s API (Wolf et al., 2019). The time we report for the HuggingFace API is the time it takes to calculate the “ground truth” used to calculate the performance of the regressors above and is the total time required for Mistralv1.0 to produce responses and for the evaluation model to produce scores for 2000 queries. The time we report for 1-NN (DKPS) includes the time it takes for Mistralv1.0 to produce m𝑚mitalic_m responses, the time it takes nomic-embed-v1.5 to embed the responses, the time it takes to induce the DKPS, and the time it takes to train and use the nearest neighbor regressor. It does not include the time it takes to produce and evaluate the responses of the other models in the collection. The relative efficiency of DKPS, as seen in Figure 5, is approximately 1/m1𝑚1/m1 / italic_m. This relationship will hold for all model-level inference tasks where the covariate is proportional to the sum of a function of individual responses.

4 Discussion

We have demonstrated – both theoretically and empirically – that embedding-based representations of generative models can be used for various model-level inference tasks. While the results we presented show the potential of our approach, there are choices throughout the collection-of-models-to-covariate-prediction pipeline that can affect performance and implementation practicality.

As mentioned in Section 2.2, one major decision is the query distribution G𝐺Gitalic_G (or, more practically, the query set Q𝑄Qitalic_Q). In particular, the representations of the models that the query set induces may or may not be relevant to a particular inference task. We demonstrate this phenomenon in Section 3.1 where the queries from the “sensitive-only” query distribution can induce representations of similar discriminative ability as queries from the “orthogonal-only” query distribution with 1/5151/51 / 5 of the queries. We expect a similar but less dramatic effect within the distributions of “sensitive-only” queries and anticipate that curating an “optimal” set of queries for a given g𝑔gitalic_g and for a fixed |Q|𝑄|Q|| italic_Q | will soon be a highly active research area.

Another decision is the distance function used to define D𝐷Ditalic_D. The MDS of the distance matrix studied herein (Eq. (1)) produces representations of the models that are consistent for objects that capture the true mean discrepancy geometry of the model responses. For model-level covariates that cannot naturally be described as a function of mean discrepancy geometry, we do not expect that information-theoretic optimal performance when using ψ^(G)^𝜓𝐺\widehat{\psi}(G)over^ start_ARG italic_ψ end_ARG ( italic_G ) is possible for any n,m,𝑛𝑚n,m,italic_n , italic_m , and r𝑟ritalic_r. Instead, for representations of the models that can be used generally without information loss, it is necessary to replace the Frobenius norm of the differences of the average embedded response with a more expressive distance such as an extension of a distance defined directly on the cumulative distributions of responses or with a task-specific distance (Helm et al., 2020). Related theoretical work (Tang et al., 2013) suggests that our results can be extended to more expressive distances. We do not expect the naïve replacement of the Frobenius norm with a more expressive distance function to be universally better for model-level inference tasks in practice as there are likely computational and query set quality trade-offs to consider. As with the active curation of an optimal query set, we expect these trade-offs to be important future research topics.

In the experiments above we have access to n𝑛nitalic_n models and n<nsuperscript𝑛𝑛n^{\prime}<nitalic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < italic_n corresponding model-level covariates. We induce a DKPS with the entire collection of models and use the nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT labeled models to predict a label for the remaining nn𝑛superscript𝑛n-n^{\prime}italic_n - italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT models. In practice an unlabeled model may not be available when inducing the DKPS or, if n𝑛nitalic_n is large, it may be too expensive to induce a DKPS whenever a prediction for a new unlabeled model is required. Out-of-sample techniques (Bengio et al., 2003; Trosset & Priebe, 2008) can be used to meet these imposed constraints at the cost of a slight degradation in representation quality and, hence, inference performance.

Implementation of inference in DKPS in practice will require an upfront, one-time cost of generating responses from a subset of models in the collection and scoring their outputs. Since this is required to compare the models with respect to the covariates anyway we followed the timing paradigm presented in (Perlitz et al., 2023) and did not include this cost when comparing the methods in Figure 5. Once the models in the initial subset are scored we expect the relative time efficiency (and implied relative computational efficiency) of inference in DKPS to be worthwhile to practitioners.

We fixed r=1𝑟1r=1italic_r = 1 and let m𝑚mitalic_m grow throughout our experiments. For Theorems 1 and 2 to hold, both m𝑚mitalic_m and r𝑟ritalic_r must grow. In practice, the trade-off between getting responses for more queries or getting more responses for the same queries depends on the distributions Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. For example, if Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is a point mass then r>1𝑟1r>1italic_r > 1 for qjsubscript𝑞𝑗q_{j}italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is unnecessary. Conversely, if Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is a complicated distribution on psuperscript𝑝\mathbb{R}^{p}blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT then more responses are necessary to properly estimate it and, hence, to properly capture the difference between average model responses.

Lastly, we note that the data kernel perspective space could be used to extend the statistical Turing test framework presented in Helm et al. (2023) by comparing the conditional distributions of populations of humans and populations of machines in low-dimensional Euclidean space.

Acknowledgements

We would like to thank Avanti Athreya, Tianyi Chen, Ben Johnson, Michael Trosset, Tim Wang, Zekun Wang, and Weiwei Yang for providing helpful comments throughout the development of this manuscript.

References

  • Acharyya et al. (2024) Aranyak Acharyya, Michael W. Trosset, Carey E. Priebe, and Hayden S. Helm. Consistent estimation of generative model representations in the data kernel perspective space, 2024. URL https://arxiv.org/abs/2409.17308.
  • Bengio et al. (2003) Yoshua Bengio, Jean-françcois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Roux, and Marie Ouimet. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Advances in neural information processing systems, 16, 2003.
  • Bodmer et al. (2021) Walter Bodmer, RA Bailey, Brian Charlesworth, Adam Eyre-Walker, Vernon Farewell, Andrew Mead, and Stephen Senn. The outstanding scientist, ra fisher: his views on eugenics and race. Heredity, 126(4):565–576, 2021.
  • Chen et al. (2022) Guodong Chen, Hayden S Helm, Kate Lytvynets, Weiwei Yang, and Carey E Priebe. Mental state classification using multi-graph features. Frontiers in Human Neuroscience, 16:930291, 2022.
  • Chung et al. (2019) Jaewon Chung, Benjamin D. Pedigo, Eric W. Bridgeford, Bijan K. Varjavand, Hayden S. Helm, and Joshua T. Vogelstein. Graspy: Graph statistics in python. Journal of Machine Learning Research, 20(158):1–7, 2019. URL http://jmlr.org/papers/v20/19-490.html.
  • Dettmers et al. (2024) Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024.
  • Devroye et al. (2013) Luc Devroye, László Györfi, and Gábor Lugosi. A probabilistic theory of pattern recognition, volume 31. Springer Science & Business Media, 2013.
  • Dhamala et al. (2021) Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pp.  862–872, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3442188.3445924. URL https://doi.org/10.1145/3442188.3445924.
  • Du Toit (2008) Wilna Du Toit. Radial basis function interpolation. PhD thesis, Stellenbosch: Stellenbosch University, 2008.
  • Dubey et al. (2024) Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
  • Duderstadt et al. (2023) Brandon Duderstadt, Hayden S Helm, and Carey E Priebe. Comparing foundation models using data kernels. arXiv preprint arXiv:2305.05126, 2023.
  • Edge et al. (2024) Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024.
  • Ferrara (2024) Emilio Ferrara. Fairness and bias in artificial intelligence: A brief survey of sources, impacts, and mitigation strategies. Sci, 6(1), 2024. ISSN 2413-4155. doi: 10.3390/sci6010003. URL https://www.mdpi.com/2413-4155/6/1/3.
  • Gehman et al. (2020) Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A Smith. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462, 2020.
  • Gholami et al. (2022) Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision, pp.  291–326. Chapman and Hall/CRC, 2022.
  • Gretton et al. (2012) Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  • Hastie et al. (2009) Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  • Helm et al. (2023) Hayden Helm, Carey E. Priebe, and Weiwei Yang. A statistical turing test for generative models, 2023. URL https://arxiv.org/abs/2309.08913.
  • Helm et al. (2024) Hayden Helm, Brandon Duderstadt, Youngser Park, and Carey E Priebe. Tracking the perspectives of interacting language models. arXiv preprint arXiv:2406.11938, 2024.
  • Helm et al. (2020) Hayden S. Helm, Ronak D. Mehta, Brandon Duderstadt, Weiwei Yang, Christoper M. White, Ali Geisa, Joshua T. Vogelstein, and Carey E. Priebe. A partition-based similarity for classification distributions, 2020.
  • Helm et al. (2021) Hayden S Helm, Weiwei Yang, Sujeeth Bharadwaj, Kate Lytvynets, Oriana Riva, Christopher White, Ali Geisa, and Carey E Priebe. Inducing a hierarchy for multi-class classification problems. arXiv preprint arXiv:2102.10263, 2021.
  • Henderson et al. (2023) Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A Lemley, and Percy Liang. Foundation models and fair use. Journal of Machine Learning Research, 24(400):1–79, 2023.
  • Hokamp & Liu (2017) Chris Hokamp and Qun Liu. Lexically constrained decoding for sequence generation using grid beam search. arXiv preprint arXiv:1704.07138, 2017.
  • Hu et al. (2021) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  • Ji et al. (2023) Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, et al. Ai alignment: A comprehensive survey. arXiv preprint arXiv:2310.19852, 2023.
  • Jiang et al. (2023) Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  • Katz et al. (2024) Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo. Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382(2270):20230254, 2024.
  • Lemley & Casey (2020) Mark A Lemley and Bryan Casey. Fair learning. Tex. L. Rev., 99:743, 2020.
  • Lester et al. (2021) Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  • Lewis et al. (2020) Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  • Liang et al. (2022) Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022.
  • Matena & Raffel (2022) Michael S Matena and Colin A Raffel. Merging models with fisher-weighted averaging. Advances in Neural Information Processing Systems, 35:17703–17716, 2022.
  • Mikolov (2013) Tomas Mikolov. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  • Neelakantan et al. (2022) Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, et al. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005, 2022.
  • Ness et al. (2024) Robert Osazuwa Ness, Katie Matton, Hayden Helm, Sheng Zhang, Junaid Bajwa, Carey E. Priebe, and Eric Horvitz. Medfuzz: Exploring the robustness of large language models in medical question answering, 2024. URL https://arxiv.org/abs/2406.06573.
  • Nori et al. (2023) Harsha Nori, Nicholas King, Scott Mayer McKinney, Dean Carignan, and Eric Horvitz. Capabilities of gpt-4 on medical challenge problems, 2023. URL https://arxiv.org/abs/2303.13375.
  • Nussbaum et al. (2024) Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar. Nomic embed: Training a reproducible long context text embedder, 2024. URL https://arxiv.org/abs/2402.01613.
  • Patil et al. (2023) Rajvardhan Patil, Sorio Boit, Venkat Gudivada, and Jagadeesh Nandigam. A survey of text representation and embedding techniques in nlp. IEEE Access, 11:36120–36146, 2023. doi: 10.1109/ACCESS.2023.3266377.
  • Perlitz et al. (2023) Yotam Perlitz, Elron Bandel, Ariel Gera, Ofir Arviv, Liat Ein-Dor, Eyal Shnarch, Noam Slonim, Michal Shmueli-Scheuer, and Leshem Choshen. Efficient benchmarking (of language models). arXiv preprint arXiv:2308.11696, 2023.
  • Reimers (2019) N Reimers. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
  • Ruan et al. (2024) Yangjun Ruan, Chris J Maddison, and Tatsunori Hashimoto. Observational scaling laws and the predictability of language model performance. arXiv preprint arXiv:2405.10938, 2024.
  • Sekhon (2021) Senan Sekhon. A result on convergence of double sequences of measurable functions. arXiv preprint arXiv:2104.09819, 2021.
  • Sheng et al. (2019) Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326, 2019.
  • Tang et al. (2013) Minh Tang, Daniel L. Sussman, and Carey E. Priebe. Universally consistent vertex classification for latent positions graphs. The Annals of Statistics, 41(3):1406 – 1430, 2013. doi: 10.1214/13-AOS1112. URL https://doi.org/10.1214/13-AOS1112.
  • Torgerson (1952) Warren S Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–419, 1952.
  • Touvron et al. (2023) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  • Trosset & Priebe (2008) Michael W Trosset and Carey E Priebe. The out-of-sample problem for classical multidimensional scaling. Computational statistics & data analysis, 52(10):4635–4642, 2008.
  • Vidgen et al. (2021) Bertie Vidgen, Tristan Thrush, Zeerak Waseem, and Douwe Kiela. Learning from the worst: Dynamically generated datasets to improve online hate detection. In ACL, 2021.
  • Wang et al. (2020) Nian Wang, Robert J. Anderson, David G. Ashbrook, Vivek Gopalakrishnan, Youngser Park, Carey E. Priebe, Yi Qi, Rick Laoprasert, Joshua T. Vogelstein, Robert W. Williams, and G. Allan Johnson. Variability and heritability of mouse brain structure: Microscopic mri atlases and connectomes for diverse strains. NeuroImage, 222:117274, 2020. ISSN 1053-8119. doi: https://doi.org/10.1016/j.neuroimage.2020.117274. URL https://www.sciencedirect.com/science/article/pii/S1053811920307606.
  • Wen et al. (2023) Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, and Minlie Huang. Unveiling the implicit toxicity in large language models. arXiv preprint arXiv:2311.17391, 2023.
  • Wolf et al. (2019) Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
  • Zhang et al. (2015) Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf.
  • Zhu & Ghodsi (2006) Mu Zhu and Ali Ghodsi. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2):918–930, 2006.

Appendix A Proofs of Theorems 1 & 2

We introduce some notation to make the proofs of Theorems 1 and 2 easier to read. Bold letters (such as 𝐁𝐁\mathbf{B}bold_B or 𝝁𝝁\boldsymbol{\mu}bold_italic_μ) are used to represent vectors and matrices. Any vector by default is a column vector. For a matrix 𝐁𝐁\mathbf{B}bold_B, the j𝑗jitalic_j-th row is denoted by (𝐁)jsubscript𝐁𝑗(\mathbf{B})_{j\cdot}( bold_B ) start_POSTSUBSCRIPT italic_j ⋅ end_POSTSUBSCRIPT, and the (i,i)𝑖superscript𝑖(i,i^{\prime})( italic_i , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )-th entry is denoted by 𝐁iisubscript𝐁𝑖superscript𝑖\mathbf{B}_{ii^{\prime}}bold_B start_POSTSUBSCRIPT italic_i italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Moreover, 𝐁Fsubscriptdelimited-∥∥𝐁𝐹\left\lVert\mathbf{B}\right\rVert_{F}∥ bold_B ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT denotes the Frobenius norm of the matrix 𝐁𝐁\mathbf{B}bold_B. For any two vectors 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y, 𝐱𝐲delimited-∥∥𝐱𝐲\left\lVert\mathbf{x}-\mathbf{y}\right\rVert∥ bold_x - bold_y ∥ denotes the Euclidean distance between 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y. The set {1,2,n}12𝑛\{1,2,\dots n\}{ 1 , 2 , … italic_n } is denoted by [n]delimited-[]𝑛[n][ italic_n ]. The set of d×d𝑑𝑑d\times ditalic_d × italic_d orthogonal matrices is denoted by 𝒪(d)𝒪𝑑\mathcal{O}(d)caligraphic_O ( italic_d ). For a sequence of random variables X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we say Xnsubscript𝑋𝑛X_{n}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT converges in probability to X𝑋Xitalic_X if limnPr[XnX>ϵ]=0subscript𝑛𝑃𝑟delimited-[]delimited-∥∥subscript𝑋𝑛𝑋italic-ϵ0\lim_{n\to\infty}Pr\left[\left\lVert X_{n}-X\right\rVert>\epsilon\right]=0roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_P italic_r [ ∥ italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_X ∥ > italic_ϵ ] = 0 for every ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. We denote convergence in probability with XnPXsuperscript𝑃subscript𝑋𝑛𝑋X_{n}\to^{P}Xitalic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_X.

Recall that our setting includes the observed training set 𝒯^={(ψ^1,y1),,(ψ^n,yn)}^𝒯subscript^𝜓1subscript𝑦1subscript^𝜓𝑛subscript𝑦𝑛\widehat{\mathcal{T}}=\{(\widehat{\psi}_{1},y_{1}),\ldots,(\widehat{\psi}_{n},% y_{n})\}over^ start_ARG caligraphic_T end_ARG = { ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } with the true-but-not-observed (ψi,yi)iidPψYsubscript𝜓𝑖subscript𝑦𝑖𝑖𝑖𝑑similar-tosubscript𝑃𝜓𝑌(\psi_{i},y_{i})\overset{iid}{\sim}P_{\psi Y}( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_OVERACCENT italic_i italic_i italic_d end_OVERACCENT start_ARG ∼ end_ARG italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT and realizations ψidsubscript𝜓𝑖superscript𝑑\psi_{i}\in\mathbb{R}^{d}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and yidsubscript𝑦𝑖superscriptsuperscript𝑑y_{i}\in\mathbb{R}^{d^{\prime}}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, a unlabeled test observation ψ^n+1subscript^𝜓𝑛1\widehat{\psi}_{n+1}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT, a class of decision functions {h:dd}conditional-setsuperscript𝑑superscriptsuperscript𝑑\mathcal{H}\subset\{h:\mathbb{R}^{d}\to\mathbb{R}^{d^{\prime}}\}caligraphic_H ⊂ { italic_h : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT }, and a loss function :d×d0:superscriptsuperscript𝑑superscriptsuperscript𝑑subscriptabsent0\ell:\mathbb{R}^{d^{\prime}}\times\mathbb{R}^{d^{\prime}}\to\mathbb{R}_{\geq 0}roman_ℓ : blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT.

A.1 Proof of Theorem 1

Define

(PψY,h(;𝒯^n))subscriptsubscript𝑃𝜓𝑌subscript^𝒯𝑛\displaystyle\mathcal{R}_{\ell}(P_{\psi Y},h(\;\cdot\;;\widehat{\mathcal{T}}_{% n}))caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) =𝔼(ψi,yi)i[n+1]iidPψY[l(h(ψ^n+1;{(ψ^i,yi)}i=1n),yn+1)]absentsubscript𝔼subscriptsubscript𝜓𝑖subscript𝑦𝑖𝑖delimited-[]𝑛1𝑖𝑖𝑑similar-tosubscript𝑃𝜓𝑌delimited-[]𝑙subscript^𝜓𝑛1superscriptsubscriptsubscript^𝜓𝑖subscript𝑦𝑖𝑖1𝑛subscript𝑦𝑛1\displaystyle=\mathbb{E}_{(\psi_{i},y_{i})_{i\in[n+1]}\overset{iid}{\sim}P_{% \psi Y}}[l(h(\widehat{\psi}_{n+1};\{(\widehat{\psi}_{i},y_{i})\}_{i=1}^{n}),y_% {n+1})]= blackboard_E start_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ [ italic_n + 1 ] end_POSTSUBSCRIPT start_OVERACCENT italic_i italic_i italic_d end_OVERACCENT start_ARG ∼ end_ARG italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_l ( italic_h ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ; { ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ]

and, analogously,

(PψY,h(;𝒯n))subscriptsubscript𝑃𝜓𝑌subscript𝒯𝑛\displaystyle\mathcal{R}_{\ell}(P_{\psi Y},h(\;\cdot\;;\mathcal{T}_{n}))caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) =𝔼(ψi,yi)i[n+1]iidPψY[l(h(ψn+1;{(ψi,yi)}i=1n),yn+1)].absentsubscript𝔼subscriptsubscript𝜓𝑖subscript𝑦𝑖𝑖delimited-[]𝑛1𝑖𝑖𝑑similar-tosubscript𝑃𝜓𝑌delimited-[]𝑙subscript𝜓𝑛1superscriptsubscriptsubscript𝜓𝑖subscript𝑦𝑖𝑖1𝑛subscript𝑦𝑛1\displaystyle=\mathbb{E}_{(\psi_{i},y_{i})_{i\in[n+1]}\overset{iid}{\sim}P_{% \psi Y}}[l(h(\psi_{n+1};\left\{(\psi_{i},y_{i})\right\}_{i=1}^{n}),y_{n+1})].= blackboard_E start_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ [ italic_n + 1 ] end_POSTSUBSCRIPT start_OVERACCENT italic_i italic_i italic_d end_OVERACCENT start_ARG ∼ end_ARG italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_l ( italic_h ( italic_ψ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ; { ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ] .

Recall that n𝑛nitalic_n remains fixed. Following Acharyya et al. (2024), we let m𝑚mitalic_m grow with the number of replicates r𝑟ritalic_r. Thus, ψ^isubscript^𝜓𝑖\widehat{\psi}_{i}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT depends on r𝑟ritalic_r. We write ψ^i(r)superscriptsubscript^𝜓𝑖𝑟\widehat{\psi}_{i}^{(r)}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT to emphasize this dependence when necessary. Note that (PψY,h(;𝒯^n))subscriptsubscript𝑃𝜓𝑌subscript^𝒯𝑛\mathcal{R}_{\ell}(P_{\psi Y},h(\;\cdot\;;\widehat{\mathcal{T}}_{n}))caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) also depends on r𝑟ritalic_r (and m𝑚mitalic_m, through r𝑟ritalic_r).

We make some assumptions about the decision function hhitalic_h and the loss function l𝑙litalic_l.

Assumption 1. The decision function hhitalic_h is invariant to affine transformation. That is, for any 𝐖𝒪(d)𝐖𝒪𝑑\mathbf{W}\in\mathcal{O}(d)bold_W ∈ caligraphic_O ( italic_d ) and 𝐚d𝐚superscript𝑑\mathbf{a}\in\mathbb{R}^{d}bold_a ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT,

h(𝐖ψn+1+𝐚;{(𝐖ψi+𝐚,yi)}i=1n)=h(ψn+1;{(ψi,yi)}i=1n).𝐖subscript𝜓𝑛1𝐚superscriptsubscript𝐖subscript𝜓𝑖𝐚subscript𝑦𝑖𝑖1𝑛subscript𝜓𝑛1superscriptsubscriptsubscript𝜓𝑖subscript𝑦𝑖𝑖1𝑛\displaystyle h(\mathbf{W}\psi_{n+1}+\mathbf{a};\left\{(\mathbf{W}\psi_{i}+% \mathbf{a},y_{i})\right\}_{i=1}^{n})=h(\psi_{n+1};\left\{(\psi_{i},y_{i})% \right\}_{i=1}^{n}).italic_h ( bold_W italic_ψ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + bold_a ; { ( bold_W italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_a , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_h ( italic_ψ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ; { ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) .

Assumption 2. The decision function hhitalic_h is continuous. That is, if

maxi[n]ψ^i(r)ψi0 as r,subscript𝑖delimited-[]𝑛superscriptsubscript^𝜓𝑖𝑟subscript𝜓𝑖0 as 𝑟\displaystyle\max_{i\in[n]}\left\lVert\widehat{\psi}_{i}^{(r)}-\psi_{i}\right% \rVert\to 0\text{ as }r\to\infty,roman_max start_POSTSUBSCRIPT italic_i ∈ [ italic_n ] end_POSTSUBSCRIPT ∥ over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT - italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ → 0 as italic_r → ∞ ,

then

h(ψ^n+1(r);{(ψ^i(r),yi)}i=1n)h(ψn+1;{(ψi,yi)}i=1n)0.delimited-∥∥superscriptsubscript^𝜓𝑛1𝑟superscriptsubscriptsuperscriptsubscript^𝜓𝑖𝑟subscript𝑦𝑖𝑖1𝑛subscript𝜓𝑛1superscriptsubscriptsubscript𝜓𝑖subscript𝑦𝑖𝑖1𝑛0\displaystyle\left\lVert h\left(\widehat{\psi}_{n+1}^{(r)};\{(\widehat{\psi}_{% i}^{(r)},y_{i})\}_{i=1}^{n}\right)-h\left(\psi_{n+1};\left\{(\psi_{i},y_{i})% \right\}_{i=1}^{n}\right)\right\rVert\to 0.∥ italic_h ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT ; { ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) - italic_h ( italic_ψ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ; { ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ∥ → 0 .

Assumption 3. We assume that the \mathcal{H}caligraphic_H is such that for every hh\in\mathcal{H}italic_h ∈ caligraphic_H, the image set of the function hhitalic_h is closed, bounded and complete.

Assumption 4. The loss function l𝑙litalic_l is continuous. That is, for every yd𝑦superscriptsuperscript𝑑y\in\mathbb{R}^{d^{\prime}}italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, l(h,y)l(h′′,y)0delimited-∥∥𝑙superscript𝑦𝑙superscript′′𝑦0\left\lVert l(h^{\prime},y)-l(h^{\prime\prime},y)\right\rVert\to 0∥ italic_l ( italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y ) - italic_l ( italic_h start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , italic_y ) ∥ → 0 if hh′′0delimited-∥∥superscriptsuperscript′′0\left\lVert h^{\prime}-h^{\prime\prime}\right\rVert\to 0∥ italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_h start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ → 0.

Thus, by Theorem 2 of Acharyya et al. (2024), we can say that there exist sequences {𝐖(u)}u=1superscriptsubscriptsuperscript𝐖𝑢𝑢1\{\mathbf{W}^{(u)}\}_{u=1}^{\infty}{ bold_W start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_u = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT and {𝐚(u)}u=1superscriptsubscriptsuperscript𝐚𝑢𝑢1\{\mathbf{a}^{(u)}\}_{u=1}^{\infty}{ bold_a start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_u = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT, where 𝐖(u)𝒪(d)superscript𝐖𝑢𝒪𝑑\mathbf{W}^{(u)}\in\mathcal{O}(d)bold_W start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ∈ caligraphic_O ( italic_d ) and 𝐚(u)dsuperscript𝐚𝑢superscript𝑑\mathbf{a}^{(u)}\in\mathbb{R}^{d}bold_a start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for all u𝑢u\in\mathbb{N}italic_u ∈ blackboard_N, such that

maxi[n]ψ^i(ru)(𝐖(u)ψi+𝐚(u))0 as u.subscript𝑖delimited-[]𝑛superscriptsubscript^𝜓𝑖subscript𝑟𝑢superscript𝐖𝑢subscript𝜓𝑖superscript𝐚𝑢0 as u\max_{i\in[n]}\left\lVert\widehat{\psi}_{i}^{(r_{u})}-(\mathbf{W}^{(u)}\psi_{i% }+\mathbf{a}^{(u)})\right\rVert\to 0\text{ as $u\to\infty$}.roman_max start_POSTSUBSCRIPT italic_i ∈ [ italic_n ] end_POSTSUBSCRIPT ∥ over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT - ( bold_W start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_a start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ) ∥ → 0 as italic_u → ∞ . (3)

Now,

(PψY,h(.;𝒯^n))(PψY,h(.;𝒯n))\displaystyle\mathcal{R}_{\ell}(P_{\psi Y},h(\;.\;;\hat{\mathcal{T}}_{n}))-% \mathcal{R}_{\ell}(P_{\psi Y},h(\;.\;;\mathcal{T}_{n}))caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( . ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) - caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( . ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) )
=𝔼[l(h(ψ^n+1(ru);{(ψ^i(ru),yi)}i=1n),yn+1)l(h(ψn+1;{(ψi,yi)}i=1n),yn+1)]absent𝔼delimited-[]𝑙superscriptsubscript^𝜓𝑛1subscript𝑟𝑢superscriptsubscriptsuperscriptsubscript^𝜓𝑖subscript𝑟𝑢subscript𝑦𝑖𝑖1𝑛subscript𝑦𝑛1𝑙subscript𝜓𝑛1superscriptsubscriptsubscript𝜓𝑖subscript𝑦𝑖𝑖1𝑛subscript𝑦𝑛1\displaystyle=\mathbb{E}\left[l(h(\widehat{\psi}_{n+1}^{(r_{u})};\{(\widehat{% \psi}_{i}^{(r_{u})},y_{i})\}_{i=1}^{n}),y_{n+1})-l(h(\psi_{n+1};\left\{(\psi_{% i},y_{i})\right\}_{i=1}^{n}),y_{n+1})\right]= blackboard_E [ italic_l ( italic_h ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ; { ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) - italic_l ( italic_h ( italic_ψ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ; { ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ]
=𝔼[l(h(ψ^n+1(ru);{(ψ^i(ru),yi)}i=1n),yn+1)l(h(𝐖(u)ψn+1+𝐚(u);{(𝐖(u)ψi+𝐚(u),yi)}i=1n),yn+1)]absent𝔼delimited-[]𝑙superscriptsubscript^𝜓𝑛1subscript𝑟𝑢superscriptsubscriptsuperscriptsubscript^𝜓𝑖subscript𝑟𝑢subscript𝑦𝑖𝑖1𝑛subscript𝑦𝑛1𝑙superscript𝐖𝑢subscript𝜓𝑛1superscript𝐚𝑢superscriptsubscriptsuperscript𝐖𝑢subscript𝜓𝑖superscript𝐚𝑢subscript𝑦𝑖𝑖1𝑛subscript𝑦𝑛1\displaystyle=\mathbb{E}\left[l(h(\widehat{\psi}_{n+1}^{(r_{u})};\{(\widehat{% \psi}_{i}^{(r_{u})},y_{i})\}_{i=1}^{n}),y_{n+1})-l(h(\mathbf{W}^{(u)}\psi_{n+1% }+\mathbf{a}^{(u)};\{(\mathbf{W}^{(u)}\psi_{i}+\mathbf{a}^{(u)},y_{i})\}_{i=1}% ^{n}),y_{n+1})\right]= blackboard_E [ italic_l ( italic_h ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ; { ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) - italic_l ( italic_h ( bold_W start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + bold_a start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT ; { ( bold_W start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_a start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ]

from Assumption 1.

Using Assumption 2 on Eq. 3, we have

h(ψ^n+1(ru);{(ψ^i(ru),yi)}i=1n)h(ψn+1;{(ψi,yi)}i=1n)P0 as u.superscript𝑃delimited-∥∥superscriptsubscript^𝜓𝑛1subscript𝑟𝑢superscriptsubscriptsuperscriptsubscript^𝜓𝑖subscript𝑟𝑢subscript𝑦𝑖𝑖1𝑛subscript𝜓𝑛1superscriptsubscriptsubscript𝜓𝑖subscript𝑦𝑖𝑖1𝑛0 as u\left\lVert h(\widehat{\psi}_{n+1}^{(r_{u})};\{(\widehat{\psi}_{i}^{(r_{u})},y% _{i})\}_{i=1}^{n})-h\left(\psi_{n+1};\left\{(\psi_{i},y_{i})\right\}_{i=1}^{n}% \right)\right\rVert\to^{P}0\text{ as $u\to\infty$}.∥ italic_h ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ; { ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) - italic_h ( italic_ψ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ; { ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ∥ → start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT 0 as italic_u → ∞ .

Further, using Assumption 4, we get

|l(h(ψ^n+1(ru);{(ψ^i(ru),yi)}i=1n),yn+1)l(h(ψn+1;{(ψi,yi)}i=1n),yn+1)|P0 as u,superscript𝑃𝑙superscriptsubscript^𝜓𝑛1subscript𝑟𝑢superscriptsubscriptsuperscriptsubscript^𝜓𝑖subscript𝑟𝑢subscript𝑦𝑖𝑖1𝑛subscript𝑦𝑛1𝑙subscript𝜓𝑛1superscriptsubscriptsubscript𝜓𝑖subscript𝑦𝑖𝑖1𝑛subscript𝑦𝑛10 as u,\left|l(h(\widehat{\psi}_{n+1}^{(r_{u})};\{(\widehat{\psi}_{i}^{(r_{u})},y_{i}% )\}_{i=1}^{n}),y_{n+1})-l(h(\psi_{n+1};\{(\psi_{i},y_{i})\}_{i=1}^{n}),y_{n+1}% )\right|\to^{P}0\text{ as $u\to\infty$,}| italic_l ( italic_h ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ; { ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) - italic_l ( italic_h ( italic_ψ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ; { ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) | → start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT 0 as italic_u → ∞ ,

which leads us to

|(PψY,h(.;𝒯^n))(PψY,h(.;𝒯n))|0 as u,\left|\mathcal{R}_{\ell}(P_{\psi Y},h(\;.\;;\widehat{\mathcal{T}}_{n}))-% \mathcal{R}_{\ell}(P_{\psi Y},h(\;.\;;\mathcal{T}_{n}))\right|\to 0\text{ as $% u\to\infty$},| caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( . ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) - caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( . ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) | → 0 as italic_u → ∞ ,

which is the desired result.

A.2 Proof of Theorem 2.

Given Theorem 1, we have

|(PψY,h(;𝒯^n))(PψY,h(;𝒯n))|0 as u.subscriptsubscript𝑃𝜓𝑌subscript^𝒯𝑛subscriptsubscript𝑃𝜓𝑌subscript𝒯𝑛0 as u\left|\mathcal{R}_{\ell}(P_{\psi Y},h(\;\cdot\;;\widehat{\mathcal{T}}_{n}))-% \mathcal{R}_{\ell}(P_{\psi Y},h(\;\cdot\;;\mathcal{T}_{n}))\right|\to 0\text{ % as $u\to\infty$}.| caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) - caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) | → 0 as italic_u → ∞ .

for every fixed n𝑛nitalic_n. Now, let (h(;𝒯1)),,h(;𝒯n))(h(\;\cdot\;;\mathcal{T}_{1})),\ldots,h(\;\cdot\;;\mathcal{T}_{n}))( italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) , … , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) be consistent for PψYsubscript𝑃𝜓𝑌P_{\psi Y}italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT with respect to \mathcal{H}caligraphic_H. That is,

|(PXY,h(;𝒯n))(PψY,)|0 as n.subscriptsubscript𝑃𝑋𝑌subscript𝒯𝑛subscriptsuperscriptsubscript𝑃𝜓𝑌0 as n\;\big{|}\mathcal{R}_{\ell}(P_{XY},h(\;\cdot\;;\mathcal{T}_{n}))-\mathcal{R}^{% *}_{\ell}(P_{\psi Y},\mathcal{H})\big{|}\to 0\text{ as $n\to\infty$}.| caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT , italic_h ( ⋅ ; caligraphic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) - caligraphic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , caligraphic_H ) | → 0 as italic_n → ∞ .

Then, given the results in Sekhon (2021), for some subsequence of u𝑢uitalic_u as defined in Theorem 3,

|(PψY,h(.;𝒯^n))(PψY,)|0 as n,\left|\mathcal{R}_{\ell}(P_{\psi Y},h(\;.\;;\widehat{\mathcal{T}}_{n}))-% \mathcal{R}^{*}_{\ell}(P_{\psi Y},\mathcal{H})\right|\to 0\text{ as $n\to% \infty$},| caligraphic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , italic_h ( . ; over^ start_ARG caligraphic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) - caligraphic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_ψ italic_Y end_POSTSUBSCRIPT , caligraphic_H ) | → 0 as italic_n → ∞ ,

as claimed.

Appendix B Additional Experimental Details

B.1 “Was RA Fisher great?”

In Section 2.3, we presented regression results in DKPS induced by up to n=550𝑛550n=550italic_n = 550 models. Each “model” is LLaMA-2-7B-Chat further parameterized by an augmentation aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that is pre-prepended to every query that is presented to the model. The 550550550550 augmentations were based off of 50505050 original augmentations presented in Acharyya et al. (2024). Of note, the 50505050 original augmentations can be further split into two classes: augmentations that describe Fisher’s statistical achievements and augmentations that describe Fisher’s involvement in the 20th century Eugenics movement or consequences thereof. Table 1 provides five original augmentations for each class.

Examples of statistics augmentations
‘RA Fisher has been described as ”a genius who almost single-handedly created the foundations for modern statistical science.”’
‘RA Fisher has been described as “the single most important figure in 20th centruy statistics.”’
‘RA Fisher has been described as “the greatest of Darwin’s successors.”’
‘RA Fisher coined the term “variance” and proposed its formal analysis.’
‘RA Fisher produced the first result towards establishing population genetics and quantitative genetics.’
Examples of eugenics augmentations
‘RA Fisher was an advocate for “positive eugenics”, often cited as a self-cetnered appeal for discrimination.’
‘RA Fisher was an advocate for diverting resources away from groups of people he deemed unworthy.’
‘RA Fisher’s amibitions were transparent, self-serving, and self-aggrandising.’
‘RA Fisher’s views on eugenics lead him to conclude racial groups were biologically different and separate populations.’
‘RA Fisher’s view on eugenics were primarily based on anecdotes and prejudice.’
Table 1: Ten of the original augmentations used to parameterize the models in Section [ref]. Typos in the table exist in the augmentations used to induce the DKPS.

To go from 50505050 augmentations to 550550550550 augmentations, we appended the name of ten random fruits, e.g., “banana”, to each of the originals. For aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the original set, the augmentations ai+“banana”subscript𝑎𝑖“banana”a_{i}+\text{``banana"}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + “banana” and ai+“strawberry”subscript𝑎𝑖“strawberry”a_{i}+\text{``strawberry"}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + “strawberry” are considered distinct from each other and from aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. While the relationship between the original augmentations and the other 500500500500 likely has an impact on the mangitude of the performance of the 1-nearest neighbor regressor, we do not think it has a meaningful effect on the relative performance of the regressor across n𝑛nitalic_n and m𝑚mitalic_m.

We also studied the effect of the number of queries on the performance of the regressor. As mentioned in the main text, to generate queries we prompted ChatGPT with the question “Provide 100 questions related to RA Fisher”. Table 2 provides 5 of these queries.

Examples of generated queries
‘What is R.A. Fisher’s most well-known statistical theorem?’
‘In which year did Fisher introduce the concept of maximum likelihood estimation?’
‘What is Fisher’s exact test, and when is it employed in statistical analysis?’
‘How did R.A. Fisher contribute to the development of experimental design in statistics?’
‘What is the significance of Fisher’s work in the analysis of variance?’
Table 2: Examples of queries generated by prompting ChatGPT with “Provide 100 questions related to RA Fisher.”

B.2 How safe is a model?

The graph of models that we study in Section 3.1 is the undirected “model family tree” of HuggingFace user mlabonne’s model AlphaMonarch-7B222https://huggingface.co/mlabonne/AlphaMonarch-7B. Some of the models in the tree are no longer publicly available at the time of writing. Further, some of the models in the tree were publicly available when we ran the experiment and are no longer. We also did not include models that were designed for anything other than natural language queries and responses, such as Q-bert’s MetaMath-Cybertron.

Here is the list of models, provided as HuggingFace model strings, studied above. The list order corresponds to the node numbers in the graph presented in Figure 4:

  1. 0.

    mistralai/Mistral-7B-v0.1

  2. 1.

    fblgit/una-cybertron-7b-v2-bf16

  3. 2.

    HuggingFaceH4/zephyr-7b-beta

  4. 3.

    Intel/neural-chat-7b-v3-3

  5. 4.

    teknium/OpenHermes-2.5-Mistral-7B

  6. 5.

    berkeley-nest/Starling-LM-7B-alpha

  7. 6.

    openchat/openchat-3.5-1210

  8. 7.

    Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp

  9. 8.

    mistralai/Mistral-7B-Instruct-v0.2

  10. 9.

    SciPhi/SciPhi-Mistral-7B-32k

  11. 10.

    mlabonne/NeuralHermes-2.5-Mistral-7B

  12. 11.

    ehartford/samantha-1.2-mistral-7b

  13. 12.

    Arc53/docsgpt-7b-mistral

  14. 13.

    Open-Orca/Mistral-7B-OpenOrca

  15. 14.

    ehartford/dolphin-2.2.1-mistral-7b

  16. 15.

    v1olet/v1olet_marcoroni-go-bruins-merge-7B

  17. 16.

    Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp

  18. 17.

    EmbeddedLLM/Mistral-7B-Merge-14-v0.3

  19. 18.

    EmbeddedLLM/Mistral-7B-Merge-14-v0

  20. 19.

    janai-hq/trinity-v1

  21. 20.

    EmbeddedLLM/Mistral-7B-Merge-14-v0.1

  22. 21.

    samir-fama/SamirGPT-v1

  23. 22.

    EmbeddedLLM/Mistral-7B-Merge-14-v0.2

  24. 23.

    abacusai/Slerp-CM-mist-dpo

  25. 24.

    openchat/openchat-3.5-0106

  26. 25.

    mlabonne/Marcoro14-7B-slerp

  27. 26.

    mlabonne/Daredevil-7B

  28. 27.

    mlabonne/NeuralMarcoro14-7B

  29. 28.

    fblgit/UNA-TheBeagle-7b-v1

  30. 29.

    EmbeddedLLM/Mistral-7B-Merge-14-v0.5

  31. 30.

    udkai/Turdus

  32. 31.

    mlabonne/Beagle14-7B

  33. 32.

    nfaheem/Marcoroni-7b-DPO-Merge

  34. 33.

    mlabonne/NeuralBeagle14-7B

  35. 34.

    mlabonne/NeuralDaredevil-7B

  36. 35.

    leveldevai/TurdusBeagle-7B

  37. 36.

    shadowml/DareBeagle-7B

  38. 37.

    FelixChao/WestSeverus-7B-DPO-v2

  39. 38.

    leveldevai/MarcBeagle-7B

  40. 39.

    leveldevai/TurdusDareBeagle-7B

  41. 40.

    shadowml/WestBeagle-7B

  42. 41.

    FelixChao/Sectumsempra-7B-DPO

  43. 42.

    leveldevai/MarcDareBeagle-7B

  44. 43.

    shadowml/BeagleSempra-7B

  45. 44.

    flemmingmiguel/MBX-7B

  46. 45.

    shadowml/BeagSake-7B

  47. 46.

    flemmingmiguel/MBX-7B-v3

  48. 47.

    mlabonne/OmniBeagle-7B

  49. 48.

    AiMavenAi/AiMaven-Prometheus

  50. 49.

    paulml/OmniBeagleMBX-v3-7B

  51. 50.

    CultriX/NeuralTrix-7B-dpo

  52. 51.

    paulml/OmniBeagleSquaredMBX-v3-7B-v2

  53. 52.

    eren23/dpo-binarized-NeuralTrix-7B

  54. 53.

    Kukedlc/NeuTrixOmniBe-7B-model-remix

  55. 54.

    eren23/dpo-binarized-NeutrixOmnibe-7B

  56. 55.

    mlabonne/OmniTruthyBeagle-7B-v0

  57. 56.

    mlabonne/NeuBeagle-7B

  58. 57.

    mlabonne/NeuralOmniBeagle-7B

  59. 58.

    mlabonne/Monarch-7B