BOPIM: Bayesian Optimization for influence maximization on temporal networks

Eric Yanchenko111Akita International University, Global Connectivity Program, Akita, Japan

The goal of influence maximization (IM) is to select a small set of seed nodes which maximizes the spread of influence on a network. In this work, we propose 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM, a Bayesian Optimization (BO) algorithm for IM on temporal networks. The IM task is well-suited for a BO solution due to its expensive and complicated objective function. We propose a simple surrogate function to model the objective function and leverage Gaussian Process regression with shrinkage priors to fit the model. A major difficulty in combinatorial BO is constructing an appropriate covariance matrix. We overcome this challenge with a kernel function based on the amount of overlap in seed set neighbors, thus tailoring the solution to our IM application. This also allows us to use the Expected Improvement acquisition function to choose the next point to evaluate. In numerical experiments on real-world networks, we prove that 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM outperforms competing methods and yields comparable influence spreads to a gold-standard greedy algorithm while being as much as five times faster. We also demonstrate the proposed method’s ability to quantify uncertainty in optimal seed sets. To our knowledge, this is the first attempt to look at uncertainty in the seed sets for IM.

Keywords: Diffusion, Dynamic networks, Shrinkage priors, Uncertainty quantification

1 Introduction

Influence maximization (IM) is a canonical network science problem where the goal is to select a small set of seed nodes in order to maximize the influence spread on a graph (Kempe et al.,, 2003; Chen et al.,, 2009). Over the past two decades, IM has garnered significant research interest due to its relevance in online marketing campaigns (Domingos and Richardson,, 2001; Bharathi et al.,, 2007; Chen et al., 2010b, ), rumor spreading (Kempe et al.,, 2003; Murata and Koga,, 2018), public health campaigns (Wilder et al.,, 2017; Yadav et al.,, 2017; Wilder et al.,, 2018; Yadav et al.,, 2018), vaccine strategies (Holme,, 2004; Lee et al.,, 2012), and more. Traditionally, the problem assumes the graph to be static, i.e., nodes and edges are fixed. In many situations, however, this may be unrealistic. For example, consider a marketing campaign on X where a company promotes a new product. As users and followers change daily, an effective IM algorithm must account for this dynamism. Therefore, recent work has looked at the IM problem on temporal networks (Holme and Saramäki,, 2012) where edges change with time. Since this combinatorial optimization problem is NP-hard, heuristics must be employed. Some methods use a greedy algorithm and probabilistic approximation of the expected influence spread (Aggarwal et al.,, 2012; Osawa and Murata,, 2015; Erkol et al.,, 2020), while others eschew costly influence spread calculations by simply ranking nodes in order of importance (Michalski et al.,, 2014; Murata and Koga,, 2018; Michalski et al.,, 2020). We refer the interested reader to Yanchenko et al., 2024b for a review of temporal IM.

While IM is a well-known problem in the network, information and computer sciences, it has received relatively little attention from statisticians. This has led to several consequences on IM methodology development, not least of which being a lack of uncertainty quantification. Additionally, as the IM task is an optimization problem, statistics has a whole suit of techniques to offer that hitherto have been unexplored. One such approach is Bayesian Optimization (BO) which has emerged as a popular paradigm for optimizing complicated objective functions (Snoek et al.,, 2012; Shahriari et al.,, 2015; Frazier,, 2018). Machine learning hyper-parameter tuning (Snoek et al.,, 2012; Wu et al.,, 2019; Turner et al.,, 2021), experimental design (Frazier and Wang,, 2015; Shahriari et al.,, 2015; Greenhill et al.,, 2020) and computer experiments (Sürer et al.,, 2023; Sauer et al.,, 2023) are just a few of the many applications where BO has been successfully implemented. BO approximates a complicated objective function with a simple surrogate function and fits this model via Bayesian regression. The statistical model is maximized with an acquisition function before evaluating this new point on the original objective function. Finally, this point is added to the data set and the process repeats.

In this work, we propose 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM, Bayesian OPtimization for Influence Maximization, a novel BO algorithm for IM on temporal networks, which is one of the first solutions to the IM problem from a statistical angle. We construct an intuitive surrogate model for the influence spread function as well as a novel kernel function based on the similarity in neighbors of the seed set. We then leverage a popular global-local shrinkage prior and Expected Improvement acquisition function to select the next candidate point. In numerical experiments on real-world networks, 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM yields seed sets with influence spreads comparable to those of a greedy algorithm but at a fraction of the computational time. Additionally, our approach yields measures of uncertainty on the seed sets.

There have been many contributions to the combinatorial BO literature, including Baptista and Poloczek, (2018); Oh et al., (2019); Daulton et al., (2022); Papenmeier et al., (2023). These methods cannot be applied directly to our problem, however, due to the cardinality constraint on the number of node in the seed set. Our work primarily builds on Baptista and Poloczek, (2018); Garnett et al., (2010); Buathong et al., (2020). We leverage the ideas from Baptista and Poloczek, (2018) to construct the mean function for our Gaussian process, but this requires some modifications thanks to the cardinality constraint and input space size. Garnett et al., (2010); Buathong et al., (2020), on the other hand, address the challenges of constructing a covariance function for combinatorial optimization problems. Their approaches cannot be used directly either, however, because our input space is nodes in a graph, i.e., non-Euclidean. Though recently, there has been some work on BO for functions defined on graphs (Wan et al.,, 2024).

The layout of the paper is as follows. For the remainder of Section 1, we introduce notation and give a formal problem statement. Section 2 presents the main algorithm while Section 3 is devoted to numerical experiments. We share concluding thoughts and future directions in Section 4.

1.1 Notation and problem statement

Let 𝒢=(G1,,GT)𝒢subscript𝐺1subscript𝐺𝑇\mathcal{G}=(G_{1},\dots,G_{T})caligraphic_G = ( italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) be a temporal graph with T𝑇Titalic_T snapshots or “slices,” where a graph snapshot Gt=(V,Et)subscript𝐺𝑡𝑉subscript𝐸𝑡G_{t}=(V,E_{t})italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_V , italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is defined by node set V𝑉Vitalic_V and edge set Etsubscript𝐸𝑡E_{t}italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. We let |V|=n𝑉𝑛|V|=n| italic_V | = italic_n be the number of nodes and define m=|t=1TEt|𝑚superscriptsubscript𝑡1𝑇subscript𝐸𝑡m=|\cup_{t=1}^{T}E_{t}|italic_m = | ∪ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | as the number of unique edges. Given an influence diffusion process and integer k<n𝑘𝑛k<nitalic_k < italic_n, the goal of IM is to find a set of nodes S𝑆Sitalic_S with |S|=k𝑆𝑘|S|=k| italic_S | = italic_k such that if the nodes in S𝑆Sitalic_S are “influenced” at time t=1𝑡1t=1italic_t = 1, then the spread of influence at time T+1𝑇1T+1italic_T + 1 is maximized.

Central to this problem is the influence diffusion mechanism. Some of the most popular choices are the Independent Cascade (IC) (Wang et al.,, 2012; Wen et al.,, 2017) and Linear Threshold (Chen et al., 2010a, ; Goyal et al.,, 2011) models. We adopt the Susceptible-Infected (SI) model from epidemiological studies (Osawa and Murata,, 2015; Murata and Koga,, 2018). In the SI model, a node is either in a susceptible (S)222We use S to refer both to the susceptible state and seed set, but from context the usage should be clear. or infected (I) state. A node is infected if it was influenced at some point during the process, and susceptible otherwise. At time t=1𝑡1t=1italic_t = 1, nodes in the seed set are set to state I, and all other nodes initialize to state S. At time t𝑡titalic_t, a node in state I attempts to influence all of its neighbors. Specifically, if node i𝑖iitalic_i is in state I, node j𝑗jitalic_j is in state S, and there exists an edge between node i𝑖iitalic_i and j𝑗jitalic_j at time t𝑡titalic_t, then node j𝑗jitalic_j will be in state I at time t+1𝑡1t+1italic_t + 1 with probability λ𝜆\lambdaitalic_λ. The process concludes at time T+1𝑇1T+1italic_T + 1 and the number of nodes in state I corresponds to the total influence spread. If we let σ(S)𝜎𝑆\sigma(S)italic_σ ( italic_S ) be the expected number of influenced nodes at time T+1𝑇1T+1italic_T + 1 for seed set S𝑆Sitalic_S, the IM problem seeks the set of nodes of size k𝑘kitalic_k that maximizes σ(S)𝜎𝑆\sigma(S)italic_σ ( italic_S ) on 𝒢𝒢\mathcal{G}caligraphic_G, i.e.,

S=argmaxSV,|S|=kσ(S).superscript𝑆subscriptformulae-sequence𝑆𝑉𝑆𝑘𝜎𝑆S^{*}=\arg\max_{S\subseteq V,|S|=k}\sigma(S).italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_S ⊆ italic_V , | italic_S | = italic_k end_POSTSUBSCRIPT italic_σ ( italic_S ) . (1)

Finding Ssuperscript𝑆S^{*}italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a combinatorial optimization problem with a cardinality constraint and has been shown to be NP-hard under many popular diffusion mechanisms (Li et al.,, 2018) so finding the global optimum by brute-force calculation is infeasible even for moderate n𝑛nitalic_n. Indeed, as the influence spread function is typically computed using Monte Carlo (MC) simulations, even evaluating σ(S)𝜎𝑆\sigma(S)italic_σ ( italic_S ) is costly. Many IM algorithms efficiently compute the influence spread function and apply a greedy algorithm to choose the seed nodes, while others skip evaluation of the objective function and instead use a heuristic to rank nodes by influence. In the remainder of this work, we describe a BO algorithm to approximate the solution in (1).

2 Bayesian Optimization

We now present the main contribution of this paper, the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm, by addressing the three main steps of the BO schema: the objective function, statistical model and acquisition function.

2.1 Objective function

The first step in a BO algorithm is the initial evaluation of the objective function. Let f:𝒳:𝑓𝒳f:\mathcal{X}\to\mathbb{R}italic_f : caligraphic_X → blackboard_R be some real-valued function defined on domain 𝒳𝒳\mathcal{X}caligraphic_X that we wish to maximize. BO is particularly useful when f()𝑓f(\cdot)italic_f ( ⋅ ) is computationally expensive to evaluate, lacks special structure, e.g., concavity, and/or has unavailable first and second order derivatives. Then we evaluate f(𝒙i)𝑓subscript𝒙𝑖f(\bm{x}_{i})italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for 𝒙i𝒳subscript𝒙𝑖𝒳\bm{x}_{i}\in\mathcal{X}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_X and i=1,,N0𝑖1subscript𝑁0i=1,\dots,N_{0}italic_i = 1 , … , italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for a user-defined N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

For the temporal IM problem, let 𝒙𝒳={0,1}n𝒙𝒳superscript01𝑛\bm{x}\in\mathcal{X}=\{0,1\}^{n}bold_italic_x ∈ caligraphic_X = { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be such that xj=1subscript𝑥𝑗1x_{j}=1italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 1 if node j𝑗jitalic_j is in the seed set, and 0 otherwise, for j=1,,n𝑗1𝑛j=1,\dots,nitalic_j = 1 , … , italic_n. Since the seed set is limited to k𝑘kitalic_k nodes, we impose the cardinality constraint that j=1nxj=ksuperscriptsubscript𝑗1𝑛subscript𝑥𝑗𝑘\sum_{j=1}^{n}x_{j}=k∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k. Then we define f(𝒙)𝑓𝒙f(\bm{x})italic_f ( bold_italic_x )333f(𝒙)=σ(S)𝑓𝒙𝜎𝑆f(\bm{x})=\sigma(S)italic_f ( bold_italic_x ) = italic_σ ( italic_S ) if xj=1subscript𝑥𝑗1x_{j}=1italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 1 whenever jS𝑗𝑆j\in Sitalic_j ∈ italic_S, and 0 otherwise. as the expected influence spread for seed set 𝒙𝒙\bm{x}bold_italic_x. We stress that f()𝑓f(\cdot)italic_f ( ⋅ ) is highly dependent on 𝒢𝒢\mathcal{G}caligraphic_G as well as the diffusion process, but suppress this dependence in the notation for convenience. f()𝑓f(\cdot)italic_f ( ⋅ ) is an ideal candidate for BO because its evaluation does not yield derivatives and it is expensive to compute via MC simulations. For the initial N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT evaluations of f()𝑓f(\cdot)italic_f ( ⋅ ), we remark that nodes with high degrees on the temporally aggregated network, G~=tGt~𝐺subscript𝑡subscript𝐺𝑡\tilde{G}=\cup_{t}G_{t}over~ start_ARG italic_G end_ARG = ∪ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, are good candidates for the optimal seed set. Therefore, instead of randomly sampling 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from 𝒳𝒳\mathcal{X}caligraphic_X, we sample nodes proportionally to their degree. Mathematically, this means sampling k𝑘kitalic_k times without replacement from a distribution with probability mass function 𝖯(Z=j)=dj/kdk𝖯𝑍𝑗subscript𝑑𝑗subscript𝑘subscript𝑑𝑘\mathsf{P}(Z=j)=d_{j}/\sum_{k}d_{k}sansserif_P ( italic_Z = italic_j ) = italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT where djsubscript𝑑𝑗d_{j}italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the degree of node j𝑗jitalic_j on G~~𝐺\tilde{G}over~ start_ARG italic_G end_ARG for Z{1,,n}𝑍1𝑛Z\in\{1,\dots,n\}italic_Z ∈ { 1 , … , italic_n }. This sampling scheme ensures that the model better fits the objective function near the expected optimal seed set (see Section 3.2 for empirical evidence). With 𝒙1,,𝒙N0subscript𝒙1subscript𝒙subscript𝑁0\bm{x}_{1},\dots,\bm{x}_{N_{0}}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT sampled, we define 𝐗=(𝒙1T,,𝒙N0T)T{0,1}N0×n𝐗superscriptsuperscriptsubscript𝒙1𝑇superscriptsubscript𝒙subscript𝑁0𝑇𝑇superscript01subscript𝑁0𝑛{\bf X}=(\bm{x}_{1}^{T},\dots,\bm{x}_{N_{0}}^{T})^{T}\in\{0,1\}^{N_{0}\times n}bold_X = ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT × italic_n end_POSTSUPERSCRIPT and evaluate f(𝒙i)𝑓subscript𝒙𝑖f(\bm{x}_{i})italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) using MC simulations.

2.2 Statistical model

Next we need an adequate surrogate model for the objective function. The most popular approach uses Gaussian Process (GP) regression. First, let

yi=f(𝒙i)+ϵi,subscript𝑦𝑖𝑓subscript𝒙𝑖subscriptitalic-ϵ𝑖y_{i}=f(\bm{x}_{i})+\epsilon_{i},italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the observed influence spread, f(𝒙i)𝑓subscript𝒙𝑖f(\bm{x}_{i})italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the objective function from Section 2.1 and ϵi𝖭𝗈𝗋𝗆𝖺𝗅(0,σ2)similar-tosubscriptitalic-ϵ𝑖𝖭𝗈𝗋𝗆𝖺𝗅0superscript𝜎2\epsilon_{i}\sim\mathsf{Normal}(0,\sigma^{2})italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ sansserif_Normal ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) is the noise term. Then our GP regression model is

𝒚𝖭𝗈𝗋𝗆𝖺𝗅(μ(𝐗),σ2γ𝐊+σ2𝐈N0)similar-to𝒚𝖭𝗈𝗋𝗆𝖺𝗅𝜇𝐗superscript𝜎2𝛾𝐊superscript𝜎2subscript𝐈subscript𝑁0\bm{y}\sim\mathsf{Normal}(\mu({\bf X}),\sigma^{2}\gamma{\bf K}+\sigma^{2}{\bf I% }_{N_{0}})bold_italic_y ∼ sansserif_Normal ( italic_μ ( bold_X ) , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_γ bold_K + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (2)

where μ()𝜇\mu(\cdot)italic_μ ( ⋅ ) is the mean function and 𝐊𝐊{\bf K}bold_K is the covariance defined by a kernel function κ(,)𝜅\kappa(\cdot,\cdot)italic_κ ( ⋅ , ⋅ ) such that 𝐊ij=κ(𝒙i,𝒙j)subscript𝐊𝑖𝑗𝜅subscript𝒙𝑖subscript𝒙𝑗{\bf K}_{ij}=\kappa(\bm{x}_{i},\bm{x}_{j})bold_K start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_κ ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). Additionally, 𝒚=(y1,,yN0)T𝒚superscriptsubscript𝑦1subscript𝑦subscript𝑁0𝑇\bm{y}=(y_{1},\dots,y_{N_{0}})^{T}bold_italic_y = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, 𝐗𝐗{\bf X}bold_X is defined in the previous sub-section, and γ>0𝛾0\gamma>0italic_γ > 0 is a kernel hyper-parameter. The two key components of the GP regression model are the mean function μ()𝜇\mu(\cdot)italic_μ ( ⋅ ) and kernel function κ(,)𝜅\kappa(\cdot,\cdot)italic_κ ( ⋅ , ⋅ ), so we consider each of these in turn.

2.2.1 Mean function

First, we construct a mean function, μ()𝜇\mu(\cdot)italic_μ ( ⋅ ) parameterized by 𝜷𝜷\bm{\beta}bold_italic_β, to serve as a surrogate for f()𝑓f(\cdot)italic_f ( ⋅ ). Motivated by Baptista and Poloczek, (2018), we assume that the surrogate model is linear in its parameters, i.e.,

μ(𝒙i)=j=1nxijβj=𝒙i𝜷𝜇subscript𝒙𝑖superscriptsubscript𝑗1𝑛subscript𝑥𝑖𝑗subscript𝛽𝑗subscript𝒙𝑖𝜷\mu(\bm{x}_{i})=\sum_{j=1}^{n}x_{ij}\beta_{j}=\bm{x}_{i}\bm{\beta}italic_μ ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_β (3)

for 𝜷=(β1,,βn)T𝜷superscriptsubscript𝛽1subscript𝛽𝑛𝑇\bm{\beta}=(\beta_{1},\dots,\beta_{n})^{T}bold_italic_β = ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. GP regression places a multivariate normal prior 𝜷𝖭𝗈𝗋𝗆𝖺𝗅(𝟎n,Σβ)similar-to𝜷𝖭𝗈𝗋𝗆𝖺𝗅subscript0𝑛subscriptΣ𝛽\bm{\beta}\sim\mathsf{Normal}({\bf 0}_{n},\Sigma_{\beta})bold_italic_β ∼ sansserif_Normal ( bold_0 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) where 𝟎nsubscript0𝑛{\bf 0}_{n}bold_0 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the vector of zeros of length n𝑛nitalic_n and ΣβsubscriptΣ𝛽\Sigma_{\beta}roman_Σ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT is some covariance matrix. We also empirically standardize 𝒚𝒚\bm{y}bold_italic_y to have mean zero to remove the need for an intercept.

Since n𝑛nitalic_n is large for most real-world networks, the dimension of 𝜷𝜷\bm{\beta}bold_italic_β will also be large. Additionally, it is unlikely that each component of 𝜷𝜷\bm{\beta}bold_italic_β is significant, i.e., many nodes inclusion in the seed set does not lead to significant influence spread. Thus, we endow 𝜷𝜷\bm{\beta}bold_italic_β with a sparsity-inducing shrinkage prior. As in Baptista and Poloczek, (2018), we use the Horseshoe (HS) prior (Carvalho et al.,, 2009, 2010) which is designed to shrink insignificant coefficient estimates towards zero while still allowing for accurate estimation of the important coefficients. Specifically, the HS prior handles sparsity via a half-Cauchy prior distribution on the variance of the coefficients, i.e.,

βj|λj2,τ2,σ2conditionalsubscript𝛽𝑗superscriptsubscript𝜆𝑗2superscript𝜏2superscript𝜎2\displaystyle\beta_{j}|\lambda_{j}^{2},\tau^{2},\sigma^{2}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 𝖭𝗈𝗋𝗆𝖺𝗅(0,λj2τ2σ2)similar-toabsent𝖭𝗈𝗋𝗆𝖺𝗅0superscriptsubscript𝜆𝑗2superscript𝜏2superscript𝜎2\displaystyle\sim\mathsf{Normal}(0,\lambda_{j}^{2}\tau^{2}\sigma^{2})∼ sansserif_Normal ( 0 , italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
λj2,τ2superscriptsubscript𝜆𝑗2superscript𝜏2\displaystyle\lambda_{j}^{2},\tau^{2}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 𝖢+(0,1)similar-toabsentsuperscript𝖢01\displaystyle\sim\mathsf{C}^{+}(0,1)\ ∼ sansserif_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( 0 , 1 ) (4)

where j=1,,n𝑗1𝑛j=1,\dots,nitalic_j = 1 , … , italic_n and 𝖢+(0,1)superscript𝖢01\mathsf{C}^{+}(0,1)sansserif_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( 0 , 1 ) is the standard half-Cauchy distribution. Thus, τ2superscript𝜏2\tau^{2}italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT represents the global shrinkage and λj2superscriptsubscript𝜆𝑗2\lambda_{j}^{2}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT controls the local shrinkage of each coefficient. For the Gibbs sampler, we leverage an auxiliary variable formulation (Makalic and Schmidt,, 2015). Please see the Supplemental Materials for complete MCMC details. We also considered the R2D2 prior (Zhang et al.,, 2022; Yanchenko et al., 2024a, ) and Dirichlet-Laplace (Bhattacharya et al.,, 2015), but the results were similar, so we focus on the HS prior.

2.2.2 Coefficient interpretation

The coefficient βjsubscript𝛽𝑗\beta_{j}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT represents the marginal importance of each node to the overall influence spread. This means that if the estimate of βjsubscript𝛽𝑗\beta_{j}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is positive and large, then including node j𝑗jitalic_j in the seed set leads to a larger total influence spread. Conversely, small and negative coefficient estimates correspond to nodes whose inclusion in the seed set does not yield large influence spreads. We stress, however, that the coefficient estimates are not interpretable in the traditional sense. Typically, the coefficient βjsubscript𝛽𝑗\beta_{j}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for a binary explanatory variable xj{0,1}subscript𝑥𝑗01x_{j}\in\{0,1\}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { 0 , 1 } corresponds to the increase in the response when xj=1subscript𝑥𝑗1x_{j}=1italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 1 and all other variables are fixed. This interpretation is invalid in our context, however, due to the constraint j=1nxj=ksuperscriptsubscript𝑗1𝑛subscript𝑥𝑗𝑘\sum_{j=1}^{n}x_{j}=k∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k. In other words, βjsubscript𝛽𝑗\beta_{j}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT does not correspond to the increase in the influence spread if node j𝑗jitalic_j is added to the seed set and all other nodes are held fixed, because then the number of nodes in the seed set would be greater than k𝑘kitalic_k. Instead, we interpret βjβisubscript𝛽𝑗subscript𝛽𝑖\beta_{j}-\beta_{i}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the expected increase in the influence spread function if node j𝑗jitalic_j replaced node i𝑖iitalic_i in the seed set. Thus, the estimates yield a relative sense of the increase/decrease of the objective function if a node is seeded, and in this sense, are analogous to a node-ranking heuristic.

2.2.3 Kernel function

To complete our GP regression model, we must also specify the covariance, 𝐊𝐊{\bf K}bold_K, defined by the kernel function, κ(,)𝜅\kappa(\cdot,\cdot)italic_κ ( ⋅ , ⋅ ). If 𝒙i,𝒙jpsubscript𝒙𝑖subscript𝒙𝑗superscript𝑝\bm{x}_{i},\bm{x}_{j}\in\mathbb{R}^{p}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, then it is straightforward to compute their variance using, e.g., a Gaussian or Matern kernel. In our setting, however, 𝒙i,𝒙j{0,1}nsubscript𝒙𝑖subscript𝒙𝑗superscript01𝑛\bm{x}_{i},\bm{x}_{j}\in\{0,1\}^{n}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and correspond to nodes in a graph, so it is not immediately obvious how to choose a kernel.

Recall that for the IM task, we initially influence a small set of nodes which then propagate information to the rest of the network. Clearly, these initially selected nodes (as well as their neighboring nodes) are very important for the final influence spread (Erkol et al.,, 2020). Indeed, if the nodes corresponding to seed sets 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒙jsubscript𝒙𝑗\bm{x}_{j}bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT have many neighbors in common, then we would expect their influence spreads also to be similar.

Inspired by this observation, as well as Garnett et al., (2010), who studies the BO task on combinatorial search-spaces, we propose the following kernel function. Let {vi1,,vik}subscript𝑣subscript𝑖1subscript𝑣subscript𝑖𝑘\{v_{i_{1}},\dots,v_{i_{k}}\}{ italic_v start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT } correspond to seed set 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. First, we find all neighbors of {vi1,,vik}subscript𝑣subscript𝑖1subscript𝑣subscript𝑖𝑘\{v_{i_{1}},\dots,v_{i_{k}}\}{ italic_v start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT } at time t=1𝑡1t=1italic_t = 1, and call this neighboring set 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In other words, if node vsubscript𝑣v_{\ell}italic_v start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT has an incoming edge from some node in {vi1,,vik}subscript𝑣subscript𝑖1subscript𝑣subscript𝑖𝑘\{v_{i_{1}},\dots,v_{i_{k}}\}{ italic_v start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT } at time t=1𝑡1t=1italic_t = 1, then v𝒮isubscript𝑣subscript𝒮𝑖v_{\ell}\in\mathcal{S}_{i}italic_v start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Of course, {vi1,,vik}𝒮isubscript𝑣subscript𝑖1subscript𝑣subscript𝑖𝑘subscript𝒮𝑖\{v_{i_{1}},\dots,v_{i_{k}}\}\in\mathcal{S}_{i}{ italic_v start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT } ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as well. We also compute the set of neighboring nodes set, 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, for 𝒙jsubscript𝒙𝑗\bm{x}_{j}bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Again, if 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT have many common elements, then we want κ(𝒙i,𝒙j)𝜅subscript𝒙𝑖subscript𝒙𝑗\kappa(\bm{x}_{i},\bm{x}_{j})italic_κ ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) to be large. Therefore, we can compute the Jaccard coefficient between 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and use this as our kernel function. The Jaccard coefficient (JC) (Jaccard,, 1901, 1912) is a simple metric to quantify the similarity between two finite sets. If A𝐴Aitalic_A and B𝐵Bitalic_B are two such sets, then the JC is the number of common elements between the two sets divided by the total number of elements in each set, i.e.,

JC(A,B)=|AB||AB|.𝐽𝐶𝐴𝐵𝐴𝐵𝐴𝐵JC(A,B)=\frac{|A\cap B|}{|A\cup B|}.italic_J italic_C ( italic_A , italic_B ) = divide start_ARG | italic_A ∩ italic_B | end_ARG start_ARG | italic_A ∪ italic_B | end_ARG .

If A𝐴Aitalic_A and B𝐵Bitalic_B are the same set, then JC(A,B)=1𝐽𝐶𝐴𝐵1JC(A,B)=1italic_J italic_C ( italic_A , italic_B ) = 1 and if A𝐴Aitalic_A and B𝐵Bitalic_B have no common elements, then JC(A,B)=0𝐽𝐶𝐴𝐵0JC(A,B)=0italic_J italic_C ( italic_A , italic_B ) = 0 such that 0JC(A,B)10𝐽𝐶𝐴𝐵10\leq JC(A,B)\leq 10 ≤ italic_J italic_C ( italic_A , italic_B ) ≤ 1. Additionally, it is clear that JC(A,B)=JC(B,A)𝐽𝐶𝐴𝐵𝐽𝐶𝐵𝐴JC(A,B)=JC(B,A)italic_J italic_C ( italic_A , italic_B ) = italic_J italic_C ( italic_B , italic_A ).

With kernel function in hand, we can easily construct our covariance matrix. Indeed,

𝐊ij=κ(𝒙i,𝒙j)={JC(𝒮i,𝒮j)ij1+δi=jsubscript𝐊𝑖𝑗𝜅subscript𝒙𝑖subscript𝒙𝑗cases𝐽𝐶subscript𝒮𝑖subscript𝒮𝑗𝑖𝑗1𝛿𝑖𝑗{\bf K}_{ij}=\kappa(\bm{x}_{i},\bm{x}_{j})=\begin{cases}JC(\mathcal{S}_{i},% \mathcal{S}_{j})&i\neq j\\ 1+\delta&i=j\end{cases}bold_K start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_κ ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = { start_ROW start_CELL italic_J italic_C ( caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_CELL start_CELL italic_i ≠ italic_j end_CELL end_ROW start_ROW start_CELL 1 + italic_δ end_CELL start_CELL italic_i = italic_j end_CELL end_ROW (5)

where δ>0𝛿0\delta>0italic_δ > 0 is some small constant to ensure positive-definiteness of our covariance matrix. This definition satisfies the desideratum that points close in the input space are more strongly correlated where we have defined two seed sets to be “close” if they have many neighbors in common at time t=1𝑡1t=1italic_t = 1.

2.3 Acquisition function

With our GP model fully defined, we can move on to the final step in the BO algorithm: the acquisition function. This function determines which seed set to evaluate our objective function at next. We want to probe areas of the search space with a high likelihood of being near the global maximum (exploitation), while also considering seed sets which have not been explored yet to prevent getting stuck in a local optimum (exploration).

For the acquisition function, we use the Expected Improvement function (Frazier,, 2018)as it is one of the most popular acquisition functions and does not require any hyper-parameter tuning. The idea is to evaluate the objective function at the value which leads to the largest expected increase compared to the current maximum. Specifically, assume that we are in b𝑏bitalic_bth iteration of the BO loop. Let 𝒙{0,1}n𝒙superscript01𝑛\bm{x}\in\{0,1\}^{n}bold_italic_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and μ(𝒙)𝜇𝒙\mu(\bm{x})italic_μ ( bold_italic_x ) and σ2(𝒙)superscript𝜎2𝒙\sigma^{2}(\bm{x})italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( bold_italic_x ) be the conditional mean and variance, respectively, of our GP model given the current data. By properties of the multivariate normal distribution, we have

μ(𝒙)=𝒙T𝜷+κ(𝒙,𝐗)T(γ𝐊+𝐈N0+b)1(𝒚𝐗𝜷)𝜇𝒙superscript𝒙𝑇𝜷superscript𝜅superscript𝒙𝐗𝑇superscript𝛾𝐊subscript𝐈subscript𝑁0𝑏1𝒚𝐗𝜷\mu(\bm{x})=\bm{x}^{T}\bm{\beta}+\kappa^{*}(\bm{x},{\bf X})^{T}(\gamma{\bf K}+% {\bf I}_{N_{0}+b})^{-1}(\bm{y}-{\bf X}\bm{\beta})italic_μ ( bold_italic_x ) = bold_italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β + italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_x , bold_X ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_γ bold_K + bold_I start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_y - bold_X bold_italic_β )

and

σ2(𝒙)=σ2{(γ+1)κ(𝒙,𝐗)T(γ𝐊+𝐈N0+b)1κ(𝒙,𝐗)}superscript𝜎2𝒙superscript𝜎2𝛾1superscript𝜅superscript𝒙𝐗𝑇superscript𝛾𝐊subscript𝐈subscript𝑁0𝑏1superscript𝜅𝒙𝐗\sigma^{2}(\bm{x})=\sigma^{2}\{(\gamma+1)-\kappa^{*}(\bm{x},{\bf X})^{T}(% \gamma{\bf K}+{\bf I}_{N_{0}+b})^{-1}\kappa^{*}(\bm{x},{\bf X})\}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( bold_italic_x ) = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT { ( italic_γ + 1 ) - italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_x , bold_X ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_γ bold_K + bold_I start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_x , bold_X ) }

where

κ(𝒙,𝐗)=(κ(𝒙,𝒙1),,κ(𝒙,𝒙N0+b))T.superscript𝜅𝒙𝐗superscript𝜅𝒙subscript𝒙1𝜅𝒙subscript𝒙subscript𝑁0𝑏𝑇\kappa^{*}(\bm{x},{\bf X})=(\kappa(\bm{x},\bm{x}_{1}),\dots,\kappa(\bm{x},\bm{% x}_{N_{0}+b}))^{T}.italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_x , bold_X ) = ( italic_κ ( bold_italic_x , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_κ ( bold_italic_x , bold_italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT .

In practice, we replace 𝜷𝜷\bm{\beta}bold_italic_β and σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with their corresponding posterior medians. Now, let Δ(𝒙)=μ(𝒙)fΔ𝒙𝜇𝒙superscript𝑓\Delta(\bm{x})=\mu(\bm{x})-f^{*}roman_Δ ( bold_italic_x ) = italic_μ ( bold_italic_x ) - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT where f=maxi{1,,N0+b{f(𝒙i)}f^{*}=\max_{i\in\{1,\dots,N_{0}+b}\{f(\bm{x}_{i})\}italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_max start_POSTSUBSCRIPT italic_i ∈ { 1 , … , italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b end_POSTSUBSCRIPT { italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) }, i.e., the maximum influence spread of the seed sets we have already evaluated. Then

EI(𝒙)=max{Δ(𝒙),0}+σ(𝒙)ϕ(Δ(𝒙)σ(𝒙))|Δ(𝒙)|Φ(Δ(𝒙)σ(𝒙))𝐸𝐼𝒙Δ𝒙0𝜎𝒙italic-ϕΔ𝒙𝜎𝒙Δ𝒙ΦΔ𝒙𝜎𝒙EI(\bm{x})=\max\{\Delta(\bm{x}),0\}+\sigma(\bm{x})\phi\left(\frac{\Delta(\bm{x% })}{\sigma(\bm{x})}\right)-|\Delta(\bm{x})|\Phi\left(\frac{\Delta(\bm{x})}{% \sigma(\bm{x})}\right)italic_E italic_I ( bold_italic_x ) = roman_max { roman_Δ ( bold_italic_x ) , 0 } + italic_σ ( bold_italic_x ) italic_ϕ ( divide start_ARG roman_Δ ( bold_italic_x ) end_ARG start_ARG italic_σ ( bold_italic_x ) end_ARG ) - | roman_Δ ( bold_italic_x ) | roman_Φ ( divide start_ARG roman_Δ ( bold_italic_x ) end_ARG start_ARG italic_σ ( bold_italic_x ) end_ARG ) (6)

where ϕ()italic-ϕ\phi(\cdot)italic_ϕ ( ⋅ ) and Φ()Φ\Phi(\cdot)roman_Φ ( ⋅ ) are the probability density function and distribution function, respectively, of the standard normal (Frazier,, 2018). We choose the next seed set to evaluate, 𝒙~~𝒙\tilde{\bm{x}}over~ start_ARG bold_italic_x end_ARG as the seed set with largest expected improvement, i.e.,

𝒙~=argmax𝒙{0,1}nEI(𝒙).~𝒙subscript𝒙superscript01𝑛𝐸𝐼𝒙\tilde{\bm{x}}={\arg\max}_{\bm{x}\in\{0,1\}^{n}}EI(\bm{x}).over~ start_ARG bold_italic_x end_ARG = roman_arg roman_max start_POSTSUBSCRIPT bold_italic_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_E italic_I ( bold_italic_x ) .

The astute reader will notice that we have simply traded one combinatorial optimization problem (maximizing f()𝑓f(\cdot)italic_f ( ⋅ )) for another (maximizing EI()𝐸𝐼EI(\cdot)italic_E italic_I ( ⋅ )). EI()𝐸𝐼EI(\cdot)italic_E italic_I ( ⋅ ), however, is much cheaper to evaluate than f()𝑓f(\cdot)italic_f ( ⋅ ) so finding its optimum will be significantly faster. Indeed, we propose a simple greedy algorithm to find the maximum of EI()𝐸𝐼EI(\cdot)italic_E italic_I ( ⋅ ) with details in the Supplemental Materials. The main idea is that nodes in the current seed set are swapped with other nodes. If the switch improves the solution (larger EI()𝐸𝐼EI(\cdot)italic_E italic_I ( ⋅ )), then its kept, otherwise the previous seed set is used. In this way, the algorithm makes the locally optimal decision. The algorithm continues until all nodes have been looped through and no changes are made.

After solving for 𝒙~=argmaxEI(𝒙)~𝒙𝐸𝐼𝒙\tilde{\bm{x}}=\arg\max EI(\bm{x})over~ start_ARG bold_italic_x end_ARG = roman_arg roman_max italic_E italic_I ( bold_italic_x ), we append it and f(𝒙~)𝑓~𝒙f(\tilde{\bm{x}})italic_f ( over~ start_ARG bold_italic_x end_ARG ) to 𝐗𝐗{\bf X}bold_X and 𝒚𝒚\bm{y}bold_italic_y, respectively, update the covariance matrix 𝐊𝐊{\bf K}bold_K, and re-fit the GP regression model to update the posterior distribution of 𝜷𝜷\bm{\beta}bold_italic_β and σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. This step is repeated B𝐵Bitalic_B times where B𝐵Bitalic_B is the number of iterations in the BO loop. Then the optimal seed set is 𝒙=argmax𝒙{𝒙1,,𝒙N0+B}f(𝒙)superscript𝒙subscript𝒙subscript𝒙1subscript𝒙subscript𝑁0𝐵𝑓𝒙\bm{x}^{*}=\arg\max_{\bm{x}\in\{\bm{x}_{1},\dots,\bm{x}_{N_{0}+B}\}}f(\bm{x})bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT bold_italic_x ∈ { bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_B end_POSTSUBSCRIPT } end_POSTSUBSCRIPT italic_f ( bold_italic_x ) where this maximum is taken over the N0+Bsubscript𝑁0𝐵N_{0}+Bitalic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_B sampled points.

The steps for the entire algorithm are outlined in Algorithm 1. For a given network 𝒢𝒢\mathcal{G}caligraphic_G, diffusion mechanism, seed size k𝑘kitalic_k, and budget N0,Bsubscript𝑁0𝐵N_{0},Bitalic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_B, the algorithm outputs the optimal seed set to maximize the influence spread on 𝒢𝒢\mathcal{G}caligraphic_G. We call the method 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM for Bayesian OPtimization for Influence Maximization on temporal networks.

Result: Optimal seed set S𝑆Sitalic_S

Input: Objective function f(𝒙)𝑓𝒙f(\bm{x})italic_f ( bold_italic_x ); temporal network 𝒢𝒢\mathcal{G}caligraphic_G; seed set size k𝑘kitalic_k; budget B𝐵Bitalic_B; number of initial samples N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT;

for N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT times do

       Sample 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT proportional to node’s aggregate degree  Evaluate yi=f(𝒙i)subscript𝑦𝑖𝑓subscript𝒙𝑖y_{i}=f(\bm{x}_{i})italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
end for
Construct 𝐊𝐊{\bf K}bold_K using (5)  Compute posterior of 𝜷𝜷\bm{\beta}bold_italic_β and σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPTfor  B𝐵Bitalic_B times do
       Find 𝒙~=argmax𝒙,jxj=kEI(𝒙)~𝒙subscript𝒙subscript𝑗subscript𝑥𝑗𝑘𝐸𝐼𝒙\tilde{\bm{x}}=\arg\max_{\bm{x},\sum_{j}x_{j}=k}EI(\bm{x})over~ start_ARG bold_italic_x end_ARG = roman_arg roman_max start_POSTSUBSCRIPT bold_italic_x , ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k end_POSTSUBSCRIPT italic_E italic_I ( bold_italic_x )   Compute y=f(𝒙~)𝑦𝑓~𝒙y=f(\tilde{\bm{x}})italic_y = italic_f ( over~ start_ARG bold_italic_x end_ARG )  Append observation and re-compute posterior of 𝜷𝜷\bm{\beta}bold_italic_β and σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT  Update 𝐊𝐊{\bf K}bold_K
end for
return 𝐱=argmax𝐱{𝐱1,,𝐱N0+B}f(𝐱)superscript𝐱subscript𝐱subscript𝐱1subscript𝐱subscript𝑁0𝐵𝑓𝐱\bm{x}^{*}=\arg\max_{\bm{x}\in\{\bm{x}_{1},\dots,\bm{x}_{N_{0}+B}\}}f(\bm{x})bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT bold_italic_x ∈ { bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_B end_POSTSUBSCRIPT } end_POSTSUBSCRIPT italic_f ( bold_italic_x )
Algorithm 1 Bayesian optimization for influence maximization: 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM

2.4 Discussion

Finding the seed set for IM on temporal networks is a cardinality-constrained, combinatorial optimization problem with non-Euclidean inputs. The challenges associated with the constraint are mitigated with our simple surrogate function (3) that is linear in the marginal contribution of each node to the influence spread. This mean function, however, implicitly assumes that there is no interaction between nodes in the seed set. From previous studies on IM, we know that this is not the case as there can be large overlap in the influence of a particular node. Instead of accounting for this property in the mean function, we explicitly incorporate the interaction between nodes via the kernel function. Our kernel function quantifies the similarity in potential seeds sets by the number of common neighbors at time t=1𝑡1t=1italic_t = 1.

𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM is much faster than a standard greedy approach since the costly objective function f()𝑓f(\cdot)italic_f ( ⋅ ) only needs to be evaluated N0+Bsubscript𝑁0𝐵N_{0}+Bitalic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_B times, as opposed to O(nk)𝑂𝑛𝑘O(nk)italic_O ( italic_n italic_k ) times for a greedy algorithm. Thus, the number of evaluations of the objective function for 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM is independent of k𝑘kitalic_k. Moreover, it is trivial to adapt the algorithm to other diffusion mechanisms. In practice, the diffusion model should be chosen with great care based on the application at hand. But since the diffusion model only appears in the evaluation of the objective function, and the objective function is treated as a “black-box”, changing the diffusion model has no impact on the implementation of the method. This differs from, e.g., Dynamic Degree (Murata and Koga,, 2018), which only works for the SI model, or Erkol et al., (2020), which is based on the SIR model.

In addition to its efficiency, 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM allows for uncertainty quantification (UQ) of the total influence spread, σ(S)𝜎𝑆\sigma(S)italic_σ ( italic_S ), and seed set, S𝑆Sitalic_S. By evaluating a given seed set using the posterior distribution of 𝜷𝜷\bm{\beta}bold_italic_β, we obtain a complete posterior predictive distribution for the total influence spread. This is in contrast with node-ranking heuristics, which do not yield an estimated influence spread, and probabilistic approximations, which yield a point estimate but not a measure of uncertainty of the total spread. Moreover, if a researcher wants to estimate the expected influence spread for another seed set SSsuperscript𝑆𝑆S^{\prime}\neq Sitalic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_S, then the entire calculation needs to be re-run for previous methods. After 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM has been fit once, the influence spread for any seed set can be estimated immediately and the uncertainty in the optimal seed set can also easily be quantified. Please see Section 3.4 for a full discussion of UQ in the seed sets.

2.5 Hyper-parameter selection

Finally, there are several hyper-parameters that need to be chosen for 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM. Choosing γ𝛾\gammaitalic_γ, the kernel hyper-parameter, is perhaps the most tricky. We note that γ𝛾\gammaitalic_γ corresponds to the contribution of 𝐊𝐊{\bf K}bold_K to the overall variance of the GP regression model. If γ𝛾\gammaitalic_γ is small, then most of the variance is simply coming from the random noise term, ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, but if γ𝛾\gammaitalic_γ is large, then the variance in the objective function evaluations is largely driven by our kernel function. There are many different ways that we could choose γ𝛾\gammaitalic_γ. If we want a fully Bayesian approach, then we could endow γ𝛾\gammaitalic_γ with a prior and update its posterior distribution during the MCMC routine. In practice, however, we found that this did not lead to very good performance and required adding a Metropolis-Hastings step to an otherwise entirely Gibbs sampler. Another choice is to fix γ𝛾\gammaitalic_γ at its maximum likelihood estimate (MLE). In our experiments, this choice proved to perform better than the fully Bayesian approach, but the MLE estimate of γ𝛾\gammaitalic_γ was consistently close to 1. Indeed, if instead of finding the MLE for γ𝛾\gammaitalic_γ after each new observation we simply set γ=1𝛾1\gamma=1italic_γ = 1, then this yielded the best performance. This choice corresponds to equal contribution from 𝐊𝐊{\bf K}bold_K and the random error term.

There are several other parameters that must be chosen in our algorithm. For the number of initial samples N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and BO loop iterations B𝐵Bitalic_B, if these values are large, then the surrogate function will fit the true function well, but at a high computational cost. We found that setting N0=5subscript𝑁05N_{0}=5italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 5 and B=25𝐵25B=25italic_B = 25 led to a good balance of fit and speed. To show that the BO iterations improve the solution quality, we report a progress curve in Figure 1 for a single run of the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm. Please see Section 3 for a description of the network. We can clearly see that as the number of BO iterations increases, the total influence spread also increases. Indeed, for this particular example, the solution quality (percentage of influenced nodes) increases by 18 % during the BO loop, which corresponds to a relative increase of almost 100%.

Refer to caption
Figure 1: Progress curve of 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm for Reality network.

For computing the posterior of 𝜷𝜷\bm{\beta}bold_italic_β and σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, the number of MCMC samples also must be selected. Again, more MCMC samples lead to better estimates of the posterior distribution but also longer run-times. Since we only use the median of the posterior distribution, we can keep the number of MCMC samples relatively small to increase speed. We recommend 2,000 iterations with the first 500 used for burn-in. For hyper-parameters in the Horseshoe prior, we simply use the default settings from the original papers. Lastly, for the number of MC simulations for the objective function calculation, it is important that this output is accurate so we recommend 1,000 iterations. Unless otherwise noted, we use these settings for all experiments in Section 3.

3 Experiments

We now study the performance of the proposed method with numerical experiments. All experiments were performed in Python and the code for Algorithm 1 is available on GitHub: https://github.com/eyanchenko/BOPIM.

3.1 Data

The following experiments are conducted on four real-world networks. Reality (n=64𝑛64n=64italic_n = 64, m=722𝑚722m=722italic_m = 722) (Eagle and Pentland,, 2006), Hospital (n=75𝑛75n=75italic_n = 75, m=1139𝑚1139m=1139italic_m = 1139) (Vanhems et al.,, 2013), Bluetooth (n=136𝑛136n=136italic_n = 136, m=5949𝑚5949m=5949italic_m = 5949) (Kotz and Henderson,, 2004) and Conference 2 (n=199𝑛199n=199italic_n = 199, m=1550𝑚1550m=1550italic_m = 1550) (Chaintreau et al.,, 2007) are all proximity networks where an edge exists if two people are close to each other. In order to discretize the continuous appearing and disappearing of edges, each network is aggregated into T𝑇Titalic_T temporal snapshots of equal duration. We note that these networks are relatively small due to the slow speed of the greedy algorithm. Therefore, in the Supplemental Materials, we also conduct an experiment using the proposed method on the DNC email network (Rossi and Ahmed,, 2015) with n=1891𝑛1891n=1891italic_n = 1891 nodes to demonstrate its scalability.

3.2 Surrogate function validation

3.2.1 Settings

In the first set of simulations, we quantify the fit of the surrogate model to the true objective function f()𝑓f(\cdot)italic_f ( ⋅ ). Towards this end, we first run Algorithm 1 with N0=5subscript𝑁05N_{0}=5italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 5 and B=25𝐵25B=25italic_B = 25 and save the final posterior distribution of 𝜷𝜷\bm{\beta}bold_italic_β. Using this posterior distribution, we then compute out-of-sample influence spread predictions on N=100𝑁100N=100italic_N = 100 seed sets and compare with the values of the true objective function. If the surrogate model fits well, then its predicted influence spread will be close to the true influence spread. We quantify this similarity using the mean absolute prediction error (MAPE). If 𝜷^^𝜷\widehat{\bm{\beta}}over^ start_ARG bold_italic_β end_ARG is the posterior median of 𝜷𝜷\bm{\beta}bold_italic_β, and 𝒙superscript𝒙\bm{x}^{*}bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT corresponds to the new seed set, then

y^=(𝒙)T𝜷^+κ(𝒙,𝐗)T(γ𝐊+𝐈N0+b)1(𝒚𝐗𝜷^)^𝑦superscriptsuperscript𝒙𝑇^𝜷𝜅superscriptsuperscript𝒙𝐗𝑇superscript𝛾𝐊subscript𝐈subscript𝑁0𝑏1𝒚𝐗^𝜷\hat{y}=(\bm{x}^{*})^{T}\hat{\bm{\beta}}+\kappa(\bm{x}^{*},{\bf X})^{T}(\gamma% {\bf K}+{\bf I}_{N_{0}+b})^{-1}(\bm{y}-{\bf X}\hat{\bm{\beta}})over^ start_ARG italic_y end_ARG = ( bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG + italic_κ ( bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_X ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_γ bold_K + bold_I start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_y - bold_X over^ start_ARG bold_italic_β end_ARG )

where κ(𝒙,𝐗)j=κ(𝒙,𝒙j)𝜅subscriptsuperscript𝒙𝐗𝑗𝜅superscript𝒙subscript𝒙𝑗\kappa(\bm{x}^{*},{\bf X})_{j}=\kappa(\bm{x}^{*},\bm{x}_{j})italic_κ ( bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_X ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_κ ( bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) for j=1,,N0+B𝑗1subscript𝑁0𝐵j=1,\dots,N_{0}+Bitalic_j = 1 , … , italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_B. Recall that κ(𝒙,𝒙j)𝜅superscript𝒙subscript𝒙𝑗\kappa(\bm{x}^{*},\bm{x}_{j})italic_κ ( bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is the Jaccard coefficient between the neighbors of the seed set corresponding to 𝒙jsubscript𝒙𝑗\bm{x}_{j}bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝒙superscript𝒙\bm{x}^{*}bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The MAPE is computed as

1Ni=1N|y^iyi|1𝑁superscriptsubscript𝑖1𝑁subscript^𝑦𝑖subscript𝑦𝑖\frac{1}{N}\sum_{i=1}^{N}|\hat{y}_{i}-y_{i}|divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |

where N𝑁Nitalic_N is the number of out-of-sample seed sets. A smaller MAPE means a closer fit.

We compute the MAPE for each of the four data sets. We fix λ=0.05,0.05,0.01,0.01𝜆0.050.050.010.01\lambda=0.05,0.05,0.01,0.01italic_λ = 0.05 , 0.05 , 0.01 , 0.01, and k=3,4,6,10𝑘34610k=3,4,6,10italic_k = 3 , 4 , 6 , 10 for the Reality, Hospital, Bluetooth and Conference 2 datasets, respectively. k𝑘kitalic_k was chosen as approximately 5% of the total number of nodes for each network. For the out-of-sample predictions, we consider two node sampling schemes. One samples nodes at random while the other samples according to their degree on the aggregate network, as in 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM. We expect the model to have a closer fit when the test points are sampled in the same way as the training points. Results are averaged over 25 Monte Carlo (MC) replications.

3.2.2 Results

The results are in Table 1. Note that the MAPE results are interpreted as follows: for the Conference 2 network with degree-based sampling of seed sets, for example, the MAPE value of 1.23 means that, on average, the estimated influence spread of the surrogate function (3) is within about one node of the true influence spread. We can see that for all networks and both sampling schemes, the MAPE values are quite small. Additionally, the degree-based sampling always leads to smaller MAPE values, which is to be expected since this is how the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm samples points for the initial fitting stage. Indeed, the surrogate model fits the true objective function more closely when the seed nodes have a large degree, which is where we hypothesize that the global optimum lies. These results give use confidence that our surrogate function is doing an adequate job of modeling the true objective function.

Dataset Sampling MAPE (se)
Reality Random 3.60 (0.19)
Degree 3.22 (0.11)
Hospital Random 4.88 (0.16)
Degree 3.49 (0.12)
Bluetooth Random 2.78 (0.08)
Degree 1.77 (0.04)
Conference 2 Random 7.45 (0.11)
Degree 1.23 (0.05)
Table 1: Mean absolute prediction errors for validating surrogate function simulations. Standard error in parentheses.

3.3 Influence maximization

For the second set of experiments, we compare the method’s performance on the IM task.

3.3.1 Settings

We simulate the influence diffusion process and select seed sets for IM using 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM. The main metrics of comparison are the number of influenced nodes at the conclusion of the spreading process, as well as the total algorithmic run time. We compare 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM with a greedy algorithm as this is considered the “gold standard” in IM. Erkol et al., (2022) proved that the SI model of diffusion is monotone and sub-modular so we use the CELF (Leskovec et al.,, 2007) step to increase the speed of the greedy algorithm. We also compare with random sampling seed nodes (Random), randomly sampling proportional to a nodes’ degree (Random Degree) and Dynamic Degree (Murata and Koga,, 2018). All experiments are conducted under the ex post assumption where the future evolution of the network is known when selecting seed nodes as in Murata and Koga, (2018); Osawa and Murata, (2015); Aggarwal et al., (2012); Erkol et al., (2020). While this assumption may be unrealistic in some cases, the main goal of this work is to propose a BO scheme for IM rather than tackling the ex ante problem (Yanchenko et al.,, 2023). We leave this as an important avenue of future work. Finally, to compare the different methods, we vary the number of seed nodes (k)𝑘(k)( italic_k ), number of snapshots (T)𝑇(T)( italic_T ), and infection parameter (λ)𝜆(\lambda)( italic_λ ).

3.3.2 Results

First, we fix T=10𝑇10T=10italic_T = 10 and λ=0.05,0.05,0.01,0.01𝜆0.050.050.010.01\lambda=0.05,0.05,0.01,0.01italic_λ = 0.05 , 0.05 , 0.01 , 0.01 for the Reality, Hospital, Bluetooth and Conference 2 datasets, respectively, while varying k𝑘kitalic_k. For this setting, we record both the influence spread and computing time with the results in Figures 2 and 3. For each data set, the proposed BO method yields comparable influence spread to the greedy algorithm. For Hospital and Conference 2, in particular, the spreads are almost indistinguishable between the two methods. 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM also yields a larger influence spread then Dynamic Degree in almost all settings. While Random Degree performs similarly to Dynamic Degree for some settings, Random always performs the worst. In terms of computing time, the proposed method has a large advantage over the greedy algorithm. In each case, the computational time greatly increases with k𝑘kitalic_k for the greedy algorithm, whereas that of the proposed method increases more slowly. For large k𝑘kitalic_k, the proposed method is as much as five times faster than the greedy algorithm.

Keeping the same values of λ𝜆\lambdaitalic_λ as before, we now fix k𝑘kitalic_k such that roughly 5% of the total number of nodes are initially activated. We vary the number of aggregations of the temporal network, where small and large T𝑇Titalic_T correspond to granular and fine aggregations, respectively. While the “real-time” that the network diffuses remains the same, we can vary the number of aggregations to see how this affects the influence spread. The results are in Figure 4. The trends are similar to that of varying k𝑘kitalic_k. Across networks and T𝑇Titalic_T, the proposed method yields comparable influence spread to that of the greedy algorithm, performing especially well in the Reality and Hospital networks. Dynamic Degree and Random Degree again yield consistently smaller influence spreads than both 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM and Greedy. In general, the total spread increases with T𝑇Titalic_T as there are more rounds for influence to diffuse, but this trend need not be monotonic as different aggregations changes when certain nodes have edges. Indeed, in the Reality network, Dynamic Degree’s results are quite sensitive to T𝑇Titalic_T as compared to the proposed method.

Finally, we fix k𝑘kitalic_k at 5% of n𝑛nitalic_n and T=10𝑇10T=10italic_T = 10 and vary the infection probability λ𝜆\lambdaitalic_λ. The results are in Figure 5 and are similar for each network: as λ𝜆\lambdaitalic_λ increases, the total influence spread also increases. The distinction between different methods is least pronounced in this setting, but 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM has almost identical influence spreads to Greedy in all settings and networks.

Refer to caption
Figure 2: Influence spread results for increasing seed size (k)𝑘(k)( italic_k ).
Refer to caption
Figure 3: Computation time results for increasing seed size (k)𝑘(k)( italic_k ).
Refer to caption
Figure 4: Influence spread results for increasing number of snapshots (T)𝑇(T)( italic_T ).
Refer to caption
Figure 5: Influence spread results for increasing infection probability (λ)𝜆(\lambda)( italic_λ ).

3.4 Uncertainty quantification

The proposed method also provides measures of uncertainty for the optimal seed sets. Uncertainty quantification (UQ) allows for a richer understanding of the outputted solution and can answer questions such as: Are there many seed sets which yield near optimal spread? Can the selection of seed nodes be viewed on a spectrum as opposed to binary inclusion/exclusion? We take a first step towards answering these questions with the proposed method.

There are two main ways that we can use the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM output for UQ. First, recall that we use GP regression to model the objective function. In (3), βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponds to the inclusion (xi=1subscript𝑥𝑖1x_{i}=1italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1) or exclusion (xi=0subscript𝑥𝑖0x_{i}=0italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0) of node i𝑖iitalic_i in the seed set, and we obtain the complete posterior distribution for each coefficient. For the acquisition function step of 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM, we only use the median of this posterior distribution. We can leverage the entire distribution, however, to obtain a clearer picture of a node’s importance.

To do this, we save the final posterior distribution of 𝜷𝜷\bm{\beta}bold_italic_β in 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM for the Reality network with k=5𝑘5k=5italic_k = 5, T=10𝑇10T=10italic_T = 10, λ=0.05𝜆0.05\lambda=0.05italic_λ = 0.05 (15,000 MCMC samples, 5,000 burn-in). In Figure 6(a), we report the box-plot for the posterior distribution of βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each node i{1,,n}𝑖1𝑛i\in\{1,\dots,n\}italic_i ∈ { 1 , … , italic_n }. The largest posterior distributions correspond with the ideal nodes for the seed set. From the figure, nodes 2, 3, 38 and 59 have noticeably larger values of the posterior distribution than that of the other nodes. Thus, while k=5𝑘5k=5italic_k = 5 nodes are chosen for the seed set, it may be that nodes 2, 3, 38 and 59 are more influential than the other node included in the seed set.

While 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM seeks the global optimum of our objective function, in practice, running the algorithm multiple times will give different optimal seed sets due to the randomness inherent in the algorithm. We can leverage this randomness for another way to quantify the uncertainty in the optimal seed sets. We now run the algorithm for 50 MC simulations (1,500 MCMC samples with 500 burn-in), and for each iteration, we find the optimal seed set. Then we can average these results to find the proportion of times that each node is assigned to the optimal seed set. These proportions are reported in Figure 6(b). For example, node 2 was selected in the seed set in 100% of the iterations, thus making it the most influential node. While nodes 1, 2 and 3 have noticeably larger proportions than all other nodes, the fourth and fifth largest (6 and 9) have much smaller proportions than that of those three.

Finally, we note some differences between the two methods of UQ. Since Figure 6(b) is based on a single iteration, the results are subject to more stochastic variability, whereas much of this effect is averaged out in Figure 6(b). For example, nodes 38 and 59 appear to be key candidates for the seed set based on their large posterior distribution, but once we average over the results we see that these very rarely appears in the top k𝑘kitalic_k of all nodes. On the other hand, the posterior mean of nodes 9 and 6 are close to zero in this one iteration, but they are the fourth and fifth most important nodes, respectively, when we re-run the algorithm many times. Regardless, these two figures together give us a good sense of the uncertainty in our optimal seed set and show that there are multiple sets of k𝑘kitalic_k nodes which likely yield similar influence spreads.

Refer to caption
(a) Posterior distribution box plots of 𝜷𝜷\bm{\beta}bold_italic_β.
Refer to caption
(b) Proportion of MC samples selected for seed set.
Figure 6: Posterior distribution summaries for the Reality network.

4 Discussion and conclusion

In this work, we apply a Bayesian Optimization framework to solve the influence maximization problem on temporal networks. The proposed 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm models the influence spread function via Gaussian Process regression and Horseshoe shrinkage prior. A simple mean function quantifies the importance of each nodes in the seed set, while our kernel function computes the correlation between seed sets based on their number of common neighbors. Experiments showed that despite being such a simple surrogate, the statistical model’s estimated influence spread is, on average, within a few nodes to the true spread. In the IM simulations, the proposed method yields influence spreads comparable to that of seed sets from a greedy algorithm. The BO approach, however, is much faster than the greedy algorithm. While in some settings, Dynamic Degree or Random Degree may have been similar to 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM, the proposed algorithm consistently yielded the best results across all networks and settings. Moreover, the UQ found evidence that there are many seed sets which yield comparably optimal influence spread, thus implying that the objective function is relatively “flat.”

There are many interesting avenues to extend this work. First, we could study the performance of the proposed method under various forms of model mis-specification. Indeed, we take a first step in this direction in the Supplemental Materials where we consider two different cases. First, the algorithm is trained on one value of ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT but then the resulting model fit is used to find the optimal seed set for kk𝑘superscript𝑘k\neq k^{\prime}italic_k ≠ italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. In the second setting, we fit the algorithm on one value of Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the number of temporal snapshots, and then use the resulting seed set to compute the influence spread on the network with TT𝑇superscript𝑇T\neq T^{\prime}italic_T ≠ italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT snapshots. In both cases, the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm performs quite favorably. This not only implies a robustness of the method, but also can lead to computational advantages as the model may not need to be re-fit for every new parameter combination. These are important and encouraging findings, but there is certainly room for more study.

The posterior sampling scheme is another area where future work could be needed. In this work, we used a fully Gibbs approach to sample from the posterior distribution. As the size of the network increases, however, this step will become slower and more difficult. Thus, alternative approaches like Variational Bayes (Kingma and Welling,, 2013), where direct sampling from the posterior distribution is replaced with an optimization problem, may be necessary for computational advantages. Another area of extension is considering more complicated surrogate functions that take into account interactions between nodes in the seed set (Baptista and Poloczek,, 2018). This should lead to better performance that may even exceed the greedy algorithm. The number of parameters in the model, however, would be O(n2)𝑂superscript𝑛2O(n^{2})italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) which could lead to computational challenges. Considering the ex ante IM task is another important open-problem.

Acknowledgements

This work was conducted while the author was on a Japan Society for the Promotion of Science (JSPS) fellowship. The author would also like to thank the editor and two anonymous reviewers whose comments greatly improved the quality of the manuscript.

References

  • Aggarwal et al., (2012) Aggarwal, C. C., Lin, S., and Yu, P. S. (2012). On influential node discovery in dynamic social networks. In Proceedings of the 2012 SIAM International Conference on Data Mining, pages 636–647. SIAM.
  • Baptista and Poloczek, (2018) Baptista, R. and Poloczek, M. (2018). Bayesian optimization of combinatorial structures. In International Conference on Machine Learning, pages 462–471. PMLR.
  • Bharathi et al., (2007) Bharathi, S., Kempe, D., and Salek, M. (2007). Competitive influence maximization in social networks. In Internet and Network Economics: Third International Workshop, WINE 2007, San Diego, CA, USA, December 12-14, 2007. Proceedings 3, pages 306–311. Springer.
  • Bhattacharya et al., (2015) Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). Dirichlet–laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110(512):1479–1490.
  • Buathong et al., (2020) Buathong, P., Ginsbourger, D., and Krityakierne, T. (2020). Kernels over sets of finite sets using rkhs embeddings, with application to bayesian (combinatorial) optimization. In International Conference on Artificial Intelligence and Statistics, pages 2731–2741. PMLR.
  • Carvalho et al., (2009) Carvalho, C. M., Polson, N. G., and Scott, J. G. (2009). Handling sparsity via the horseshoe. In Artificial intelligence and statistics, pages 73–80. PMLR.
  • Carvalho et al., (2010) Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480.
  • Chaintreau et al., (2007) Chaintreau, A., Hui, P., Crowcroft, J., Diot, C., Gass, R., and Scott, J. (2007). Impact of human mobility on opportunistic forwarding algorithms. IEEE Transactions on Mobile Computing, 6(6):606–620.
  • (9) Chen, W., Wang, C., and Wang, Y. (2010a). Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1029–1038.
  • Chen et al., (2009) Chen, W., Wang, Y., and Yang, S. (2009). Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 199–208.
  • (11) Chen, W., Yuan, Y., and Zhang, L. (2010b). Scalable influence maximization in social networks under the linear threshold model. In 2010 IEEE International Conference on Data Mining, pages 88–97. IEEE.
  • Daulton et al., (2022) Daulton, S., Wan, X., Eriksson, D., Balandat, M., Osborne, M. A., and Bakshy, E. (2022). Bayesian optimization over discrete and mixed spaces via probabilistic reparameterization. Advances in Neural Information Processing Systems, 35:12760–12774.
  • Domingos and Richardson, (2001) Domingos, P. and Richardson, M. (2001). Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57–66.
  • Eagle and Pentland, (2006) Eagle, N. and Pentland, A. (2006). Reality mining: sensing complex social systems. Personal and Ubiquitous Computing, 10:255–268.
  • Erkol et al., (2020) Erkol, Ş., Mazzilli, D., and Radicchi, F. (2020). Influence maximization on temporal networks. Physical Review E, 102(4):042307.
  • Erkol et al., (2022) Erkol, Ş., Mazzilli, D., and Radicchi, F. (2022). Effective submodularity of influence maximization on temporal networks. Physical Review E, 106(3):034301.
  • Frazier, (2018) Frazier, P. I. (2018). A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811.
  • Frazier and Wang, (2015) Frazier, P. I. and Wang, J. (2015). Bayesian optimization for materials design. In Information science for materials discovery and design, pages 45–75. Springer.
  • Garnett et al., (2010) Garnett, R., Osborne, M. A., and Roberts, S. J. (2010). Bayesian optimization for sensor set selection. In Proceedings of the 9th ACM/IEEE international conference on information processing in sensor networks, pages 209–219.
  • Goyal et al., (2011) Goyal, A., Lu, W., and Lakshmanan, L. V. (2011). Celf++ optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the 20th International Conference Companion on World Wide Web, pages 47–48.
  • Greenhill et al., (2020) Greenhill, S., Rana, S., Gupta, S., Vellanki, P., and Venkatesh, S. (2020). Bayesian optimization for adaptive experimental design: A review. IEEE access, 8:13937–13948.
  • Holme, (2004) Holme, P. (2004). Efficient local strategies for vaccination and network attack. Europhysics Letters, 68(6):908.
  • Holme and Saramäki, (2012) Holme, P. and Saramäki, J. (2012). Temporal networks. Physics Reports, 519(3):97–125.
  • Jaccard, (1901) Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaudoise Sci Nat, 37:547–579.
  • Jaccard, (1912) Jaccard, P. (1912). The distribution of the flora in the alpine zone. 1. New phytologist, 11(2):37–50.
  • Kempe et al., (2003) Kempe, D., Kleinberg, J., and Tardos, É. (2003). Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137–146.
  • Kingma and Welling, (2013) Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  • Kotz and Henderson, (2004) Kotz, D. and Henderson, T. (2004). Crawdad network repository. https://crawdad.org/.
  • Lee et al., (2012) Lee, S., Rocha, L. E. C., Liljeros, F., and Holme, P. (2012). Exploiting temporal network structures of human interaction to effectively immunize populations. PLOS ONE, 7(5):0036439.
  • Leskovec et al., (2007) Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., and Glance, N. (2007). Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 420–429.
  • Li et al., (2018) Li, Y., Fan, J., Wang, Y., and Tan, K.-L. (2018). Influence maximization on social graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 30(10):1852–1872.
  • Makalic and Schmidt, (2015) Makalic, E. and Schmidt, D. F. (2015). A simple sampler for the horseshoe estimator. IEEE Signal Processing Letters, 23(1):179–182.
  • Michalski et al., (2020) Michalski, R., Jankowski, J., and Pazura, P. (2020). Entropy-based measure for influence maximization in temporal networks. In Computational Science–ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part IV 20, pages 277–290. Springer.
  • Michalski et al., (2014) Michalski, R., Kajdanowicz, T., Bródka, P., and Kazienko, P. (2014). Seed selection for spread of influence in social networks: Temporal vs. static approach. New Generation Computing, 32(3):213–235.
  • Murata and Koga, (2018) Murata, T. and Koga, H. (2018). Extended methods for influence maximization in dynamic networks. Computational Social Networks, 5(1):1–21.
  • Oh et al., (2019) Oh, C., Tomczak, J., Gavves, E., and Welling, M. (2019). Combo: Combinatorial bayesian optimization using graph representations. In ICML Workshop on Learning and Reasoning with Graph-Structured Data.
  • Osawa and Murata, (2015) Osawa, S. and Murata, T. (2015). Selecting seed nodes for influence maximization in dynamic networks. In Mangioni, G., Simini, F., Uzzo, S. M., and Wang, D., editors, Complex Networks VI, pages 91–98. Springer.
  • Papenmeier et al., (2023) Papenmeier, L., Nardi, L., and Poloczek, M. (2023). Bounce: Reliable high-dimensional bayesian optimization for combinatorial and mixed spaces. Advances in Neural Information Processing Systems, 36:1764–1793.
  • Polson and Scott, (2010) Polson, N. G. and Scott, J. G. (2010). Shrink globally, act locally: Sparse bayesian regularization and prediction. Bayesian statistics, 9(501-538):105.
  • Rossi and Ahmed, (2015) Rossi, R. A. and Ahmed, N. K. (2015). The network data repository with interactive graph analytics and visualization. In AAAI.
  • Sauer et al., (2023) Sauer, A., Gramacy, R. B., and Higdon, D. (2023). Active learning for deep gaussian process surrogates. Technometrics, 65(1):4–18.
  • Shahriari et al., (2015) Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and De Freitas, N. (2015). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175.
  • Snoek et al., (2012) Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25.
  • Sürer et al., (2023) Sürer, Ö., Plumlee, M., and Wild, S. M. (2023). Sequential bayesian experimental design for calibration of expensive simulation models. Technometrics, pages 1–15.
  • Turner et al., (2021) Turner, R., Eriksson, D., McCourt, M., Kiili, J., Laaksonen, E., Xu, Z., and Guyon, I. (2021). Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In NeurIPS 2020 Competition and Demonstration Track, pages 3–26. PMLR.
  • Vanhems et al., (2013) Vanhems, P., Barrat, A., Cattuto, C., Pinton, J.-F., Khanafer, N., Régis, C., Kim, B.-a., Comte, B., and Voirin, N. (2013). Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PloS one, 8(9):e73970.
  • Wan et al., (2024) Wan, X., Osselin, P., Kenlay, H., Ru, B., Osborne, M. A., and Dong, X. (2024). Bayesian optimisation of functions on graphs. Advances in Neural Information Processing Systems, 36.
  • Wang et al., (2012) Wang, C., Chen, W., and Wang, Y. (2012). Scalable influence maximization for independent cascade model in large-scale social networks. Data Mining and Knowledge Discovery, 25:545–576.
  • Wen et al., (2017) Wen, Z., Kveton, B., Valko, M., and Vaswani, S. (2017). Online influence maximization under independent cascade model with semi-bandit feedback. Advances in Neural Information Processing Systems, 30.
  • Wilder et al., (2018) Wilder, B., Onasch-Vera, L., Hudson, J., Luna, J., Wilson, N., Petering, R., Woo, D., Tambe, M., and Rice, E. (2018). End-to-end influence maximization in the field. In AAMAS, volume 18, pages 1414–1422.
  • Wilder et al., (2017) Wilder, B., Yadav, A., Immorlica, N., Rice, E., and Tambe, M. (2017). Uncharted but not uninfluenced: Influence maximization with an uncertain network. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 1305–1313.
  • Wu et al., (2019) Wu, J., Chen, X.-Y., Zhang, H., Xiong, L.-D., Lei, H., and Deng, S.-H. (2019). Hyperparameter optimization for machine learning models based on bayesian optimization. Journal of Electronic Science and Technology, 17(1):26–40.
  • Yadav et al., (2017) Yadav, A., Wilder, B., Rice, E., Petering, R., Craddock, J., Yoshioka-Maxwell, A., Hemler, M., Onasch-Vera, L., Tambe, M., and Woo, D. (2017). Influence maximization in the field: The arduous journey from emerging to deployed application. In Proceedings of the 16th conference on autonomous agents and multiagent systems, pages 150–158.
  • Yadav et al., (2018) Yadav, A., Wilder, B., Rice, E., Petering, R., Craddock, J., Yoshioka-Maxwell, A., Hemler, M., Onasch-Vera, L., Tambe, M., and Woo, D. (2018). Bridging the gap between theory and practice in influence maximization: Raising awareness about hiv among homeless youth. In IJCAI, pages 5399–5403.
  • (55) Yanchenko, E., Bondell, H. D., and Reich, B. J. (2024a). The r2d2 prior for generalized linear mixed models. The American Statistician, pages 1–20.
  • Yanchenko et al., (2023) Yanchenko, E., Murata, T., and Holme, P. (2023). Link prediction for ex ante influence maximization on temporal networks. Applied Network Science, 8(70).
  • (57) Yanchenko, E., Murata, T., and Holme, P. (2024b). Influence maximization on temporal networks: a review. Applied Network Science, 9(1):1–25.
  • Zhang et al., (2022) Zhang, Y. D., Naughton, B. P., Bondell, H. D., and Reich, B. J. (2022). Bayesian regression using a prior on the model fit: The r2-d2 shrinkage prior. Journal of the American Statistical Association, 117(538):862–874.

Supplemental Materials

Horseshoe prior Gibbs sampler

In this section, we report the Horseshoe prior (Polson and Scott,, 2010) Gibbs sampler, using the auxiliary variable formulation of Makalic and Schmidt, (2015).

  1. 1.

    Sample 𝜷|𝝀,𝝂,τ2,σ2,𝐘𝖭(𝝁,σ2𝐕)similar-toconditional𝜷𝝀𝝂superscript𝜏2superscript𝜎2𝐘𝖭𝝁superscript𝜎2𝐕\bm{\beta}|\bm{\lambda},\bm{\nu},\tau^{2},\sigma^{2},{\bf Y}\sim\mathsf{N}(\bm% {\mu},\sigma^{2}{\bf V})bold_italic_β | bold_italic_λ , bold_italic_ν , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , bold_Y ∼ sansserif_N ( bold_italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_V ) where 𝝁=𝐕𝐗T𝐘𝝁superscript𝐕𝐗𝑇𝐘\bm{\mu}={\bf V}{\bf X}^{T}{\bf Y}bold_italic_μ = bold_VX start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Y, 𝐕=(𝐗T𝐊1𝐗+𝐒1)1𝐕superscriptsuperscript𝐗𝑇superscript𝐊1𝐗superscript𝐒11{\bf V}=({\bf X}^{T}{\bf K}^{-1}{\bf X}+{\bf S}^{-1})^{-1}bold_V = ( bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_X + bold_S start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT,
    𝐒=diag(λ12τ2,,λn2τ2)𝐒𝑑𝑖𝑎𝑔superscriptsubscript𝜆12superscript𝜏2superscriptsubscript𝜆𝑛2superscript𝜏2{\bf S}=diag(\lambda_{1}^{2}\tau^{2},\dots,\lambda_{n}^{2}\tau^{2})bold_S = italic_d italic_i italic_a italic_g ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

  2. 2.

    Sample σ2|𝜷,𝝀,𝝂,τ2,,𝐘𝖨𝖦(a1+(N0+n)/2,b1+((𝐘𝐗𝜷)T𝐊1(𝐘𝐗𝜷)+𝜷T𝐒1𝜷)/2)\sigma^{2}|\bm{\beta},\bm{\lambda},\bm{\nu},\tau^{2},,{\bf Y}\sim\mathsf{IG}(a% _{1}+(N_{0}+n)/2,b_{1}+(({\bf Y}-{\bf X}\bm{\beta})^{T}{\bf K}^{-1}({\bf Y}-{% \bf X}\bm{\beta})+\bm{\beta}^{T}{\bf S}^{-1}\bm{\beta})/2)italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | bold_italic_β , bold_italic_λ , bold_italic_ν , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , , bold_Y ∼ sansserif_IG ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_n ) / 2 , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( ( bold_Y - bold_X bold_italic_β ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_Y - bold_X bold_italic_β ) + bold_italic_β start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_β ) / 2 ).

  3. 3.

    Sample λj2|𝜷,𝝂,τ2,σ2𝖨𝖦(1,νj1+βj2/(2τ2σ2)\lambda_{j}^{2}|\bm{\beta},\bm{\nu},\tau^{2},\sigma^{2}\sim\mathsf{IG}(1,\nu_{% j}^{-1}+\beta^{2}_{j}/(2\tau^{2}\sigma^{2})italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | bold_italic_β , bold_italic_ν , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ sansserif_IG ( 1 , italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / ( 2 italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), j=1,,n𝑗1𝑛j=1,\dots,nitalic_j = 1 , … , italic_n

  4. 4.

    Sample τ2|𝜷,𝝀,σ2,ξ𝖨𝖦((n+1)/2,ξ1+k=1nβk2/(2λk2σ2))similar-toconditionalsuperscript𝜏2𝜷𝝀superscript𝜎2𝜉𝖨𝖦𝑛12superscript𝜉1superscriptsubscript𝑘1𝑛subscriptsuperscript𝛽2𝑘2subscriptsuperscript𝜆2𝑘superscript𝜎2\tau^{2}|\bm{\beta},\bm{\lambda},\sigma^{2},\xi\sim\mathsf{IG}((n+1)/2,\xi^{-1% }+\sum_{k=1}^{n}\beta^{2}_{k}/(2\lambda^{2}_{k}\sigma^{2}))italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | bold_italic_β , bold_italic_λ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_ξ ∼ sansserif_IG ( ( italic_n + 1 ) / 2 , italic_ξ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / ( 2 italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ).

  5. 5.

    Sample νj|λk𝖨𝖦(1,1+λj2)similar-toconditionalsubscript𝜈𝑗subscript𝜆𝑘𝖨𝖦11superscriptsubscript𝜆𝑗2\nu_{j}|\lambda_{k}\sim\mathsf{IG}(1,1+\lambda_{j}^{-2})italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ sansserif_IG ( 1 , 1 + italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) for j=1,,n𝑗1𝑛j=1,\dots,nitalic_j = 1 , … , italic_n.

  6. 6.

    Sample ξ|τ2𝖨𝖦(1,1+τ2)similar-toconditional𝜉superscript𝜏2𝖨𝖦11superscript𝜏2\xi|\tau^{2}\sim\mathsf{IG}(1,1+\tau^{-2})italic_ξ | italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ sansserif_IG ( 1 , 1 + italic_τ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ).

Greedy algorithm for EI()𝐸𝐼EI(\cdot)italic_E italic_I ( ⋅ )

Result: Optimum seed set 𝒙~~𝒙\tilde{\bm{x}}over~ start_ARG bold_italic_x end_ARG

Input: seed set size k𝑘kitalic_k, current optimum y~~𝑦\tilde{y}over~ start_ARG italic_y end_ARG, sampled sets 𝐗𝐗{\bf X}bold_X, observations 𝒚𝒚\bm{y}bold_italic_y, ;

Initialize seed set S𝑆Sitalic_S

run=1𝑟𝑢𝑛1run=1italic_r italic_u italic_n = 1

while run>0𝑟𝑢𝑛0run>0italic_r italic_u italic_n > 0 do

       run=0𝑟𝑢𝑛0run=0italic_r italic_u italic_n = 0  Randomly order nodes  for vS𝑣𝑆v\in Sitalic_v ∈ italic_S do
             for uVS𝑢𝑉𝑆u\in V\setminus Sitalic_u ∈ italic_V ∖ italic_S do
                   S=Ssuperscript𝑆𝑆S^{\prime}=Sitalic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_S  Remove v𝑣vitalic_v from Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and replace with u𝑢uitalic_uif EI(S)>EI(S)𝐸𝐼superscript𝑆𝐸𝐼𝑆EI(S^{\prime})>EI(S)italic_E italic_I ( italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) > italic_E italic_I ( italic_S ) then
                         S=S𝑆superscript𝑆S=S^{\prime}italic_S = italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTrun=1𝑟𝑢𝑛1run=1italic_r italic_u italic_n = 1
                   end if
                  
             end for
            
       end for
      
end while
Algorithm 2 Greedy

IM for large networks

To demonstrate the performance of 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM on large networks, we present the results for IM on the DNC email network (Rossi and Ahmed,, 2015) with n=1891𝑛1891n=1891italic_n = 1891 nodes. We fix λ=0.05𝜆0.05\lambda=0.05italic_λ = 0.05 and T=10𝑇10T=10italic_T = 10 while varying k=25,50,,100𝑘2550100k=25,50,\dots,100italic_k = 25 , 50 , … , 100. Because of the long run times, we set N0=5,B=20formulae-sequencesubscript𝑁05𝐵20N_{0}=5,\ B=20italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 5 , italic_B = 20 and only use 3 MC simulations for 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM, but still use 25 for Random and Random Degree. The influence spread results and run times are in Figure 7. We can see that the proposed method provides comparable results to the greedy algorithm, but is faster for all but the first value of k𝑘kitalic_k. While 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM performs similarly to Random Degree, it greatly outperforms Dynamic Degree.

Refer to caption
Figure 7: Results for Email DNC network.

Robustness experiments

In this section, we explore the robustness of the proposed method to varying the size of the seed set as well as the number of temporal snapshots.

Varying k𝑘kitalic_k

First, we are interested in the performance of the proposed method when the fitting steps are done on one value of ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, but then we find the optimal seed set for kk𝑘superscript𝑘k\neq k^{\prime}italic_k ≠ italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Specifically, we let ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be fixed and then run the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm for B𝐵Bitalic_B BO loop iterations as usual. After the last iteration, however, we re-fit the GP regression model and use the acquisition function to find the optimal seed set of size k𝑘kitalic_k which is not necessarily the same as ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We then use this seed set to compute the number of influenced nodes on the network.

In the following experiments, we fix λ=0.05𝜆0.05\lambda=0.05italic_λ = 0.05 and T=10𝑇10T=10italic_T = 10 and look at the Reality and Hospital networks. All other hyper-parameters are the same as in the previous experiments. We consider four different methods, Low, Mid, High and Mix. For Low, Mid and High, the BO routine is carried out setting k=1,4superscript𝑘14k^{\prime}=1,4italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 , 4 and 7777, respectively. For Mix, we randomly sample ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT from a uniform distribution on {1,2,7}127\{1,2\dots,7\}{ 1 , 2 … , 7 } at each stage in the initial fitting as well as the BO loop. For each method, we then compute the optimal seed set of size k𝑘kitalic_k for k=1,,7𝑘17k=1,\dots,7italic_k = 1 , … , 7. The results are repeated 25 times and averaged for each value of k𝑘kitalic_k. We compare the results with the original 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm where the same k𝑘kitalic_k is used in the training phase of the algorithm as well as the final influence spread evaluation.

Note that even when the network is trained on seed set of size ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and then tested on sets of size k=k𝑘superscript𝑘k=k^{\prime}italic_k = italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, this is still not equivalent to the original 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm. In the original algorithm, the model is fit on one value of k𝑘kitalic_k throughout the B𝐵Bitalic_B BO iterations and then the set seed which corresponded to the largest influence spread is then chosen as the optimal set. In these robustness simulations, on the other hand, the final posterior distribution is used to select the optimal seed set. Therefore, Low, for example, is not equivalent to 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM even when k=k=1superscript𝑘𝑘1k^{\prime}=k=1italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_k = 1.

Now, the results are in Figure 9. In the Reality network, for k=2𝑘2k=2italic_k = 2, Low comes closest to the original 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm. When k=3,4,5𝑘345k=3,4,5italic_k = 3 , 4 , 5, Mid performs well while High does the best for large k𝑘kitalic_k. Mix, however, yields seed sets with noticeably smaller influence spreads. In the Hospital network, Low, Mid and High all perform comparably well when k4𝑘4k\geq 4italic_k ≥ 4 with Low and Mid yielding slightly larger optimal spreads for smaller values of k𝑘kitalic_k. Again, Mix is clearly performing quite poorly.

These findings are promising for the performance of the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm. In particular, it means that even if we run the algorithm for one value of ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the results can be used to select seed sets of various sizes. Additionally and not surprisingly, these results also show that as |kk|superscript𝑘𝑘|k^{\prime}-k|| italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_k | increases, the performance seems to decrease slightly. Thus, the method will yield better seed sets when kksuperscript𝑘𝑘k^{\prime}\approx kitalic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≈ italic_k. This not only shows the generalizability of the method, but also can save computing time as the algorithm may not need to be re-run for each value of k𝑘kitalic_k. Finally, a single ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT should be used in the algorithm’s model fitting as the Mix method consistently performed poorly.

Refer to caption
Figure 8: Robustness simulations where the size of the seed set in the model fitting is different from the final seed set.

Varying T𝑇Titalic_T

This sub-section is similar to the previous, except now we are looking at the robustness of the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm to different values of T𝑇Titalic_T, the number of temporal snapshots. In particular, we run the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm for some fixed Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and obtain the optimal seed set. This seed set is then used to compute the influence spread on the same network but now with T{2,3,,10}𝑇2310T\in\{2,3,\dots,10\}italic_T ∈ { 2 , 3 , … , 10 } snapshots. This scenario can be considered as model mis-specification as the number of snapshots is not correctly known.

We fix λ=0.05𝜆0.05\lambda=0.05italic_λ = 0.05 and k=3,4𝑘34k=3,4italic_k = 3 , 4 for the Reality and Hospital network, respectively. We now consider three different methods, Low, Mid and High. For these methods, we train the proposed method on the network using T=2,5superscript𝑇25T^{\prime}=2,5italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 2 , 5 and 10101010, respectively. We then use the resulting seed sets to compute the influence spread on the same network but with T=2,3,,10𝑇2310T=2,3,\dots,10italic_T = 2 , 3 , … , 10. The results are again averaged over 25 repetitions for each value of T𝑇Titalic_T and compare with the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm fit using the correct value of T𝑇Titalic_T.

The results are in Figure 9. For the Reality network, both Mid and High perform similarly and nearly approximate the correctly-specified algorithm. Low, however, yields seed sets with smaller influence spreads. In the Hospital network, Low, Mid and High all perform comparably well in relation to the original algorithm.

Again, these results are favorable for the 𝖡𝖮𝖯𝖨𝖬𝖡𝖮𝖯𝖨𝖬\mathsf{BOPIM}sansserif_BOPIM algorithm. They show that the algorithm can be fit on almost any value of Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the resulting seed set will yield large influence spreads for nearly all T𝑇Titalic_T. Moreover, they imply that the optimal seed set is not very sensitive to the number of snapshots used to aggregate the temporal network. For example, if a node is influential with T=5𝑇5T=5italic_T = 5 snapshots, it is also likely to be influential with T=10𝑇10T=10italic_T = 10 snapshots. This is somewhat expected as central nodes should be influential regardless of the network aggregation scheme. Thus, the proposed algorithm is relatively robust to the choice of T𝑇Titalic_T.

Refer to caption
Figure 9: Robustness simulations where the number of temporal snap shots in the model fitting is different from influence spread calculation.