Multivariate cluster point process to quantify and explore multi-entity configurations: Application to biofilm image data - Supplementary Materials

Suman Majumder1,∗ , Brent A. Coull2, Jessica L. Mark Welch3    Patrick J. La Riviere4, Floyd E. Dewhirst3    Jacqueline R. Starr5,†,Kyu Ha Lee2,†

1 University of Missouri, Columbia, MO, USA
2 Harvard T.H. Chan School of Public Health, Boston, MA, USA
3 Forsyth Institute, Cambridge, MA, USA
4 University of Chicago, Chicago, IL, USA
5 Brigham and Women’s Hospital, Boston, MA, USA
Co-senior authors

A Neyman-Scott process

A Neyman-Scott process is a point process used for modeling parent-offspring clustering. In the simplest setting, consider the parent process C𝐶Citalic_C to be a homogeneous Poisson point process with intensity λCsuperscript𝜆𝐶\lambda^{C}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT. For each observation location cC𝑐𝐶c\in Citalic_c ∈ italic_C, the cluster of offspring Ycsubscript𝑌𝑐Y_{c}italic_Y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is an independent Poisson process with intensity αk(c,h)\alpha k(\cdot-c,h)italic_α italic_k ( ⋅ - italic_c , italic_h ), where k(c,h)k(\cdot-c,h)italic_k ( ⋅ - italic_c , italic_h ) is a probability distribution function parameterized by hhitalic_h that determines the spread and distribution of the offspring locations around the parent c𝑐citalic_c, and α>0𝛼0\alpha>0italic_α > 0 is the expected number of offspring per cluster. The Neyman-Scott process Y𝑌Yitalic_Y is the union of all these offspring cluster processes, namely, Y=cCYc𝑌subscript𝑐𝐶subscript𝑌𝑐Y=\bigcup_{c\in C}Y_{c}italic_Y = ⋃ start_POSTSUBSCRIPT italic_c ∈ italic_C end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. Further details can be found in illian2008statistical and chiu2013stochastic, for example.

B Images of the taxa from the human dental plaque biofilm data not visualized in the image included in the main text

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure S.1: RGB images of Neisseriaceae (top left), Capnocytophaga (top right), Actinomyces (middle left), Fusobacterium (middle right), Leptotrichia (bottom left) and Eubacterium (bottom right) in the dental plaque biofilm sample. Eubacterium denotes a probe for all oral bacteria. It is used for methodologic purposes to evaluate the completeness of the set of specific probes. Hence, it is omitted from analysis of community spatial structure. The genera shown here were modeled as homogeneous Poisson process in the data analysis.

C Computational details of the sampling algorithm

We use a Markov chain Monte Carlo (MCMC) method to draw samples from the joint posterior distribution of 𝜽𝜽\thetabold_italic_θ. In the MCMC scheme, parameters are updated either by exploiting conjugacies inherent to the proposed model or by using a Metropolis-Hastings algorithm.

C.1 Updating parameters associated with offspring densities

Let 𝜽(α)superscript𝜽𝛼\mbox{\boldmath$\theta$}^{-(\alpha)}bold_italic_θ start_POSTSUPERSCRIPT - ( italic_α ) end_POSTSUPERSCRIPT denote a set of parameters 𝜽𝜽\thetabold_italic_θ with α𝛼\alphaitalic_α removed. The full conditional distribution for αl,l=p+1,,p+qformulae-sequencesubscript𝛼𝑙𝑙𝑝1𝑝𝑞\alpha_{l}\,,\ l=p+1,\ldots,p+qitalic_α start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_l = italic_p + 1 , … , italic_p + italic_q is

αl|𝜽(αl) Gamma(aY+nl,bY+clClWkl(ucl,hl)𝑑u),similar-toconditionalsubscript𝛼𝑙superscript𝜽subscript𝛼𝑙 Gammasubscript𝑎𝑌subscript𝑛𝑙subscript𝑏𝑌subscriptsubscriptc𝑙subscript𝐶𝑙subscript𝑊subscript𝑘𝑙usubscriptc𝑙subscript𝑙differential-du\alpha_{l}|\mbox{\boldmath$\theta$}^{-(\alpha_{l})}\sim\mbox{ Gamma}(a_{Y}+n_{% l},b_{Y}+\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k_{l}(\mbox{\bf u}-\mbox{\bf c% }_{l},h_{l})\,\ d\mbox{\bf u}),italic_α start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT | bold_italic_θ start_POSTSUPERSCRIPT - ( italic_α start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∼ Gamma ( italic_a start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( u - c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) italic_d u ) ,

where nlsubscript𝑛𝑙n_{l}italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the number of observations of taxon l𝑙litalic_l in the window.

C.2 Updating intensity parameters in homogeneous Poisson processes

Posterior conjugacy is also achieved in the full conditional distributions of intensity parameters, λvC,v=1,,pformulae-sequencesubscriptsuperscript𝜆𝐶𝑣𝑣1𝑝\lambda^{C}_{v}\,,\ v=1,\ldots,pitalic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_v = 1 , … , italic_p and λj,j=p+q+1,,mformulae-sequencesubscript𝜆𝑗𝑗𝑝𝑞1𝑚\lambda_{j}\,,\ j=p+q+1,\ldots,mitalic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = italic_p + italic_q + 1 , … , italic_m, which are given by

λvC|𝜽(λvC) Gamma(aC+nv,bC+|𝒲|),v=1,,p; andformulae-sequencesimilar-toconditionalsubscriptsuperscript𝜆𝐶𝑣superscript𝜽subscriptsuperscript𝜆𝐶𝑣 Gammasubscript𝑎𝐶subscript𝑛𝑣subscript𝑏𝐶𝒲𝑣1𝑝 and\lambda^{C}_{v}|\mbox{\boldmath$\theta$}^{-(\lambda^{C}_{v})}\sim\mbox{ Gamma}% (a_{C}+n_{v},~{}b_{C}+\lvert\mathcal{W}\rvert)\,,\ v=1,\ldots,p;\mbox{ and}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT | bold_italic_θ start_POSTSUPERSCRIPT - ( italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∼ Gamma ( italic_a start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT + | caligraphic_W | ) , italic_v = 1 , … , italic_p ; and
λj|𝜽(λj) Gamma(a+nj,b+|𝒲|),j=p+q+1,,m;formulae-sequencesimilar-toconditionalsubscript𝜆𝑗superscript𝜽subscript𝜆𝑗 Gamma𝑎subscript𝑛𝑗𝑏𝒲𝑗𝑝𝑞1𝑚\lambda_{j}|\mbox{\boldmath$\theta$}^{-(\lambda_{j})}\sim\mbox{ Gamma}(a+n_{j}% ,~{}b+\lvert\mathcal{W}\rvert)\,,\ j=p+q+1,\ldots,m;italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_italic_θ start_POSTSUPERSCRIPT - ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∼ Gamma ( italic_a + italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_b + | caligraphic_W | ) , italic_j = italic_p + italic_q + 1 , … , italic_m ;

where nvsubscript𝑛𝑣n_{v}italic_n start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT and njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the numbers of observations for taxon v𝑣vitalic_v and taxon j𝑗jitalic_j within the window, respectively.

C.3 Updating bandwidth parameters

Since the full conditionals of the bandwidth parameters do not have standard forms, we use a random work Metropolis-Hastings step to update each of hl,l=1,,pformulae-sequencesubscript𝑙𝑙1𝑝h_{l}\,,\ l=1,\ldots,pitalic_h start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_l = 1 , … , italic_p. Denote hj(t)superscriptsubscript𝑗𝑡h_{j}^{(t)}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT the sample for hj,j=p+1.,p+qformulae-sequencesubscript𝑗𝑗𝑝1𝑝𝑞h_{j}\,,\ j=p+1.\ldots,p+qitalic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = italic_p + 1 . … , italic_p + italic_q from iteration t𝑡titalic_t. For iteration (t+1)𝑡1(t+1)( italic_t + 1 ), we propose a candidate sample hjsuperscriptsubscript𝑗h_{j}^{*}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as a random draw from N(hj(t),σprop2)𝑁superscriptsubscript𝑗𝑡subscriptsuperscript𝜎2𝑝𝑟𝑜𝑝N(h_{j}^{(t)},\sigma^{2}_{prop})italic_N ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p italic_r italic_o italic_p end_POSTSUBSCRIPT ), where σprop2subscriptsuperscript𝜎2𝑝𝑟𝑜𝑝\sigma^{2}_{prop}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p italic_r italic_o italic_p end_POSTSUBSCRIPT is the prespecified variance of the proposal density. The corresponding acceptance ratio computes to

R=exp(αlclClWk(ucl,hj)𝑑u)yYl(clClWk(ucl,hj))exp(hj2/2σ2)𝕀(hj>0)exp(αlclClWk(ucl,hj(t))𝑑u)yYl(clClWk(ucl,hj(t)))exp(hj(t)2/2σ2).𝑅subscript𝛼𝑙subscriptsubscriptc𝑙subscript𝐶𝑙subscript𝑊𝑘usubscriptc𝑙superscriptsubscript𝑗differential-dusubscriptproductysubscript𝑌𝑙subscriptsubscriptc𝑙subscript𝐶𝑙subscript𝑊𝑘usubscriptc𝑙superscriptsubscript𝑗superscriptsubscript𝑗absent22superscript𝜎2𝕀superscriptsubscript𝑗0subscript𝛼𝑙subscriptsubscriptc𝑙subscript𝐶𝑙subscript𝑊𝑘usubscriptc𝑙superscriptsubscript𝑗𝑡differential-dusubscriptproductysubscript𝑌𝑙subscriptsubscriptc𝑙subscript𝐶𝑙subscript𝑊𝑘usubscriptc𝑙superscriptsubscript𝑗𝑡superscriptsubscript𝑗𝑡22superscript𝜎2R=\frac{\exp\left(-\alpha_{l}\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k(\mbox{% \bf u}-\mbox{\bf c}_{l},h_{j}^{*})\,\ d\mbox{\bf u}\right)\prod_{\mbox{\bf y}% \in Y_{l}}\left(\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k(\mbox{\bf u}-\mbox{% \bf c}_{l},h_{j}^{*})\right)\exp\left(-h_{j}^{*2}/2\sigma^{2}\right)\mathbb{I}% (h_{j}^{*}>0)}{\exp\left(-\alpha_{l}\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k(% \mbox{\bf u}-\mbox{\bf c}_{l},h_{j}^{(t)})\,\ d\mbox{\bf u}\right)\prod_{\mbox% {\bf y}\in Y_{l}}\left(\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k(\mbox{\bf u}-% \mbox{\bf c}_{l},h_{j}^{(t)})\right)\exp\left(-h_{j}^{(t)2}/2\sigma^{2}\right)}.italic_R = divide start_ARG roman_exp ( - italic_α start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_k ( u - c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_d u ) ∏ start_POSTSUBSCRIPT y ∈ italic_Y start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_k ( u - c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) roman_exp ( - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT / 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_I ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0 ) end_ARG start_ARG roman_exp ( - italic_α start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_k ( u - c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) italic_d u ) ∏ start_POSTSUBSCRIPT y ∈ italic_Y start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_k ( u - c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ) roman_exp ( - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) 2 end_POSTSUPERSCRIPT / 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG .

Then, we accept the proposed candidate hjsuperscriptsubscript𝑗h_{j}^{*}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as hj(t+1)superscriptsubscript𝑗𝑡1h_{j}^{(t+1)}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT with probability min{R,1}min𝑅1\mbox{min}\{R,1\}min { italic_R , 1 } or keep hj(t+1)=hj(t)superscriptsubscript𝑗𝑡1superscriptsubscript𝑗𝑡h_{j}^{(t+1)}=h_{j}^{(t)}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT.

D Additional Tables from the simulation study

Here, we present additional details regarding the simulation scenarios (Table S.1) and results for the scenarios that included a taxon unrelated to the parent-offspring-type configurations of interest (Table S.2). The presence of an unrelated taxon (Table S.2) did not meaningfully affect the results (Table 1). Specifically, with or without this spatially unrelated taxon, the multivariate cluster point process (MCPP) performed better than the Neyman-Scott process (NSP) implementation in all aspects. The NSP often failed to converge, especially in scenarios where the bandwidth parameter was large.

Table S.1: A summary of twelve simulation scenarios considered in Section 4. The offspring density is controlled by setting (α2,α3)=(1.5,1)subscript𝛼2subscript𝛼31.51(\alpha_{2},\alpha_{3})=(1.5,1)( italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( 1.5 , 1 ) for ‘Sparse’, (4,3)43(4,3)( 4 , 3 ) for ‘Dense’ and (4,1)41(4,1)( 4 , 1 ) for ‘Mixed’ densities. Bandwidth ‘Low’ sets (h2,h3)=(0.01,0.02)subscript2subscript30.010.02(h_{2},h_{3})=(0.01,0.02)( italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( 0.01 , 0.02 ) and ‘High’ to (0.1,0.01)0.10.01(0.1,0.01)( 0.1 , 0.01 ). The setting “Unrelated taxon” refers to whether there exists a taxon in the data spatially unrelated to the multilayered arrangement.
Scenario Unrelated taxon Offspring density Bandwidth
1 Absent Sparse Low
2 Absent Sparse High
3 Absent Dense Low
4 Absent Dense High
5 Absent Mixed Low
6 Absent Mixed High
7 Present Sparse Low
8 Present Sparse High
9 Present Dense Low
10 Present Dense High
11 Present Mixed Low
12 Present Mixed High
Table S.2: The true value, estimates (EST), and uncertainty measures for the offspring density (α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT), bandwidth (h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT), and parent process (λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) parameters from the MCPP and NSP analyses in the last six simulated scenarios (those that included a spatially unrelated taxon). For the MCPP model, the estimates are the posterior means averaged over different datasets, the SD is computed by averaging the posterior standard deviation over different datasets, and the SDEST is computed as the standard deviation of the estimates over the datasets. For the NSP model, the estimates are the outputs of the minimum contrast method, and the SE is calculated similarly by using these estimates. The SD for the NSP model is not computed, as the method does not provide an uncertainty measure. The last column (%percent\%%F) refers to the percentage of datasets in which the NSP model failed to converge for a given scenario. There is no corresponding %percent\%%F column for the MCPP because all models converged.
True MCPP NSP
Scenario Value EST SD SDEST EST SE %percent\%%F
α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 1.50 1.53 0.10 0.10 1.46 0.33
α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 1.00 1.02 0.08 0.09 3.31 20.25
7 h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.01 0.01 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.01 <0.01absent0.01<0.01< 0.01 2
h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.02 0.02 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.04 0.09
λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 150.00 161.06 12.91 12.20 171.35 34.72
α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 1.50 1.48 0.11 0.09 198.46 283.69
α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 1.00 1.02 0.08 0.09 0.98 0.28
8 h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.10 0.08 0.01 0.01 10.30 28.72 36
h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.01 0.01 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.01 <0.01absent0.01<0.01< 0.01
λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 150.00 160.25 12.86 12.57 939.70 2777.40
α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 4.00 4.02 0.14 0.15 8.77 48.78
α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 3.00 3.05 0.13 0.13 2.91 0.77
9 h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.01 0.01 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.02 0.08 0
h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.02 0.02 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.02 <0.01absent0.01<0.01< 0.01
λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 150.00 202.78 14.38 14.29 208.52 39.37
α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 4.00 4.00 0.17 0.17 613.34 569.20
α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 3.00 3.02 0.13 0.14 2.93 0.53
10 h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.10 0.09 0.01 0.01 1.15 0.64 48
h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.01 0.01 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.01 <0.01absent0.01<0.01< 0.01
λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 150.00 200.49 14.26 14.48 13.30 28.78
α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 4.00 4.05 0.15 0.14 18.45 87.48
α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 1.00 1.00 0.07 0.08 2.05 10.04
11 h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.01 0.01 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.03 0.11 0
h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.02 0.02 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.47 4.41
λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 150.00 201.50 14.37 14.09 203.36 53.30
α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 4.00 4.02 0.17 0.17 547.61 553.61
α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 1.00 1.02 0.07 0.07 0.97 0.24
12 h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.10 0.09 0.01 0.01 1.04 0.82 70
h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.01 0.01 <0.01absent0.01<0.01< 0.01 <0.01absent0.01<0.01< 0.01 0.01 <0.01absent0.01<0.01< 0.01
λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 150.00 199.05 14.23 15.95 26.62 41.46

E Sensitivity analyses regarding choice of prior for the bandwidth parameters

As part of the simulation study described in Section 4, we also evaluated the MCPP method’s sensitivity to choice of prior distribution for the bandwidth parameters. We considered four different prior distributions; 1) half-normal, 2) uniform, 3) log-normal with a flat tail and high variance and 4) log-normal with a slim tail and higher peak. For the uniform prior, the lower and upper bounds were taken to be 00 and 0.20.20.20.2, respectively. Both the log-normal priors had μ=log0.05𝜇0.05\mu=\log 0.05italic_μ = roman_log 0.05; the flat-tailed prior had σ=1𝜎1\sigma=1italic_σ = 1, and the high-peaked prior had σ=0.1𝜎0.1\sigma=0.1italic_σ = 0.1 as the hyperparameter. The hyperparameter setting for the half-normal prior was the same as in Section 4. We compared performance of the MCPP for the different prior distributions in Scenario 5 and 6 (Table 1): both scenarios considered mixed offspring density (α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT=4 and α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT=1), one had low bandwidth (h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT=0.01 and h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT=0.02), and the other had high bandwidth (h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT=0.1 and h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT=0.01).

We report the mean absolute percentage bias for estimating the corresponding parameters in the two scenarios for the four different prior settings: i) Half-normal, ii) Uniform, iii) a flat Log-normal, and iv) a tight Log-normal. The half-normal prior-based MCPP model performed the best, and the performance was similar to that for the original model. When the true bandwidth was low, all the models—irrespective of prior choice—generally performed well and similarly to each other, with almost all biases <8%absentpercent8<8\%< 8 %. Differences in performance emerged when the true bandwidth was high, where the analyses with tighter priors produced much less biased estimates (<10%absentpercent10<10\%< 10 % except in one instance) than the analyses with flatter priors (4-137%; Table S.3). However, using an informative log-normal prior backfired even for the low-bandwidth scenario when the offspring density was also low, as for the second offspring process (similar-to\sim20-25%).

Table S.3: Results of MCPP based analysis of simulated data, comparing different priors for the bandwidth parameters. The true values for the offspring densities were α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT=4 and α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT=1. The true values for the bandwidth parameters were h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT=0.01 and h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT=0.02 under low bandwidth and h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT=0.1 and h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT=0.01 under high bandwidth. The parent process is denoted λCsuperscript𝜆𝐶\lambda^{C}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT. Results are presented as mean absolute percentage bias of the estimated parameter values based on posterior means of each of the 100100100100 simulated datasets. There were no other taxa unrelated to these multi-layered arrangements.
Log-normal Log-normal
Half-normal Uniform (flat) (tight)
α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.03 0.03 0.03 0.03
Low α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.07 0.07 0.07 0.07
bandwidth h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.02 0.02 0.02 0.05
h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.04 0.04 0.04 0.24
λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 0.06 0.06 0.06 0.06
α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.03 0.31 0.52 0.03
High α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.05 0.05 0.05 0.05
bandwidth h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.09 0.99 1.37 0.10
h3subscript3h_{3}italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.03 0.03 0.03 0.20
λ1Csubscriptsuperscript𝜆𝐶1\lambda^{C}_{1}italic_λ start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 0.06 0.06 0.06 0.06

F Additional Figures and Tables for the Analysis of Human Microbiome Biofilm Image Data

Here, we present a visual representation of the four quadrants of the whole dental plaque data for subset analyses in Figure S.2 and the abundances of different taxa in the four quadrants of the subsetted data (Table S.4). The estimates for the intensity functions for the five taxa (Neisseriaceae, Capnocytophaga, Actinomyces, Fusobacterium, Leptotrichia) that have no visible spatial relationship with the parent-offspring-type configurations also varied across the quadrants (Table S.5). K-functions for the whole and subsetted analyses (Figures S.3 and S.4 through S.7) also varied noticeably by quadrant. The DIC estimates for the different models explored in Section 5.2 (Table S.6) indicate that the models with the Fusobacterium and Leptotrichia as an additional parent-offspring pair is a better fit to the data than the original model, while models fitting Streptococcus around Fusobacterium do not fit the data well.

Refer to caption
Figure S.2: Division of the dental plaque sample image into the first (bottom left), second (top left), third (bottom right) and fourth (top right) quadrants. Black space has been removed.
Table S.4: The abundance (counts) of bacterial taxa of interest in the human dental plaque sample image data and its four subdivided quadrants.
Taxon Quadrant Total
I II III IV
Actinomyces 119 280 154 223 776
Capnocytophaga 512 755 574 573 2414
Corynebacterium 58 219 186 245 708
Fusobacterium 92 250 141 173 656
Leptotrichia 191 411 234 339 1175
Neisseriaceae 339 479 402 491 1711
Pasteurellaceae 53 130 76 106 365
Porphyromonas 227 525 269 420 1441
Streptococcus 98 379 163 249 889
Table S.5: The posterior means of parameters associated with Neisseriaceae (λ5subscript𝜆5\lambda_{5}italic_λ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT), Capnocytophaga (λ6subscript𝜆6\lambda_{6}italic_λ start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT), Actinomyces (λ7subscript𝜆7\lambda_{7}italic_λ start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT), Fusobacterium (λ8subscript𝜆8\lambda_{8}italic_λ start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT) and Leptotrichia (λ9subscript𝜆9\lambda_{9}italic_λ start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT) obtained by applying the proposed MCPP method on the entire image and on each of the four quadrants of the dental plaque sample image. All results are rounded to two decimal places. The posterior standard deviations were all smaller than 0.01 and are not reported separately.
λ5subscript𝜆5\lambda_{5}italic_λ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT λ6subscript𝜆6\lambda_{6}italic_λ start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT λ7subscript𝜆7\lambda_{7}italic_λ start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT λ8subscript𝜆8\lambda_{8}italic_λ start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT λ9subscript𝜆9\lambda_{9}italic_λ start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT
Full Image 0.04 0.06 0.02 0.02 0.03
Segment 1 0.04 0.06 0.01 0.01 0.02
Segment 2 0.05 0.07 0.03 0.02 0.04
Segment 3 0.04 0.05 0.01 0.01 0.02
Segment 4 0.05 0.06 0.02 0.02 0.03
Refer to caption
Figure S.3: Border-corrected K𝐾Kitalic_K-functions (K^bord(r)\big{(}\hat{K}_{\text{bord}}(r)( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT bord end_POSTSUBSCRIPT ( italic_r ) or K^inhombord(r))\hat{K}_{\text{inhom}}^{\text{bord}}(r)\big{)}over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT inhom end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bord end_POSTSUPERSCRIPT ( italic_r ) ) for the processes corresponding to Corynebacterium (top left), Streptococcus (top right), Porphyromonas (bottom left) and Pasteurellaceae (bottom right) in comparison to that of a homogeneous Poisson process (K^pois(r))subscript^𝐾pois𝑟\big{(}\hat{K}_{\text{pois}}(r)\big{)}( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT pois end_POSTSUBSCRIPT ( italic_r ) ).
Refer to caption
Figure S.4: Border-corrected K𝐾Kitalic_K-functions (K^bord(r)\big{(}\hat{K}_{\text{bord}}(r)( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT bord end_POSTSUBSCRIPT ( italic_r ) or K^inhombord(r))\hat{K}_{\text{inhom}}^{\text{bord}}(r)\big{)}over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT inhom end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bord end_POSTSUPERSCRIPT ( italic_r ) ) for the processes corresponding to Corynebacterium (top left), Streptococcus (top right), Porphyromonas (bottom left) and Pasteurellaceae (bottom right) in comparison to that of a homogeneous Poisson process (K^pois(r))subscript^𝐾pois𝑟\big{(}\hat{K}_{\text{pois}}(r)\big{)}( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT pois end_POSTSUBSCRIPT ( italic_r ) ) in Segment 1
Refer to caption
Figure S.5: Border-corrected K𝐾Kitalic_K-functions (K^bord(r)\big{(}\hat{K}_{\text{bord}}(r)( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT bord end_POSTSUBSCRIPT ( italic_r ) or K^inhombord(r))\hat{K}_{\text{inhom}}^{\text{bord}}(r)\big{)}over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT inhom end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bord end_POSTSUPERSCRIPT ( italic_r ) ) for the processes corresponding to Corynebacterium (top left), Streptococcus (top right), Porphyromonas (bottom left) and Pasteurellaceae (bottom right) in comparison to that of a homogeneous Poisson process (K^pois(r))subscript^𝐾pois𝑟\big{(}\hat{K}_{\text{pois}}(r)\big{)}( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT pois end_POSTSUBSCRIPT ( italic_r ) ) in Segment 2
Refer to caption
Figure S.6: Border-corrected K𝐾Kitalic_K-functions (K^bord(r)\big{(}\hat{K}_{\text{bord}}(r)( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT bord end_POSTSUBSCRIPT ( italic_r ) or K^inhombord(r))\hat{K}_{\text{inhom}}^{\text{bord}}(r)\big{)}over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT inhom end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bord end_POSTSUPERSCRIPT ( italic_r ) ) for the processes corresponding to Corynebacterium (top left), Streptococcus (top right), Porphyromonas (bottom left) and Pasteurellaceae (bottom right) in comparison to that of a homogeneous Poisson process (K^pois(r))subscript^𝐾pois𝑟\big{(}\hat{K}_{\text{pois}}(r)\big{)}( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT pois end_POSTSUBSCRIPT ( italic_r ) ) in Segment 3
Refer to caption
Figure S.7: Border-corrected K𝐾Kitalic_K-functions (K^bord(r)\big{(}\hat{K}_{\text{bord}}(r)( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT bord end_POSTSUBSCRIPT ( italic_r ) or K^inhombord(r))\hat{K}_{\text{inhom}}^{\text{bord}}(r)\big{)}over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT inhom end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bord end_POSTSUPERSCRIPT ( italic_r ) ) for the processes corresponding to Corynebacterium (top left), Streptococcus (top right), Porphyromonas (bottom left) and Pasteurellaceae (bottom right) in comparison to that of a homogeneous Poisson process (K^pois(r))subscript^𝐾pois𝑟\big{(}\hat{K}_{\text{pois}}(r)\big{)}( over^ start_ARG italic_K end_ARG start_POSTSUBSCRIPT pois end_POSTSUBSCRIPT ( italic_r ) ) in Segment 4
Table S.6: The parent-offspring relationships explored in different models and their DIC. The abbreviations C, S, Po, Pa, F and L are used for Corynebacterium, Streptococcus, Porphyromonas, Pasteurellaceae, Fusobacterium and Leptotrichia. The \rightarrow implies parent-offspring relationship with the arrow directed from the parent to the offspring(s).
Identifier Parent-Offspring Realtions Present DIC
1 CSPoSPa𝐶𝑆𝑃𝑜𝑆𝑃𝑎C\rightarrow SPo\vdots S\rightarrow Paitalic_C → italic_S italic_P italic_o ⋮ italic_S → italic_P italic_a 124985.4
2 CPoSPaFS𝐶𝑃𝑜𝑆𝑃𝑎𝐹𝑆C\rightarrow Po\vdots S\rightarrow Pa\vdots F\rightarrow Sitalic_C → italic_P italic_o ⋮ italic_S → italic_P italic_a ⋮ italic_F → italic_S 125531.1
3 CSPoFSSPa𝐶𝑆𝑃𝑜𝐹𝑆𝑆𝑃𝑎C\rightarrow SPo\vdots F\rightarrow S\vdots S\rightarrow Paitalic_C → italic_S italic_P italic_o ⋮ italic_F → italic_S ⋮ italic_S → italic_P italic_a 134007.1
4 CSPoSPaFL𝐶𝑆𝑃𝑜𝑆𝑃𝑎𝐹𝐿C\rightarrow SPo\vdots S\rightarrow Pa\vdots F\rightarrow Litalic_C → italic_S italic_P italic_o ⋮ italic_S → italic_P italic_a ⋮ italic_F → italic_L 124317.0
5 CSPoSPaLF𝐶𝑆𝑃𝑜𝑆𝑃𝑎𝐿𝐹C\rightarrow SPo\vdots S\rightarrow Pa\vdots L\rightarrow Fitalic_C → italic_S italic_P italic_o ⋮ italic_S → italic_P italic_a ⋮ italic_L → italic_F 124303.5
6 CSPoSPaFLS𝐶𝑆𝑃𝑜𝑆𝑃𝑎𝐹𝐿𝑆C\rightarrow SPo\vdots S\rightarrow Pa\vdots F\rightarrow LSitalic_C → italic_S italic_P italic_o ⋮ italic_S → italic_P italic_a ⋮ italic_F → italic_L italic_S 133357.2
7 CSPoSPaLFFS𝐶𝑆𝑃𝑜𝑆𝑃𝑎𝐿𝐹𝐹𝑆C\rightarrow SPo\vdots S\rightarrow Pa\vdots L\rightarrow F\vdots F\rightarrow Sitalic_C → italic_S italic_P italic_o ⋮ italic_S → italic_P italic_a ⋮ italic_L → italic_F ⋮ italic_F → italic_S 133345.2

We additionally present two images (Figures S.8 and S.9) to make the case for testing the different models as described in Sections 5.2.1 and 5.2.2 of the main manuscript.

Refer to caption
Figure S.8: In vitro corncob formation between (A) S. sanguis CC5A and B. matruchotti and (B) S. sanguis CC5A and F. nucleatum. The bar is 2μm2𝜇𝑚2\mu m2 italic_μ italic_m. Image taken from lancy1983corncob.
Refer to caption
Figure S.9: Fusobacterium (Yellow) and Leptotrichia (Blue) occupying space near each other in the dental plaque sample we analyze.
(a) Model 1
Refer to caption
(b) Model 2
Refer to caption
(c) Model 3
Refer to caption
(d) Model 4
Refer to caption
(e) Model 5
Refer to caption
(f) Model 6
Refer to caption
(g) Model 7
Refer to caption
Figure S.10: Cartoon images depicting the different relationships present in the seven different models in Section 5.2.2. These models involve Corynebacterium (magenta), Streptococcus (green), Porphyromonas (blue), Pasteurellaceae (Orange), Fusobacterium (yellow), and Leptotrichia (cyan) in different configurations. Note that for each model, we include here only one illustrative cluster example.

G Model Formulation for Given Parent-Offspring Relationships

In this section, we illustrate the formulation of the MCPP model for a given parent-offspring configuration, as described in Section 3.1 of the main manuscript. Using the notation introduced in Section 3.5, we assume data containing four taxa A,B,C,𝐴𝐵𝐶A,B,C,italic_A , italic_B , italic_C , and D𝐷Ditalic_D (m=4𝑚4m=4italic_m = 4). We detail the construction of the likelihood function according to equations (1)-(3) for the following four different configurations:

  1. a)

    ABCD𝐴𝐵𝐶𝐷A\rightarrow B\vdots C\rightarrow Ditalic_A → italic_B ⋮ italic_C → italic_D

  2. b)

    ABC𝐴𝐵𝐶A\rightarrow BCitalic_A → italic_B italic_C

  3. c)

    ABBC𝐴𝐵𝐵𝐶A\rightarrow B\vdots B\rightarrow Citalic_A → italic_B ⋮ italic_B → italic_C

  4. d)

    ABCCD𝐴𝐵𝐶𝐶𝐷A\rightarrow BC\vdots C\rightarrow Ditalic_A → italic_B italic_C ⋮ italic_C → italic_D

G.1 Modeling ABCD𝐴𝐵𝐶𝐷A\rightarrow B\vdots C\rightarrow Ditalic_A → italic_B ⋮ italic_C → italic_D

The model represents a configuration involving two parent processes (taxa A𝐴Aitalic_A and C𝐶Citalic_C, p=2𝑝2p=2italic_p = 2) and two offspring processes (taxa B𝐵Bitalic_B and D𝐷Ditalic_D, q=2𝑞2q=2italic_q = 2). That is, all taxa are interrelated within parent-offspring framework, with no taxa existing outside of these relationships (mpq=0𝑚𝑝𝑞0m-p-q=0italic_m - italic_p - italic_q = 0). Therefore, following equations (1)-(3), the corresponding model formulation can be written as follows:

λA(s)=λAλC(s)=λBλB(s)=αBcAkB(sc,hB)λD(s)=αDcCkD(sc,hD).subscript𝜆𝐴ssubscript𝜆𝐴subscript𝜆𝐶ssubscript𝜆𝐵subscript𝜆𝐵ssubscript𝛼𝐵subscriptc𝐴subscript𝑘𝐵scsubscript𝐵subscript𝜆𝐷ssubscript𝛼𝐷subscriptc𝐶subscript𝑘𝐷scsubscript𝐷\begin{split}\lambda_{A}(\mbox{\bf s})&=\lambda_{A}\\ \lambda_{C}(\mbox{\bf s})&=\lambda_{B}\\ \lambda_{B}(\mbox{\bf s})&=\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s% }-\mbox{\bf c},h_{B})\\ \lambda_{D}(\mbox{\bf s})&=\alpha_{D}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf s% }-\mbox{\bf c},h_{D}).\end{split}start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_λ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_C end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) . end_CELL end_ROW

It follows that

l(Y|𝜽)|𝒲||𝒲|λA|𝒲|λC+αBcA𝒲kB(uc,hB)𝑑u+αDcC𝒲kD(uc,hD)𝑑u+nAlogλA+nClogλC+sBlog(αBcAkB(sc,hB))+sDlog(αDcCkD(sc,hD)).proportional-to𝑙conditional𝑌𝜽𝒲𝒲subscript𝜆𝐴𝒲subscript𝜆𝐶subscript𝛼𝐵subscriptc𝐴subscript𝒲subscript𝑘𝐵ucsubscript𝐵differential-dusubscript𝛼𝐷subscriptc𝐶subscript𝒲subscript𝑘𝐷ucsubscript𝐷differential-dusubscript𝑛𝐴subscript𝜆𝐴subscript𝑛𝐶subscript𝜆𝐶subscripts𝐵subscript𝛼𝐵subscriptc𝐴subscript𝑘𝐵scsubscript𝐵subscripts𝐷subscript𝛼𝐷subscriptc𝐶subscript𝑘𝐷scsubscript𝐷\begin{split}l(Y|\mbox{\boldmath$\theta$})&\propto|\mathcal{W}|-|\mathcal{W}|% \lambda_{A}-|\mathcal{W}|\lambda_{C}\\ &\quad+\alpha_{B}\sum_{\mbox{\bf c}\in A}\int_{\mathcal{W}}k_{B}(\mbox{\bf u}-% \mbox{\bf c},h_{B})\,\ d\mbox{\bf u}+\alpha_{D}\sum_{\mbox{\bf c}\in C}\int_{% \mathcal{W}}k_{D}(\mbox{\bf u}-\mbox{\bf c},h_{D})\,\ d\mbox{\bf u}\\ &\quad+n_{A}\log\lambda_{A}+n_{C}\log\lambda_{C}\\ &\quad+\sum_{\mbox{\bf s}\in B}\log\left(\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{% B}(\mbox{\bf s}-\mbox{\bf c},h_{B})\right)+\sum_{\mbox{\bf s}\in D}\log\left(% \alpha_{D}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf s}-\mbox{\bf c},h_{D})\right% ).\end{split}start_ROW start_CELL italic_l ( italic_Y | bold_italic_θ ) end_CELL start_CELL ∝ | caligraphic_W | - | caligraphic_W | italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT - | caligraphic_W | italic_λ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) italic_d u + italic_α start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_C end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) italic_d u end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_n start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT roman_log italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT roman_log italic_λ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT s ∈ italic_B end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT s ∈ italic_D end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_C end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) ) . end_CELL end_ROW

G.2 Modeling ABC𝐴𝐵𝐶A\rightarrow BCitalic_A → italic_B italic_C

In this model, we investigate the configuration where two offspring processes (taxa B𝐵Bitalic_B and C𝐶Citalic_C, q=2𝑞2q=2italic_q = 2) share the same parent process (taxon A𝐴Aitalic_A, p=1𝑝1p=1italic_p = 1). Additionally, the model includes a separate process for taxon D𝐷Ditalic_D, which remains uninvolved in any parent-offspring relationship (mpq=1𝑚𝑝𝑞1m-p-q=1italic_m - italic_p - italic_q = 1). The model formulation can be written as

λA(s)=λA,λB(s)=αBcAkB(sc,hB),λC(s)=αCcAkC(sc,hC),λD(s)=λD.formulae-sequencesubscript𝜆𝐴ssubscript𝜆𝐴formulae-sequencesubscript𝜆𝐵ssubscript𝛼𝐵subscriptc𝐴subscript𝑘𝐵scsubscript𝐵formulae-sequencesubscript𝜆𝐶ssubscript𝛼𝐶subscriptc𝐴subscript𝑘𝐶scsubscript𝐶subscript𝜆𝐷ssubscript𝜆𝐷\begin{split}\lambda_{A}(\mbox{\bf s})&=\lambda_{A},\\ \lambda_{B}(\mbox{\bf s})&=\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s% }-\mbox{\bf c},h_{B}),\\ \lambda_{C}(\mbox{\bf s})&=\alpha_{C}\sum_{\mbox{\bf c}\in A}k_{C}(\mbox{\bf s% }-\mbox{\bf c},h_{C}),\\ \lambda_{D}(\mbox{\bf s})&=\lambda_{D}.\end{split}start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT . end_CELL end_ROW

It follows that

l(Y|𝜽)|𝒲||𝒲|λA|𝒲|λDαBcA𝒲kB(uc,hB)𝑑uαCcA𝒲kC(uc,hC)𝑑u+nAlogλA+nDlogλD+sBlog(αBcAkB(sc,hB))+sClog(αCcAkC(sc,hC)).proportional-to𝑙conditional𝑌𝜽𝒲𝒲subscript𝜆𝐴𝒲subscript𝜆𝐷subscript𝛼𝐵subscriptc𝐴subscript𝒲subscript𝑘𝐵ucsubscript𝐵differential-dusubscript𝛼𝐶subscriptc𝐴subscript𝒲subscript𝑘𝐶ucsubscript𝐶differential-dusubscript𝑛𝐴subscript𝜆𝐴subscript𝑛𝐷subscript𝜆𝐷subscripts𝐵subscript𝛼𝐵subscriptc𝐴subscript𝑘𝐵scsubscript𝐵subscripts𝐶subscript𝛼𝐶subscriptc𝐴subscript𝑘𝐶scsubscript𝐶\begin{split}l(Y|\mbox{\boldmath$\theta$})&\propto|\mathcal{W}|-|\mathcal{W}|% \lambda_{A}-|\mathcal{W}|\lambda_{D}\\ &\quad-\alpha_{B}\sum_{\mbox{\bf c}\in A}\int_{\mathcal{W}}k_{B}(\mbox{\bf u}-% \mbox{\bf c},h_{B})\,\ d\mbox{\bf u}-\alpha_{C}\sum_{\mbox{\bf c}\in A}\int_{% \mathcal{W}}k_{C}(\mbox{\bf u}-\mbox{\bf c},h_{C})\,\ d\mbox{\bf u}\\ &\quad+n_{A}\log\lambda_{A}+n_{D}\log\lambda_{D}\\ &\quad+\sum_{\mbox{\bf s}\in B}\log\left(\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{% B}(\mbox{\bf s}-\mbox{\bf c},h_{B})\right)+\sum_{\mbox{\bf s}\in C}\log\left(% \alpha_{C}\sum_{\mbox{\bf c}\in A}k_{C}(\mbox{\bf s}-\mbox{\bf c},h_{C})\right% ).\end{split}start_ROW start_CELL italic_l ( italic_Y | bold_italic_θ ) end_CELL start_CELL ∝ | caligraphic_W | - | caligraphic_W | italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT - | caligraphic_W | italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) italic_d u - italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) italic_d u end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_n start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT roman_log italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT roman_log italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT s ∈ italic_B end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT s ∈ italic_C end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) ) . end_CELL end_ROW

G.3 Modeling ABBC𝐴𝐵𝐵𝐶A\rightarrow B\vdots B\rightarrow Citalic_A → italic_B ⋮ italic_B → italic_C

This scenario presents a more complex relationship where taxon B𝐵Bitalic_B behaves both as an offspring to taxon A𝐴Aitalic_A and as a parent to taxon C𝐶Citalic_C. In this setup, there is only one process serving as a parent (taxon A𝐴Aitalic_A, p=1𝑝1p=1italic_p = 1), two processes functioning as offspring (taxa B𝐵Bitalic_B and C𝐶Citalic_C, q=2𝑞2q=2italic_q = 2), and a separate process for taxon D𝐷Ditalic_D, which is uninvolved in any parent-offspring relationship (mpq=1𝑚𝑝𝑞1m-p-q=1italic_m - italic_p - italic_q = 1). It is worth noting that p=1𝑝1p=1italic_p = 1 as the process for taxon B𝐵Bitalic_B is counted as an offspring process, while the locations of taxon B𝐵Bitalic_B will be used for modeling the process for taxon C𝐶Citalic_C. Therefore, the MCPP under the configuration can be written as

λA(s)=λA,λB(s)=αBcAkB(sc,hB),λC(s)=αCcBkC(sc,hC),λD(s)=λD.formulae-sequencesubscript𝜆𝐴ssubscript𝜆𝐴formulae-sequencesubscript𝜆𝐵ssubscript𝛼𝐵subscriptc𝐴subscript𝑘𝐵scsubscript𝐵formulae-sequencesubscript𝜆𝐶ssubscript𝛼𝐶subscriptc𝐵subscript𝑘𝐶scsubscript𝐶subscript𝜆𝐷ssubscript𝜆𝐷\begin{split}\lambda_{A}(\mbox{\bf s})&=\lambda_{A},\\ \lambda_{B}(\mbox{\bf s})&=\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s% }-\mbox{\bf c},h_{B}),\\ \lambda_{C}(\mbox{\bf s})&=\alpha_{C}\sum_{\mbox{\bf c}\in B}k_{C}(\mbox{\bf s% }-\mbox{\bf c},h_{C}),\\ \lambda_{D}(\mbox{\bf s})&=\lambda_{D}.\end{split}start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_B end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT . end_CELL end_ROW

It follows that

l(Y|𝜽)|𝒲||𝒲|λA|𝒲|λDαB𝒲cAkB(uc,hB)duαC𝒲cAkC(uc,hC)du+nAlogλA+nDlogλD+sBlog(αBcAkB(sc,hB))+sClog(αCcAkC(sc,hC)).proportional-to𝑙conditional𝑌𝜽𝒲𝒲subscript𝜆𝐴𝒲subscript𝜆𝐷subscript𝛼𝐵subscript𝒲subscriptc𝐴subscript𝑘𝐵ucsubscript𝐵𝑑usubscript𝛼𝐶subscript𝒲subscriptc𝐴subscript𝑘𝐶ucsubscript𝐶𝑑usubscript𝑛𝐴subscript𝜆𝐴subscript𝑛𝐷subscript𝜆𝐷subscripts𝐵subscript𝛼𝐵subscriptc𝐴subscript𝑘𝐵scsubscript𝐵subscripts𝐶subscript𝛼𝐶subscriptc𝐴subscript𝑘𝐶scsubscript𝐶\begin{split}l(Y|\mbox{\boldmath$\theta$})&\propto|\mathcal{W}|-|\mathcal{W}|% \lambda_{A}-|\mathcal{W}|\lambda_{D}\\ &\quad-\alpha_{B}\int_{\mathcal{W}}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf u}-% \mbox{\bf c},h_{B})\,\ d\mbox{\bf u}-\alpha_{C}\int_{\mathcal{W}}\sum_{\mbox{% \bf c}\in A}k_{C}(\mbox{\bf u}-\mbox{\bf c},h_{C})\,\ d\mbox{\bf u}\\ &\quad+n_{A}\log\lambda_{A}+n_{D}\log\lambda_{D}\\ &\quad+\sum_{\mbox{\bf s}\in B}\log\left(\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{% B}(\mbox{\bf s}-\mbox{\bf c},h_{B})\right)+\sum_{\mbox{\bf s}\in C}\log\left(% \alpha_{C}\sum_{\mbox{\bf c}\in A}k_{C}(\mbox{\bf s}-\mbox{\bf c},h_{C})\right% ).\end{split}start_ROW start_CELL italic_l ( italic_Y | bold_italic_θ ) end_CELL start_CELL ∝ | caligraphic_W | - | caligraphic_W | italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT - | caligraphic_W | italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) italic_d u - italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) italic_d u end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_n start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT roman_log italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT roman_log italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT s ∈ italic_B end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT s ∈ italic_C end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) ) . end_CELL end_ROW

G.4 Modeling ABCCD𝐴𝐵𝐶𝐶𝐷A\rightarrow BC\vdots C\rightarrow Ditalic_A → italic_B italic_C ⋮ italic_C → italic_D

In this model, we explore a more complex scenario where the two processes for taxa B𝐵Bitalic_B and C𝐶Citalic_C share the same parent process (for taxon A𝐴Aitalic_A). Furthermore, the process for taxon C𝐶Citalic_C not only functions as an offspring process with respect to that of taxon A𝐴Aitalic_A but also acts as a parent process for taxon D𝐷Ditalic_D. This configuration involves one process serving exclusively as a parent (taxon A𝐴Aitalic_A, p=1𝑝1p=1italic_p = 1) and three processes serving as offspring processes (taxa B,C𝐵𝐶B,Citalic_B , italic_C and D𝐷Ditalic_D, q=3𝑞3q=3italic_q = 3). Consequently, no taxon remains uninvolved in parent-offspring relationships (mpq=0𝑚𝑝𝑞0m-p-q=0italic_m - italic_p - italic_q = 0). As in Section G.3, it is important to note that p=1𝑝1p=1italic_p = 1 as the process for taxon C𝐶Citalic_C is counted as an offspring process, despite its role as a parent process for taxon D𝐷Ditalic_D. The corresponding model formulation can be written as follows:

λA(s)=λA,λB(s)=αBcAkB(sc,hB),λC(s)=αCcAkC(sc,hC),λD(s)=αDcCkD(sc,hD).formulae-sequencesubscript𝜆𝐴ssubscript𝜆𝐴formulae-sequencesubscript𝜆𝐵ssubscript𝛼𝐵subscriptc𝐴subscript𝑘𝐵scsubscript𝐵formulae-sequencesubscript𝜆𝐶ssubscript𝛼𝐶subscriptc𝐴subscript𝑘𝐶scsubscript𝐶subscript𝜆𝐷ssubscript𝛼𝐷subscriptc𝐶subscript𝑘𝐷scsubscript𝐷\begin{split}\lambda_{A}(\mbox{\bf s})&=\lambda_{A},\\ \lambda_{B}(\mbox{\bf s})&=\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s% }-\mbox{\bf c},h_{B}),\\ \lambda_{C}(\mbox{\bf s})&=\alpha_{C}\sum_{\mbox{\bf c}\in A}k_{C}(\mbox{\bf s% }-\mbox{\bf c},h_{C}),\\ \lambda_{D}(\mbox{\bf s})&=\alpha_{D}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf s% }-\mbox{\bf c},h_{D}).\end{split}start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( s ) end_CELL start_CELL = italic_α start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_C end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) . end_CELL end_ROW

It follows that

l(Y|𝜽)|𝒲||𝒲|λAαB𝒲cAkB(uc,hB)duαC𝒲cAkC(uc,hC)duαD𝒲cCkD(uc,hD)du+nAlogλA+sBlog(αBcAkB(sc,hB))+sClog(αCcAkC(sc,hC))+sDlog(αDcCkD(sc,hD)).proportional-to𝑙Missing Operator𝒲𝒲subscript𝜆𝐴subscript𝛼𝐵subscript𝒲subscriptc𝐴subscript𝑘𝐵ucsubscript𝐵𝑑usubscript𝛼𝐶subscript𝒲subscriptc𝐴subscript𝑘𝐶ucsubscript𝐶𝑑usubscript𝛼𝐷subscript𝒲subscriptc𝐶subscript𝑘𝐷ucsubscript𝐷𝑑usubscript𝑛𝐴subscript𝜆𝐴subscripts𝐵subscript𝛼𝐵subscriptc𝐴subscript𝑘𝐵scsubscript𝐵subscripts𝐶subscript𝛼𝐶subscriptc𝐴subscript𝑘𝐶scsubscript𝐶subscripts𝐷subscript𝛼𝐷subscriptc𝐶subscript𝑘𝐷scsubscript𝐷\begin{split}l(Y|\mbox{\boldmath$\theta$})&\propto|\mathcal{W}|-|\mathcal{W}|% \lambda_{A}\\ &\quad-\alpha_{B}\int_{\mathcal{W}}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf u}-% \mbox{\bf c},h_{B})\,\ d\mbox{\bf u}-\alpha_{C}\int_{\mathcal{W}}\sum_{\mbox{% \bf c}\in A}k_{C}(\mbox{\bf u}-\mbox{\bf c},h_{C})\,\ d\mbox{\bf u}\\ &\quad-\alpha_{D}\int_{\mathcal{W}}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf u}-% \mbox{\bf c},h_{D})\,\ d\mbox{\bf u}+n_{A}\log\lambda_{A}+\sum_{\mbox{\bf s}% \in B}\log\left(\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s}-\mbox{\bf c% },h_{B})\right)\\ &\quad+\sum_{\mbox{\bf s}\in C}\log\left(\alpha_{C}\sum_{\mbox{\bf c}\in A}k_{% C}(\mbox{\bf s}-\mbox{\bf c},h_{C})\right)+\sum_{\mbox{\bf s}\in D}\log\left(% \alpha_{D}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf s}-\mbox{\bf c},h_{D})\right% ).\end{split}start_ROW start_CELL italic_l ( italic_Y | bold_italic_θ ) end_CELL start_CELL ∝ | caligraphic_W | - | caligraphic_W | italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) italic_d u - italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) italic_d u end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - italic_α start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_C end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( u - c , italic_h start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) italic_d u + italic_n start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT roman_log italic_λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT s ∈ italic_B end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT s ∈ italic_C end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_A end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT s ∈ italic_D end_POSTSUBSCRIPT roman_log ( italic_α start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT c ∈ italic_C end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( s - c , italic_h start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) ) . end_CELL end_ROW