Multivariate cluster point process to quantify and explore multi-entity configurations: Application to biofilm image data - Supplementary Materials

Suman Majumder^1,∗ , Brent A. Coull², Jessica L. Mark Welch³ Patrick J. La Riviere⁴, Floyd E. Dewhirst³ Jacqueline R. Starr^5,†,Kyu Ha Lee^2,†

¹ University of Missouri, Columbia, MO, USA
² Harvard T.H. Chan School of Public Health, Boston, MA, USA
³ Forsyth Institute, Cambridge, MA, USA
⁴ University of Chicago, Chicago, IL, USA
⁵ Brigham and Women’s Hospital, Boston, MA, USA
^† Co-senior authors

A Neyman-Scott process

A Neyman-Scott process is a point process used for modeling parent-offspring clustering. In the simplest setting, consider the parent process $C$ to be a homogeneous Poisson point process with intensity $\lambda^{C}$ . For each observation location $c\in C$ , the cluster of offspring $Y_{c}$ is an independent Poisson process with intensity $\alpha k(\cdot-c,h)$ , where $k(\cdot-c,h)$ is a probability distribution function parameterized by $h$ that determines the spread and distribution of the offspring locations around the parent $c$ , and $\alpha>0$ is the expected number of offspring per cluster. The Neyman-Scott process $Y$ is the union of all these offspring cluster processes, namely, $Y=\bigcup_{c\in C}Y_{c}$ . Further details can be found in illian2008statistical and chiu2013stochastic, for example.

B Images of the taxa from the human dental plaque biofilm data not visualized in the image included in the main text

Refer to caption — Figure S.1: RGB images of Neisseriaceae (top left), Capnocytophaga (top right), Actinomyces (middle left), Fusobacterium (middle right), Leptotrichia (bottom left) and Eubacterium (bottom right) in the dental plaque biofilm sample. Eubacterium denotes a probe for all oral bacteria. It is used for methodologic purposes to evaluate the completeness of the set of specific probes. Hence, it is omitted from analysis of community spatial structure. The genera shown here were modeled as homogeneous Poisson process in the data analysis.

C Computational details of the sampling algorithm

We use a Markov chain Monte Carlo (MCMC) method to draw samples from the joint posterior distribution of $\theta$ . In the MCMC scheme, parameters are updated either by exploiting conjugacies inherent to the proposed model or by using a Metropolis-Hastings algorithm.

C.1 Updating parameters associated with offspring densities

Let $\mbox{\boldmath$\theta$}^{-(\alpha)}$ denote a set of parameters $\theta$ with $\alpha$ removed. The full conditional distribution for $\alpha_{l}\,,\ l=p+1,\ldots,p+q$ is

\alpha_{l}|\mbox{\boldmath$\theta$}^{-(\alpha_{l})}\sim\mbox{ Gamma}(a_{Y}+n_{% l},b_{Y}+\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k_{l}(\mbox{\bf u}-\mbox{\bf c% }_{l},h_{l})\,\ d\mbox{\bf u}),

where $n_{l}$ is the number of observations of taxon $l$ in the window.

C.2 Updating intensity parameters in homogeneous Poisson processes

Posterior conjugacy is also achieved in the full conditional distributions of intensity parameters, $\lambda^{C}_{v}\,,\ v=1,\ldots,p$ and $\lambda_{j}\,,\ j=p+q+1,\ldots,m$ , which are given by

\lambda^{C}_{v}|\mbox{\boldmath$\theta$}^{-(\lambda^{C}_{v})}\sim\mbox{ Gamma}% (a_{C}+n_{v},~{}b_{C}+\lvert\mathcal{W}\rvert)\,,\ v=1,\ldots,p;\mbox{ and}

\lambda_{j}|\mbox{\boldmath$\theta$}^{-(\lambda_{j})}\sim\mbox{ Gamma}(a+n_{j}% ,~{}b+\lvert\mathcal{W}\rvert)\,,\ j=p+q+1,\ldots,m;

where $n_{v}$ and $n_{j}$ are the numbers of observations for taxon $v$ and taxon $j$ within the window, respectively.

C.3 Updating bandwidth parameters

Since the full conditionals of the bandwidth parameters do not have standard forms, we use a random work Metropolis-Hastings step to update each of $h_{l}\,,\ l=1,\ldots,p$ . Denote $h_{j}^{(t)}$ the sample for $h_{j}\,,\ j=p+1.\ldots,p+q$ from iteration $t$ . For iteration $(t+1)$ , we propose a candidate sample $h_{j}^{*}$ as a random draw from $N(h_{j}^{(t)},\sigma^{2}_{prop})$ , where $\sigma^{2}_{prop}$ is the prespecified variance of the proposal density. The corresponding acceptance ratio computes to

R=\frac{\exp\left(-\alpha_{l}\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k(\mbox{% \bf u}-\mbox{\bf c}_{l},h_{j}^{*})\,\ d\mbox{\bf u}\right)\prod_{\mbox{\bf y}% \in Y_{l}}\left(\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k(\mbox{\bf u}-\mbox{% \bf c}_{l},h_{j}^{*})\right)\exp\left(-h_{j}^{*2}/2\sigma^{2}\right)\mathbb{I}% (h_{j}^{*}>0)}{\exp\left(-\alpha_{l}\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k(% \mbox{\bf u}-\mbox{\bf c}_{l},h_{j}^{(t)})\,\ d\mbox{\bf u}\right)\prod_{\mbox% {\bf y}\in Y_{l}}\left(\sum_{\mbox{\bf c}_{l}\in C_{l}}\int_{W}k(\mbox{\bf u}-% \mbox{\bf c}_{l},h_{j}^{(t)})\right)\exp\left(-h_{j}^{(t)2}/2\sigma^{2}\right)}.

Then, we accept the proposed candidate $h_{j}^{*}$ as $h_{j}^{(t+1)}$ with probability $\mbox{min}\{R,1\}$ or keep $h_{j}^{(t+1)}=h_{j}^{(t)}$ .

D Additional Tables from the simulation study

Here, we present additional details regarding the simulation scenarios (Table S.1) and results for the scenarios that included a taxon unrelated to the parent-offspring-type configurations of interest (Table S.2). The presence of an unrelated taxon (Table S.2) did not meaningfully affect the results (Table 1). Specifically, with or without this spatially unrelated taxon, the multivariate cluster point process (MCPP) performed better than the Neyman-Scott process (NSP) implementation in all aspects. The NSP often failed to converge, especially in scenarios where the bandwidth parameter was large.

Table S.1: A summary of twelve simulation scenarios considered in Section 4. The offspring density is controlled by setting

(\alpha_{2},\alpha_{3})=(1.5,1)

for ‘Sparse’,

(4,3)

for ‘Dense’ and

(4,1)

for ‘Mixed’ densities. Bandwidth ‘Low’ sets

(h_{2},h_{3})=(0.01,0.02)

and ‘High’ to

(0.1,0.01)

. The setting “Unrelated taxon” refers to whether there exists a taxon in the data spatially unrelated to the multilayered arrangement.

Scenario	Unrelated taxon	Offspring density	Bandwidth
1	Absent	Sparse	Low
2	Absent	Sparse	High
3	Absent	Dense	Low
4	Absent	Dense	High
5	Absent	Mixed	Low
6	Absent	Mixed	High
7	Present	Sparse	Low
8	Present	Sparse	High
9	Present	Dense	Low
10	Present	Dense	High
11	Present	Mixed	Low
12	Present	Mixed	High

Table S.2: The true value, estimates (EST), and uncertainty measures for the offspring density (

\alpha_{2}

\alpha_{3}

), bandwidth (

h_{2}

h_{3}

), and parent process (

\lambda^{C}_{1}

) parameters from the MCPP and NSP analyses in the last six simulated scenarios (those that included a spatially unrelated taxon). For the MCPP model, the estimates are the posterior means averaged over different datasets, the SD is computed by averaging the posterior standard deviation over different datasets, and the SD_EST is computed as the standard deviation of the estimates over the datasets. For the NSP model, the estimates are the outputs of the minimum contrast method, and the SE is calculated similarly by using these estimates. The SD for the NSP model is not computed, as the method does not provide an uncertainty measure. The last column (

\%

F) refers to the percentage of datasets in which the NSP model failed to converge for a given scenario. There is no corresponding

\%

F column for the MCPP because all models converged.

		True	MCPP			NSP
Scenario		Value	EST	SD	SD_EST	EST	SE	$\%$ F
	$\alpha_{2}$	1.50	1.53	0.10	0.10	1.46	0.33
	$\alpha_{3}$	1.00	1.02	0.08	0.09	3.31	20.25
7	$h_{2}$	0.01	0.01	$<0.01$	$<0.01$	0.01	$<0.01$	2
	$h_{3}$	0.02	0.02	$<0.01$	$<0.01$	0.04	0.09
	$\lambda^{C}_{1}$	150.00	161.06	12.91	12.20	171.35	34.72
	$\alpha_{2}$	1.50	1.48	0.11	0.09	198.46	283.69
	$\alpha_{3}$	1.00	1.02	0.08	0.09	0.98	0.28
8	$h_{2}$	0.10	0.08	0.01	0.01	10.30	28.72	36
	$h_{3}$	0.01	0.01	$<0.01$	$<0.01$	0.01	$<0.01$
	$\lambda^{C}_{1}$	150.00	160.25	12.86	12.57	939.70	2777.40
	$\alpha_{2}$	4.00	4.02	0.14	0.15	8.77	48.78
	$\alpha_{3}$	3.00	3.05	0.13	0.13	2.91	0.77
9	$h_{2}$	0.01	0.01	$<0.01$	$<0.01$	0.02	0.08	0
	$h_{3}$	0.02	0.02	$<0.01$	$<0.01$	0.02	$<0.01$
	$\lambda^{C}_{1}$	150.00	202.78	14.38	14.29	208.52	39.37
	$\alpha_{2}$	4.00	4.00	0.17	0.17	613.34	569.20
	$\alpha_{3}$	3.00	3.02	0.13	0.14	2.93	0.53
10	$h_{2}$	0.10	0.09	0.01	0.01	1.15	0.64	48
	$h_{3}$	0.01	0.01	$<0.01$	$<0.01$	0.01	$<0.01$
	$\lambda^{C}_{1}$	150.00	200.49	14.26	14.48	13.30	28.78
	$\alpha_{2}$	4.00	4.05	0.15	0.14	18.45	87.48
	$\alpha_{3}$	1.00	1.00	0.07	0.08	2.05	10.04
11	$h_{2}$	0.01	0.01	$<0.01$	$<0.01$	0.03	0.11	0
	$h_{3}$	0.02	0.02	$<0.01$	$<0.01$	0.47	4.41
	$\lambda^{C}_{1}$	150.00	201.50	14.37	14.09	203.36	53.30
	$\alpha_{2}$	4.00	4.02	0.17	0.17	547.61	553.61
	$\alpha_{3}$	1.00	1.02	0.07	0.07	0.97	0.24
12	$h_{2}$	0.10	0.09	0.01	0.01	1.04	0.82	70
	$h_{3}$	0.01	0.01	$<0.01$	$<0.01$	0.01	$<0.01$
	$\lambda^{C}_{1}$	150.00	199.05	14.23	15.95	26.62	41.46

E Sensitivity analyses regarding choice of prior for the bandwidth parameters

As part of the simulation study described in Section 4, we also evaluated the MCPP method’s sensitivity to choice of prior distribution for the bandwidth parameters. We considered four different prior distributions; 1) half-normal, 2) uniform, 3) log-normal with a flat tail and high variance and 4) log-normal with a slim tail and higher peak. For the uniform prior, the lower and upper bounds were taken to be $0$ and $0.2$ , respectively. Both the log-normal priors had $\mu=\log 0.05$ ; the flat-tailed prior had $\sigma=1$ , and the high-peaked prior had $\sigma=0.1$ as the hyperparameter. The hyperparameter setting for the half-normal prior was the same as in Section 4. We compared performance of the MCPP for the different prior distributions in Scenario 5 and 6 (Table 1): both scenarios considered mixed offspring density ( $\alpha_{2}$ =4 and $\alpha_{3}$ =1), one had low bandwidth ( $h_{2}$ =0.01 and $h_{3}$ =0.02), and the other had high bandwidth ( $h_{2}$ =0.1 and $h_{3}$ =0.01).

We report the mean absolute percentage bias for estimating the corresponding parameters in the two scenarios for the four different prior settings: i) Half-normal, ii) Uniform, iii) a flat Log-normal, and iv) a tight Log-normal. The half-normal prior-based MCPP model performed the best, and the performance was similar to that for the original model. When the true bandwidth was low, all the models—irrespective of prior choice—generally performed well and similarly to each other, with almost all biases $<8\%$ . Differences in performance emerged when the true bandwidth was high, where the analyses with tighter priors produced much less biased estimates ( $<10\%$ except in one instance) than the analyses with flatter priors (4-137%; Table S.3). However, using an informative log-normal prior backfired even for the low-bandwidth scenario when the offspring density was also low, as for the second offspring process ( $\sim$ 20-25%).

Table S.3: Results of MCPP based analysis of simulated data, comparing different priors for the bandwidth parameters. The true values for the offspring densities were

\alpha_{2}

=4 and

\alpha_{3}

=1. The true values for the bandwidth parameters were

h_{2}

=0.01 and

h_{3}

=0.02 under low bandwidth and

h_{2}

=0.1 and

h_{3}

=0.01 under high bandwidth. The parent process is denoted

\lambda^{C}

. Results are presented as mean absolute percentage bias of the estimated parameter values based on posterior means of each of the

100

simulated datasets. There were no other taxa unrelated to these multi-layered arrangements.

				Log-normal	Log-normal
		Half-normal	Uniform	(flat)	(tight)
	$\alpha_{2}$	0.03	0.03	0.03	0.03
Low	$\alpha_{3}$	0.07	0.07	0.07	0.07
bandwidth	$h_{2}$	0.02	0.02	0.02	0.05
	$h_{3}$	0.04	0.04	0.04	0.24
	$\lambda^{C}_{1}$	0.06	0.06	0.06	0.06
	$\alpha_{2}$	0.03	0.31	0.52	0.03
High	$\alpha_{3}$	0.05	0.05	0.05	0.05
bandwidth	$h_{2}$	0.09	0.99	1.37	0.10
	$h_{3}$	0.03	0.03	0.03	0.20
	$\lambda^{C}_{1}$	0.06	0.06	0.06	0.06

F Additional Figures and Tables for the Analysis of Human Microbiome Biofilm Image Data

Here, we present a visual representation of the four quadrants of the whole dental plaque data for subset analyses in Figure S.2 and the abundances of different taxa in the four quadrants of the subsetted data (Table S.4). The estimates for the intensity functions for the five taxa (Neisseriaceae, Capnocytophaga, Actinomyces, Fusobacterium, Leptotrichia) that have no visible spatial relationship with the parent-offspring-type configurations also varied across the quadrants (Table S.5). K-functions for the whole and subsetted analyses (Figures S.3 and S.4 through S.7) also varied noticeably by quadrant. The DIC estimates for the different models explored in Section 5.2 (Table S.6) indicate that the models with the Fusobacterium and Leptotrichia as an additional parent-offspring pair is a better fit to the data than the original model, while models fitting Streptococcus around Fusobacterium do not fit the data well.

Table S.4: The abundance (counts) of bacterial taxa of interest in the human dental plaque sample image data and its four subdivided quadrants.

Taxon	Quadrant				Total
	I	II	III	IV
Actinomyces	119	280	154	223	776
Capnocytophaga	512	755	574	573	2414
Corynebacterium	58	219	186	245	708
Fusobacterium	92	250	141	173	656
Leptotrichia	191	411	234	339	1175
Neisseriaceae	339	479	402	491	1711
Pasteurellaceae	53	130	76	106	365
Porphyromonas	227	525	269	420	1441
Streptococcus	98	379	163	249	889

Table S.5: The posterior means of parameters associated with Neisseriaceae (

\lambda_{5}

), Capnocytophaga (

\lambda_{6}

), Actinomyces (

\lambda_{7}

), Fusobacterium (

\lambda_{8}

) and Leptotrichia (

\lambda_{9}

) obtained by applying the proposed MCPP method on the entire image and on each of the four quadrants of the dental plaque sample image. All results are rounded to two decimal places. The posterior standard deviations were all smaller than 0.01 and are not reported separately.

Full Image	0.04	0.06	0.02	0.02	0.03
	$\lambda_{5}$	$\lambda_{6}$	$\lambda_{7}$	$\lambda_{8}$	$\lambda_{9}$
Segment 1	0.04	0.06	0.01	0.01	0.02
Segment 2	0.05	0.07	0.03	0.02	0.04
Segment 3	0.04	0.05	0.01	0.01	0.02
Segment 4	0.05	0.06	0.02	0.02	0.03

Table S.6: The parent-offspring relationships explored in different models and their DIC. The abbreviations C, S, Po, Pa, F and L are used for Corynebacterium, Streptococcus, Porphyromonas, Pasteurellaceae, Fusobacterium and Leptotrichia. The

\rightarrow

implies parent-offspring relationship with the arrow directed from the parent to the offspring(s).

Identifier	Parent-Offspring Realtions Present	DIC
1	$C\rightarrow SPo\vdots S\rightarrow Pa$	124985.4
2	$C\rightarrow Po\vdots S\rightarrow Pa\vdots F\rightarrow S$	125531.1
3	$C\rightarrow SPo\vdots F\rightarrow S\vdots S\rightarrow Pa$	134007.1
4	$C\rightarrow SPo\vdots S\rightarrow Pa\vdots F\rightarrow L$	124317.0
5	$C\rightarrow SPo\vdots S\rightarrow Pa\vdots L\rightarrow F$	124303.5
6	$C\rightarrow SPo\vdots S\rightarrow Pa\vdots F\rightarrow LS$	133357.2
7	$C\rightarrow SPo\vdots S\rightarrow Pa\vdots L\rightarrow F\vdots F\rightarrow S$	133345.2

We additionally present two images (Figures S.8 and S.9) to make the case for testing the different models as described in Sections 5.2.1 and 5.2.2 of the main manuscript.

G Model Formulation for Given Parent-Offspring Relationships

In this section, we illustrate the formulation of the MCPP model for a given parent-offspring configuration, as described in Section 3.1 of the main manuscript. Using the notation introduced in Section 3.5, we assume data containing four taxa $A,B,C,$ and $D$ ( $m=4$ ). We detail the construction of the likelihood function according to equations (1)-(3) for the following four different configurations:

a)

$A\rightarrow B\vdots C\rightarrow D$
b)

$A\rightarrow BC$
c)

$A\rightarrow B\vdots B\rightarrow C$
d)

$A\rightarrow BC\vdots C\rightarrow D$

G.1 Modeling $A\rightarrow B\vdots C\rightarrow D$

The model represents a configuration involving two parent processes (taxa $A$ and $C$ , $p=2$ ) and two offspring processes (taxa $B$ and $D$ , $q=2$ ). That is, all taxa are interrelated within parent-offspring framework, with no taxa existing outside of these relationships ( $m-p-q=0$ ). Therefore, following equations (1)-(3), the corresponding model formulation can be written as follows:

\begin{split}\lambda_{A}(\mbox{\bf s})&=\lambda_{A}\\ \lambda_{C}(\mbox{\bf s})&=\lambda_{B}\\ \lambda_{B}(\mbox{\bf s})&=\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s% }-\mbox{\bf c},h_{B})\\ \lambda_{D}(\mbox{\bf s})&=\alpha_{D}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf s% }-\mbox{\bf c},h_{D}).\end{split}

It follows that

\begin{split}l(Y|\mbox{\boldmath$\theta$})&\propto|\mathcal{W}|-|\mathcal{W}|% \lambda_{A}-|\mathcal{W}|\lambda_{C}\\ &\quad+\alpha_{B}\sum_{\mbox{\bf c}\in A}\int_{\mathcal{W}}k_{B}(\mbox{\bf u}-% \mbox{\bf c},h_{B})\,\ d\mbox{\bf u}+\alpha_{D}\sum_{\mbox{\bf c}\in C}\int_{% \mathcal{W}}k_{D}(\mbox{\bf u}-\mbox{\bf c},h_{D})\,\ d\mbox{\bf u}\\ &\quad+n_{A}\log\lambda_{A}+n_{C}\log\lambda_{C}\\ &\quad+\sum_{\mbox{\bf s}\in B}\log\left(\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{% B}(\mbox{\bf s}-\mbox{\bf c},h_{B})\right)+\sum_{\mbox{\bf s}\in D}\log\left(% \alpha_{D}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf s}-\mbox{\bf c},h_{D})\right% ).\end{split}

G.2 Modeling $A\rightarrow BC$

In this model, we investigate the configuration where two offspring processes (taxa $B$ and $C$ , $q=2$ ) share the same parent process (taxon $A$ , $p=1$ ). Additionally, the model includes a separate process for taxon $D$ , which remains uninvolved in any parent-offspring relationship ( $m-p-q=1$ ). The model formulation can be written as

\begin{split}\lambda_{A}(\mbox{\bf s})&=\lambda_{A},\\ \lambda_{B}(\mbox{\bf s})&=\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s% }-\mbox{\bf c},h_{B}),\\ \lambda_{C}(\mbox{\bf s})&=\alpha_{C}\sum_{\mbox{\bf c}\in A}k_{C}(\mbox{\bf s% }-\mbox{\bf c},h_{C}),\\ \lambda_{D}(\mbox{\bf s})&=\lambda_{D}.\end{split}

It follows that

\begin{split}l(Y|\mbox{\boldmath$\theta$})&\propto|\mathcal{W}|-|\mathcal{W}|% \lambda_{A}-|\mathcal{W}|\lambda_{D}\\ &\quad-\alpha_{B}\sum_{\mbox{\bf c}\in A}\int_{\mathcal{W}}k_{B}(\mbox{\bf u}-% \mbox{\bf c},h_{B})\,\ d\mbox{\bf u}-\alpha_{C}\sum_{\mbox{\bf c}\in A}\int_{% \mathcal{W}}k_{C}(\mbox{\bf u}-\mbox{\bf c},h_{C})\,\ d\mbox{\bf u}\\ &\quad+n_{A}\log\lambda_{A}+n_{D}\log\lambda_{D}\\ &\quad+\sum_{\mbox{\bf s}\in B}\log\left(\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{% B}(\mbox{\bf s}-\mbox{\bf c},h_{B})\right)+\sum_{\mbox{\bf s}\in C}\log\left(% \alpha_{C}\sum_{\mbox{\bf c}\in A}k_{C}(\mbox{\bf s}-\mbox{\bf c},h_{C})\right% ).\end{split}

G.3 Modeling $A\rightarrow B\vdots B\rightarrow C$

This scenario presents a more complex relationship where taxon $B$ behaves both as an offspring to taxon $A$ and as a parent to taxon $C$ . In this setup, there is only one process serving as a parent (taxon $A$ , $p=1$ ), two processes functioning as offspring (taxa $B$ and $C$ , $q=2$ ), and a separate process for taxon $D$ , which is uninvolved in any parent-offspring relationship ( $m-p-q=1$ ). It is worth noting that $p=1$ as the process for taxon $B$ is counted as an offspring process, while the locations of taxon $B$ will be used for modeling the process for taxon $C$ . Therefore, the MCPP under the configuration can be written as

\begin{split}\lambda_{A}(\mbox{\bf s})&=\lambda_{A},\\ \lambda_{B}(\mbox{\bf s})&=\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s% }-\mbox{\bf c},h_{B}),\\ \lambda_{C}(\mbox{\bf s})&=\alpha_{C}\sum_{\mbox{\bf c}\in B}k_{C}(\mbox{\bf s% }-\mbox{\bf c},h_{C}),\\ \lambda_{D}(\mbox{\bf s})&=\lambda_{D}.\end{split}

It follows that

\begin{split}l(Y|\mbox{\boldmath$\theta$})&\propto|\mathcal{W}|-|\mathcal{W}|% \lambda_{A}-|\mathcal{W}|\lambda_{D}\\ &\quad-\alpha_{B}\int_{\mathcal{W}}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf u}-% \mbox{\bf c},h_{B})\,\ d\mbox{\bf u}-\alpha_{C}\int_{\mathcal{W}}\sum_{\mbox{% \bf c}\in A}k_{C}(\mbox{\bf u}-\mbox{\bf c},h_{C})\,\ d\mbox{\bf u}\\ &\quad+n_{A}\log\lambda_{A}+n_{D}\log\lambda_{D}\\ &\quad+\sum_{\mbox{\bf s}\in B}\log\left(\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{% B}(\mbox{\bf s}-\mbox{\bf c},h_{B})\right)+\sum_{\mbox{\bf s}\in C}\log\left(% \alpha_{C}\sum_{\mbox{\bf c}\in A}k_{C}(\mbox{\bf s}-\mbox{\bf c},h_{C})\right% ).\end{split}

G.4 Modeling $A\rightarrow BC\vdots C\rightarrow D$

In this model, we explore a more complex scenario where the two processes for taxa $B$ and $C$ share the same parent process (for taxon $A$ ). Furthermore, the process for taxon $C$ not only functions as an offspring process with respect to that of taxon $A$ but also acts as a parent process for taxon $D$ . This configuration involves one process serving exclusively as a parent (taxon $A$ , $p=1$ ) and three processes serving as offspring processes (taxa $B,C$ and $D$ , $q=3$ ). Consequently, no taxon remains uninvolved in parent-offspring relationships ( $m-p-q=0$ ). As in Section G.3, it is important to note that $p=1$ as the process for taxon $C$ is counted as an offspring process, despite its role as a parent process for taxon $D$ . The corresponding model formulation can be written as follows:

\begin{split}\lambda_{A}(\mbox{\bf s})&=\lambda_{A},\\ \lambda_{B}(\mbox{\bf s})&=\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s% }-\mbox{\bf c},h_{B}),\\ \lambda_{C}(\mbox{\bf s})&=\alpha_{C}\sum_{\mbox{\bf c}\in A}k_{C}(\mbox{\bf s% }-\mbox{\bf c},h_{C}),\\ \lambda_{D}(\mbox{\bf s})&=\alpha_{D}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf s% }-\mbox{\bf c},h_{D}).\end{split}

It follows that

\begin{split}l(Y|\mbox{\boldmath$\theta$})&\propto|\mathcal{W}|-|\mathcal{W}|% \lambda_{A}\\ &\quad-\alpha_{B}\int_{\mathcal{W}}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf u}-% \mbox{\bf c},h_{B})\,\ d\mbox{\bf u}-\alpha_{C}\int_{\mathcal{W}}\sum_{\mbox{% \bf c}\in A}k_{C}(\mbox{\bf u}-\mbox{\bf c},h_{C})\,\ d\mbox{\bf u}\\ &\quad-\alpha_{D}\int_{\mathcal{W}}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf u}-% \mbox{\bf c},h_{D})\,\ d\mbox{\bf u}+n_{A}\log\lambda_{A}+\sum_{\mbox{\bf s}% \in B}\log\left(\alpha_{B}\sum_{\mbox{\bf c}\in A}k_{B}(\mbox{\bf s}-\mbox{\bf c% },h_{B})\right)\\ &\quad+\sum_{\mbox{\bf s}\in C}\log\left(\alpha_{C}\sum_{\mbox{\bf c}\in A}k_{% C}(\mbox{\bf s}-\mbox{\bf c},h_{C})\right)+\sum_{\mbox{\bf s}\in D}\log\left(% \alpha_{D}\sum_{\mbox{\bf c}\in C}k_{D}(\mbox{\bf s}-\mbox{\bf c},h_{D})\right% ).\end{split}

Multivariate cluster point process to quantify and explore multi-entity configurations: Application to biofilm image data - Supplementary Materials

A Neyman-Scott process

B Images of the taxa from the human dental plaque biofilm data not visualized in the image included in the main text

C Computational details of the sampling algorithm

C.1 Updating parameters associated with offspring densities

C.2 Updating intensity parameters in homogeneous Poisson processes

C.3 Updating bandwidth parameters

D Additional Tables from the simulation study

E Sensitivity analyses regarding choice of prior for the bandwidth parameters

F Additional Figures and Tables for the Analysis of Human Microbiome Biofilm Image Data

G Model Formulation for Given Parent-Offspring Relationships

G.1 Modeling A→B⁢⋮⁢C→D→𝐴𝐵⋮𝐶→𝐷A\rightarrow B\vdots C\rightarrow Ditalic_A → italic_B ⋮ italic_C → italic_D

G.2 Modeling A→B⁢C→𝐴𝐵𝐶A\rightarrow BCitalic_A → italic_B italic_C

G.3 Modeling A→B⁢⋮⁢B→C→𝐴𝐵⋮𝐵→𝐶A\rightarrow B\vdots B\rightarrow Citalic_A → italic_B ⋮ italic_B → italic_C

G.4 Modeling A→B⁢C⁢⋮⁢C→D→𝐴𝐵𝐶⋮𝐶→𝐷A\rightarrow BC\vdots C\rightarrow Ditalic_A → italic_B italic_C ⋮ italic_C → italic_D

G.1 Modeling $A\rightarrow B\vdots C\rightarrow D$

G.2 Modeling $A\rightarrow BC$

G.3 Modeling $A\rightarrow B\vdots B\rightarrow C$

G.4 Modeling $A\rightarrow BC\vdots C\rightarrow D$