# CoRpower’s Algorithms for Simulating Placebo Group and Baseline Immunogenicity Predictor Data

## Introduction

The CoRpower package assumes that $$P(Y^{\tau}(1)=Y^{\tau}(0))=1$$ for the biomarker sampling timepoint $$\tau$$, which renders the CoR parameter $$P(Y=1 \mid S=s_1, Z=1, Y^{\tau}=0)$$ equal to $$P(Y=1 \mid S=s_1, Z=1, Y^{\tau}(1)=Y^{\tau}(0)=0)$$, which links the CoR and biomarker-specific treatment efficacy (TE) parameters. Estimation of the latter requires outcome data in placebo recipients, and some estimation methods additionally require availability of a baseline immunogenicity predictor (BIP) of $$S(1)$$, the biomarker response at $$\tau$$ under assignment to treatment. In order to link power calculations for detecting a correlate of risk (CoR) and a correlate of TE (coTE), CoRpower allows to export simulated data sets that are used in CoRpower’s calculations and that are extended to include placebo-group and BIP data for harmonized use by methods assessing biomarker-specific TE. This vignette aims to describe CoRpower’s algorithms, and the underlying assumptions, for simulating placebo-group and BIP data. The exported data sets include full rectangular data to allow the user to consider various biomarker sub-sampling designs, e.g., different biomarker case:control sampling ratios, or case-control vs. case-cohort designs.

## Algorithms for Simulating Placebo Group Data

### Trichotomous $$\, X$$ and $$\, S(1)$$ Using Approach 1

1. Specify $$P^{lat}_0$$, $$P^{lat}_2$$, $$P_0$$, $$P_2$$, $$risk_0$$, $$n_{cases, 0}$$, $$n_{controls, 0}$$, $$K$$
• $$N_{complete, 0} = n_{cases, 0} + n_{controls, 0}$$
2. Specify $$Sens$$, $$Spec$$, $$FP^0$$, and $$FN^2$$
3. Number of observations in each latent subgroup: $$N_x = N_{complete, 0} P^{lat}_x$$
4. Simulate $$X$$ under the assumption of homogeneous risk in the placebo group:
• Cases: $$\left(n_{cases, 0}(0),n_{cases,0}(1),n_{cases,0}(2)\right) \sim \mathsf{Mult}(n_{cases,0},(p_0,p_1,p_2))$$, where \begin{align*} p_x=P(X=x|Y=1,Y^{\tau}=0,Z=0) &= P(X=x|Y(0)=1)\\ &= \frac{P(Y(0)=1|X=x)P(X=x)}{P(Y(0)=1)}\\ &= \frac{risk^{lat}_0(x)P^{lat}_{x}}{risk_0}\\ &= P^{lat}_{x} \quad \text{because } risk^{lat}_0(x)=risk_0 \end{align*}
• Controls: $$\left(n_{controls,0}(0),n_{controls,0}(1),n_{controls,0}(2)\right) \sim \mathsf{Mult}(n_{controls,0},(p_0,p_1,p_2))$$, where \begin{align*} p_x=P(X=x|Y=0,Y^{\tau}=0,Z=0) &= P(X=x|Y(0)=0)\\ &= \frac{P(Y(0)=0|X=x)P(X=x)}{P(Y(0)=0)}\\ &= \frac{(1-risk^{lat}_0(x))P^{lat}_{x}}{(1-risk_0)}\\ &= P^{lat}_{x} \quad \text{because } risk^{lat}_0(x)=risk_0 \end{align*}
• $$n_{controls,0}(x) = N_x - n_{cases,0}(x)$$
5. Simulate $$Y$$: Vector with $$n_{cases,0}(0)$$ 1’s, followed by $$n_{controls,0}(0)$$ 0’s, followed by $$n_{cases,0}(1)$$ 1’s, etc.
6. Simulate $$S(1)$$: For each of the $$N_x$$ subjects, generate $$S(1)$$ by a draw from $$\mathsf{Mult}(1,(p_0,p_1,p_2))$$, where $$p_k=P(S(1)=k|X=x)$$ is given by $$Sens, Spec$$, etc.

### Trichotomous $$\, X$$ and $$\, S(1)$$ Using Approach 2

1. Specify $$P^{lat}_0$$, $$P^{lat}_2$$, $$P_0$$, $$P_2$$, $$risk_0$$, $$N_{complete,0}$$, $$n_{cases,0}$$, $$n^S_{cases}$$, $$K$$
2. Specify $$\rho$$ and $$\sigma^2_{obs}$$
3. Calculation of $$(Sens, Spec, FP^0, FP^1, FN^1, FN^2)$$:
1. Assuming the classical measurement error model, where $$X^{\ast} \sim \mathsf{N}(0,\sigma^2_{tr})$$, solve $P^{lat}_0 = P(X^{\ast} \leq \theta_0) \quad \textrm{and} \quad P^{lat}_2 = P(X^{\ast} > \theta_2)$ for $$\theta_0$$ and $$\theta_2$$
2. Generate $$B$$ realizations of $$X^{\ast}$$ and $$S^{\ast} = X^{\ast} + e$$, where $$e \sim \mathsf{N}(0,\sigma^2_{e})$$, and $$X^{\ast}$$ independent of $$e$$ + $$B = 20,000$$ by default
3. Using $$\theta_0$$ and $$\theta_2$$ from Step i., define \begin{align*} Spec(\phi_0) &= P(S^{\ast} \leq \phi_0 \mid X^{\ast} \leq \theta_0)\\ FN^1(\phi_0) &= P(S^{\ast} \leq \phi_0 \mid X^{\ast} \in (\theta_0,\theta_2])\\ FN^2(\phi_0) &= P(S^{\ast} \leq \phi_0 \mid X^{\ast} > \theta_2)\\ Sens(\phi_2) &= P(S^{\ast} > \phi_2 \mid X^{\ast} > \theta_2)\\ FP^1(\phi_2) &= P(S^{\ast} > \phi_2 \mid X^{\ast} \in (\theta_0,\theta_2])\\ FP^0(\phi_2) &= P(S^{\ast} > \phi_2 \mid X^{\ast} \leq \theta_0) \end{align*}

Estimate $$Spec(\phi_0)$$ by $\widehat{Spec}(\phi_0) = \frac{\#\{S^{\ast}_b \leq \phi_0, X^{\ast}_b \leq \theta_0\}}{\#\{X^{\ast}_b \leq \theta_0\}}\,$ etc.
4. Find $$\phi_0 = \phi^{\ast}_0$$ and $$\phi_2 = \phi^{\ast}_2$$ that numerically solve \begin{align*} P_0 &= \widehat{Spec}(\phi_0)P^{lat}_0 + \widehat{FN}^1(\phi_0)P^{lat}_1 + \widehat{FN}^2(\phi_0)P^{lat}_2\\ P_2 &= \widehat{Sens}(\phi_2)P^{lat}_2 + \widehat{FP}^1(\phi_2)P^{lat}_1 + \widehat{FP}^0(\phi_2)P^{lat}_0 \end{align*} and compute $Spec = \widehat{Spec}(\phi^{\ast}_0),\; Sens = \widehat{Sens}(\phi^{\ast}_2),\; \textrm{etc.}$
4. Follow Steps 3–6 under Approach 1

### Continuous $$\, X^*$$ and $$\, S^*(1)$$

1. Specify $$P^{lat}_{lowestVE}$$, $$\rho$$, $$\sigma^2_{obs}$$, $$VE_{lowest}$$, $$risk_0$$, $$n_{cases,0}$$, $$n_{controls, 0}$$, $$n^S_{cases}$$, $$K$$
• $$N_{complete, 0} = n_{cases, 0} + n_{controls, 0}$$
2. Simulate $$Y$$ by creating a vector with $$n_{cases,0}$$ 1’s followed by $$n_{controls,0}$$ 0’s.
3. Simulate $$X^*$$ under the assumption of homogeneous risk in the placebo group:
• Cases: from a grid of values ranging from -3 to 3, sample $$n_{cases,0}$$ with replacement from: \begin{align*} f_{X^{\ast}}(x^{\ast}|Y=1,Y^{\tau}=0,Z=0) &= f_{X^{\ast}}(x^{\ast}|Y(0)=1)\\ &= \frac{P(Y(0)=1|X^*=x^*)f_{X^{\ast}}(x^{\ast})}{P(Y(0)=1)}\\ &= \frac{risk^{lat}_0(x^*)f_{X^{\ast}}(x^{\ast})}{risk_0}\\ &= f_{X^{\ast}}(x^{\ast}) \quad \text{because } risk^{lat}_0(x^*)=risk_0 \end{align*}
• Controls: from a grid of values ranging from -3 to 3, sample $$n_{controls,0}$$ with replacement from: \begin{align*} f_{X^{\ast}}(x^{\ast}|Y=0,Y^{\tau}=0,Z=0) &= f_{X^{\ast}}(x^{\ast}|Y(0)=0)\\ &= \frac{P(Y(0)=0|X^*=x^*)f_{X^{\ast}}(x^{\ast})}{P(Y(0)=0)}\\ &= \frac{(1-risk^{lat}_0(x^*))f_{X^{\ast}}(x^{\ast})}{1-risk_0}\\ &= f_{X^{\ast}}(x^{\ast}) \quad \text{because } risk^{lat}_0(x^*)=risk_0 \end{align*}
• $$f_{X^{\ast}}(x^{\ast})$$ is fully specified because $$X^* \sim N(0, \sigma^2_{tr})$$
4. Simulate $$S^*(1)$$: $$S^*(1)=X^*+\epsilon,$$ where $$\epsilon \sim N(0, \sigma^2_e)$$ and $$\sigma_e^2=(1-\rho)\sigma^2_{obs}$$. $$\epsilon$$ is independent of $$X^*$$ and is simulated by rnorm(Ncomplete, mean=0, sd=sqrt(sigma2e))

## Algorithms for Simulating a Baseline Immunogenicity Predictor (BIP)

### Trichotomous $$\, X, S(1),$$ and $$\, BIP$$ Using Approach 1

1. The user specifies a classification rule defined by $$P(BIP=i \mid S(1)=j)$$, $$i,j=0,1,2$$.
2. For a subject with biomarker measurement $$S_k(1)$$, generate $$BIP_k$$ by a draw from $$\mathsf{Mult}(1, (q_0, q_1, q_2))$$, where $$q_i=P(BIP_k=i \mid S(1)=S_k(1))$$, $$i=0,1,2$$.

### Trichotomous $$\, X, S(1),$$ and $$\, BIP$$ Using Approach 2

Note: All variables with * are continuous.

1. The user specifies $$\mathop{\mathrm{corr}}(BIP^*, S^*(1))$$.
2. Assuming that $$BIP^*$$ follows an additive measurement error model, i.e., $$BIP^* := S^*(1) + \delta$$, where $$\delta \sim N(0, \sigma^2_{\delta})$$ with an unknown $$\sigma^2_{\delta}$$, and $$\delta, \epsilon$$, and $$X^*$$ are independent, solve the following equation for $$\mathop{\mathrm{var}}\delta = \sigma^2_{\delta}$$: $\mathop{\mathrm{corr}}(BIP^*, S^*(1)) = \sqrt\frac{\mathop{\mathrm{var}}X^* + \mathop{\mathrm{var}}\epsilon}{\mathop{\mathrm{var}}X^* + \mathop{\mathrm{var}}\epsilon + \mathop{\mathrm{var}}\delta}$
3. For the fixed $$\phi^{\ast}_0$$ and $$\phi^{\ast}_2$$ derived above, define \begin{align*} Spec_{BIP}(\xi_0) &= P(BIP^{\ast} \leq \xi_0 \mid S^{\ast} \leq \phi^{\ast}_0)\\ FN^1_{BIP}(\xi_0) &= P(BIP^{\ast} \leq \xi_0 \mid S^{\ast} \in (\phi^{\ast}_0,\phi^{\ast}_2])\\ FN^2_{BIP}(\xi_0) &= P(BIP^{\ast} \leq \xi_0 \mid S^{\ast} > \phi^{\ast}_2)\\ Sens_{BIP}(\xi_2) &= P(BIP^{\ast} > \xi_2 \mid S^{\ast} > \phi^{\ast}_2)\\ FP^1_{BIP}(\xi_2) &= P(BIP^{\ast} > \xi_2 \mid S^{\ast} \in (\phi^{\ast}_0,\phi^{\ast}_2])\\ FP^0_{BIP}(\xi_2) &= P(BIP^{\ast} > \xi_2 \mid S^{\ast} \leq \phi^{\ast}_0) \end{align*}
4. Using the same technique as in the derivation of $$\phi^{\ast}_0$$ and $$\phi^{\ast}_2$$ above, find $$\xi_0=\xi^{\ast}_0$$ and $$\xi_2=\xi^{\ast}_2$$ that numerically solve \begin{align*} P_0 &= \widehat{Spec}_{BIP}(\xi_0)P_0 + \widehat{FN}_{BIP}^1(\xi_0)P_1 + \widehat{FN}_{BIP}^2(\xi_0)P_2\\ P_2 &= \widehat{Sens}_{BIP}(\xi_2)P_2 + \widehat{FP}_{BIP}^1(\xi_2)P_1 + \widehat{FP}_{BIP}^0(\xi_2)P_0 \end{align*} and compute $Spec_{BIP} = \widehat{Spec}_{BIP}(\xi^{\ast}_0),\; Sens_{BIP} = \widehat{Sens}_{BIP}(\xi^{\ast}_2),\; \textrm{etc.}$
5. For a subject with biomarker measurement $$S_k(1)$$, generate $$BIP_k$$ by a draw from $$\mathsf{Mult}(1, (q_0, q_1, q_2))$$, where $$q_i$$, $$i=0,1,2$$, are determined by $$Sens_{BIP}$$, $$Spec_{BIP}$$, etc. obtained in Step 4.

### Continuous $$\, X^*, S^*(1),$$ and $$\, BIP^*$$

1. The user specifies $$\mathop{\mathrm{corr}}(BIP^*, S^*(1))$$.
2. Assuming that $$BIP^*$$ follows an additive measurement error model, i.e., $$BIP^* := S^*(1) + \delta$$, where $$\delta \sim N(0, \sigma^2_{\delta})$$ with an unknown $$\sigma^2_{\delta}$$, and $$\delta, \epsilon$$, and $$X^*$$ are independent, solve the following equation for $$\mathop{\mathrm{var}}\delta = \sigma^2_{\delta}$$: $\mathop{\mathrm{corr}}(BIP^*, S^*(1)) = \sqrt\frac{\mathop{\mathrm{var}}X^* + \mathop{\mathrm{var}}\epsilon}{\mathop{\mathrm{var}}X^* + \mathop{\mathrm{var}}\epsilon + \mathop{\mathrm{var}}\delta}$
3. For a subject with biomarker measurement $$S^*_k(1)$$, generate $$BIP^*_k$$ as $$BIP^*_k = S^*_k(1) + \delta$$ using $$\sigma^2_{\delta} = \mathop{\mathrm{var}}\delta$$ obtained in Step 2.