Guide for package DesignCTPB

Yitao Lu, Belaid, Xuekui Zhang



As a future trend of health care, personalized medicine tailors medical treatments to individual patients. It requires to identify a subset of patients with the best response to treatment. The subset can be defined by a biomarker (e.g. expression of a gene) and its cutoff value. However, designing clinical trials that utilize the discovered uncertain subsets/biomarkers is not trivial and rarely discussed in the literature.

And we formulate the problem of clinical trial design into an optimization problem involving high-dimensional integration, and propose a novel computational solution based on Monte-Carlo and smoothing methods. Our method utilizes the modern techniques of General-Purpose computing on Graphics Processing Units for large-scale parallel computing. Compared to a published method in three-dimensional problems, our approach is more accurate and 133 times faster. This advantage increases when dimensionality increases. Our method is scalable to higher-dimensional problems since the precision bound of our estimated study power is a finite number not affected by dimensionality.

This package can guide researchers to do clinical trials with nested sub-population effect, which is easily to use. Before using the package, please check your CUDA and CUDAtoolkit are well-installed. Here are the guidance for users.

How to


Calculating optimal alpha-split for a given setting of proportion of each subpopulation

alpha.split()# the default setting will give an optimal results of 3-dimensional case

Calculating optimal alpha-split for variate settings of r values (i.e. size of nested subpopulations), and visualize their results, calculate optimal choice of the proportion for each subset

In this guidance, we show the results presented in our paper, which are the simulation examples for the strong and weak biomarker effect conditions. The following chunk shows how to get the results. m is the density value for grid setting or r(the proportion for each sub-population); n_dim denotes the dimension; N1 and N2 are fixed and we suggest do not change them otherwise have to change the corresponding the number of threads and block in python code. N3 could be changed and has to be the multiplier of 5. E is the total number of events in the clinical trial, if not specified, we will apply an estimated information units, please refer to formula(10) in our paper. SIGMA is the matrix of standard deviation of each sub-population, which should coincide with r_set or the default setting of each sub-population(i.e each entry of each row coincides to the corresponding entry in r_set). For simplify, we apply \(\sigma_i = \frac{1}{\sqrt{20*r_i}}\) which has been explained in the paper. DELTA is the matrix of harzard reduction corresponding to the r setting too. While for simplify, we use a linear scheme of harzard reduction, which means \(\Delta_i = 0.8-0.6*r_i\) in our example below.

res <- designCTPB(m=24, n_dim=3, sd_full=1/sqrt(20),delta_linear_bd=c(0.2,0.8))
res$plot_alpha # to see the 3-d rotatable plot of optimal alpha versus r2 and r3.
res$plot_power # to see the 3-d rotatable plot of optimal power versus r2 and r3.

For the time consuming problem, we load the pre-run data and show the results below.

Weak biomarker effect

Fitted TPS surface of the optimal power: Fitted TPS surface of the optimal \(\alpha\) :

data(ctpbw, package = "DesignCTPB")
#optimal choice of each population's proportion
#>        r2        r3 
#> 0.3028319 0.0000000
#the optimal power of the optimal design
#> [1] 0.787761
#the optimal alpha split of the optimal design
#>      alpha1      alpha2      alpha3 
#> 0.012543469 0.008444883 0.005990633

For the weak biomarker effect, we find that \(r_3 = 0\), which suggests only consider one sub-population instead of two, reducing the optimization into two dimension. Then we have to compute the optimal alpha split in two dimension.

alpha.split(r=c(1,0.303),N3=100,sd_full=1/sqrt(20),delta_linear_bd = c(0.2,0.3))

For another way, we could re-design the clinical trial in two dimension.

r2 <- seq(0.025,1,by=0.025)
res_2dim <- matrix(rep(0,3*length(r2)), ncol=3)
for(ii in 1:length(r2)){
  res_2dim[ii,] <- alpha.split(r=c(1,r2[ii]),N3=100,sd_full=1/sqrt(20),delta_linear_bd = c(0.2,0.3))

One can use smooth model to fit and find the maximization, but we could also just take the maximizer right away.

power_value <- res[,3]
opt_r2 <- r2[which.max(power_value)]
opt_alpha <- res[which.max(power_value),1:2]

Strong biomarker effect

Fitted TPS surface of the optimal power:

Fitted TPS surface of the optimal \(\alpha\):

data(ctpbs, package = "DesignCTPB")
#the optimal power of the optimal design
#> [1] 0.977925

Hence, for the strong biomarker effect condition, the simulation suggests that 2-cutoff design is optimal, where the smallest sub-population consists of 13.9% of the full population, the larger one consists of nearly 40%. Then the design separates the full population into three populations with two nested sub-populations. The Type-I error rate for each population is allocated as below.

#the optimal alpha split of the optimal design
#>      alpha1      alpha2      alpha3 
#> 0.001842201 0.014434377 0.012165240

And the maximized power for the optimal design is:

#optimal choice of each population's proportion
#>        r2        r3 
#> 0.3973387 0.1393452