. 2021 Mar 21;49(10):2447–2466. doi: 10.1080/02664763.2021.1904846

Stopping for efficacy in single-arm phase II clinical trials

Rezoanoor Rahman ¹, M Iftakhar Alam ^1,^CONTACT

PMCID: PMC9225313 PMID: 35757036

Abstract

Phase II clinical trials investigate whether a new drug or treatment has sufficient evidence of effectiveness against the disease under study. Two-stage designs are popular for phase II since they can stop in the first stage if the drug is ineffective. Investigators often face difficulties in determining the target response rates, and adaptive designs can help to set the target response rate tested in the second stage based on the number of responses observed in the first stage. Popular adaptive designs consider two alternate response rates, and they generally minimise the expected sample size at the maximum uninterested response rate. Moreover, these designs consider only futility as the reason for early stopping and have high expected sample sizes if the provided drug is effective. Motivated by this problem, we propose an adaptive design that enables us to terminate the single-arm trial at the first stage for efficacy and conclude which alternate response rate to choose. Comparing the proposed design with a popular adaptive design from literature reveals that the expected sample size decreases notably if any of the two target response rates are correct. In contrast, the expected sample size remains almost the same under the null hypothesis.

Keywords: Phase II trial, two-stage design, optimal design, single-arm trial, sample size

1. Introduction

After obtaining the dose with an acceptable level of toxicity in phase I, we move to phase II for screening out the drugs that have little or no effect on the disease while minimising the number of patients exposed. Phase II trials can be further divided into single-arm or double-arm. The single-arm trials are often known as IIa trials, where the drug's efficacy is compared with the fixed standard response rate. Similarly, double-arm trials are known as phase IIb trials, where the experimental drug is compared with the other standard or experimental drugs so that the most promising one can be carried to the next phase for large scale evaluation [1]. Compared to phase IIa trials, phase IIb trials require a larger sample size. Since the paper is devoted to single-arm trials, we restrict ourselves mostly to phase IIa designs. Moreover, we exclusively use phase II to mean a phase IIa trial. Fleming [6] proposed a design for phase II that calculates critical values for testing the null hypothesis using the O'Brien and Fleming multiple testing procedure [19]. This design allowed early stopping under controlled type I and II error rates, and there was no attempt to be ‘optimal’ in terms of minimising the expected sample size. Multi-stage designs are more popular than the single-stage designs since they can stop the study early if the drug is ineffective. The very first two-stage design was proposed by Gehan [7]. This design was highly criticized as it has a high probability of going to the second stage even for an inferior performing drug, which contradicts the main idea of using multi-stage designs.

Simon [26] proposed two-stage designs, optimal and minimax, which minimise the expected sample size and maximum sample size, respectively, under the null hypothesis. The idea behind the two-step implementation is that it is not ethical to proceed further if the drug is not active and to terminate the study at the first stage for futility. The problem arises when we have an efficacious drug since we cannot stop early using Simon's designs as they do not consider efficacy as a stopping rule at the first stage. The average sample size approaches close to the maximum sample size since the probability of early termination due to futility becomes close to zero. One possible solution to this problem might be constructing designs that minimise the expected sample size under the alternate response rate. Nevertheless, for such designs, the expected sample size under the null response rate would be larger than Simon's optimal design. There have been several extensions to the Simon two-stage designs, including the optimal three-stage design [2], optimal three-stage design stopping for efficacy [3], and admissible designs that balance the optimisation criteria of expected sample size and maximum sample size [10]. The list also includes a predictive probability design [15], balanced two-stage designs [27], adaptive two-stage optimal design [23], etc. All these papers only consider the optimal design under the null response rate.

Mander and Thompson [17] showed that in situations where an agent is active, Simon's two-stage design is not optimal. The authors proposed designs that also consider efficacy as a reason for early termination. They showed that if a trial stops early for both futility and efficacy, then the expected sample size reduces in almost every case. The new early stopping rule generally increases the probability of early termination, which reduces the expected sample size. In designing clinical trials, especially in the early investigation of new treatments, researchers often face uncertainty in assuming the variability of the response variable and/or the treatment effects' magnitude. A natural way to resolve this problem is to choose the alternate response rate with some flexibility. Lin and Shih [16] introduced an adaptive two-stage phase II design that concerns the specification of alternative response rate and the associated power. They considered two alternate response rates and their associated pre-specified powers. The design takes a primary sample in the first stage and based on the response in the first stage, it decides which alternate response rate to be tested. Like Simon's design, this design also considers futility as the only reason for early termination.

Sambucini [20] used a Bayesian predictive strategy to derive an adaptive two-stage design, where the second stage sample size is not selected in advance but depends on the first stage responses. Englert and Kieser [4] considered the loss of power while transforming continuous test statistic into discrete test statistic and proposed a method based on the conditional error function principle that directly accounts for the discreteness of the outcome. Englert and Kieser [5] proposed a design that allows an arbitrary modification of the sample size of the second stage using the results of the interim analysis or external information while controlling the type I error rate. Shan et al. [22] proposed an adaptive design that used a branch-and-bound algorithm to find the optimal design with the smallest expected sample size under the null hypothesis. Kim and Wong [12] developed a design that considered three alternative response rates with their associated powers. This paper is an extension of Lin and Shih [16] to include three alternative response rates. Sambucini [21] took efficacy and safety as bivariate binary outcomes and proposed design using Bayesian predictive strategy for interim monitoring. Jin and Yin [8] proposed the Bayesian enhancement two-stage design to strengthen the passing criterion to the second stage. Mander et al. [18] combined the maximum sample size and the two expected sample sizes under null and alternative hypotheses to produce an expected loss function to find admissible designs.

Jung [9] considered phase II trials randomising patients between a prospective control and an experimental therapy. This design is analog to Simon's design for a single-arm trial. Lai et al. [13] expanded a randomised phase II study of response rate seamlessly into a randomised phase III study of time to failure. This approach is based on advances in group sequential designs and joint modeling of the response rate and time to the event. Shi and Yin [24] proposed a Bayesian two-stage design with changing hypothesis tests to bridge the single- and double-arm schemes in one phase II clinical trial. Shi and Yin [25] proposed a two-stage design, in which the first stage takes a single-arm comparison of the experimental drug with the standard response rate, and the second stage imposes a two-arm comparison by adding an active control arm.

This paper is organised as follows. Section 2 presents the methodology of the work. A new design is proposed in Section 2.4 and its computational algorithm is presented in Section 2.5. The numerical results of the proposed design are available in Section 3. Finally, we end up with a discussion in Section 6.

2. Methodology

Usually, the primary endpoint for a phase II clinical trial is categorised as a response or no response. For cancer trials, the clinical response is complete (the patient is cured completely) or partial response. Partial response is often defined as a $50 %$ or more tumor volume shrinkage based on a two-dimensional measurement. Assume that p is the true response rate for an experimental drug. Then the hypothesis to be tested is

\begin{aligned} H_{0} : p \leq p_{0} \\ H_{1} : p \geq p_{1}, \end{aligned}

where $p_{0}$ is the maximum uninteresting response rate and $p_{1}$ is the minimum desired response rate. Generally, the value of $p_{0}$ is below 0.3 and the improvement of the target response rate ( $p_{1} - p_{0}$ ) is between 0.1 and 0.3 [14].

2.1. Simon's two-stage design

According to Simon's design [26], $n_{1}$ patients are recruited in the first stage. Let x be the number of responses in the first stage. If $x \leq r_{1}$ , then the trial is stopped for futility, and the drug is identified as ineffective. If more than $r_{1}$ responses are observed in the first stage, then the study proceeds to the second stage, and $n_{2}$ ( $n_{1} + n_{2} = n$ ) more patients are recruited. If the total number of responses observed from the two-stages is more than r, then $H_{0}$ is rejected. Simon's design is indexed by the four numbers $r_{1}$ , r, $n_{1}$ and n, and is referred as a ‘ $r_{1} / n_{1}$ r/n’ design. The probability that the design would not proceed to the second stage or the probability of early termination $(P E T)$ is

P E T (p) = B (n_{1}, p, r_{1}) .

The probability of not recommending a drug to proceed further is

\bar{R} (p) = B (n_{1}, p, r_{1}) + \sum_{i = r_{1} + 1}^{\min (n_{1}, r)} b (n_{1}, p, i) \times B (n - n_{1}, p, r - i),

(1)

where $b (.)$ and $B (.)$ are the probability mass function and cumulative distribution function of the binomial distribution, respectively. The expected sample size is then obtained as

E (N | p) = n_{1} \times P E T (p) + n \times {1 - P E T (p)} .

(2)

Both $\bar{R} (p)$ and $E (N | p)$ are expressed as a function of the true response rate p. Let α and β be the type I and type II error probabilities, respectively. For predetermined $p_{0}$ , $p_{1}$ , α and β, an acceptable design should satisfy the error constraints $\bar{R} (p_{0}) \geq 1 - α$ and $\bar{R} (p_{1}) \leq β$ . Let Ω be the set of all such designs. Simon's optimal design is the one in Ω that has the minimum expected sample size $E (N | p_{0})$ under the null response rate. On the other hand, the minimax design is the one that has the smallest $E (N | p_{0})$ among those designs in Ω with the smallest n.

2.2. Mander and Thomson's design

Like Simon's design, Mander and Thompson's design [17] also recruits $n_{1}$ patients in the first stage and have a similar stopping rule in the second stage. The only exception is that the design stops for efficacy if $x > r_{2}$ , where x is the number of responses in the first stage. The design rejects $H_{0}$ and proceed to the second stage only if $r_{1} < x \leq r_{2}$ . It recruits $n_{2}$ patients in the second stage, where $n = n_{1} + n_{2}$ . This design is indexed by five numbers and will be referred as ‘ $(r_{1}$ $r_{2}) / n_{1}$ r/n’ design. The probability of early termination for this design is

P E T (p) = B (n_{1}, p, r_{1}) + 1 - B (n_{1}, p, r_{2}) .

The probability of not rejecting the null hypothesis is given as

\bar{R} (p) = B (n_{1}, p, r_{1}) + \sum_{i = r_{1} + 1}^{r_{2}} b (n_{1}, p, i) \times B (n - n_{1}, p, r - i) .

(3)

Then the expected number of patients is

E (N | p) = n_{1} \times P E T (p) + n \times {1 - P E T (p)} .

(4)

As before we require the error probabilities to be constrained by $\bar{R} (p) \geq 1 - α$ and $\bar{R} (p) \leq β$ . Suppose $Ω_{E}$ be the set of all such designs. Mander and Thompson used four optimality criteria and named those designs as:

$H_{0} - o p t i m a l_{E}$ : $E (N | p_{0})$ is the smallest.
$H_{0} - m i n i m a x_{E}$ : $E (N | p_{0})$ is the smallest among those designs in $Ω_{E}$ with the smallest n.
$H_{1} - o p t i m a l_{E}$ : $E (N | p_{1})$ is the smallest.
$H_{1} - m i n i m a x_{E}$ : $E (N | p_{1})$ is the smallest among those designs in $Ω_{E}$ with the smallest n.

2.3. Lin and Shih's design

Lin and Shih [16] emphasised the uncertainty that investigators often face while choosing the target response rate. They proposed an adaptive two-stage design with two choices of the target response rates $p_{1}$ and $p_{2}$ ( $p_{0} < p_{1} < p_{2}$ ). This design has a fixed sample size $n_{1}$ in the first stage, like Simon's design, but the sample size in the second stage depends on the number of responses observed in the first stage. Let x be the number of responses observed in the first stage. Then Lin and Shih's design proceeds as follows:

If $x \leq s_{1}$ , stop the trial for futility.
If $s_{1} < x \leq r_{1}$ , power the study at $(1 - β_{1})$ for $p = p_{1}$ and enter $m_{2} = m - n_{1}$ additional patients into the study. Reject the null hypothesis ( $p \leq p_{0}$ ) if the total number of responses $> s$ out of m patients.
If $x > r_{1}$ , power the study at $(1 - β_{2})$ for $p = p_{2}$ and enter $n_{2} = n - n_{1}$ additional patients into the study. Reject the null hypothesis ( $p \leq p_{0}$ ) if the total number of responses $> r$ out of n patients.

This design is indexed by the seven numbers $s_{1}$ , $r_{1}$ , $n_{1}$ , s, m, r and n, and we refer it as ( $s_{1} / r_{1} / n_{1}$ ) (s/m) (r/n). The probability of terminating the study at the first stage is

P E T (p) = B (n_{1}, p, s_{1}) .

The probability of not recommending the drug is

\begin{aligned} \bar{R} (p) & = B (n_{1}, p, s_{1}) + \sum_{i = s_{1} + 1}^{\min (r_{1}, s)} b (n_{1}, p, i) \times B (m - n_{1}, p, s - i) \\ + \sum_{i = r_{1} + 1}^{\min (r, n_{1})} b (n_{1}, p, i) \times B (n - n_{1}, p, r - i) . \end{aligned}

(5)

Finally, the expected sample size is given as

\begin{aligned} E (N | p) = n_{1} + (m - n_{1}) \times {(B (n_{1}, p, r_{1}) - B (n_{1}, p, s_{1})} + (n - n_{1}) \times {1 - B (n_{1}, p, r_{1})} . \end{aligned}

(6)

For predetermined $p_{0}$ , $p_{1}$ , $p_{2}$ , α, $β_{1}$ and $β_{2}$ , an acceptable design must satisfy the following error constraints

\begin{aligned} \bar{R} (p_{0}) & \geq 1 - α, \\ \bar{R} (p_{1}) & \leq β_{1}, \\ and \bar{R} (p_{2}) & \leq β_{2} . \end{aligned}

(7)

Although not required, it is reasonable to let $β_{1} \geq β_{2}$ in practice because we would like to have higher power for detecting more improvement of the new therapy ( $p_{2}$ versus $p_{0}$ ). From the feasibility aspect, we need to compromise the power somewhat for less improvement ( $p_{1}$ versus $p_{0}$ ) because the sample size cannot be too large for a phase II study. Let $Ω^{'}$ be the set of designs that satisfy the error constraints. Lin and Shih [16] proposed designs based on the following optimality conditions.

Optimality type 1 (O1): $E (N | p_{0})$ is the smallest.
Optimality type 2 (O2): $max {E (N | p_{i}); i = 0, 1, 2}$ is the smallest.
Optimality type 3 (O3): $max (n, m)$ is the smallest among all fesible solutions and $E (N | p_{0})$ is the smallest among such solutions.
Optimality type 4 (O4): $max (n, m)$ is the smallest among all fesible solutions and $max {E (N | p_{i}); i = 0, 1, 2}$ is the smallest among such solutions.

O1 and O3 are extensions to Simon's ‘optimal design’ and ‘minimax design’ criteria. That is, if $p_{1} = p_{2}$ and $β_{1} = β_{2}$ , then $r_{1} = s_{1}$ , s = r, m = n, and O1 and O3 reduces to Simon's optimal and minimax designs, respectively.

2.4. Proposed design

Like Simon's design, the design by Lin and Shih [16] considers only futility as a reason for early stopping in the first stage. The design has a larger expected sample size if the proposed drug is effective. We propose an adaptive design that considers both futility and efficacy as reasons for early stopping. Although it is easier to set the uninteresting rate $p_{0}$ , the same is not right for $p_{1}$ , the alternate response rate. The challenge in selecting a single alternate response rate can be minimised by the specification of two alternate response rates instead. Therefore, as in Lin and Shih's design, the proposed design also considers two alternate response rates. Based on the response in the first stage, it determines which target response rate is to be tested and engage patients in the second stage accordingly. Let x be the number of responses in the first stage out of the $n_{1}$ patients enrolled. The proposed design then proceeds as follows:

If $x \leq s_{1}$ , stop the trial for futility.
If $s_{1} < x \leq r_{1}$ , power the study at $(1 - β_{1})$ for $p \geq p_{1}$ and enter $m_{2} = m - n_{1}$ additional patients into the study. Reject the null hypothesis ( $p \leq p_{0}$ ) if the total number of responses $> s$ out of m patients.
If $r_{1} < x \leq c_{1}$ , power the study at $(1 - β_{2})$ for $p \geq p_{2}$ and enter $n_{2} = n - n_{1}$ additional patients into the study. Reject the null hypothesis ( $p \leq p_{0}$ ) if the total number of responses $> r$ out of n patients.
If $c_{1} < x \leq c_{2}$ , stop for efficacy in the first stage and reject the null hypothesis for $H_{1}$ : $p \geq p_{1}$ .
If $x > c_{2}$ , stop for efficacy and reject the null hypothesis for $H_{1}$ : $p \geq p_{2}$ .

This design is indexed by the nine parameters $s_{1}$ , $r_{1}$ , $c_{1}$ , $c_{2}$ , $n_{1}$ , s, m, r and n, where $0 \leq s_{1} < r_{1} < c_{1} < c_{2} \leq n_{1}$ . It is referred as ( $s_{1} / r_{1} / c_{1} / c_{2} / n_{1}$ ) (s/m) (r/n). The probability of early termination in the proposed design is

P E T (p) = B (n_{1}, p, s_{1}) + 1 - B (n_{1}, p, c_{1}) .

Then the probability of not rejecting $H_{0}$ is

\begin{aligned} \bar{R} (p) & = B (n_{1}, p, s_{1}) + \sum_{i = s_{1} + 1}^{\min (r_{1}, s)} b (n_{1}, p, i) \times B (m - n_{1}, p, s - i) \\ + \sum_{i = r_{1} + 1}^{\min (r, c_{1})} b (n_{1}, p, i) \times B (n - n_{1}, p, r - i) . \end{aligned}

(8)

Finally, the expected sample size is

\begin{aligned} E (N | p) & = n_{1} + (m - n_{1}) \times {(B (n_{1}, p, r_{1}) - B (n_{1}, p, s_{1})} + (n - n_{1}) \\ \times {B (n_{1}, p, c_{1}) - B (n_{1}, p, r_{1})}, \\ = w_{1} n_{1} + w_{2} m + w_{3} n, \end{aligned}

(9)

where $w_{1} = P E T (p)$ is the probability that the design would stop at the first stage, $w_{2} = {B (n_{1}, p, r_{1}) - B (n_{1}, p, s_{1}}$ is the probability that the design would proceed to the second stage and test for the target response rate $p_{1}$ and $w_{3} = {B (n_{1}, p, c_{1}) - B (n_{1}, p, r_{1})}$ is the probability that the design would proceed to the second stage and test for the target response rate $p_{2}$ . Note that $w_{1} + w_{2} + w_{3} = 1$ . For a predetermined level of ϕ= $(p_{0}, p_{1}, p_{2}, α, β_{1}, β_{2})$ , we find the optimal designs under the four optimality criteria proposed by Lin and Shih [16], and these designs satisfy the error constrains in (7). From Equation (8), we can see that the probability of not rejecting $H_{0}$ does not depend on $c_{2}$ . So the value of $c_{2}$ will be determined based on the following hypothesis

\begin{aligned} H_{0} : p \leq p_{1}, \\ H_{1} : p \geq p_{2} . \end{aligned}

2.5. Algorithm

Now we discuss the algorithm to find the optimal solutions. For specified values of $p_{0}$ , $p_{1}$ , $p_{2}$ , α, $β_{1}$ and $β_{2}$ , at first feasible values of m and n are calculated. The range of m is calculated as 0.85 to 1.5 times the sample size of the one-stage design for testing $p_{0}$ versus $p_{1}$ at the significance level α and power $(1 - β_{1})$ . This strategy is similar to that was used by Simon [26] in his algorithm. To ensure that the range's values are integers, the lower value is taken as the largest integer that is less than or equal to the previously calculated lower value. Similarly, the range's upper value is taken as the smallest integer that is greater than or equal to the previously calculated upper value of the range. Similarly, feasible range of n is calculated from the sample size of the design for testing $p_{0}$ versus $p_{2}$ at the significance level α and power $(1 - β_{2})$ .

For each combination of m, n and $n_{1}$ in the range ${1, \min (m, n) - 1}$ , at first $c_{2}$ is calculated as the $[{1 - (β_{1} - β_{2})}$ th quantile of $n_{1}$ $- 1]$ . Following this, $s_{1}$ is taken in the range $(0,$ $c_{2})$ , $r_{1}$ in the range $(s_{1} + 1,$ $c_{2})$ , $c_{1}$ in the range $(r_{1} + 1,$ $c_{2})$ , s in the range $(s_{1} + 1,$ $m)$ , and r in the range $(r_{1} + 1,$ $m)$ . Then we find the designs with parameters $s_{1}$ , $r_{1}$ , $c_{1}$ , $c_{2}$ , $n_{1}$ , s, m, r, and n those satisfy the error constraints associated with $β_{1}$ and $β_{2}$ . Then we check which remaining designs fulfill the error constraint associated with α and call them ‘set of feasible designs’. For every feasible design, the expected sample sizes $E (N | p_{0})$ , $E (N | p_{1})$ , and $E (N | p_{2})$ are calculated.

To find the designs of optimality type 1, we arrange the designs in order of $E (N | p_{0})$ . For optimality type 2, a vector of $\max {E (N | p_{i}); i = 0, 1, 2}$ is calculated and the designs are ordered accordingly. For finding the design of optimality type 3 and 4, first a column containing $\max (m, n)$ is calculated. For optimality 3, the designs are first ordered according to $E (N | p_{0})$ and then $\max (m, n)$ . Finally, for optimality type 4, the ordering is done first according to $\max {E (N | p_{i}); i = 0, 1, 2}$ and then $\max (m, n)$ . In every optimality criterion, the first ten ordered designs are kept and the first design is called the optimality design under that optimality criterion. All the designs are obtained numerically using a self-written code in R through parallel computation.

3. Numerical results

3.1. Proposed design

We have considered $p_{0}$ from 0.05 to 0.50 with regular interval 0.05 and the differences $(p_{1} - p_{0})$ and $(p_{2} - p_{0})$ are kept fixed as 0.15 and 0.20, respectively. Level of significance α is taken as $5 %$ and $10 %$ and the maximum type II error $β_{1}$ and $β_{2}$ are kept fixed as $20 %$ and $10 %$ for the target response rate $p_{1}$ and $p_{2}$ , respectively. The designs for $α = 0.05$ and $α = 0.10$ are shown in Tables 1 and 2, respectively.

Table 1.

Proposed and LS designs for $α = 0.05$ , $β_{1} = 0.20$ and $β_{2} = 0.10$ .

					First stage					Second stage				True						Expected sample size
$p_{0}$	$p_{1}$	$p_{2}$	Design	Optimal type	$n_{1}$	$s_{1}$	$r_{1}$	$c_{1}$	$c_{2}$	m	s	n	r	α	$β_{1}$	$β_{2}$	$P E T (p_{0})$	$P E T (p_{1})$	$P E T (p_{2})$	$E (N \| p_{0})$	$E (N \| p_{1})$	$E (N \| p_{2})$
0.05	0.20	0.25	Proposed	1	10	0	1	2	3	28	3	38	4	0.042	0.199	0.086	0.610	0.430	0.531	17.76	23.29	21.26
				2	12	0	1	2	3	28	3	26	3	0.049	0.197	0.080	0.560	0.510	0.640	18.84	19.27	17.28
				3	12	0	1	2	3	27	3	27	3	0.049	0.198	0.080	0.560	0.510	0.640	18.60	19.34	17.38
				4	12	0	1	2	3	27	3	27	3	0.049	0.198	0.080	0.560	0.510	0.640	18.60	19.34	17.38
			LS	1	9	0	2			31	3	43	5	0.049	0.200	0.094	0.630	0.134	0.075	17.23	31.19	34.14
				2	18	0	2			29	3	23	3	0.044	0.199	0.080	0.397	0.018	0.006	24.28	24.43	23.75
				3	21	0	1			26	2	26	3	0.047	0.197	0.076	0.341	0.009	0.002	24.30	25.95	25.99
				4	21	0	1			26	2	26	3	0.047	0.197	0.076	0.341	0.009	0.002	24.30	25.95	25.99
0.10	0.25	0.30	Proposed	1	18	2	3	5	6	37	6	50	8	0.049	0.200	0.078	0.740	0.420	0.530	24.13	34.41	31.82
				2	17	1	3	4	6	40	7	38	7	0.049	0.199	0.068	0.504	0.480	0.690	28.29	28.61	25.12
				3	23	2	4	6	7	38	7	37	6	0.050	0.199	0.067	0.598	0.395	0.576	28.97	31.70	29.06
				4	24	2	4	6	8	38	7	36	6	0.049	0.198	0.066	0.572	0.430	0.620	29.84	31.23	28.72
			LS	1	18	2	3			38	6	49	8	0.048	0.199	0.077	0.734	0.135	0.060	24.40	42.93	45.99
				2	21	2	4			44	8	29	5	0.048	0.199	0.068	0.648	0.075	0.027	28.30	32.80	31.35
				3	18	1	3			37	6	38	7	0.050	0.199	0.068	0.450	0.039	0.014	28.54	36.94	37.57
				4	26	2	4			38	7	35	6	0.050	0.198	0.067	0.511	0.026	0.007	31.54	35.24	35.14
0.15	0.30	0.35	Proposed	1	19	3	4	6	7	54	12	59	13	0.050	0.200	0.074	0.700	0.468	0.578	30.12	39.55	35.43
				2	25	4	6	7	9	57	13	41	10	0.050	0.199	0.065	0.708	0.580	0.700	33.65	35.74	31.65
				3	38	6	9	10	14	46	11	46	10	0.050	0.200	0.061	0.679	0.650	0.840	40.56	40.78	39.31
				4	38	6	9	10	14	46	11	46	10	0.050	0.200	0.061	0.679	0.650	0.840	40.56	40.78	39.31
			LS	1	19	3	6			55	12	46	10	0.050	0.199	0.074	0.684	0.133	0.059	30.22	47.20	48.20
				2	27	4	7			51	12	34	8	0.049	0.198	0.061	0.619	0.059	0.018	35.48	39.57	37.27
				3	38	6	9			46	11	46	10	0.050	0.200	0.061	0.659	0.036	0.008	40.73	45.71	45.94
				4	38	6	9			46	11	46	10	0.050	0.200	0.061	0.659	0.036	0.008	40.73	45.71	45.94
0.20	0.35	0.40	Proposed	1	23	5	6	9	10	56	15	69	19	0.049	0.199	0.068	0.704	0.390	0.500	34.74	49.46	45.19
				2	26	5	8	9	11	63	18	47	14	0.049	0.199	0.059	0.601	0.490	0.660	40.20	42.20	36.46
				3	28	5	6	11	12	52	15	53	15	0.050	0.200	0.058	0.505	0.290	0.460	40.18	45.79	41.48
				4	38	8	9	13	16	53	16	53	15	0.050	0.199	0.057	0.667	0.510	0.720	42.99	45.30	42.18
			LS	1	23	5	6			56	15	66	18	0.049	0.199	0.068	0.695	0.131	0.054	34.67	59.14	62.98
				2	35	8	11			60	17	40	12	0.049	0.196	0.058	0.745	0.089	0.026	40.69	45.81	43.25
				3	31	6	12			53	15	40	13	0.050	0.200	0.058	0.571	0.046	0.013	40.38	48.56	46.48
				4	31	6	12			53	15	40	13	0.050	0.200	0.058	0.571	0.046	0.013	40.38	48.56	46.48
0.25	0.40	0.45	Proposed	1	26	7	8	12	13	55	18	74	24	0.050	0.199	0.064	0.690	0.320	0.420	38.31	56.62	52.73
				2	34	9	10	13	16	87	29	61	21	0.050	0.200	0.055	0.692	0.580	0.730	45.58	46.97	41.38
				3	37	9	11	17	18	58	19	59	20	0.050	0.200	0.054	0.552	0.220	0.400	46.60	54.13	50.26
				4	37	8	14	15	18	59	20	57	19	0.050	0.200	0.054	0.411	0.420	0.650	49.92	49.54	44.50
			LS	1	23	6	7			59	19	74	24	0.049	0.197	0.066	0.654	0.124	0.051	38.41	65.98	70.44
				2	38	9	14			65	22	41	15	0.050	0.200	0.058	0.513	0.027	0.006	50.32	50.18	45.62
				3	37	9	11			58	19	59	20	0.050	0.200	0.054	0.550	0.035	0.008	46.64	58.14	58.79
				4	42	9	16			59	20	51	17	0.050	0.200	0.058	0.371	0.009	0.001	52.53	54.58	52.81
0.30	0.45	0.50	Proposed	1	28	9	10	14	15	57	22	80	30	0.050	0.199	0.061	0.690	0.350	0.470	41.21	59.45	54.50
				2	34	11	13	15	18	75	29	71	28	0.050	0.200	0.053	0.720	0.560	0.700	45.15	50.82	44.52
				3	39	12	14	19	21	63	24	64	25	0.050	0.200	0.051	0.622	0.310	0.510	48.22	56.02	51.16
				4	37	10	11	17	20	60	23	64	25	0.050	0.200	0.051	0.437	0.410	0.630	51.63	52.92	46.90
			LS	1	28	9	10			58	22	77	29	0.049	0.200	0.061	0.682	0.119	0.044	41.16	69.38	73.94
				2	42	14	18			70	27	45	19	0.050	0.200	0.056	0.743	0.085	0.022	48.54	53.94	49.90
				3	39	12	16			64	25	63	24	0.050	0.199	0.051	0.618	0.050	0.012	48.50	62.11	62.87
				4	37	9	17			64	25	45	18	0.049	0.200	0.051	0.289	0.008	0.001	55.95	56.42	52.02
0.35	0.50	0.55	Proposed	1	27	10	11	15	16	77	33	79	34	0.050	0.198	0.06	0.678	0.340	0.450	43.47	60.87	55.50
				2	34	12	16	17	20	77	34	59	27	0.050	0.200	0.05	0.616	0.490	0.660	50.02	53.36	45.77
				3	41	14	15	23	24	66	30	66	29	0.050	0.200	0.049	0.528	0.200	0.390	52.80	60.89	56.24
				4	47	17	18	25	27	66	30	66	29	0.050	0.199	0.049	0.634	0.320	0.550	53.95	59.93	55.55
			LS	1	27	10	11			69	30	80	34	0.050	0.198	0.06	0.670	0.124	0.046	43.10	72.37	76.97
				2	43	17	20			87	38	45	21	0.050	0.200	0.057	0.785	0.111	0.030	50.66	56.09	50.70
				3	42	15	22			66	29	65	29	0.050	0.200	0.049	0.608	0.044	0.009	51.41	64.62	65.20
				4	50	19	26			66	29	62	29	0.050	0.200	0.049	0.726	0.059	0.012	54.36	63.71	63.36
0.40	0.55	0.60	Proposed	1	28	12	14	17	18	83	40	82	39	0.050	0.199	0.06	0.703	0.350	0.450	44.23	63.39	57.92
				2	39	17	20	21	24	83	41	67	33	0.050	0.199	0.048	0.763	0.600	0.730	49.00	54.68	47.94
				3	37	15	19	22	23	70	34	70	35	0.049	0.199	0.047	0.602	0.290	0.480	50.13	60.28	54.24
				4	43	17	23	24	27	69	34	70	34	0.050	0.197	0.045	0.554	0.430	0.670	54.62	57.96	51.80
			LS	1	26	11	12			79	38	82	39	0.050	0.200	0.062	0.674	0.135	0.052	43.89	74.13	78.93
				2	40	17	21			81	40	45	23	0.050	0.199	0.049	0.689	0.077	0.019	51.36	57.51	51.75
				3	36	14	19			69	34	66	32	0.050	0.200	0.047	0.518	0.038	0.008	51.77	66.11	66.13
				4	39	15	22			69	34	45	23	0.050	0.200	0.046	0.491	0.028	0.005	53.95	59.29	53.98
0.45	0.60	0.65	Proposed	1	33	16	17	21	22	74	40	79	42	0.050	0.199	0.052	0.729	0.400	0.540	44.94	60.32	54.06
				2	40	19	23	24	27	77	42	57	31	0.050	0.199	0.044	0.704	0.510	0.700	50.53	55.40	48.60
				3	50	23	26	32	33	67	36	68	37	0.050	0.199	0.044	0.616	0.270	0.510	56.66	63.05	58.78
				4	45	19	26	28	30	68	37	67	36	0.050	0.200	0.044	0.420	0.340	0.600	58.32	59.96	54.00
			LS	1	31	15	16			74	39	79	42	0.050	0.200	0.054	0.713	0.128	0.042	44.23	72.38	76.75
				2	43	20	25			75	41	46	26	0.049	0.199	0.044	0.639	0.051	0.010	53.68	57.68	51.92
				3	38	16	18			68	36	68	37	0.050	0.200	0.044	0.425	0.019	0.003	55.26	67.42	67.90
				4	46	19	27			68	37	66	35	0.050	0.200	0.044	0.363	0.008	0.001	59.97	66.79	66.44
0.50	0.65	0.70	Proposed	1	30	16	18	21	22	76	44	80	47	0.050	0.199	0.051	0.716	0.350	0.470	43.45	61.57	55.94
				2	39	21	23	25	28	81	48	72	43	0.050	0.199	0.043	0.765	0.590	0.740	48.19	54.13	47.42
				3	41	21	24	29	30	67	40	68	40	0.050	0.199	0.042	0.625	0.220	0.410	50.87	61.76	56.86
				4	45	23	26	30	32	68	40	67	40	0.050	0.198	0.041	0.625	0.390	0.640	53.52	58.54	52.95
			LS	1	30	16	17			69	40	79	46	0.049	0.200	0.052	0.708	0.126	0.040	43.21	71.88	76.59
				2	43	23	27			77	46	46	28	0.049	0.197	0.041	0.729	0.079	0.016	51.20	56.84	51.39
				3	53	26	32			66	39	67	40	0.050	0.200	0.042	0.500	0.012	0.001	59.55	66.56	66.90
				4	38	16	22			67	40	66	39	0.050	0.200	0.042	0.209	0.003	0.000	60.82	66.13	66.07

Open in a new tab

Table 2.

Proposed and LS designs for $α = 0.10$ , $β_{1} = 0.20$ and $β_{2} = 0.10$ .

					First stage					Second stage				True						Expected sample size
$p_{0}$	$p_{1}$	$p_{2}$	Design	Optimal type	$n_{1}$	$s_{1}$	$r_{1}$	$c_{1}$	$c_{2}$	m	s	n	r	α	$β_{1}$	$β_{2}$	$P E T (p_{0})$	$P E T (p_{1})$	$P E T (p_{2})$	$E (N \| p_{0})$	$E (N \| p_{1})$	$E (N \| p_{2})$
0.05	0.20	0.25	Proposed	1	10	0	1	2	3	22	2	23	2	0.092	0.198	0.092	0.610	0.430	0.530	14.75	17.15	15.91
				2	12	0	1	2	3	22	2	20	2	0.082	0.194	0.086	0.560	0.51	0.641	16.20	16.33	15.13
				3	12	0	1	2	3	21	2	21	2	0.080	0.197	0.087	0.560	0.51	0.641	15.96	16.41	15.23
				4	12	0	1	2	3	21	2	21	2	0.080	0.197	0.087	0.560	0.51	0.641	15.96	16.41	15.23
			LS	1	9	0	1			28	2	27	2	0.092	0.199	0.100	0.630	0.134	0.075	14.46	23.38	24.75
				2	12	0	1			24	2	18	2	0.086	0.200	0.093	0.540	0.069	0.032	16.81	18.82	18.57
				3	17	0	1			18	1	20	2	0.091	0.200	0.087	0.418	0.023	0.008	18.00	19.74	19.89
				4	17	0	1			18	1	20	2	0.091	0.200	0.087	0.418	0.023	0.008	18.00	19.74	19.89
0.10	0.25	0.30	Proposed	1	13	1	2	3	4	32	5	34	5	0.097	0.200	0.088	0.656	0.542	0.643	19.74	22.2	20.22
				2	15	1	2	3	5	29	5	32	5	0.098	0.198	0.081	0.605	0.619	0.738	20.92	21.01	19.17
				3	20	1	3	4	7	28	5	27	4	0.099	0.200	0.080	0.435	0.609	0.77	24.43	22.93	21.71
				4	20	1	3	4	7	28	5	27	4	0.099	0.200	0.080	0.435	0.609	0.77	24.43	22.93	21.71
			LS	1	14	1	2			26	4	34	5	0.095	0.199	0.085	0.585	0.101	0.047	20.25	30.54	32.14
				2	17	1	3			32	5	22	4	0.091	0.200	0.084	0.482	0.05	0.019	23.95	24.78	23.73
				3	20	1	3			28	5	27	4	0.099	0.200	0.080	0.392	0.024	0.008	24.73	27.03	27.05
				4	20	1	3			28	5	27	4	0.099	0.200	0.080	0.392	0.024	0.008	24.73	27.03	27.05
0.15	0.30	0.35	Proposed	1	15	2	4	5	6	39	8	43	9	0.099	0.200	0.084	0.621	0.405	0.497	24.27	30.1	27.91
				2	20	3	4	5	8	41	9	40	9	0.100	0.197	0.074	0.715	0.691	0.799	25.88	26.32	24.09
				3	28	3	6	7	11	34	7	34	8	0.100	0.200	0.075	0.426	0.651	0.822	31.45	30.09	29.07
				4	28	3	6	7	11	34	7	34	8	0.100	0.200	0.075	0.426	0.651	0.822	31.45	30.09	29.07
			LS	1	15	2	3			34	7	45	9	0.098	0.199	0.085	0.604	0.127	0.062	24.47	39.32	41.93
				2	22	3	5			41	9	26	6	0.096	0.199	0.077	0.575	0.068	0.025	28.57	29.41	27.98
				3	26	2	5			34	8	33	7	0.100	0.200	0.075	0.230	0.007	0.001	31.98	33.11	33.05
				4	26	2	5			34	8	33	7	0.100	0.200	0.075	0.230	0.007	0.001	31.98	33.11	33.05
0.20	0.35	0.40	Proposed	1	17	3	4	6	7	42	11	48	13	0.100	0.198	0.074	0.587	0.484	0.600	28.56	32.20	28.97
				2	22	4	5	7	10	44	12	43	12	0.099	0.199	0.070	0.599	0.600	0.720	30.61	30.53	27.57
				3	23	4	5	8	10	39	10	40	11	0.100	0.199	0.070	0.528	0.470	0.630	30.83	31.95	29.24
				4	24	4	5	8	10	38	10	40	11	0.099	0.199	0.070	0.496	0.520	0.69	31.67	31.61	28.98
			LS	1	17	3	5			39	10	52	14	0.099	0.200	0.076	0.549	0.103	0.046	28.30	44.28	47.55
				2	25	5	7			53	15	29	8	0.100	0.194	0.068	0.617	0.083	0.029	33.11	34.03	31.86
				3	23	4	5			40	10	40	11	0.098	0.200	0.070	0.501	0.055	0.019	31.49	39.06	39.68
				4	26	4	8			40	11	33	9	0.099	0.200	0.069	0.383	0.024	0.007	34.22	35.53	34.49
0.25	0.40	0.45	Proposed	1	20	5	6	9	10	44	14	55	17	0.098	0.200	0.076	0.631	0.370	0.460	31.06	40.67	37.94
				2	21	5	7	8	10	58	19	42	14	0.099	0.199	0.07	0.623	0.570	0.700	33.78	34.05	29.88
				3	25	5	10	11	12	43	14	43	13	0.099	0.200	0.067	0.389	0.300	0.470	36.00	37.65	34.61
				4	31	7	8	12	15	41	13	43	14	0.100	0.199	0.067	0.502	0.510	0.710	36.66	36.76	34.50
			LS	1	20	5	6			43	13	54	17	0.100	0.199	0.075	0.617	0.126	0.055	31.16	48.36	51.30
				2	28	7	10			49	16	32	11	0.097	0.199	0.068	0.600	0.074	0.024	35.25	37.22	35.12
				3	25	5	10			43	14	39	12	0.100	0.200	0.067	0.378	0.029	0.009	36.07	40.81	40.38
				4	24	4	10			43	14	33	11	0.099	0.200	0.067	0.247	0.013	0.004	38.10	39.25	37.47
0.30	0.45	0.50	Proposed	1	20	6	9	10	11	55	20	59	22	0.100	0.200	0.075	0.625	0.380	0.470	33.24	42.36	39.27
				2	26	8	10	11	14	61	23	46	18	0.100	0.200	0.067	0.688	0.630	0.730	35.96	36.65	32.70
				3	29	8	11	15	15	46	17	47	18	0.100	0.199	0.064	0.483	0.220	0.370	37.92	42.75	40.26
				4	34	10	13	15	18	47	18	46	17	0.100	0.199	0.064	0.581	0.520	0.710	39.36	40.01	37.60
			LS	1	20	6	10			55	20	44	16	0.100	0.199	0.075	0.608	0.130	0.058	33.53	47.71	48.45
				2	32	10	13			53	20	35	14	0.097	0.199	0.066	0.644	0.082	0.025	38.23	40.07	37.87
				3	29	8	11			46	17	47	18	0.099	0.200	0.064	0.479	0.043	0.012	37.99	45.99	46.66
				4	34	10	13			47	18	46	17	0.098	0.200	0.064	0.554	0.047	0.012	39.68	45.66	45.96
0.35	0.50	0.55	Proposed	1	22	8	9	12	13	54	22	65	27	0.100	0.198	0.073	0.665	0.400	0.500	34.83	46.29	42.86
				2	24	8	10	12	14	61	26	48	21	0.099	0.200	0.064	0.568	0.500	0.640	38.15	38.65	33.98
				3	26	8	9	14	15	45	19	49	21	0.100	0.200	0.062	0.426	0.320	0.480	38.56	41.54	37.83
				4	32	11	12	16	19	49	21	49	21	0.100	0.198	0.061	0.578	0.490	0.670	39.17	40.75	37.63
			LS	1	20	7	8			54	22	60	25	0.099	0.2	0.076	0.601	0.132	0.058	34.99	54.02	57.24
				2	30	11	14			58	25	34	15	0.097	0.199	0.063	0.655	0.100	0.033	38.10	41.46	38.61
				3	30	10	15			49	21	39	17	0.100	0.199	0.062	0.508	0.049	0.014	39.05	43.78	42.29
				4	29	9	15			49	21	34	16	0.099	0.200	0.062	0.408	0.031	0.008	40.54	43.05	40.31
0.40	0.55	0.60	Proposed	1	24	10	11	15	15	50	23	62	29	0.100	0.200	0.069	0.658	0.310	0.380	35.36	49.03	46.78
				2	30	12	15	16	19	54	26	51	25	0.100	0.199	0.059	0.627	0.570	0.720	38.82	39.8	36.01
				3	27	10	15	16	17	50	24	49	23	0.100	0.200	0.06	0.472	0.310	0.470	39.13	42.73	38.99
				4	30	11	16	17	19	50	24	47	23	0.100	0.200	0.06	0.452	0.390	0.590	40.87	41.72	37.86
			LS	1	22	9	13			58	27	60	28	0.100	0.200	0.07	0.624	0.133	0.055	35.57	53.77	56.92
				2	33	14	17			56	27	37	18	0.100	0.198	0.06	0.681	0.101	0.031	39.11	42.43	40.20
				3	25	9	14			50	24	50	23	0.100	0.200	0.06	0.425	0.044	0.013	39.38	48.9	49.67
				4	30	11	17			50	24	35	18	0.100	0.200	0.059	0.431	0.033	0.008	41.06	43.94	41.16
0.45	0.60	0.65	Proposed	1	22	10	11	14	15	47	25	66	34	0.099	0.200	0.067	0.628	0.411	0.521	35.51	45.90	41.95
				2	27	12	13	16	18	69	36	52	28	0.100	0.199	0.059	0.603	0.530	0.670	39.39	39.88	35.20
				3	35	15	16	23	24	50	27	49	26	0.100	0.200	0.057	0.473	0.230	0.410	42.52	45.88	43.30
				4	38	17	23	24	26	49	26	50	27	0.100	0.200	0.057	0.562	0.330	0.540	42.83	45.49	43.17
			LS	1	22	10	11			48	25	60	31	0.100	0.200	0.066	0.604	0.121	0.047	35.25	54.13	57.48
				2	32	15	18			62	33	34	19	0.097	0.197	0.06	0.654	0.092	0.027	40.35	42.33	38.67
				3	35	15	16			50	27	49	26	0.100	0.200	0.057	0.469	0.030	0.006	42.57	48.61	48.92
				4	35	15	16			50	27	49	26	0.100	0.200	0.057	0.469	0.030	0.006	42.57	48.61	48.92
0.50	0.65	0.70	Proposed	1	25	13	15	17	18	54	31	58	33	0.100	0.200	0.061	0.677	0.430	0.560	34.75	42.78	39.07
				2	27	14	16	17	20	65	38	51	30	0.100	0.198	0.056	0.710	0.630	0.740	37.12	39.03	34.36
				3	28	14	15	19	20	50	29	50	29	0.100	0.199	0.055	0.593	0.380	0.550	36.96	41.58	37.94
				4	27	12	16	18	20	50	29	49	29	0.100	0.199	0.054	0.377	0.380	0.580	41.24	40.93	36.34
			LS	1	20	10	11			51	29	58	33	0.099	0.199	0.064	0.588	0.122	0.048	34.53	52.56	55.72
				2	30	15	19			53	31	34	20	0.099	0.200	0.052	0.572	0.065	0.017	38.90	41.85	38.73
				3	25	12	13			47	27	50	29	0.099	0.200	0.056	0.500	0.060	0.017	37.04	48.29	49.48
				4	36	19	23			50	29	39	24	0.099	0.199	0.055	0.691	0.088	0.022	39.97	43.34	41.59

Open in a new tab

Let us assume that the maximum uninteresting response rate ( $p_{0}$ ) is $5 %$ , and we are not sure whether the target response rate is $20 %$ or $25 %$ . If the maximum type II errors are considered as $20 %$ and $10 %$ for the target response rates $20 %$ and $25 %$ , then our proposed design for $α = 0.05$ is (0/1/2/3/10) (3/28) (4/38) under the optimality criterion 1 (O1): see Table 1. That is, at the first stage, 10 patients would be recruited. If none of the patients respond, then the design would be stopped for futility. If one patient responds, the design will proceed to the second stage to test the target response rate as $20 %$ at $20 %$ maximum type II error rate and 28−10 = 18 more patients would be recruited. If more than 3 respondents are observed at the second stage, the null hypothesis would be rejected for an alternate response rate $20 %$ . If 2 respondents are observed at the first stage, the design will proceed to the second stage, and 38−10 = 28 more patients would be recruited to test for the target response rate as $25 %$ at $10 %$ maximum type II error rate. If more than 4 respondents are observed, the null hypothesis would be rejected for an alternate response rate $25 %$ . Otherwise, the drug would be identified as ineffective, and the trial would be stopped. However, if three respondents are observed in the first stage, then the design would be stopped for efficacy, and the null hypothesis would be rejected for the target response rate $20 %$ . Finally, if the number of respondents observed at the first stage is more than 3, the design would be stopped, and the null hypothesis would be rejected for the target response rate $25 %$ .

The expected sample sizes are 17.76, 23.29 and 21.26, if the true response rates are 0.05, 0.20 and 0.25, respectively. Note that the expected sample size is much higher if $p_{1}$ is the true response rate than if $p_{0}$ or $p_{2}$ is the true response rate. This trend is found for almost every designs except few exceptions. The reason behind is that the proposed design is more unlikely to proceed to the second stage if the drug is either futile or highly effective. If $p_{2}$ is true, then we can say that the drug is more effective than it was at $p_{1}$ . The optimality criterion for O2 design is to minimise max ${E (N | p_{i}); i = 0, 1, 2}$ and in every case $E (N | p_{1})$ is the highest. So we can say that our proposed O2 design have minimum $E (N | p_{1})$ among all the designs fulfilling the error constraints. Although not presented in the paper, for every set of parameter values, the first ten designs under each optimality criterion are computed. For ϕ $= (0.25, 0.40, 0.45, 0.05, 0.20, 0.10)$ , the O2 design is (9/10/13/16/34) (29/87) (21/61) with 45.58, 46.97 and 41.37 as the expected sample sizes under $p_{0}$ , $p_{1}$ and $p_{2}$ , respectively. Though not presented here, the second design under same optimality criterion is (9/11/13/16/34) (23/68) (22/64), where the expected sample sizes are 44.1, 47.17 and 41.73. The O2 design has $max (m, n) = 87$ and for the second design, $\max (m, n) = 68$ . The second design may be more attractive to some investigatorss than the optimal one because of lower max(m,n). Note that the expected sample sizes are very similar in this case.

The O1 designs have some common features that can be observed from Tables 1 and 2. E $(N | p_{0})$ is the lowest among the designs under four optimality criteria. It is obvious as the optimality criterion for these designs ensures the smallest expected sample size under the null hypothesis. The maximum difference in the expected sample sizes between O2 and O1 is observed for $ϕ = (0.25, 0.40, 0.45, 0.05, 0.20, 0.10)$ , which is 45.58−38.31 = 7.27. This difference is higher for designs at $5 %$ significance level than that of the designs at $10 %$ significance level. For the same values of design parameters but at $10 %$ significance level, the difference is 33.78−31.06 = 2.72. It is because the sample size in the first stage $n_{1}$ and total sample sizes m and n are higher at the smaller significance level. Also, the probability of early termination if the null hypothesis is true, $P E T (p_{0})$ , is the highest, and the sample size in the first stage ( $n_{1}$ ) is the lowest for designs under optimality criterion 1. If $P E T (p_{0})$ is high, then the design is less likely to proceed to the second stage when the drug is ineffective. In Equation (9), we have expressed the expected sample size as an weighted sum of $n_{1}$ , m and n. For O1 designs, the $E (N | p_{0})$ is lowest because $w_{1}$ or $P E T (p_{0})$ is highest and the expected sample size is dominated by $n_{1}$ . Generally, $\max (m, n)$ for O1 designs is higher than the other three designs.

O2 designs generally have larger $n_{1}$ than those for the O1 designs but smaller than those for O3 and O4 designs. Only two exceptions are observed in Tables 1 and 2. For $ϕ = (0.10, 0.25, 0.30, 0.05, 0.20, 0.10)$ , $n_{1}$ for O2 design is 17, which is smaller than $n_{1}$ for O1 design $(= 18)$ . For this set of parameters, $P E T (p_{0})$ for O1 design is 0.740, which is the highest for the designs we have computed. Another exception is observed for $ϕ = (0.40, 0.55, 0.60, 0.10, 0.20, 0.10)$ , where $n_{1}$ for O2 is 30, which is larger than $n_{1}$ for O3 $(= 27)$ . Moreover, the O2 designs have highest $min {P E T (p_{i}); i = 0, 1, 2}$ and smallest max ${E (N | p_{i}); i = 0, 1, 2}$ among the four optimal designs. Between the O3 and O4 designs, generally O3 designs have smaller $n_{1}$ except the cases where it is the same for both the designs. Tables 1 and 2 show that in many cases the sample size in the first stage are the same under both conditions and in some cases the two designs coincide. For $10 %$ significance level in Table 2, we see that in the first three cases, O3 and O4 designs coincide. At $5 %$ significance level, for the first and third cases, these two designs coincide. Among these two, the O3 designs have larger $P E T (p_{0})$ while O4 have larger $min {P E T (p_{i}); i = 0, 1, 2}$ .

3.2. Comparison with Lin and Shih's design

We now compare the proposed design with the Lin and Shih's (LS) design. It is seen that the expected sample sizes are notably smaller for the proposed design if the target response rates ( $p_{1}$ or $p_{2}$ ) are true. The difference is higher if $p_{2}$ is the true response rate. For $ϕ =$ $(0.05$ , 0.20, 0.25, 0.10, 0.20, $0.10)$ , the LS design under the optimality criterion O1 is $(0 / 2 / 9)$ $(3 / 31)$ $(5 / 43)$ and $E (N | p_{0})$ , $E (N | p_{1})$ and $E (N | p_{2})$ are 17.23, 31.19 and 34.14, respectively. The difference in $E (N | p_{1})$ for O1 design is 31.19−23.29 = 7.90 while the difference in $E (N | p_{2})$ is 34.14−21.26 = 12.88. The highest differences between $E (N | p_{1})$ and $E (N | p_{2})$ are observed in O1 design for the set $ϕ =$ (0.45, 0.60, 0.65, 0.05, 0.20, $0.10)$ , which are 72.38−60.32 = 12.06 and 76.75−54.06 = 22.69, respectively. The lowest differences are observed for O4 designs for $ϕ =$ (0.40, 0.55, 0.60, 0.05, 0.20, $0.10)$ , which are 59.29−57.96 = 1.33 and 53.98−51.80 = 2.18, respectively.

If $p_{0}$ is the true response rate, the expected sample size in LS design is generally smaller than that of the proposed design, but the increment is tiny. For $ϕ =$ $(0.05$ , 0.20, 0.25, 0.10, 0.20, $0.10)$ , $E (N | p_{0})$ is 17.23, which is smaller than the proposed design( $= 17.76$ ). It happened since the probabilities of early termination under the null hypothesis $P E T (p_{0})$ for LS and proposed designs are high. But $P E T (p_{1})$ and $P E T (p_{2})$ for LS design is very low, which means that these designs have a very high chance of proceeding to the second stage, and therefore results in larger expected sample sizes. In case of the proposed design, $P E T (p_{1})$ and $P E T (p_{2})$ are notably larger than LS design.

Figure 1 shows $P E T (p)$ and $E (N | p)$ versus p for $ϕ =$ $(0.05$ , 0.20, 0.25, 0.05, 0.20, $0.10)$ . We see that PET decreases with the increment of the true response rate for LS design. However, for the proposed design, PET starts increasing after reaching a minimum value. Note that the curves for O3 and O4 designs are not shown since those designs' main goal is to minimise the maximum sample size. For LS O1 design, the expected sample size is a monotonic non-decreasing curve, and eventually, it proceeds to max(m,n). For LS O2 design, there is no common trend because of the optimality criterion, but the curve also proceeds to max(m,n). For our proposed design, the expected sample size curve starts to increase as the true response rate increases. However, after a certain point, it starts decreasing and eventually reaches the sample size at the first stage ( $n_{1}$ ). As stated earlier, $E (N | p_{0})$ is smaller for the LS design in this case. The expected sample sizes for LS and proposed designs are very similar if the true response rate is close to $p_{0}$ under optimality criterion 1. After a little increment in p, the proposed design's expected sample size seems to be much lower than that of the LS design.

4. Application on Lin and Shih's VBG study

A study was conducted by Lin and Shih [16] to investigate the efficacy of the combinations of therapies of vinorelbine, bleomycin and gemcitabine (VBG) for treating patients with recurrent or refractory Hodgkin disease. In their study, the maximum uninteresting response rate $p_{0}$ is considered as $40 %$ , and the target response rate may vary from $50 %$ to $60 %$ . For $α = 0.05$ , Lin and Shih considered two target response rates, $55 %$ and $60 %$ at $β_{1} = 0.20$ and $β_{2} = 0.10$ . Table 1 shows that $E (N | p)$ for LS design is 43.89 under O1 if the true response rate is $40 %$ . However, if the true response rate is $55 %$ or $60 %$ , then $E (N | p)$ are 74.13 and 78.93, which are significantly higher than the expected sample size when the true response rate is $40 %$ . Under the same setup, the proposed design is $(12 / 14 / 17 / 18 / 28)$ $(40 / 83)$ $(39 / 82)$ under O1. $E (N | p)$ for the proposed design is 44.23 when the true response rate is $40 %$ , which is almost the same for LS design. But when the true response rate is $55 %$ and $60 %$ , $E (N | p)$ are 63.39 and 57.92 respectively, which are notably lower than that for LS design. This is true for the other three optimality criteria: see Table 1.

5. Comparison with Mander and Thompson's design

The main difference between these two designs lies in the number of target response rates being considered. In the proposed design, against one maximum uninteresting response rate $p_{0}$ , we consider two target response rates $p_{1}$ and $p_{2}$ , where Mander and Thompson's design [17] consider only one $p_{1}$ . If $p_{1} = p_{2}$ and $β_{1} = β_{2}$ then $s_{1} = r_{1}$ , $c_{1} = c_{2}$ and m = n, and the proposed design becomes Mander and Thompson's design. Continuing the VBG example of Lin and Shih's study presented in Section 4, against one maximum uninteresting response rate ( $40 %$ ), it is not possible to test two different target response rates ( $55 %$ and $60 %$ ) by using Mander and Thompson's design at the same time. One possible solution may be conducting two separate tests as $40 %$ vs. $55 %$ and $40 %$ vs. $60 %$ . Appropriate designs for these separate tests are given in Table 3. These designs are computed by grid searching over different combinations of n, $n_{1}$ , r, $r_{1}$ and $r_{2}$ using the self-written code in R.

Table 3.

Mander and Thompson's design for $p_{0} = 0.40$ and $p_{1} = 0.55$ or 0.60 at $β = 0.20$ and 0.10.

			First stage			Second stage				True		Expected sample size}
$p_{0}$	$p_{1}$	β	$n_{1}$	$r_{1}$	$r_{2}$	n	r	$P E T (p_{0})$	$P E T (p_{1})$	α	β	$E (N \| p_{0})$	$E (N \| p_{1})$	Comment
0.40	0.55	0.20	26	11	17	84	40	0.676	0.237	0.050	0.194	44.78	70.23	$H_{0} - o p t i m a l_{E}$
			41	16	23	69	34	0.530	0.414	0.050	0.199	54.17	57.41	$H_{0} - m i n i m a x_{E}$
			44	19	23	80	40	0.759	0.663	0.049	0.200	52.69	56.12	$H_{1} - o p t i m a l_{E}$
			41	16	23	69	34	0.530	0.414	0.050	0.199	54.17	57.41	$H_{1} - m i n i m a x_{E}$
0.40	0.60	0.10	25	11	17	66	32	0.733	0.231	0.049	0.098	35.93	56.51	$H_{0} - o p t i m a l_{E}$
			29	12	19	54	27	0.639	0.248	0.049	0.099	38.03	47.81	$H_{0} - m i n i m a x_{E}$
			27	10	15	62	32	0.492	0.626	0.048	0.099	44.77	40.09	$H_{1} - o p t i m a l_{E}$
			36	16	21	54	27	0.772	0.561	0.050	0.098	40.10	43.91	$H_{1} - m i n i m a x_{E}$

Open in a new tab

For testing $40 %$ vs. $55 %$ , $H_{0} - o p t i m a l_{E}$ will be (11 17)/26 40/84. That means at the first stage, 26 patients will be recruited. If the 11 or fewer patients respond, the study will be stopped due to futility, and if the number is more than 17, we will stop the study because of efficacy and reject the null hypothesis for $55 %$ target response rate. However, suppose the number of responses is more than 11 and less or equal to 17 patients. In that case, we will proceed to the second stage and recruit 58 additional patients and reject the null hypothesis only if the total number of responses is more than 40. Similarly, for testing $40 %$ vs. $60 %$ , $H_{0} - o p t i m a l_{E}$ design will be (11 17)/25 32/66. Although it is possible to stop the study early for both futility and efficacy in Mander and Thompson's design, it is impossible to mitigate the uncertainty that arises while selecting the target response rate.

6. Discussion

The phase II clinical trial aims to determine whether a drug is effective and screens out the ineffective drugs. Phase II is an early phase of a clinical trial, and recruiting fewer patients is desirable. The adaptive phase II design by Lin and Shih [16] only considers the futility to stop early and has a large expected sample size if the proposed drug is effective. In this paper, we have discussed why efficacy should also be considered as a reason for early termination. A design has been proposed for a single-arm phase II clinical trial that, along with futility, also considers efficacy to stop early. The proposed design can achieve a notable reduction in the expected sample size if the drug is effective without affecting the sample size when the drug is ineffective.

One of the difficulties is that the proposed design takes much time to be calculated. Designs at $10 %$ significance level take less time than that of the designs at $5 %$ significance level. This is because the sample sizes in both stages are notably larger for $5 %$ significance level. It will take even more time if we consider designs at $1 %$ significance level. The other difficulty is calculating the values of $c_{2}$ . As discussed at the beginning, Kim and Wong [12] proposed an adaptive phase II clinical trial design that allows three target response rates against one null response rate. However, they did not allow early stopping for efficacy, rather considered futility as the only reason. The authors used the Particle Swarm Optimisation (PSO) technique introduced by Kennedy and Eberhart [11] to find the solutions of parameters for their optimal design. One possible extension of the proposed design could be the usage of PSO to find the solutions. The design can also be extended for three or more target response rates and their associated type II error rates against one maximum uninteresting response rate. Finally, the paper's findings should encourage stopping early for both futility and efficacy in two-stage adaptive design for phase II trials.

Acknowledgments

The authors would like to thank the reviewers for their useful suggestions to improve the paper. The first author also would like to thank the Ministry of Science and Technology, Government of Bangladesh, for providing him the National Science and Technology Fellowship during this work.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Berry S.M., Carlin B.P., Lee J.J., and Muller P., Bayesian Adaptive Methods for Clinical Trials, CRC Press, New York, 2010. [Google Scholar]
2.Chen T.T., Optimal three-stage designs for phase II cancer clinical trials, Stat. Med. 16 (1998), pp. 2701–2711. [DOI] [PubMed] [Google Scholar]
3.Chen K. and Shan M., Optimal and minimax three-stage designs for phase II oncology clinical trials, Contemp. Clin. Trials 29 (2008), pp. 32–41. [DOI] [PubMed] [Google Scholar]
4.Englert S. and Kieser M., Improving the flexibility and efficiency of phase II designs for oncology trials, Biometrics 68 (2011), pp. 886–892. [DOI] [PubMed] [Google Scholar]
5.Englert S. and Kieser M., Adaptive designs for single-arm phase II trials in oncology, Pharm. Stat. 11 (2012), pp. 241–249. [DOI] [PubMed] [Google Scholar]
6.Fleming T.R., One-sample multiple testing procedure for phase II clinical trials, Biometrics 38 (1982), pp. 143–151. [PubMed] [Google Scholar]
7.Gehan E.A., The determination of the number of patients required in a preliminary and a follow-up trial of a new chemotherapeutic agent, J. Chronic Dis. 13 (1961), pp. 346–353. [DOI] [PubMed] [Google Scholar]
8.Jin H. and Yin G., Bayesian enhancement two-stage design with error control for phase II clinical trials, Stat. Med. 39 (2020), pp. 4452–4465. [DOI] [PubMed] [Google Scholar]
9.Jung S.-H., Randomized phase II trials with a prospective control, Stat. Med. 27 (2008), pp. 568–583. [DOI] [PubMed] [Google Scholar]
10.Jung S.-H., Lee T., Kim K., and George S.L., Admissible two-stage designs for phase II cancer clinical trials, Stat. Med. 23 (2004), pp. 561–569. [DOI] [PubMed] [Google Scholar]
11.Kennedy J. and Eberhart R., Particle swarm optimization, Proc. Int. Conf. Neural Networks 4 (1995), pp. 1942–1948. [Google Scholar]
12.Kim S. and Wong W.K., Extended two-stage adaptive design with three target responses for phase II clinical trial, Stat. Methods Med. Res. 27 (2017), pp. 3628–3642. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Lai T.L., Lavori P.W., and Shih M.-C., Sequential design of phase II-III cancer trials, Stat. Med. 31 (2012), pp. 1944–1960. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Lee J.J. and Feng L., Randomized phase II designs in cancer clinical trials: Current status and future directions, J. Clin. Oncol. 23 (2005), pp. 4450–4457. [DOI] [PubMed] [Google Scholar]
15.Lee J.J. and Liu D.D., A predictive probability design for phase II cancer clinical trials, Clin. Trials 5 (2008), pp. 93–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Lin Y. and Shih W.J., Adaptive two-stage designs for single-arm phase IIA cancer clinical trials, Biometrics 60 (2004), pp. 482–490. [DOI] [PubMed] [Google Scholar]
17.Mander A.P. and Thompson S.G., Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials, Contemp. Clin. Trials 31 (2010), pp. 572–578. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Mander A.P., Wason J.M.S., Sweeting M.J., and Thompson S.G., Admissible two-stage designs for phase II cancer clinical trials that incorporate the expected sample size under the alternative hypothesis, Pharm. Stat. 11 (2012), pp. 91–96. [DOI] [PubMed] [Google Scholar]
19.O'Brien P.C. and Fleming T.R., A multiple testing procedure for clinical trials, Biometrics 35 (1979), pp. 549–556. [PubMed] [Google Scholar]
20.Sambucini V., A Bayesian predictive strategy for an adaptive two-stage design in phase II clinical trials, Stat. Med. 29 (2010), pp. 1430–1442. [DOI] [PubMed] [Google Scholar]
21.Sambucini V., Bayesian predictive monitoring with bivariate binary outcomes in phase II clinical trials, Comput. Stat. Data Anal. 132 (2019), pp. 18–30. [Google Scholar]
22.Shan G., Wilding G.E., Hutson A.D., and Gerstenberger S., Optimal adaptive two-stage designs for early phase II clinical trials, Stat. Med. 35 (2015), pp. 1257–1266. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Shan G., Zhang H., and Jiang T., Adaptive two-stage optimal designs for phase II clinical studies that allow early futility stopping, Seq. Anal. 38 (2019), pp. 199–213. [Google Scholar]
24.Shi H. and Yin G., Bayesian two-stage design for phase II clinical trials with switching hypothesis tests, Bayesian Anal. 12 (2017), pp. 31–51. [Google Scholar]
25.Shi H. and Yin G., Two-stage seamless transition design from open-label single-arm to randomized double-arm clinical trials, Stat. Methods Med. Res. 27 (2018), pp. 158–171. [DOI] [PubMed] [Google Scholar]
26.Simon R., Optimal two-stage designs for phase II clinical trials, Control. Clin. Trials 10 (1989), pp. 1–10. [DOI] [PubMed] [Google Scholar]
27.Ye F. and Shyr Y., Balanced two-stage designs for phase II clinical trials, Clin. Trials 4 (2007), pp. 514–524. [DOI] [PubMed] [Google Scholar]

PERMALINK

Stopping for efficacy in single-arm phase II clinical trials

Rezoanoor Rahman

M Iftakhar Alam

Abstract

1. Introduction

2. Methodology

2.1. Simon's two-stage design

2.2. Mander and Thomson's design

2.3. Lin and Shih's design

2.4. Proposed design

2.5. Algorithm

3. Numerical results

3.1. Proposed design

Table 1.

Table 2.

3.2. Comparison with Lin and Shih's design

Figure 1.

4. Application on Lin and Shih's VBG study

5. Comparison with Mander and Thompson's design

Table 3.

6. Discussion

Acknowledgments

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Stopping for efficacy in single-arm phase II clinical trials

Rezoanoor Rahman

M Iftakhar Alam

Abstract

1. Introduction

2. Methodology

2.1. Simon's two-stage design

2.2. Mander and Thomson's design

2.3. Lin and Shih's design

2.4. Proposed design

2.5. Algorithm

3. Numerical results

3.1. Proposed design

Table 1.

Table 2.

3.2. Comparison with Lin and Shih's design

Figure 1.

4. Application on Lin and Shih's VBG study

5. Comparison with Mander and Thompson's design

Table 3.

6. Discussion

Acknowledgments

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases