Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2021 Mar 21;49(10):2447–2466. doi: 10.1080/02664763.2021.1904846

Stopping for efficacy in single-arm phase II clinical trials

Rezoanoor Rahman 1, M Iftakhar Alam 1,CONTACT
PMCID: PMC9225313  PMID: 35757036

Abstract

Phase II clinical trials investigate whether a new drug or treatment has sufficient evidence of effectiveness against the disease under study. Two-stage designs are popular for phase II since they can stop in the first stage if the drug is ineffective. Investigators often face difficulties in determining the target response rates, and adaptive designs can help to set the target response rate tested in the second stage based on the number of responses observed in the first stage. Popular adaptive designs consider two alternate response rates, and they generally minimise the expected sample size at the maximum uninterested response rate. Moreover, these designs consider only futility as the reason for early stopping and have high expected sample sizes if the provided drug is effective. Motivated by this problem, we propose an adaptive design that enables us to terminate the single-arm trial at the first stage for efficacy and conclude which alternate response rate to choose. Comparing the proposed design with a popular adaptive design from literature reveals that the expected sample size decreases notably if any of the two target response rates are correct. In contrast, the expected sample size remains almost the same under the null hypothesis.

Keywords: Phase II trial, two-stage design, optimal design, single-arm trial, sample size

1. Introduction

After obtaining the dose with an acceptable level of toxicity in phase I, we move to phase II for screening out the drugs that have little or no effect on the disease while minimising the number of patients exposed. Phase II trials can be further divided into single-arm or double-arm. The single-arm trials are often known as IIa trials, where the drug's efficacy is compared with the fixed standard response rate. Similarly, double-arm trials are known as phase IIb trials, where the experimental drug is compared with the other standard or experimental drugs so that the most promising one can be carried to the next phase for large scale evaluation [1]. Compared to phase IIa trials, phase IIb trials require a larger sample size. Since the paper is devoted to single-arm trials, we restrict ourselves mostly to phase IIa designs. Moreover, we exclusively use phase II to mean a phase IIa trial. Fleming [6] proposed a design for phase II that calculates critical values for testing the null hypothesis using the O'Brien and Fleming multiple testing procedure [19]. This design allowed early stopping under controlled type I and II error rates, and there was no attempt to be ‘optimal’ in terms of minimising the expected sample size. Multi-stage designs are more popular than the single-stage designs since they can stop the study early if the drug is ineffective. The very first two-stage design was proposed by Gehan [7]. This design was highly criticized as it has a high probability of going to the second stage even for an inferior performing drug, which contradicts the main idea of using multi-stage designs.

Simon [26] proposed two-stage designs, optimal and minimax, which minimise the expected sample size and maximum sample size, respectively, under the null hypothesis. The idea behind the two-step implementation is that it is not ethical to proceed further if the drug is not active and to terminate the study at the first stage for futility. The problem arises when we have an efficacious drug since we cannot stop early using Simon's designs as they do not consider efficacy as a stopping rule at the first stage. The average sample size approaches close to the maximum sample size since the probability of early termination due to futility becomes close to zero. One possible solution to this problem might be constructing designs that minimise the expected sample size under the alternate response rate. Nevertheless, for such designs, the expected sample size under the null response rate would be larger than Simon's optimal design. There have been several extensions to the Simon two-stage designs, including the optimal three-stage design [2], optimal three-stage design stopping for efficacy [3], and admissible designs that balance the optimisation criteria of expected sample size and maximum sample size [10]. The list also includes a predictive probability design [15], balanced two-stage designs [27], adaptive two-stage optimal design [23], etc. All these papers only consider the optimal design under the null response rate.

Mander and Thompson [17] showed that in situations where an agent is active, Simon's two-stage design is not optimal. The authors proposed designs that also consider efficacy as a reason for early termination. They showed that if a trial stops early for both futility and efficacy, then the expected sample size reduces in almost every case. The new early stopping rule generally increases the probability of early termination, which reduces the expected sample size. In designing clinical trials, especially in the early investigation of new treatments, researchers often face uncertainty in assuming the variability of the response variable and/or the treatment effects' magnitude. A natural way to resolve this problem is to choose the alternate response rate with some flexibility. Lin and Shih [16] introduced an adaptive two-stage phase II design that concerns the specification of alternative response rate and the associated power. They considered two alternate response rates and their associated pre-specified powers. The design takes a primary sample in the first stage and based on the response in the first stage, it decides which alternate response rate to be tested. Like Simon's design, this design also considers futility as the only reason for early termination.

Sambucini [20] used a Bayesian predictive strategy to derive an adaptive two-stage design, where the second stage sample size is not selected in advance but depends on the first stage responses. Englert and Kieser [4] considered the loss of power while transforming continuous test statistic into discrete test statistic and proposed a method based on the conditional error function principle that directly accounts for the discreteness of the outcome. Englert and Kieser [5] proposed a design that allows an arbitrary modification of the sample size of the second stage using the results of the interim analysis or external information while controlling the type I error rate. Shan et al. [22] proposed an adaptive design that used a branch-and-bound algorithm to find the optimal design with the smallest expected sample size under the null hypothesis. Kim and Wong [12] developed a design that considered three alternative response rates with their associated powers. This paper is an extension of Lin and Shih [16] to include three alternative response rates. Sambucini [21] took efficacy and safety as bivariate binary outcomes and proposed design using Bayesian predictive strategy for interim monitoring. Jin and Yin [8] proposed the Bayesian enhancement two-stage design to strengthen the passing criterion to the second stage. Mander et al. [18] combined the maximum sample size and the two expected sample sizes under null and alternative hypotheses to produce an expected loss function to find admissible designs.

Jung [9] considered phase II trials randomising patients between a prospective control and an experimental therapy. This design is analog to Simon's design for a single-arm trial. Lai et al. [13] expanded a randomised phase II study of response rate seamlessly into a randomised phase III study of time to failure. This approach is based on advances in group sequential designs and joint modeling of the response rate and time to the event. Shi and Yin [24] proposed a Bayesian two-stage design with changing hypothesis tests to bridge the single- and double-arm schemes in one phase II clinical trial. Shi and Yin [25] proposed a two-stage design, in which the first stage takes a single-arm comparison of the experimental drug with the standard response rate, and the second stage imposes a two-arm comparison by adding an active control arm.

This paper is organised as follows. Section 2 presents the methodology of the work. A new design is proposed in Section 2.4 and its computational algorithm is presented in Section 2.5. The numerical results of the proposed design are available in Section 3. Finally, we end up with a discussion in Section 6.

2. Methodology

Usually, the primary endpoint for a phase II clinical trial is categorised as a response or no response. For cancer trials, the clinical response is complete (the patient is cured completely) or partial response. Partial response is often defined as a 50% or more tumor volume shrinkage based on a two-dimensional measurement. Assume that p is the true response rate for an experimental drug. Then the hypothesis to be tested is

H0:pp0H1:pp1,

where p0 is the maximum uninteresting response rate and p1 is the minimum desired response rate. Generally, the value of p0 is below 0.3 and the improvement of the target response rate ( p1p0) is between 0.1 and 0.3 [14].

2.1. Simon's two-stage design

According to Simon's design [26], n1 patients are recruited in the first stage. Let x be the number of responses in the first stage. If xr1, then the trial is stopped for futility, and the drug is identified as ineffective. If more than r1 responses are observed in the first stage, then the study proceeds to the second stage, and n2 ( n1+n2=n) more patients are recruited. If the total number of responses observed from the two-stages is more than r, then H0 is rejected. Simon's design is indexed by the four numbers r1, r, n1 and n, and is referred as a ‘ r1/n1 r/n’ design. The probability that the design would not proceed to the second stage or the probability of early termination (PET) is

PET(p)=B(n1,p,r1).

The probability of not recommending a drug to proceed further is

R¯(p)=B(n1,p,r1)+i=r1+1min(n1,r)b(n1,p,i)×B(nn1,p,ri), (1)

where b(.) and B(.) are the probability mass function and cumulative distribution function of the binomial distribution, respectively. The expected sample size is then obtained as

E(N|p)=n1×PET(p)+n×{1PET(p)}. (2)

Both R¯(p) and E(N|p) are expressed as a function of the true response rate p. Let α and β be the type I and type II error probabilities, respectively. For predetermined p0, p1, α and β, an acceptable design should satisfy the error constraints R¯(p0)1α and R¯(p1)β. Let Ω be the set of all such designs. Simon's optimal design is the one in Ω that has the minimum expected sample size E(N|p0) under the null response rate. On the other hand, the minimax design is the one that has the smallest E(N|p0) among those designs in Ω with the smallest n.

2.2. Mander and Thomson's design

Like Simon's design, Mander and Thompson's design [17] also recruits n1 patients in the first stage and have a similar stopping rule in the second stage. The only exception is that the design stops for efficacy if x>r2, where x is the number of responses in the first stage. The design rejects H0 and proceed to the second stage only if r1<xr2. It recruits n2 patients in the second stage, where n=n1+n2. This design is indexed by five numbers and will be referred as ‘ (r1 r2)/n1 r/n’ design. The probability of early termination for this design is

PET(p)=B(n1,p,r1)+1B(n1,p,r2).

The probability of not rejecting the null hypothesis is given as

R¯(p)=B(n1,p,r1)+i=r1+1r2b(n1,p,i)×B(nn1,p,ri). (3)

Then the expected number of patients is

E(N|p)=n1×PET(p)+n×{1PET(p)}. (4)

As before we require the error probabilities to be constrained by R¯(p)1α and R¯(p)β. Suppose ΩE be the set of all such designs. Mander and Thompson used four optimality criteria and named those designs as:

  1. H0optimalE: E(N|p0) is the smallest.

  2. H0minimaxE: E(N|p0) is the smallest among those designs in ΩE with the smallest n.

  3. H1optimalE: E(N|p1) is the smallest.

  4. H1minimaxE: E(N|p1) is the smallest among those designs in ΩE with the smallest n.

2.3. Lin and Shih's design

Lin and Shih [16] emphasised the uncertainty that investigators often face while choosing the target response rate. They proposed an adaptive two-stage design with two choices of the target response rates p1 and p2 ( p0<p1<p2). This design has a fixed sample size n1 in the first stage, like Simon's design, but the sample size in the second stage depends on the number of responses observed in the first stage. Let x be the number of responses observed in the first stage. Then Lin and Shih's design proceeds as follows:

  1. If xs1, stop the trial for futility.

  2. If s1<xr1, power the study at (1β1) for p=p1 and enter m2=mn1 additional patients into the study. Reject the null hypothesis ( pp0) if the total number of responses >s out of m patients.

  3. If x>r1, power the study at (1β2) for p=p2 and enter n2=nn1 additional patients into the study. Reject the null hypothesis ( pp0) if the total number of responses >r out of n patients.

This design is indexed by the seven numbers s1, r1, n1, s, m, r and n, and we refer it as ( s1/r1/n1) (s/m) (r/n). The probability of terminating the study at the first stage is

PET(p)=B(n1,p,s1).

The probability of not recommending the drug is

R¯(p)=B(n1,p,s1)+i=s1+1min(r1,s)b(n1,p,i)×B(mn1,p,si)+i=r1+1min(r,n1)b(n1,p,i)×B(nn1,p,ri). (5)

Finally, the expected sample size is given as

E(N|p)=n1+(mn1)×{(B(n1,p,r1)B(n1,p,s1)}+(nn1)×{1B(n1,p,r1)}. (6)

For predetermined p0, p1, p2, α, β1 and β2, an acceptable design must satisfy the following error constraints

R¯(p0)1α,R¯(p1)β1,andR¯(p2)β2. (7)

Although not required, it is reasonable to let β1β2 in practice because we would like to have higher power for detecting more improvement of the new therapy ( p2 versus p0). From the feasibility aspect, we need to compromise the power somewhat for less improvement ( p1 versus p0) because the sample size cannot be too large for a phase II study. Let Ω be the set of designs that satisfy the error constraints. Lin and Shih [16] proposed designs based on the following optimality conditions.

  1. Optimality type 1 (O1): E(N|p0) is the smallest.

  2. Optimality type 2 (O2): max{E(N|pi);i=0,1,2} is the smallest.

  3. Optimality type 3 (O3): max(n,m) is the smallest among all fesible solutions and E(N|p0) is the smallest among such solutions.

  4. Optimality type 4 (O4): max(n,m) is the smallest among all fesible solutions and max{E(N|pi);i=0,1,2} is the smallest among such solutions.

O1 and O3 are extensions to Simon's ‘optimal design’ and ‘minimax design’ criteria. That is, if p1=p2 and β1=β2, then r1=s1, s = r, m = n, and O1 and O3 reduces to Simon's optimal and minimax designs, respectively.

2.4. Proposed design

Like Simon's design, the design by Lin and Shih [16] considers only futility as a reason for early stopping in the first stage. The design has a larger expected sample size if the proposed drug is effective. We propose an adaptive design that considers both futility and efficacy as reasons for early stopping. Although it is easier to set the uninteresting rate p0, the same is not right for p1, the alternate response rate. The challenge in selecting a single alternate response rate can be minimised by the specification of two alternate response rates instead. Therefore, as in Lin and Shih's design, the proposed design also considers two alternate response rates. Based on the response in the first stage, it determines which target response rate is to be tested and engage patients in the second stage accordingly. Let x be the number of responses in the first stage out of the n1 patients enrolled. The proposed design then proceeds as follows:

  1. If xs1, stop the trial for futility.

  2. If s1<xr1, power the study at (1β1) for pp1 and enter m2=mn1 additional patients into the study. Reject the null hypothesis ( pp0) if the total number of responses >s out of m patients.

  3. If r1<xc1, power the study at (1β2) for pp2 and enter n2=nn1 additional patients into the study. Reject the null hypothesis ( pp0) if the total number of responses >r out of n patients.

  4. If c1<xc2, stop for efficacy in the first stage and reject the null hypothesis for H1: pp1.

  5. If x>c2, stop for efficacy and reject the null hypothesis for H1: pp2.

This design is indexed by the nine parameters s1, r1, c1, c2, n1, s, m, r and n, where 0s1<r1<c1<c2n1. It is referred as ( s1/r1/c1/c2/n1) (s/m) (r/n). The probability of early termination in the proposed design is

PET(p)=B(n1,p,s1)+1B(n1,p,c1).

Then the probability of not rejecting H0 is

R¯(p)=B(n1,p,s1)+i=s1+1min(r1,s)b(n1,p,i)×B(mn1,p,si)+i=r1+1min(r,c1)b(n1,p,i)×B(nn1,p,ri). (8)

Finally, the expected sample size is

E(N|p)=n1+(mn1)×{(B(n1,p,r1)B(n1,p,s1)}+(nn1)×{B(n1,p,c1)B(n1,p,r1)},=w1n1+w2m+w3n, (9)

where w1=PET(p) is the probability that the design would stop at the first stage, w2={B(n1,p,r1)B(n1,p,s1} is the probability that the design would proceed to the second stage and test for the target response rate p1 and w3={B(n1,p,c1)B(n1,p,r1)} is the probability that the design would proceed to the second stage and test for the target response rate p2. Note that w1+w2+w3=1. For a predetermined level of ϕ= (p0,p1,p2,α,β1,β2), we find the optimal designs under the four optimality criteria proposed by Lin and Shih [16], and these designs satisfy the error constrains in (7). From Equation (8), we can see that the probability of not rejecting H0 does not depend on c2. So the value of c2 will be determined based on the following hypothesis

H0:pp1,H1:pp2.

2.5. Algorithm

Now we discuss the algorithm to find the optimal solutions. For specified values of p0, p1, p2, α, β1 and β2, at first feasible values of m and n are calculated. The range of m is calculated as 0.85 to 1.5 times the sample size of the one-stage design for testing p0 versus p1 at the significance level α and power (1β1). This strategy is similar to that was used by Simon [26] in his algorithm. To ensure that the range's values are integers, the lower value is taken as the largest integer that is less than or equal to the previously calculated lower value. Similarly, the range's upper value is taken as the smallest integer that is greater than or equal to the previously calculated upper value of the range. Similarly, feasible range of n is calculated from the sample size of the design for testing p0 versus p2 at the significance level α and power (1β2).

For each combination of m, n and n1 in the range {1,min(m,n)1}, at first c2 is calculated as the [{1(β1β2)}th quantile of n1 1]. Following this, s1 is taken in the range (0, c2), r1 in the range (s1+1, c2), c1 in the range (r1+1, c2), s in the range (s1+1, m), and r in the range (r1+1, m). Then we find the designs with parameters s1, r1, c1, c2, n1, s, m, r, and n those satisfy the error constraints associated with β1 and β2. Then we check which remaining designs fulfill the error constraint associated with α and call them ‘set of feasible designs’. For every feasible design, the expected sample sizes E(N|p0), E(N|p1), and E(N|p2) are calculated.

To find the designs of optimality type 1, we arrange the designs in order of E(N|p0). For optimality type 2, a vector of max{E(N|pi);i=0,1,2} is calculated and the designs are ordered accordingly. For finding the design of optimality type 3 and 4, first a column containing max(m,n) is calculated. For optimality 3, the designs are first ordered according to E(N|p0) and then max(m,n). Finally, for optimality type 4, the ordering is done first according to max{E(N|pi);i=0,1,2} and then max(m,n). In every optimality criterion, the first ten ordered designs are kept and the first design is called the optimality design under that optimality criterion. All the designs are obtained numerically using a self-written code in R through parallel computation.

3. Numerical results

3.1. Proposed design

We have considered p0 from 0.05 to 0.50 with regular interval 0.05 and the differences (p1p0) and (p2p0) are kept fixed as 0.15 and 0.20, respectively. Level of significance α is taken as 5% and 10% and the maximum type II error β1 and β2 are kept fixed as 20% and 10% for the target response rate p1 and p2, respectively. The designs for α=0.05 and α=0.10 are shown in Tables 1 and 2, respectively.

Table 1.

Proposed and LS designs for α=0.05, β1=0.20 and β2=0.10.

          First stage Second stage True       Expected sample size
p0 p1 p2 Design Optimal type n1 s1 r1 c1 c2 m s n r α β1 β2 PET(p0) PET(p1) PET(p2) E(N|p0) E(N|p1) E(N|p2)
0.05 0.20 0.25 Proposed 1 10 0 1 2 3 28 3 38 4 0.042 0.199 0.086 0.610 0.430 0.531 17.76 23.29 21.26
        2 12 0 1 2 3 28 3 26 3 0.049 0.197 0.080 0.560 0.510 0.640 18.84 19.27 17.28
        3 12 0 1 2 3 27 3 27 3 0.049 0.198 0.080 0.560 0.510 0.640 18.60 19.34 17.38
        4 12 0 1 2 3 27 3 27 3 0.049 0.198 0.080 0.560 0.510 0.640 18.60 19.34 17.38
      LS 1 9 0 2     31 3 43 5 0.049 0.200 0.094 0.630 0.134 0.075 17.23 31.19 34.14
        2 18 0 2     29 3 23 3 0.044 0.199 0.080 0.397 0.018 0.006 24.28 24.43 23.75
        3 21 0 1     26 2 26 3 0.047 0.197 0.076 0.341 0.009 0.002 24.30 25.95 25.99
        4 21 0 1     26 2 26 3 0.047 0.197 0.076 0.341 0.009 0.002 24.30 25.95 25.99
0.10 0.25 0.30 Proposed 1 18 2 3 5 6 37 6 50 8 0.049 0.200 0.078 0.740 0.420 0.530 24.13 34.41 31.82
        2 17 1 3 4 6 40 7 38 7 0.049 0.199 0.068 0.504 0.480 0.690 28.29 28.61 25.12
        3 23 2 4 6 7 38 7 37 6 0.050 0.199 0.067 0.598 0.395 0.576 28.97 31.70 29.06
        4 24 2 4 6 8 38 7 36 6 0.049 0.198 0.066 0.572 0.430 0.620 29.84 31.23 28.72
      LS 1 18 2 3     38 6 49 8 0.048 0.199 0.077 0.734 0.135 0.060 24.40 42.93 45.99
        2 21 2 4     44 8 29 5 0.048 0.199 0.068 0.648 0.075 0.027 28.30 32.80 31.35
        3 18 1 3     37 6 38 7 0.050 0.199 0.068 0.450 0.039 0.014 28.54 36.94 37.57
        4 26 2 4     38 7 35 6 0.050 0.198 0.067 0.511 0.026 0.007 31.54 35.24 35.14
0.15 0.30 0.35 Proposed 1 19 3 4 6 7 54 12 59 13 0.050 0.200 0.074 0.700 0.468 0.578 30.12 39.55 35.43
        2 25 4 6 7 9 57 13 41 10 0.050 0.199 0.065 0.708 0.580 0.700 33.65 35.74 31.65
        3 38 6 9 10 14 46 11 46 10 0.050 0.200 0.061 0.679 0.650 0.840 40.56 40.78 39.31
        4 38 6 9 10 14 46 11 46 10 0.050 0.200 0.061 0.679 0.650 0.840 40.56 40.78 39.31
      LS 1 19 3 6     55 12 46 10 0.050 0.199 0.074 0.684 0.133 0.059 30.22 47.20 48.20
        2 27 4 7     51 12 34 8 0.049 0.198 0.061 0.619 0.059 0.018 35.48 39.57 37.27
        3 38 6 9     46 11 46 10 0.050 0.200 0.061 0.659 0.036 0.008 40.73 45.71 45.94
        4 38 6 9     46 11 46 10 0.050 0.200 0.061 0.659 0.036 0.008 40.73 45.71 45.94
0.20 0.35 0.40 Proposed 1 23 5 6 9 10 56 15 69 19 0.049 0.199 0.068 0.704 0.390 0.500 34.74 49.46 45.19
        2 26 5 8 9 11 63 18 47 14 0.049 0.199 0.059 0.601 0.490 0.660 40.20 42.20 36.46
        3 28 5 6 11 12 52 15 53 15 0.050 0.200 0.058 0.505 0.290 0.460 40.18 45.79 41.48
        4 38 8 9 13 16 53 16 53 15 0.050 0.199 0.057 0.667 0.510 0.720 42.99 45.30 42.18
      LS 1 23 5 6     56 15 66 18 0.049 0.199 0.068 0.695 0.131 0.054 34.67 59.14 62.98
        2 35 8 11     60 17 40 12 0.049 0.196 0.058 0.745 0.089 0.026 40.69 45.81 43.25
        3 31 6 12     53 15 40 13 0.050 0.200 0.058 0.571 0.046 0.013 40.38 48.56 46.48
        4 31 6 12     53 15 40 13 0.050 0.200 0.058 0.571 0.046 0.013 40.38 48.56 46.48
0.25 0.40 0.45 Proposed 1 26 7 8 12 13 55 18 74 24 0.050 0.199 0.064 0.690 0.320 0.420 38.31 56.62 52.73
        2 34 9 10 13 16 87 29 61 21 0.050 0.200 0.055 0.692 0.580 0.730 45.58 46.97 41.38
        3 37 9 11 17 18 58 19 59 20 0.050 0.200 0.054 0.552 0.220 0.400 46.60 54.13 50.26
        4 37 8 14 15 18 59 20 57 19 0.050 0.200 0.054 0.411 0.420 0.650 49.92 49.54 44.50
      LS 1 23 6 7     59 19 74 24 0.049 0.197 0.066 0.654 0.124 0.051 38.41 65.98 70.44
        2 38 9 14     65 22 41 15 0.050 0.200 0.058 0.513 0.027 0.006 50.32 50.18 45.62
        3 37 9 11     58 19 59 20 0.050 0.200 0.054 0.550 0.035 0.008 46.64 58.14 58.79
        4 42 9 16     59 20 51 17 0.050 0.200 0.058 0.371 0.009 0.001 52.53 54.58 52.81
0.30 0.45 0.50 Proposed 1 28 9 10 14 15 57 22 80 30 0.050 0.199 0.061 0.690 0.350 0.470 41.21 59.45 54.50
        2 34 11 13 15 18 75 29 71 28 0.050 0.200 0.053 0.720 0.560 0.700 45.15 50.82 44.52
        3 39 12 14 19 21 63 24 64 25 0.050 0.200 0.051 0.622 0.310 0.510 48.22 56.02 51.16
        4 37 10 11 17 20 60 23 64 25 0.050 0.200 0.051 0.437 0.410 0.630 51.63 52.92 46.90
      LS 1 28 9 10     58 22 77 29 0.049 0.200 0.061 0.682 0.119 0.044 41.16 69.38 73.94
        2 42 14 18     70 27 45 19 0.050 0.200 0.056 0.743 0.085 0.022 48.54 53.94 49.90
        3 39 12 16     64 25 63 24 0.050 0.199 0.051 0.618 0.050 0.012 48.50 62.11 62.87
        4 37 9 17     64 25 45 18 0.049 0.200 0.051 0.289 0.008 0.001 55.95 56.42 52.02
0.35 0.50 0.55 Proposed 1 27 10 11 15 16 77 33 79 34 0.050 0.198 0.06 0.678 0.340 0.450 43.47 60.87 55.50
        2 34 12 16 17 20 77 34 59 27 0.050 0.200 0.05 0.616 0.490 0.660 50.02 53.36 45.77
        3 41 14 15 23 24 66 30 66 29 0.050 0.200 0.049 0.528 0.200 0.390 52.80 60.89 56.24
        4 47 17 18 25 27 66 30 66 29 0.050 0.199 0.049 0.634 0.320 0.550 53.95 59.93 55.55
      LS 1 27 10 11     69 30 80 34 0.050 0.198 0.06 0.670 0.124 0.046 43.10 72.37 76.97
        2 43 17 20     87 38 45 21 0.050 0.200 0.057 0.785 0.111 0.030 50.66 56.09 50.70
        3 42 15 22     66 29 65 29 0.050 0.200 0.049 0.608 0.044 0.009 51.41 64.62 65.20
        4 50 19 26     66 29 62 29 0.050 0.200 0.049 0.726 0.059 0.012 54.36 63.71 63.36
0.40 0.55 0.60 Proposed 1 28 12 14 17 18 83 40 82 39 0.050 0.199 0.06 0.703 0.350 0.450 44.23 63.39 57.92
        2 39 17 20 21 24 83 41 67 33 0.050 0.199 0.048 0.763 0.600 0.730 49.00 54.68 47.94
        3 37 15 19 22 23 70 34 70 35 0.049 0.199 0.047 0.602 0.290 0.480 50.13 60.28 54.24
        4 43 17 23 24 27 69 34 70 34 0.050 0.197 0.045 0.554 0.430 0.670 54.62 57.96 51.80
      LS 1 26 11 12     79 38 82 39 0.050 0.200 0.062 0.674 0.135 0.052 43.89 74.13 78.93
        2 40 17 21     81 40 45 23 0.050 0.199 0.049 0.689 0.077 0.019 51.36 57.51 51.75
        3 36 14 19     69 34 66 32 0.050 0.200 0.047 0.518 0.038 0.008 51.77 66.11 66.13
        4 39 15 22     69 34 45 23 0.050 0.200 0.046 0.491 0.028 0.005 53.95 59.29 53.98
0.45 0.60 0.65 Proposed 1 33 16 17 21 22 74 40 79 42 0.050 0.199 0.052 0.729 0.400 0.540 44.94 60.32 54.06
        2 40 19 23 24 27 77 42 57 31 0.050 0.199 0.044 0.704 0.510 0.700 50.53 55.40 48.60
        3 50 23 26 32 33 67 36 68 37 0.050 0.199 0.044 0.616 0.270 0.510 56.66 63.05 58.78
        4 45 19 26 28 30 68 37 67 36 0.050 0.200 0.044 0.420 0.340 0.600 58.32 59.96 54.00
      LS 1 31 15 16     74 39 79 42 0.050 0.200 0.054 0.713 0.128 0.042 44.23 72.38 76.75
        2 43 20 25     75 41 46 26 0.049 0.199 0.044 0.639 0.051 0.010 53.68 57.68 51.92
        3 38 16 18     68 36 68 37 0.050 0.200 0.044 0.425 0.019 0.003 55.26 67.42 67.90
        4 46 19 27     68 37 66 35 0.050 0.200 0.044 0.363 0.008 0.001 59.97 66.79 66.44
0.50 0.65 0.70 Proposed 1 30 16 18 21 22 76 44 80 47 0.050 0.199 0.051 0.716 0.350 0.470 43.45 61.57 55.94
        2 39 21 23 25 28 81 48 72 43 0.050 0.199 0.043 0.765 0.590 0.740 48.19 54.13 47.42
        3 41 21 24 29 30 67 40 68 40 0.050 0.199 0.042 0.625 0.220 0.410 50.87 61.76 56.86
        4 45 23 26 30 32 68 40 67 40 0.050 0.198 0.041 0.625 0.390 0.640 53.52 58.54 52.95
      LS 1 30 16 17     69 40 79 46 0.049 0.200 0.052 0.708 0.126 0.040 43.21 71.88 76.59
        2 43 23 27     77 46 46 28 0.049 0.197 0.041 0.729 0.079 0.016 51.20 56.84 51.39
        3 53 26 32     66 39 67 40 0.050 0.200 0.042 0.500 0.012 0.001 59.55 66.56 66.90
        4 38 16 22     67 40 66 39 0.050 0.200 0.042 0.209 0.003 0.000 60.82 66.13 66.07

Table 2.

Proposed and LS designs for α=0.10, β1=0.20 and β2=0.10.

          First stage Second stage True       Expected sample size
p0 p1 p2 Design Optimal type n1 s1 r1 c1 c2 m s n r α β1 β2 PET(p0) PET(p1) PET(p2) E(N|p0) E(N|p1) E(N|p2)
0.05 0.20 0.25 Proposed 1 10 0 1 2 3 22 2 23 2 0.092 0.198 0.092 0.610 0.430 0.530 14.75 17.15 15.91
        2 12 0 1 2 3 22 2 20 2 0.082 0.194 0.086 0.560 0.51 0.641 16.20 16.33 15.13
        3 12 0 1 2 3 21 2 21 2 0.080 0.197 0.087 0.560 0.51 0.641 15.96 16.41 15.23
        4 12 0 1 2 3 21 2 21 2 0.080 0.197 0.087 0.560 0.51 0.641 15.96 16.41 15.23
      LS 1 9 0 1     28 2 27 2 0.092 0.199 0.100 0.630 0.134 0.075 14.46 23.38 24.75
        2 12 0 1     24 2 18 2 0.086 0.200 0.093 0.540 0.069 0.032 16.81 18.82 18.57
        3 17 0 1     18 1 20 2 0.091 0.200 0.087 0.418 0.023 0.008 18.00 19.74 19.89
        4 17 0 1     18 1 20 2 0.091 0.200 0.087 0.418 0.023 0.008 18.00 19.74 19.89
0.10 0.25 0.30 Proposed 1 13 1 2 3 4 32 5 34 5 0.097 0.200 0.088 0.656 0.542 0.643 19.74 22.2 20.22
        2 15 1 2 3 5 29 5 32 5 0.098 0.198 0.081 0.605 0.619 0.738 20.92 21.01 19.17
        3 20 1 3 4 7 28 5 27 4 0.099 0.200 0.080 0.435 0.609 0.77 24.43 22.93 21.71
        4 20 1 3 4 7 28 5 27 4 0.099 0.200 0.080 0.435 0.609 0.77 24.43 22.93 21.71
      LS 1 14 1 2     26 4 34 5 0.095 0.199 0.085 0.585 0.101 0.047 20.25 30.54 32.14
        2 17 1 3     32 5 22 4 0.091 0.200 0.084 0.482 0.05 0.019 23.95 24.78 23.73
        3 20 1 3     28 5 27 4 0.099 0.200 0.080 0.392 0.024 0.008 24.73 27.03 27.05
        4 20 1 3     28 5 27 4 0.099 0.200 0.080 0.392 0.024 0.008 24.73 27.03 27.05
0.15 0.30 0.35 Proposed 1 15 2 4 5 6 39 8 43 9 0.099 0.200 0.084 0.621 0.405 0.497 24.27 30.1 27.91
        2 20 3 4 5 8 41 9 40 9 0.100 0.197 0.074 0.715 0.691 0.799 25.88 26.32 24.09
        3 28 3 6 7 11 34 7 34 8 0.100 0.200 0.075 0.426 0.651 0.822 31.45 30.09 29.07
        4 28 3 6 7 11 34 7 34 8 0.100 0.200 0.075 0.426 0.651 0.822 31.45 30.09 29.07
      LS 1 15 2 3     34 7 45 9 0.098 0.199 0.085 0.604 0.127 0.062 24.47 39.32 41.93
        2 22 3 5     41 9 26 6 0.096 0.199 0.077 0.575 0.068 0.025 28.57 29.41 27.98
        3 26 2 5     34 8 33 7 0.100 0.200 0.075 0.230 0.007 0.001 31.98 33.11 33.05
        4 26 2 5     34 8 33 7 0.100 0.200 0.075 0.230 0.007 0.001 31.98 33.11 33.05
0.20 0.35 0.40 Proposed 1 17 3 4 6 7 42 11 48 13 0.100 0.198 0.074 0.587 0.484 0.600 28.56 32.20 28.97
        2 22 4 5 7 10 44 12 43 12 0.099 0.199 0.070 0.599 0.600 0.720 30.61 30.53 27.57
        3 23 4 5 8 10 39 10 40 11 0.100 0.199 0.070 0.528 0.470 0.630 30.83 31.95 29.24
        4 24 4 5 8 10 38 10 40 11 0.099 0.199 0.070 0.496 0.520 0.69 31.67 31.61 28.98
      LS 1 17 3 5     39 10 52 14 0.099 0.200 0.076 0.549 0.103 0.046 28.30 44.28 47.55
        2 25 5 7     53 15 29 8 0.100 0.194 0.068 0.617 0.083 0.029 33.11 34.03 31.86
        3 23 4 5     40 10 40 11 0.098 0.200 0.070 0.501 0.055 0.019 31.49 39.06 39.68
        4 26 4 8     40 11 33 9 0.099 0.200 0.069 0.383 0.024 0.007 34.22 35.53 34.49
0.25 0.40 0.45 Proposed 1 20 5 6 9 10 44 14 55 17 0.098 0.200 0.076 0.631 0.370 0.460 31.06 40.67 37.94
        2 21 5 7 8 10 58 19 42 14 0.099 0.199 0.07 0.623 0.570 0.700 33.78 34.05 29.88
        3 25 5 10 11 12 43 14 43 13 0.099 0.200 0.067 0.389 0.300 0.470 36.00 37.65 34.61
        4 31 7 8 12 15 41 13 43 14 0.100 0.199 0.067 0.502 0.510 0.710 36.66 36.76 34.50
      LS 1 20 5 6     43 13 54 17 0.100 0.199 0.075 0.617 0.126 0.055 31.16 48.36 51.30
        2 28 7 10     49 16 32 11 0.097 0.199 0.068 0.600 0.074 0.024 35.25 37.22 35.12
        3 25 5 10     43 14 39 12 0.100 0.200 0.067 0.378 0.029 0.009 36.07 40.81 40.38
        4 24 4 10     43 14 33 11 0.099 0.200 0.067 0.247 0.013 0.004 38.10 39.25 37.47
0.30 0.45 0.50 Proposed 1 20 6 9 10 11 55 20 59 22 0.100 0.200 0.075 0.625 0.380 0.470 33.24 42.36 39.27
        2 26 8 10 11 14 61 23 46 18 0.100 0.200 0.067 0.688 0.630 0.730 35.96 36.65 32.70
        3 29 8 11 15 15 46 17 47 18 0.100 0.199 0.064 0.483 0.220 0.370 37.92 42.75 40.26
        4 34 10 13 15 18 47 18 46 17 0.100 0.199 0.064 0.581 0.520 0.710 39.36 40.01 37.60
      LS 1 20 6 10     55 20 44 16 0.100 0.199 0.075 0.608 0.130 0.058 33.53 47.71 48.45
        2 32 10 13     53 20 35 14 0.097 0.199 0.066 0.644 0.082 0.025 38.23 40.07 37.87
        3 29 8 11     46 17 47 18 0.099 0.200 0.064 0.479 0.043 0.012 37.99 45.99 46.66
        4 34 10 13     47 18 46 17 0.098 0.200 0.064 0.554 0.047 0.012 39.68 45.66 45.96
0.35 0.50 0.55 Proposed 1 22 8 9 12 13 54 22 65 27 0.100 0.198 0.073 0.665 0.400 0.500 34.83 46.29 42.86
        2 24 8 10 12 14 61 26 48 21 0.099 0.200 0.064 0.568 0.500 0.640 38.15 38.65 33.98
        3 26 8 9 14 15 45 19 49 21 0.100 0.200 0.062 0.426 0.320 0.480 38.56 41.54 37.83
        4 32 11 12 16 19 49 21 49 21 0.100 0.198 0.061 0.578 0.490 0.670 39.17 40.75 37.63
      LS 1 20 7 8     54 22 60 25 0.099 0.2 0.076 0.601 0.132 0.058 34.99 54.02 57.24
        2 30 11 14     58 25 34 15 0.097 0.199 0.063 0.655 0.100 0.033 38.10 41.46 38.61
        3 30 10 15     49 21 39 17 0.100 0.199 0.062 0.508 0.049 0.014 39.05 43.78 42.29
        4 29 9 15     49 21 34 16 0.099 0.200 0.062 0.408 0.031 0.008 40.54 43.05 40.31
0.40 0.55 0.60 Proposed 1 24 10 11 15 15 50 23 62 29 0.100 0.200 0.069 0.658 0.310 0.380 35.36 49.03 46.78
        2 30 12 15 16 19 54 26 51 25 0.100 0.199 0.059 0.627 0.570 0.720 38.82 39.8 36.01
        3 27 10 15 16 17 50 24 49 23 0.100 0.200 0.06 0.472 0.310 0.470 39.13 42.73 38.99
        4 30 11 16 17 19 50 24 47 23 0.100 0.200 0.06 0.452 0.390 0.590 40.87 41.72 37.86
      LS 1 22 9 13     58 27 60 28 0.100 0.200 0.07 0.624 0.133 0.055 35.57 53.77 56.92
        2 33 14 17     56 27 37 18 0.100 0.198 0.06 0.681 0.101 0.031 39.11 42.43 40.20
        3 25 9 14     50 24 50 23 0.100 0.200 0.06 0.425 0.044 0.013 39.38 48.9 49.67
        4 30 11 17     50 24 35 18 0.100 0.200 0.059 0.431 0.033 0.008 41.06 43.94 41.16
0.45 0.60 0.65 Proposed 1 22 10 11 14 15 47 25 66 34 0.099 0.200 0.067 0.628 0.411 0.521 35.51 45.90 41.95
        2 27 12 13 16 18 69 36 52 28 0.100 0.199 0.059 0.603 0.530 0.670 39.39 39.88 35.20
        3 35 15 16 23 24 50 27 49 26 0.100 0.200 0.057 0.473 0.230 0.410 42.52 45.88 43.30
        4 38 17 23 24 26 49 26 50 27 0.100 0.200 0.057 0.562 0.330 0.540 42.83 45.49 43.17
      LS 1 22 10 11     48 25 60 31 0.100 0.200 0.066 0.604 0.121 0.047 35.25 54.13 57.48
        2 32 15 18     62 33 34 19 0.097 0.197 0.06 0.654 0.092 0.027 40.35 42.33 38.67
        3 35 15 16     50 27 49 26 0.100 0.200 0.057 0.469 0.030 0.006 42.57 48.61 48.92
        4 35 15 16     50 27 49 26 0.100 0.200 0.057 0.469 0.030 0.006 42.57 48.61 48.92
0.50 0.65 0.70 Proposed 1 25 13 15 17 18 54 31 58 33 0.100 0.200 0.061 0.677 0.430 0.560 34.75 42.78 39.07
        2 27 14 16 17 20 65 38 51 30 0.100 0.198 0.056 0.710 0.630 0.740 37.12 39.03 34.36
        3 28 14 15 19 20 50 29 50 29 0.100 0.199 0.055 0.593 0.380 0.550 36.96 41.58 37.94
        4 27 12 16 18 20 50 29 49 29 0.100 0.199 0.054 0.377 0.380 0.580 41.24 40.93 36.34
      LS 1 20 10 11     51 29 58 33 0.099 0.199 0.064 0.588 0.122 0.048 34.53 52.56 55.72
        2 30 15 19     53 31 34 20 0.099 0.200 0.052 0.572 0.065 0.017 38.90 41.85 38.73
        3 25 12 13     47 27 50 29 0.099 0.200 0.056 0.500 0.060 0.017 37.04 48.29 49.48
        4 36 19 23     50 29 39 24 0.099 0.199 0.055 0.691 0.088 0.022 39.97 43.34 41.59

Let us assume that the maximum uninteresting response rate ( p0) is 5%, and we are not sure whether the target response rate is 20% or 25%. If the maximum type II errors are considered as 20% and 10% for the target response rates 20% and 25%, then our proposed design for α=0.05 is (0/1/2/3/10) (3/28) (4/38) under the optimality criterion 1 (O1): see Table 1. That is, at the first stage, 10 patients would be recruited. If none of the patients respond, then the design would be stopped for futility. If one patient responds, the design will proceed to the second stage to test the target response rate as 20% at 20% maximum type II error rate and 28−10 = 18 more patients would be recruited. If more than 3 respondents are observed at the second stage, the null hypothesis would be rejected for an alternate response rate 20%. If 2 respondents are observed at the first stage, the design will proceed to the second stage, and 38−10 = 28 more patients would be recruited to test for the target response rate as 25% at 10% maximum type II error rate. If more than 4 respondents are observed, the null hypothesis would be rejected for an alternate response rate 25%. Otherwise, the drug would be identified as ineffective, and the trial would be stopped. However, if three respondents are observed in the first stage, then the design would be stopped for efficacy, and the null hypothesis would be rejected for the target response rate 20%. Finally, if the number of respondents observed at the first stage is more than 3, the design would be stopped, and the null hypothesis would be rejected for the target response rate 25%.

The expected sample sizes are 17.76, 23.29 and 21.26, if the true response rates are 0.05, 0.20 and 0.25, respectively. Note that the expected sample size is much higher if p1 is the true response rate than if p0 or p2 is the true response rate. This trend is found for almost every designs except few exceptions. The reason behind is that the proposed design is more unlikely to proceed to the second stage if the drug is either futile or highly effective. If p2 is true, then we can say that the drug is more effective than it was at p1. The optimality criterion for O2 design is to minimise max {E(N|pi);i=0,1,2} and in every case E(N|p1) is the highest. So we can say that our proposed O2 design have minimum E(N|p1) among all the designs fulfilling the error constraints. Although not presented in the paper, for every set of parameter values, the first ten designs under each optimality criterion are computed. For ϕ =(0.25,0.40,0.45,0.05,0.20,0.10), the O2 design is (9/10/13/16/34) (29/87) (21/61) with 45.58, 46.97 and 41.37 as the expected sample sizes under p0, p1 and p2, respectively. Though not presented here, the second design under same optimality criterion is (9/11/13/16/34) (23/68) (22/64), where the expected sample sizes are 44.1, 47.17 and 41.73. The O2 design has max(m,n)=87 and for the second design, max(m,n)=68. The second design may be more attractive to some investigatorss than the optimal one because of lower max(m,n). Note that the expected sample sizes are very similar in this case.

The O1 designs have some common features that can be observed from Tables 1 and 2. E (N|p0) is the lowest among the designs under four optimality criteria. It is obvious as the optimality criterion for these designs ensures the smallest expected sample size under the null hypothesis. The maximum difference in the expected sample sizes between O2 and O1 is observed for ϕ=(0.25,0.40,0.45,0.05,0.20,0.10), which is 45.58−38.31 = 7.27. This difference is higher for designs at 5% significance level than that of the designs at 10% significance level. For the same values of design parameters but at 10% significance level, the difference is 33.78−31.06 = 2.72. It is because the sample size in the first stage n1 and total sample sizes m and n are higher at the smaller significance level. Also, the probability of early termination if the null hypothesis is true, PET(p0), is the highest, and the sample size in the first stage ( n1) is the lowest for designs under optimality criterion 1. If PET(p0) is high, then the design is less likely to proceed to the second stage when the drug is ineffective. In Equation (9), we have expressed the expected sample size as an weighted sum of n1, m and n. For O1 designs, the E(N|p0) is lowest because w1 or PET(p0) is highest and the expected sample size is dominated by n1. Generally, max(m,n) for O1 designs is higher than the other three designs.

O2 designs generally have larger n1 than those for the O1 designs but smaller than those for O3 and O4 designs. Only two exceptions are observed in Tables 1 and 2. For ϕ=(0.10,0.25,0.30,0.05,0.20,0.10), n1 for O2 design is 17, which is smaller than n1 for O1 design (=18). For this set of parameters, PET(p0) for O1 design is 0.740, which is the highest for the designs we have computed. Another exception is observed for ϕ=(0.40,0.55,0.60,0.10,0.20,0.10), where n1 for O2 is 30, which is larger than n1 for O3 (=27). Moreover, the O2 designs have highest min{PET(pi);i=0,1,2} and smallest max {E(N|pi);i=0,1,2} among the four optimal designs. Between the O3 and O4 designs, generally O3 designs have smaller n1 except the cases where it is the same for both the designs. Tables 1 and 2 show that in many cases the sample size in the first stage are the same under both conditions and in some cases the two designs coincide. For 10% significance level in Table 2, we see that in the first three cases, O3 and O4 designs coincide. At 5% significance level, for the first and third cases, these two designs coincide. Among these two, the O3 designs have larger PET(p0) while O4 have larger min{PET(pi);i=0,1,2}.

3.2. Comparison with Lin and Shih's design

We now compare the proposed design with the Lin and Shih's (LS) design. It is seen that the expected sample sizes are notably smaller for the proposed design if the target response rates ( p1 or p2) are true. The difference is higher if p2 is the true response rate. For ϕ= (0.05, 0.20, 0.25, 0.10, 0.20, 0.10), the LS design under the optimality criterion O1 is (0/2/9) (3/31) (5/43) and E(N|p0), E(N|p1) and E(N|p2) are 17.23, 31.19 and 34.14, respectively. The difference in E(N|p1) for O1 design is 31.19−23.29 = 7.90 while the difference in E(N|p2) is 34.14−21.26 = 12.88. The highest differences between E(N|p1) and E(N|p2) are observed in O1 design for the set ϕ= (0.45, 0.60, 0.65, 0.05, 0.20, 0.10), which are 72.38−60.32 = 12.06 and 76.75−54.06 = 22.69, respectively. The lowest differences are observed for O4 designs for ϕ= (0.40, 0.55, 0.60, 0.05, 0.20, 0.10), which are 59.29−57.96 = 1.33 and 53.98−51.80 = 2.18, respectively.

If p0 is the true response rate, the expected sample size in LS design is generally smaller than that of the proposed design, but the increment is tiny. For ϕ= (0.05, 0.20, 0.25, 0.10, 0.20, 0.10), E(N|p0) is 17.23, which is smaller than the proposed design( =17.76). It happened since the probabilities of early termination under the null hypothesis PET(p0) for LS and proposed designs are high. But PET(p1) and PET(p2) for LS design is very low, which means that these designs have a very high chance of proceeding to the second stage, and therefore results in larger expected sample sizes. In case of the proposed design, PET(p1) and PET(p2) are notably larger than LS design.

Figure 1 shows PET(p) and E(N|p) versus p for ϕ= (0.05, 0.20, 0.25, 0.05, 0.20, 0.10). We see that PET decreases with the increment of the true response rate for LS design. However, for the proposed design, PET starts increasing after reaching a minimum value. Note that the curves for O3 and O4 designs are not shown since those designs' main goal is to minimise the maximum sample size. For LS O1 design, the expected sample size is a monotonic non-decreasing curve, and eventually, it proceeds to max(m,n). For LS O2 design, there is no common trend because of the optimality criterion, but the curve also proceeds to max(m,n). For our proposed design, the expected sample size curve starts to increase as the true response rate increases. However, after a certain point, it starts decreasing and eventually reaches the sample size at the first stage ( n1). As stated earlier, E(N|p0) is smaller for the LS design in this case. The expected sample sizes for LS and proposed designs are very similar if the true response rate is close to p0 under optimality criterion 1. After a little increment in p, the proposed design's expected sample size seems to be much lower than that of the LS design.

Figure 1.

Figure 1.

Probability of early termination and expected sample size against the true response rate for the proposed and LS designs under optimality criterion 1 and 2 if ϕ= (0.05,0.20,0.25,0.05,0.20,0.10).

4. Application on Lin and Shih's VBG study

A study was conducted by Lin and Shih [16] to investigate the efficacy of the combinations of therapies of vinorelbine, bleomycin and gemcitabine (VBG) for treating patients with recurrent or refractory Hodgkin disease. In their study, the maximum uninteresting response rate p0 is considered as 40%, and the target response rate may vary from 50% to 60%. For α=0.05, Lin and Shih considered two target response rates, 55% and 60% at β1=0.20 and β2=0.10. Table 1 shows that E(N|p) for LS design is 43.89 under O1 if the true response rate is 40%. However, if the true response rate is 55% or 60%, then E(N|p) are 74.13 and 78.93, which are significantly higher than the expected sample size when the true response rate is 40%. Under the same setup, the proposed design is (12/14/17/18/28) (40/83) (39/82) under O1. E(N|p) for the proposed design is 44.23 when the true response rate is 40%, which is almost the same for LS design. But when the true response rate is 55% and 60%, E(N|p) are 63.39 and 57.92 respectively, which are notably lower than that for LS design. This is true for the other three optimality criteria: see Table 1.

5. Comparison with Mander and Thompson's design

The main difference between these two designs lies in the number of target response rates being considered. In the proposed design, against one maximum uninteresting response rate p0, we consider two target response rates p1 and p2, where Mander and Thompson's design [17] consider only one p1. If p1=p2 and β1=β2 then s1=r1, c1=c2 and m = n, and the proposed design becomes Mander and Thompson's design. Continuing the VBG example of Lin and Shih's study presented in Section 4, against one maximum uninteresting response rate ( 40%), it is not possible to test two different target response rates ( 55% and 60%) by using Mander and Thompson's design at the same time. One possible solution may be conducting two separate tests as 40% vs. 55% and 40% vs. 60%. Appropriate designs for these separate tests are given in Table 3. These designs are computed by grid searching over different combinations of n, n1, r, r1 and r2 using the self-written code in R.

Table 3.

Mander and Thompson's design for p0=0.40 and p1=0.55 or 0.60 at β=0.20 and 0.10.

      First stage Second stage     True Expected sample size}  
p0 p1 β n1 r1 r2 n r PET(p0) PET(p1) α β E(N|p0) E(N|p1) Comment
0.40 0.55 0.20 26 11 17 84 40 0.676 0.237 0.050 0.194 44.78 70.23 H0optimalE
      41 16 23 69 34 0.530 0.414 0.050 0.199 54.17 57.41 H0minimaxE
      44 19 23 80 40 0.759 0.663 0.049 0.200 52.69 56.12 H1optimalE
      41 16 23 69 34 0.530 0.414 0.050 0.199 54.17 57.41 H1minimaxE
0.40 0.60 0.10 25 11 17 66 32 0.733 0.231 0.049 0.098 35.93 56.51 H0optimalE
      29 12 19 54 27 0.639 0.248 0.049 0.099 38.03 47.81 H0minimaxE
      27 10 15 62 32 0.492 0.626 0.048 0.099 44.77 40.09 H1optimalE
      36 16 21 54 27 0.772 0.561 0.050 0.098 40.10 43.91 H1minimaxE

For testing 40% vs. 55%, H0optimalE will be (11 17)/26 40/84. That means at the first stage, 26 patients will be recruited. If the 11 or fewer patients respond, the study will be stopped due to futility, and if the number is more than 17, we will stop the study because of efficacy and reject the null hypothesis for 55% target response rate. However, suppose the number of responses is more than 11 and less or equal to 17 patients. In that case, we will proceed to the second stage and recruit 58 additional patients and reject the null hypothesis only if the total number of responses is more than 40. Similarly, for testing 40% vs. 60%, H0optimalE design will be (11 17)/25 32/66. Although it is possible to stop the study early for both futility and efficacy in Mander and Thompson's design, it is impossible to mitigate the uncertainty that arises while selecting the target response rate.

6. Discussion

The phase II clinical trial aims to determine whether a drug is effective and screens out the ineffective drugs. Phase II is an early phase of a clinical trial, and recruiting fewer patients is desirable. The adaptive phase II design by Lin and Shih [16] only considers the futility to stop early and has a large expected sample size if the proposed drug is effective. In this paper, we have discussed why efficacy should also be considered as a reason for early termination. A design has been proposed for a single-arm phase II clinical trial that, along with futility, also considers efficacy to stop early. The proposed design can achieve a notable reduction in the expected sample size if the drug is effective without affecting the sample size when the drug is ineffective.

One of the difficulties is that the proposed design takes much time to be calculated. Designs at 10% significance level take less time than that of the designs at 5% significance level. This is because the sample sizes in both stages are notably larger for 5% significance level. It will take even more time if we consider designs at 1% significance level. The other difficulty is calculating the values of c2. As discussed at the beginning, Kim and Wong [12] proposed an adaptive phase II clinical trial design that allows three target response rates against one null response rate. However, they did not allow early stopping for efficacy, rather considered futility as the only reason. The authors used the Particle Swarm Optimisation (PSO) technique introduced by Kennedy and Eberhart [11] to find the solutions of parameters for their optimal design. One possible extension of the proposed design could be the usage of PSO to find the solutions. The design can also be extended for three or more target response rates and their associated type II error rates against one maximum uninteresting response rate. Finally, the paper's findings should encourage stopping early for both futility and efficacy in two-stage adaptive design for phase II trials.

Acknowledgments

The authors would like to thank the reviewers for their useful suggestions to improve the paper. The first author also would like to thank the Ministry of Science and Technology, Government of Bangladesh, for providing him the National Science and Technology Fellowship during this work.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Berry S.M., Carlin B.P., Lee J.J., and Muller P., Bayesian Adaptive Methods for Clinical Trials, CRC Press, New York, 2010. [Google Scholar]
  • 2.Chen T.T., Optimal three-stage designs for phase II cancer clinical trials, Stat. Med. 16 (1998), pp. 2701–2711. [DOI] [PubMed] [Google Scholar]
  • 3.Chen K. and Shan M., Optimal and minimax three-stage designs for phase II oncology clinical trials, Contemp. Clin. Trials 29 (2008), pp. 32–41. [DOI] [PubMed] [Google Scholar]
  • 4.Englert S. and Kieser M., Improving the flexibility and efficiency of phase II designs for oncology trials, Biometrics 68 (2011), pp. 886–892. [DOI] [PubMed] [Google Scholar]
  • 5.Englert S. and Kieser M., Adaptive designs for single-arm phase II trials in oncology, Pharm. Stat. 11 (2012), pp. 241–249. [DOI] [PubMed] [Google Scholar]
  • 6.Fleming T.R., One-sample multiple testing procedure for phase II clinical trials, Biometrics 38 (1982), pp. 143–151. [PubMed] [Google Scholar]
  • 7.Gehan E.A., The determination of the number of patients required in a preliminary and a follow-up trial of a new chemotherapeutic agent, J. Chronic Dis. 13 (1961), pp. 346–353. [DOI] [PubMed] [Google Scholar]
  • 8.Jin H. and Yin G., Bayesian enhancement two-stage design with error control for phase II clinical trials, Stat. Med. 39 (2020), pp. 4452–4465. [DOI] [PubMed] [Google Scholar]
  • 9.Jung S.-H., Randomized phase II trials with a prospective control, Stat. Med. 27 (2008), pp. 568–583. [DOI] [PubMed] [Google Scholar]
  • 10.Jung S.-H., Lee T., Kim K., and George S.L., Admissible two-stage designs for phase II cancer clinical trials, Stat. Med. 23 (2004), pp. 561–569. [DOI] [PubMed] [Google Scholar]
  • 11.Kennedy J. and Eberhart R., Particle swarm optimization, Proc. Int. Conf. Neural Networks 4 (1995), pp. 1942–1948. [Google Scholar]
  • 12.Kim S. and Wong W.K., Extended two-stage adaptive design with three target responses for phase II clinical trial, Stat. Methods Med. Res. 27 (2017), pp. 3628–3642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lai T.L., Lavori P.W., and Shih M.-C., Sequential design of phase II-III cancer trials, Stat. Med. 31 (2012), pp. 1944–1960. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lee J.J. and Feng L., Randomized phase II designs in cancer clinical trials: Current status and future directions, J. Clin. Oncol. 23 (2005), pp. 4450–4457. [DOI] [PubMed] [Google Scholar]
  • 15.Lee J.J. and Liu D.D., A predictive probability design for phase II cancer clinical trials, Clin. Trials 5 (2008), pp. 93–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lin Y. and Shih W.J., Adaptive two-stage designs for single-arm phase IIA cancer clinical trials, Biometrics 60 (2004), pp. 482–490. [DOI] [PubMed] [Google Scholar]
  • 17.Mander A.P. and Thompson S.G., Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials, Contemp. Clin. Trials 31 (2010), pp. 572–578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mander A.P., Wason J.M.S., Sweeting M.J., and Thompson S.G., Admissible two-stage designs for phase II cancer clinical trials that incorporate the expected sample size under the alternative hypothesis, Pharm. Stat. 11 (2012), pp. 91–96. [DOI] [PubMed] [Google Scholar]
  • 19.O'Brien P.C. and Fleming T.R., A multiple testing procedure for clinical trials, Biometrics 35 (1979), pp. 549–556. [PubMed] [Google Scholar]
  • 20.Sambucini V., A Bayesian predictive strategy for an adaptive two-stage design in phase II clinical trials, Stat. Med. 29 (2010), pp. 1430–1442. [DOI] [PubMed] [Google Scholar]
  • 21.Sambucini V., Bayesian predictive monitoring with bivariate binary outcomes in phase II clinical trials, Comput. Stat. Data Anal. 132 (2019), pp. 18–30. [Google Scholar]
  • 22.Shan G., Wilding G.E., Hutson A.D., and Gerstenberger S., Optimal adaptive two-stage designs for early phase II clinical trials, Stat. Med. 35 (2015), pp. 1257–1266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shan G., Zhang H., and Jiang T., Adaptive two-stage optimal designs for phase II clinical studies that allow early futility stopping, Seq. Anal. 38 (2019), pp. 199–213. [Google Scholar]
  • 24.Shi H. and Yin G., Bayesian two-stage design for phase II clinical trials with switching hypothesis tests, Bayesian Anal. 12 (2017), pp. 31–51. [Google Scholar]
  • 25.Shi H. and Yin G., Two-stage seamless transition design from open-label single-arm to randomized double-arm clinical trials, Stat. Methods Med. Res. 27 (2018), pp. 158–171. [DOI] [PubMed] [Google Scholar]
  • 26.Simon R., Optimal two-stage designs for phase II clinical trials, Control. Clin. Trials 10 (1989), pp. 1–10. [DOI] [PubMed] [Google Scholar]
  • 27.Ye F. and Shyr Y., Balanced two-stage designs for phase II clinical trials, Clin. Trials 4 (2007), pp. 514–524. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES