An adaptive trial should be like a driverless car – boring and predictable

On the BBC world service in 2013 there was an account by two technology journalists of going for a ride in the Google “Driverless Car”. They recounted their emotional journey as the Google engineer in the middle of a Californian freeway, in the middle of the day, leant forward and turned on the “driverless” system, then sat back arms folded. The journalists said they went through 3 stages of emotion:

  1. Firstly complete panic, were they going to crash?
  2.  Secondly, after a few seconds where everything had gone smoothly, intense curiosity as they watched how it changed lanes, dropped back when cars cut in, and took exits.
  3.  Finally boredom, as everything continued to go smoothly and they realised how cautiously it drove.

That is the rule rather than the exception with automated control: safety first, no sudden adjustments, keep safe distances, and anticipate required course corrections.

Dose response adaptive trials are designed along similar lines:

  • Only make adjustments warranted by the data,
  • Make frequent small course adjustments rather than a few abrupt “handbrake turns”,
  • If things break down, have a reasonable default behaviour.

Watching a dose-response adaptive trial run, whilst fascinating because of what is at stake, is not a thrill a minute ride. Rather we see data accumulate, cautious adjustments made following the pre-defined algorithm, and in the end a sense of inevitability around the final outcome.

For example, here is an example from a simulation of a phase 2 dose ranging study in neuropathic pain. This is a plot showing the raw data and fitted treatment response at the 4th interim, 20 weeks into the study, with 125 subjects recruited, 73 of whom have completed at this point.

Image 1

 

Notice how the response fitting has smoothed out the strong responses seen on the 4th and 6th treatment arms. Fitting a dose response model is a far more efficient method of analysis than pair-wise comparison where the response on each dose is compared in isolation against the control. Yet shockingly that is still the standard method of analysis. Use of this form of analysis contributes to the prevalent error in drug development of studying too few doses and too small a dose range in phase 2[1]. One of the problems of studying more than 3 different dose strengths is the statistical problem of “multiplicity” – the more things tested in an experiment the greater the opportunity for error and to counter this the analysis has to be more conservative. The conventional study to study 3 doses in this setting would need to test 360 patients, and to study 7 doses would need to test 870 patients.

This design, fitting a dose response curve as part of the analysis, whilst testing no more than 360 patients, studies 7 doses and has the same type-1 error and better power than the traditional fixed design. It also has a high probability of stopping early – testing less than 360 patients and delivering its conclusion sooner.

But I digress. The question is where would we want the next 15-20 subjects allocated over the next 2 weeks (until the next interim) given we are trying to find the dose with the maximum response? Well, a fixed proportion (25%) to the control (arm 0) and perhaps a similar amount to arm 6 which seems to have the best effect, and then the remaining allocation over the nearby arms 4, 5, & 7 that are also doing well, but no need to allocate to arms 1, 2 or 3 where the response is little better than the control arm. Seems reasonable?

That’s exactly what our adaptive design does, using the rule of allocating to each of the study arms in proportion to the probability that the arm has the maximum response, scaled by how much information another subject on that arm will provide.

This plot shows how many subjects had been allocated to each arm so far, how many had completed, and what proportion of the incoming subjects should be randomised to that arm (Pr(allocation)):

Image 2

What though if Arm 6 was a mistake? What if the so far relatively small sample of patients on that dose had had randomly high responses, or on one of the other arms the responses had been randomly low? Well in this simulation that’s exactly the case, and somewhat like a long horse race, Arm 5 though lagging until the 7th interim (week 26) slowly overhauled Arm 6.

 

At the 10th interim (week 32, 247 subjects recruited, 176 complete) the data looks like this:

Image 3

 

Now we have a stand-out candidate for the best dose: Arm 5’s probability that it has the maximum response is over 55%, with no other dose having a probability above 20%. Perhaps a bigger margin than one would expect from looking at the graph. We have also met our criteria for stopping early for success: there have been least 30 patients tested with our selected dose and the probability that the response of patients on the selected dose is greater than those on control Arm is over 98.5%.

In summary the trial has convincingly shown a dose response, has a strong candidate for ‘best dose’, has stopped 12 weeks early and has data on over 100 subjects at the ‘best dose’ or one adjacent to it. All by being boring and cautious, just like the driverless car.

 

1 Neal Thomas , Kevin Sweeney , Veena Somayaji, “Meta-Analysis of Clinical Dose–Response in a Large Drug Development Portfolio”, Statistics in Biopharmaceutical Research, Vol. 6, Iss. 4, 2014

 

Tom Parke

Tom Parke

Tom Parke has been working at Tessella for over ten years. For a large part of that time he has ...

© Copyright 2017 Tessella
All rights reserved