GIST Support Wiki

Marina's Tutorial on Bayesian Statistics and Clinical Trials

Subject: Bayesian Statistics and Clinical trials... I don't understand statistics. Bayesian statistics is mentioned frequently as an alternative approach in editorial articles criticizing the current phase III randomized trials/placebo situations.

MARINA'S lesson about picking up socks is below...

DA Shows Interest in 18th Century Presbyterian Minister Bayesian statistics may help improve drug development

Derek Lowe Medical Progress Today January 19, 2006 Not many ideas of 18th-century Presbyterian ministers attract the interest of the pharmaceutical industry. But the works of Rev. Thomas Bayes have improved greatly with age. The paper that made his name was published in 1763 (two years after his death), where he proposed a method to decide the likelihood of an event while taking into account one's prior knowledge of what might occur. This idea bounced around through the mathematical literature for the next century or two, but it fell out of favor in the 1930s with the advent of the statistical methods that have been used ever since. For decades, no one heard very much about Bayesian statistics at all. One reason for this was they're much more computationally demanding, which was a real handicap until fairly recently.

But things are changing. The pharmaceutical and medical device companies have been especially interested, since the Bayesian approach could - if you believe its cheering section - make clinical trials easier to run and more meaningful at the same time. The FDA is also exploring Bayesian statistics as part of its Critical Path toolkit. Others see the whole idea as an invitation to self-deception. They might both be right, actually. Bayesian and standard "frequentist" statistics are in many ways mirror images of each other, and there are mistakes to be made each way.

The frequentist approach, familiar to anyone who follows the news of clinical trials, measures the likelihood of an observed result having occurred by chance. That "just by accident" possibility is the null hypothesis, and is usually realized in clinical studies by giving a placebo to some of the trial participants for comparison. Results are expressed as a "P value", with (for example) a P of 0.01 meaning that if the trial were repeated over and over, only one per cent of those studies would show an equivalent result (or better) for the placebo as compared to the drug treatment.

Bayesian statistics, though, don't address the likelihood that your observed results might have come out by random chance, but rather give you a likelihood of whether your initial hypothesis is true. (Ironically, that's what many lay people think that's what the standard approach does). That likelihood is compared to some initial hypothesis, which doesn't have to be the just-by-accident null one. In fact, you can start with more than one hypothesis and compare things as you go along. One consequence of that setup is that Bayesian trial designs allow you to use the data that comes in to modify the trial while it's still going on. That's basically forbidden under the standard statistical approach, where the design and end points of the study have to be decided up front.

Here's an illustration: Imagine a drawer full of socks, some white and some black. If you think you have a technique for picking out black socks without looking, you could test that by first closing your eyes and taking out ten socks in a row totally at random. That would give you your null-hypothesis data set. Then you would mix the socks up again, close your eyes, pick ten according to your best black-sock-selecting technique, then open your eyes and see how well you did.

That's a frequentist approach to the problem, run just like a basic clinical trial would be. Each difference you might find between your selected group and the random group would have a different P value (the bigger the difference, the lower the value, naturally). If you thought you were on to something, you could run a similar trial again picking a larger number of socks, because a larger sample size will tend to give you better P values all by itself. (That effect is one of the objections that the Bayes camp has to frequentist statistics in general, by the way).

But a Bayesian trial of your sock-picking method would work quite differently. You could start just as before with a random set, and then start trying to pick black socks. This time, though, you'd open your eyes after each pick to see how you did, and each black sock would be added to the calculation of how likely it is that you've got a black-sock-picking method worked out. Depending on what your random sample looked like (that is, depending on what the ratio of black and white socks was in the drawer itself), you might be able to say that you've proved your hypothesis after pulling out five or six black socks in a row.

Another way to run the Bayesian sock-picking test would be to skip the random drawing, and go right to picking socks. You could calculate your results against (for example) three "priors": one where you assume that the drawer was a 50-50 mix, and two where you assume that the drawer was either largely white or largely black. Then if you started pulling mostly black socks out, you'd quickly be able calculate a strong probability against the mostly-white-sock prior. It would take longer (or require a stronger run of black socks) to reach that level of confidence against the neutral prior, and longer still to reach it against the mostly-black-sock one. And if you had two sock-picking methods that you were trying to compare against each other, you could actually change course during the evaluation and do more runs of the one that looked like it was performing better, in order to get better statistics on it.

Medical device makers have been especially quick to take Rev. Bayes up on his suggestions, because their clinical trials have several features that make them a good fit. For one thing, it can be impossible to run a placebo group (sham surgeries are often done in animal studies, but it's a hard sell in human trials). It can also be hard to get the sample sizes that would help to ensure reasonable P values under a standard statistical regime. And (importantly) experience with previous similar devices can provide a meaningful prior hypothesis to test against. About 10% of the trials the FDA evaluates in this area are run under Bayesian protocols, up from essentially zero before the 1990s.

There are downsides to the ways of Bayes. One major problem is the choice of a prior hypothesis. If you choose an unrealistic one, you can prove "significance" versus something that doesn't really matter. (My industry definitely does not need new technologies with which to fool ourselves!) A sociopolitical problem is the shortage of people trained in Bayesian methods - there are whole companies in the drug industry that have no one who's truly up to speed. Then there's the fact that some traditional statisticians are quite suspicious of the whole idea, and sometimes openly hostile. Finally, there are operational difficulties, such as a lack of good off-the-shelf software.

As far as I know, no pharma company has yet taken a fully Bayesian clinical package to the FDA for a drug approval. There have been a few dose-finding trials in the cancer area, and Pfizer's research arm in England ran a large trial of a novel stroke therapy under Bayesian protocols. The drug turned out not to be efficacious, but Pfizer claims that they were able to determine this more quickly and with fewer patients than would have been the case otherwise. Interestingly, they reported that one of the problems with the trial was that the clinical research centers were too enthusiastic: they recruited patients too quickly, impairing the ability to learn what was happening while the trial was going on.

Someone's going to take the chance eventually, though, because the potential benefits are too attractive to pass up. The most likely candidate is a smaller company (one with fewer entrenched statisticians of the other school), evaluating a drug that has a small expected patient population and a fairly quick clinical endpoint. That'll allow for reasonably fast changes in the trial direction if needed. If the drug is an improvement on an existing therapy, so much the better, since that will help in generating the prior hypothesis. It's going to be a real learning experience for the industry and for the FDA, but it should be worth it.