When to use Bayesian vs. frequentist — a practical guide for clinical researchers
The choice isn't philosophical. It depends on your prior information, regulatory context, and what you're trying to communicate to a decision-maker.
Micah Thornton, MS — Thornton Statistical Consulting
The real question
Most introductions to this debate start with probability. Frequentists say probability is the long-run frequency of events in repeated trials. Bayesians say it's a degree of belief, updated as evidence accumulates. Both framings are coherent. Neither tells you which one to use on your next study.
The practical question is different: given your specific situation — your data, your priors, your audience, and your regulatory environment — which framework gives you answers that are honest, defensible, and useful?
The answer is almost never "always one or the other." It depends on four things: what you know before the study starts, how you plan to use the results, who you need to convince, and how much flexibility you need during the study.
What frequentist inference actually says
The frequentist framework asks: if the null hypothesis were true and we repeated this experiment many times, how often would we see a result at least this extreme? That frequency is the p-value.
A 95% confidence interval means: if we constructed this interval using the same procedure on many repeated samples, 95% of the intervals would contain the true parameter. It does not mean there is a 95% probability that the true value lies in this particular interval.
Frequentist methods work well when: you have no credible prior information to incorporate, your goal is to control error rates in a repeated-testing framework (Type I and Type II errors), your results will be used for regulatory submission, or you need a procedure that requires no judgment calls that reviewers could contest.
What Bayesian inference actually says
The Bayesian framework asks: given the data I observed and what I knew before, what should I now believe about this parameter? It produces a posterior distribution — a full probability distribution over plausible parameter values — from which you can extract credible intervals, probabilities of treatment benefit, and expected utility.
A 95% credible interval means exactly what most people mistakenly think a confidence interval means: there is a 95% posterior probability that the parameter lies in this range, given the data and the prior.
Bayesian methods work well when: you have genuine prior information from previous trials or historical controls, you want probability statements about the treatment effect rather than about hypothetical repetitions, you need to make a decision under uncertainty rather than test a hypothesis, or your study design is adaptive and requires sequential analysis.
A direct comparison
| Frequentist | Bayesian | |
|---|---|---|
| Core output | p-value, confidence interval, test statistic | Posterior distribution, credible interval, posterior probability |
| Probability means | Long-run frequency across repeated experiments | Degree of belief, updated by evidence |
| Prior information | Not incorporated (all information comes from the current data) | Explicitly incorporated via the prior distribution |
| Sequential looks | Requires adjustment (alpha spending) to maintain error rates | Naturally handles interim analyses without penalty |
| Regulatory acceptance | Standard; FDA/EMA guidance is built around it | Accepted for devices, adaptive designs, and secondary analyses; less common for pivotal drug trials |
| Communication | P=0.03; CI [0.1, 0.9] — interpreted by audience as it will | 91% probability the treatment reduces the primary endpoint by ≥15% |
| Main vulnerability | Does not answer the question most people want answered | Prior choice is contestable and must be pre-specified |
Where prior information changes everything
Suppose you are designing a Phase II oncology trial. You have results from two completed Phase I studies showing a dose-response relationship and preliminary efficacy signals. You also have a mechanistic model suggesting the drug should reduce tumor volume by roughly 20–35% at the proposed dose.
In a frequentist analysis, that information is background context — it informs your power calculation and your choice of effect size, but it does not enter the inference. The p-value treats your study as if it were conducted in a vacuum.
In a Bayesian analysis, you can encode that prior knowledge as a skeptical or enthusiastic prior on the treatment effect. A skeptical prior — centered near zero with modest variance — means the data need to work harder to move the posterior toward a meaningful effect. An enthusiastic prior — centered on your Phase I estimates — means a smaller Phase II can be informative.
The regulatory reality
For pivotal drug trials submitted to the FDA or EMA, the frequentist framework is de facto standard. Both agencies have extensive guidance built around Type I error control, pre-specified hypotheses, and confirmatory evidence. This isn't because frequentist inference is philosophically superior — it's because the framework is auditable, reproducible, and hard to game after the fact.
The FDA has approved Bayesian adaptive designs for medical devices since 2010 (FDA Guidance on Bayesian Statistics in Medical Device Clinical Trials, 2010) and has issued draft guidance for adaptive designs in drug trials that accommodate Bayesian elements. Bayesian methods are increasingly common for:
- –Platform trials and master protocols
- –Borrowing strength from historical controls
- –Adaptive randomization and dose-finding
- –Pediatric extrapolation from adult data
- –Subgroup analyses where prior evidence exists
For non-regulatory work — internal decision-making, go/no-go decisions at Phase II, budget allocation, academic publications — you have much more flexibility and Bayesian methods often communicate better.
Sequential analysis: where frequentism gets expensive
One of the most practically important differences involves interim analyses. In a frequentist framework, every time you look at accumulating data and consider stopping the trial, you spend Type I error. If you plan five interim looks and a final analysis using the standard α=0.05 threshold each time, your overall Type I error rate inflates to roughly 14%. The solution — alpha spending functions like O'Brien-Fleming or Pocock boundaries — works, but it requires careful pre-specification and costs power.
Bayesian analysis doesn't have this problem. You can look at the posterior as often as you want. The posterior after 100 patients is simply updated when patient 101 arrives. There's no multiple-testing correction because you're not computing p-values — you're updating a distribution.
Note that this freedom comes with a different responsibility: your decision rules (e.g., "stop for futility if P(effect > threshold | data) < 0.10") must be pre-specified and their operating characteristics — how often they trigger under the null and under various alternatives — must be validated by simulation.
Decision analysis: beyond hypothesis testing
Both frameworks can test hypotheses, but only one naturally supports decision analysis. If you want to answer "should we proceed to Phase III?" or "which dose should we carry forward?" you need expected utility — and expected utility requires a posterior distribution.
A frequentist analysis can tell you p=0.04. It cannot tell you how likely the drug is to be effective, nor how that likelihood should be weighted against the cost and risk of a larger trial. Those calculations require a probability over the parameter — which is the posterior.
For sponsors making go/no-go decisions under resource constraints, a well-calibrated posterior probability of success is often more useful than a p-value that was significant by a margin the team can't easily interpret.
A worked example: same data, two frameworks
You run a randomized pilot study comparing a new analgesic to placebo. 40 patients per arm. The pain reduction (0–10 VAS) in the treatment arm is 2.8 points (SD 2.1); in placebo it is 1.9 points (SD 2.0). Difference: 0.9 points (95% CI −0.04 to 1.84), p=0.059.
Frequentist read:
The null hypothesis of no difference is not rejected at α=0.05. The confidence interval includes zero. The study is "not statistically significant." Most team members hear "didn't work."
Bayesian read (weakly informative prior):
Posterior mean treatment effect: 0.87 points. 95% credible interval: 0.01 to 1.73. Posterior probability of any benefit: 97%. Posterior probability of a clinically meaningful benefit (≥1.5 points): 22%.
Common mistakes
Mistake 1: Treating p=0.05 as a probability.
"There is a 95% chance the drug works" is never a valid frequentist conclusion. If you need that sentence, use Bayesian methods and earn it with a prior and a posterior.
Mistake 2: Using a flat prior and calling it "objective."
An uninformative prior is a prior. A uniform distribution on [0, ∞) implies you think a treatment effect of 1,000 units is just as plausible as an effect of 1 unit. That's rarely true. A weakly informative prior — wide but centered on realistic values — is almost always more honest than a flat one.
Mistake 3: Choosing the framework after seeing the data.
Switching from frequentist to Bayesian because p=0.06 and you want to show 93% posterior probability of benefit is data dredging with extra steps. Framework choice must be pre-specified and justified on scientific grounds, not chosen to produce a more favorable-looking result.
Mistake 4: Ignoring operating characteristics.
Adaptive Bayesian designs sound flexible, but their actual Type I error rates and power under realistic assumptions must be validated by simulation. A "Bayesian" design that hasn't been simulated is a design that hasn't been tested.
Practical decision guide
Run through these questions before your next protocol:
- 1.Do you have credible prior data? If yes, consider Bayesian. If no — or if the prior would be contested — frequentist keeps the focus on the current study.
- 2.Is this a regulatory submission? If yes, default to frequentist unless you have a strong scientific case and early FDA/EMA engagement for a Bayesian adaptive design.
- 3.Do you need interim looks without paying an alpha penalty? Bayesian adaptive designs handle this cleanly. Frequentist approaches work too but require pre-specified alpha spending.
- 4.Do you need to communicate a probability of benefit? Only Bayesian inference gives you that directly and honestly. Frequentist p-values will be misread as probabilities regardless of what you write in the methods section.
- 5.Are you making a go/no-go decision? Bayesian expected utility or posterior probability of success is often more useful than a p-value for internal decisions.
Bottom line
Frequentist methods are the right default when you need reproducible error-rate control, regulatory acceptance, and no controversial prior assumptions. They are the language regulators speak and the language most reviewers expect.
Bayesian methods are the right choice when you have prior information worth using, need probability statements about parameters, are running an adaptive design, or are making a decision rather than testing a hypothesis. The price is a prior that must be defensible and pre-specified.
If you're uncertain which framework is right for your next study, that's a good reason to get a statistician involved before the protocol is written — not after the data are locked.
Need help choosing the right framework for your study?
I work with clinical researchers and sponsors on statistical design, analysis plans, and regulatory submissions.