MethodsMay 2026 · 12 min read

When to use Bayesian vs. frequentist — a practical guide for clinical researchers

The choice isn't philosophical. It depends on your prior information, regulatory context, and what you're trying to communicate to a decision-maker.

Micah Thornton, MS — Thornton Statistical Consulting

The real question

Most introductions to this debate start with probability. Frequentists say probability is the long-run frequency of events in repeated trials. Bayesians say it's a degree of belief, updated as evidence accumulates. Both framings are coherent. Neither tells you which one to use on your next study.

The practical question is different: given your specific situation — your data, your priors, your audience, and your regulatory environment — which framework gives you answers that are honest, defensible, and useful?

The answer is almost never "always one or the other." It depends on four things: what you know before the study starts, how you plan to use the results, who you need to convince, and how much flexibility you need during the study.

What frequentist inference actually says

The frequentist framework asks: if the null hypothesis were true and we repeated this experiment many times, how often would we see a result at least this extreme? That frequency is the p-value.

A 95% confidence interval means: if we constructed this interval using the same procedure on many repeated samples, 95% of the intervals would contain the true parameter. It does not mean there is a 95% probability that the true value lies in this particular interval.

The distinction matters. Frequentist procedures produce statements about the long-run behavior of a method, not probability statements about the parameter in front of you. This is often exactly what regulators want — and often not what a clinician asking "what should I do for this patient?" needs.

Frequentist methods work well when: you have no credible prior information to incorporate, your goal is to control error rates in a repeated-testing framework (Type I and Type II errors), your results will be used for regulatory submission, or you need a procedure that requires no judgment calls that reviewers could contest.

What Bayesian inference actually says

The Bayesian framework asks: given the data I observed and what I knew before, what should I now believe about this parameter? It produces a posterior distribution — a full probability distribution over plausible parameter values — from which you can extract credible intervals, probabilities of treatment benefit, and expected utility.

A 95% credible interval means exactly what most people mistakenly think a confidence interval means: there is a 95% posterior probability that the parameter lies in this range, given the data and the prior.

The catch is the prior. Every Bayesian analysis requires you to specify what you believed before you saw the data. If that prior is defensible — grounded in previous trials, biological reasoning, or expert consensus — it's a feature. If it's chosen to make the results look a certain way, it's a liability.

Bayesian methods work well when: you have genuine prior information from previous trials or historical controls, you want probability statements about the treatment effect rather than about hypothetical repetitions, you need to make a decision under uncertainty rather than test a hypothesis, or your study design is adaptive and requires sequential analysis.

A direct comparison

	Frequentist	Bayesian
Core output	p-value, confidence interval, test statistic	Posterior distribution, credible interval, posterior probability
Probability means	Long-run frequency across repeated experiments	Degree of belief, updated by evidence
Prior information	Not incorporated (all information comes from the current data)	Explicitly incorporated via the prior distribution
Sequential looks	Requires adjustment (alpha spending) to maintain error rates	Naturally handles interim analyses without penalty
Regulatory acceptance	Standard; FDA/EMA guidance is built around it	Accepted for devices, adaptive designs, and secondary analyses; less common for pivotal drug trials
Communication	P=0.03; CI [0.1, 0.9] — interpreted by audience as it will	91% probability the treatment reduces the primary endpoint by ≥15%
Main vulnerability	Does not answer the question most people want answered	Prior choice is contestable and must be pre-specified

Where prior information changes everything

Suppose you are designing a Phase II oncology trial. You have results from two completed Phase I studies showing a dose-response relationship and preliminary efficacy signals. You also have a mechanistic model suggesting the drug should reduce tumor volume by roughly 20–35% at the proposed dose.

In a frequentist analysis, that information is background context — it informs your power calculation and your choice of effect size, but it does not enter the inference. The p-value treats your study as if it were conducted in a vacuum.

In a Bayesian analysis, you can encode that prior knowledge as a skeptical or enthusiastic prior on the treatment effect. A skeptical prior — centered near zero with modest variance — means the data need to work harder to move the posterior toward a meaningful effect. An enthusiastic prior — centered on your Phase I estimates — means a smaller Phase II can be informative.

The right move is usually to pre-specify multiple priors — a non-informative one (which gives you something close to a frequentist answer), a skeptical one, and an informative one based on the prior data — and show the posterior under each. This demonstrates robustness and is far more honest than pretending the prior data don't exist.

The regulatory reality

For pivotal drug trials submitted to the FDA or EMA, the frequentist framework is de facto standard. Both agencies have extensive guidance built around Type I error control, pre-specified hypotheses, and confirmatory evidence. This isn't because frequentist inference is philosophically superior — it's because the framework is auditable, reproducible, and hard to game after the fact.

The FDA has approved Bayesian adaptive designs for medical devices since 2010 (FDA Guidance on Bayesian Statistics in Medical Device Clinical Trials, 2010) and has issued draft guidance for adaptive designs in drug trials that accommodate Bayesian elements. Bayesian methods are increasingly common for:

–Platform trials and master protocols
–Borrowing strength from historical controls
–Adaptive randomization and dose-finding
–Pediatric extrapolation from adult data
–Subgroup analyses where prior evidence exists

For non-regulatory work — internal decision-making, go/no-go decisions at Phase II, budget allocation, academic publications — you have much more flexibility and Bayesian methods often communicate better.

Sequential analysis: where frequentism gets expensive

One of the most practically important differences involves interim analyses. In a frequentist framework, every time you look at accumulating data and consider stopping the trial, you spend Type I error. If you plan five interim looks and a final analysis using the standard α=0.05 threshold each time, your overall Type I error rate inflates to roughly 14%. The solution — alpha spending functions like O'Brien-Fleming or Pocock boundaries — works, but it requires careful pre-specification and costs power.

Bayesian analysis doesn't have this problem. You can look at the posterior as often as you want. The posterior after 100 patients is simply updated when patient 101 arrives. There's no multiple-testing correction because you're not computing p-values — you're updating a distribution.

This is one of the strongest practical arguments for Bayesian adaptive designs in early-phase research. The freedom to learn continuously without paying an error-rate tax is genuinely useful when you don't know in advance how long the study needs to run.

Note that this freedom comes with a different responsibility: your decision rules (e.g., "stop for futility if P(effect > threshold | data) < 0.10") must be pre-specified and their operating characteristics — how often they trigger under the null and under various alternatives — must be validated by simulation.

Decision analysis: beyond hypothesis testing

Both frameworks can test hypotheses, but only one naturally supports decision analysis. If you want to answer "should we proceed to Phase III?" or "which dose should we carry forward?" you need expected utility — and expected utility requires a posterior distribution.

A frequentist analysis can tell you p=0.04. It cannot tell you how likely the drug is to be effective, nor how that likelihood should be weighted against the cost and risk of a larger trial. Those calculations require a probability over the parameter — which is the posterior.

For sponsors making go/no-go decisions under resource constraints, a well-calibrated posterior probability of success is often more useful than a p-value that was significant by a margin the team can't easily interpret.

A worked example: same data, two frameworks

You run a randomized pilot study comparing a new analgesic to placebo. 40 patients per arm. The pain reduction (0–10 VAS) in the treatment arm is 2.8 points (SD 2.1); in placebo it is 1.9 points (SD 2.0). Difference: 0.9 points (95% CI −0.04 to 1.84), p=0.059.

Frequentist read:

The null hypothesis of no difference is not rejected at α=0.05. The confidence interval includes zero. The study is "not statistically significant." Most team members hear "didn't work."

Bayesian read (weakly informative prior):

Posterior mean treatment effect: 0.87 points. 95% credible interval: 0.01 to 1.73. Posterior probability of any benefit: 97%. Posterior probability of a clinically meaningful benefit (≥1.5 points): 22%.

The underlying data are identical. The frequentist answer is a binary fail. The Bayesian answer is a probability distribution that the team can use to decide whether a larger confirmatory trial is worth the investment. Neither answer is wrong — they answer different questions.

Common mistakes

Mistake 1: Treating p=0.05 as a probability.

"There is a 95% chance the drug works" is never a valid frequentist conclusion. If you need that sentence, use Bayesian methods and earn it with a prior and a posterior.

Mistake 2: Using a flat prior and calling it "objective."

An uninformative prior is a prior. A uniform distribution on [0, ∞) implies you think a treatment effect of 1,000 units is just as plausible as an effect of 1 unit. That's rarely true. A weakly informative prior — wide but centered on realistic values — is almost always more honest than a flat one.

Mistake 3: Choosing the framework after seeing the data.

Switching from frequentist to Bayesian because p=0.06 and you want to show 93% posterior probability of benefit is data dredging with extra steps. Framework choice must be pre-specified and justified on scientific grounds, not chosen to produce a more favorable-looking result.

Mistake 4: Ignoring operating characteristics.

Adaptive Bayesian designs sound flexible, but their actual Type I error rates and power under realistic assumptions must be validated by simulation. A "Bayesian" design that hasn't been simulated is a design that hasn't been tested.

Practical decision guide

Run through these questions before your next protocol:

1.Do you have credible prior data? If yes, consider Bayesian. If no — or if the prior would be contested — frequentist keeps the focus on the current study.
2.Is this a regulatory submission? If yes, default to frequentist unless you have a strong scientific case and early FDA/EMA engagement for a Bayesian adaptive design.
3.Do you need interim looks without paying an alpha penalty? Bayesian adaptive designs handle this cleanly. Frequentist approaches work too but require pre-specified alpha spending.
4.Do you need to communicate a probability of benefit? Only Bayesian inference gives you that directly and honestly. Frequentist p-values will be misread as probabilities regardless of what you write in the methods section.
5.Are you making a go/no-go decision? Bayesian expected utility or posterior probability of success is often more useful than a p-value for internal decisions.

Bottom line

Frequentist methods are the right default when you need reproducible error-rate control, regulatory acceptance, and no controversial prior assumptions. They are the language regulators speak and the language most reviewers expect.

Bayesian methods are the right choice when you have prior information worth using, need probability statements about parameters, are running an adaptive design, or are making a decision rather than testing a hypothesis. The price is a prior that must be defensible and pre-specified.

The best analysts use both. A frequentist primary analysis for the regulatory record. A Bayesian sensitivity analysis to show what the posterior looks like under various priors. That combination is more transparent and more useful than either alone.

If you're uncertain which framework is right for your next study, that's a good reason to get a statistician involved before the protocol is written — not after the data are locked.

Need help choosing the right framework for your study?

I work with clinical researchers and sponsors on statistical design, analysis plans, and regulatory submissions.

Consulting services Get in touch

← All articles