StrategyMay 2026 · 15 min read

What to put in your statistical analysis plan (and what to leave out)

An SAP that's too vague gets you in trouble with reviewers. One that's too rigid ties your hands mid-study. Here's how to write one that's actually useful.

Micah Thornton, MS — Thornton Statistical Consulting


Why the SAP exists

The statistical analysis plan is a pre-commitment device. You write it before you see the unblinded data, before you know whether your trial worked, and before you know which analytical choices would favor which conclusion. Done well, it makes your results credible: a reviewer reading the SAP and the report can verify that you did what you said you would do, not what the data suggested after the fact.

Done poorly — which usually means done too vaguely — the SAP becomes a regulatory liability. If you specified "a mixed model for the primary endpoint" but your SAR describes a mixed model with a specific covariance structure, site as a random effect, and a Kenward-Roger denominator degrees-of-freedom approximation, a reviewer will ask: where is that specified? If the answer is "it's standard practice," you've invited a data integrity question you didn't need.

The SAP is not just documentation. It is the pre-registration of your analysis. Its credibility rests entirely on the fact that it was finalized — dated, version-controlled, and ideally filed with the regulatory authority — before anyone saw outcomes data. Vagueness after the fact looks like wiggle room.

The other failure mode is over-specification: an SAP so detailed that it prescribes the exact SAS macro call, the font size in tables, or the precise format of every confidence interval. That document can't survive a data cleaning finding without an amendment. The goal is pre-specifying the analytical decisions that matter, not producing a programming specification.

When to write it

The credibility of an SAP depends entirely on when it was finalized relative to when blinding was broken. For a randomized controlled trial, the SAP should be finalized and locked before database lock — meaning before the treatment assignments are revealed to the analysis team. Not before enrollment, not before the last patient visit, but before unblinding.

In practice, many teams write a "draft" SAP early and finalize it during the data cleaning phase while the database is still locked. That is the appropriate practice. It means you can incorporate things you learn during cleaning — variables that ended up encoded differently than expected, missing data patterns that suggest a pre-specified imputation strategy — while still committing before you see the outcomes.

For FDA submissions, ICH E9 is the governing guidance. For EMA, the template requirements are more prescriptive. For academic journals, some require SAP registration (e.g., on ClinicalTrials.gov or OSF) as a condition of submission. Know which standard applies to your submission before you structure the document.

For observational studies and secondary database analyses — where there is no database lock and no blinding — the equivalent is a pre-registered analysis protocol, filed with a registry before accessing the outcomes data. The principle is the same: you commit before you can optimize.

Amendments are normal and expected. An amendment to address a protocol change, a database finding, or a change in regulatory guidance is legitimate. An amendment that recodes the primary endpoint after a blinded interim review that gave you outcome-adjacent information is a different matter. Document the timing and rationale of every amendment, and be prepared to explain it.

What must be in every SAP

These sections are not optional. A regulatory reviewer will look for each of them. If one is absent, the reviewer will either ask for it in a query or assume you didn't pre-specify it.

Study objectives and endpoints

State the primary objective and the single primary endpoint that operationalizes it. Be precise: not "change from baseline in pain" but "change from baseline to Week 12 in the NRS pain score, scored 0–10, where higher scores indicate greater pain." State the direction of the effect you expect (or are testing against). Specify secondary endpoints in the order of their prespecified hierarchical importance.

Every endpoint in the SAP should also appear in the protocol. If there is an endpoint in the SAP that is not in the protocol — even a "post-hoc exploratory" one — a reviewer will ask why it was added. Add endpoints to the protocol first, then reflect them in the SAP.

Analysis populations

Define each analysis population precisely enough that a programmer can construct the indicator variable without asking the statistician for clarification. The intent-to-treat (ITT) population typically means all randomized patients regardless of what happened after randomization. The per-protocol (PP) population typically excludes major protocol deviators. Define what constitutes a major protocol deviation — by category, not by name — before unblinding.

State which population is the primary analysis population for each endpoint. If the primary population for the primary endpoint is the full analysis set (FAS), say so. If secondary analyses use a different population, specify them.

Primary analysis model

This is the most important section in the SAP. Specify the model completely enough that two statisticians working independently would implement it the same way. For a mixed model for repeated measures (MMRM):

  • The response variable (and any transformation applied to it)
  • Fixed effects: treatment, visit, treatment × visit interaction, baseline value, covariates specified in the protocol
  • Random effects (if any — most regulatory MMRM implementations have none)
  • Covariance structure (unstructured, compound symmetry, Toeplitz — and the rationale for the choice)
  • The estimand: treatment effect at which visit, under which missing data assumption
  • How degrees of freedom will be approximated (Kenward-Roger is standard; say so)
  • What will happen if the model fails to converge (fallback structure, fallback estimator)

If your primary analysis is a logistic regression, a time-to-event analysis, a Wilcoxon rank-sum test, or a Bayesian hierarchical model, the same principle applies: specify the model precisely enough that the implementation is determined.

Missing data strategy

Missing data is not an afterthought. Since ICH E9(R1), regulators expect you to define your estimand — including the treatment policy, composite, or hypothetical strategy for handling intercurrent events — and align your missing data approach with that estimand.

For a treatment policy estimand (the most common regulatory choice), the primary analysis should use data from patients regardless of treatment discontinuation. MMRM under missing at random (MAR) is standard. But you also need to pre-specify sensitivity analyses that stress-test the MAR assumption: controlled multiple imputation (delta method), tipping-point analysis, or pattern mixture models. Specify the sensitivity analyses in the SAP — they are expected, and the direction of the sensitivity matters for interpretation.

Multiplicity control

If you are testing more than one hypothesis with a family-wise Type I error rate that matters for labeling claims, you need a pre-specified multiplicity procedure. State the procedure (Bonferroni, Holm, hierarchical testing with fallback, Hochberg, Hommel, gatekeeping for multiple families), the family-wise error rate, and which endpoints are in each family.

If your study has a single primary endpoint and secondary endpoints that are clearly exploratory, you can state that secondary endpoints will be assessed at a 0.05 significance level but not used to make labeling claims, and no alpha adjustment is performed. That is a legitimate approach — but state it explicitly. "We did not adjust for multiplicity" without explanation looks like an oversight.

Subgroup analyses

Pre-specify all subgroup analyses — including the subgroup definition, the analysis method, and the estimand — before unblinding. Post-hoc subgroups are never primary evidence of efficacy. Pre-specified subgroups are never definitive evidence of efficacy either, but they are at least interpretable.

For a 5-subgroup analysis with the standard treatment-by-subgroup interaction test, state that explicitly. For Forest plots of subgroup effects, state what will be shown. If you will test the interaction formally, state the threshold at which you would interpret the interaction as meaningful (usually 0.10 or 0.05, with the recognition that interaction tests are underpowered in most trials).

The include / leave-out summary

A useful way to calibrate the level of detail: if two competent statisticians would make different analytical choices given what's written, it needs to be more specific. If two programmers would write different code given what's written, that's a programming spec, not an SAP.

SectionIncludeLeave out
Primary modelModel structure, covariates, covariance assumption, estimand, convergence fallbackSAS PROC MIXED syntax, macro names, output formatting
Missing dataPrimary assumption (MAR/MNAR), imputation method if applicable, sensitivity analyses and their directionNumber of imputation datasets (unless pre-specifying matters), technical imputation algorithm
PopulationsExact inclusion/exclusion rules with variable names and values, primary population for each endpointHow the flag variable will be named in the SAS dataset
SubgroupsAll pre-specified subgroups with definitions, interaction test approach, interpretation thresholdPost-hoc subgroups suggested during data cleaning
MultiplicityFull procedure with family assignments, FWER target, and order of testingSoftware implementation details
SafetySummary approach, TEAE definition, grading scheme (CTCAE version), periods, special interest AEsEvery cross-tabulation and listing — those belong in the shells document
TLF shellsReference the shells document; do not embed table mockups inlineColumn widths, page orientation, font specifications

Safety analysis: a common SAP gap

Many SAPs have a thorough efficacy section and a cursory safety section: "Safety will be summarized descriptively. Adverse events will be coded using MedDRA and graded per CTCAE v5.0." That is a starting point, not a specification.

A regulator reviewing a safety concern will want to know: what was the pre-specified treatment-emergent adverse event (TEAE) window? What is the analysis period — on-treatment only, or through a 30-day follow-up? If an event occurs during follow-up after dose reduction, how is it counted? Which adverse events were identified as special interest before unblinding?

Adverse events of special interest (AESIs) should be pre-specified in the SAP, not identified after reviewing the blinded data. If your compound has a known class effect (hepatotoxicity, QTc prolongation, bone marrow suppression), the enhanced monitoring and analysis plan for that signal belongs in the SAP, not in a post-hoc addendum.

For the safety section, specify at minimum:

  • TEAE definition: onset on or after first dose, through [X] days after last dose
  • Grading system and version (CTCAE v5.0 for oncology, custom scales for disease-specific endpoints)
  • MedDRA version to be used for coding
  • Analysis period (on-treatment, on-treatment + 30 days, full follow-up)
  • AESIs and the enhanced summaries that will be produced for each
  • Laboratory shift table approach (notable ranges, CTCAE grading of lab values)
  • Vital signs and ECG analysis approach
  • Exposure summaries (duration, dose intensity, reasons for discontinuation)

Sensitivity and supportive analyses

The primary analysis is a point estimate with confidence interval and hypothesis test under a set of assumptions. Sensitivity analyses test what happens when those assumptions are violated. Supportive analyses use different methods to address the same question from another angle. The SAP should pre-specify both.

For the missing data assumption: if the primary analysis assumes MAR, a sensitivity analysis under MNAR (e.g., a controlled imputation that adds a constant δ to imputed post-dropout values in the active arm, varying δ to find the tipping point) pre-specifies the logic without pre-specifying the answer. That is exactly what regulators want to see.

For the population assumption: if the primary analysis is ITT, a per-protocol analysis as a supportive analysis pre-specifies the convergence between the two. Divergence in the PP direction is evidence of a treatment effect; divergence in the opposite direction is a signal that protocol deviations are informative.

The number of sensitivity analyses is not a quality signal. Three well-chosen sensitivity analyses that probe the key assumptions of the primary model are more valuable — and more credible — than twelve sensitivity analyses that test every possible variation. Pre-specify the ones that would change your interpretation of the primary result if they differed.

Exploratory analyses are legitimate and should be labeled as such in the SAP. The label "exploratory" means: we will look at this, we will report it honestly, and we will not draw confirmatory conclusions from it. It does not mean unplanned.

Interim analyses and DMC charters

If your trial includes one or more interim analyses — for efficacy stopping, futility stopping, safety monitoring, or adaptive sample size re-estimation — every detail of those interim analyses must be in the SAP.

For efficacy stopping: the boundary (O'Brien-Fleming, Pocock, Haybitchi-Peto), the alpha spending function, the information fractions at which interim looks will occur, and who has decision authority. For futility: whether the futility boundary is binding, and the conditional power threshold. For adaptive designs: the adaptation rule, the data used to trigger adaptation, and the Type I error control method.

The Data Monitoring Committee charter is a separate document, but it should be consistent with the SAP. The charter describes governance (who the DMC members are, how they communicate, what information they receive); the SAP describes the statistical procedures. A conflict between the two documents — a different spending function, different information fractions — will surface in a regulatory review.

The most common interim analysis error: the DMC charter specifies a spending function that the SAP never mentions. Regulators want to see the statistical procedure specified in the SAP, with the charter serving as the operational governance layer.

The estimand framework: ICH E9(R1)

Since ICH E9(R1) became final in 2019, regulatory submissions are increasingly expected to define a precise estimand — the exact population-level quantity the trial is designed to estimate. The estimand framework has five components:

  1. 1.Population. Which patients, defined by the eligibility criteria.
  2. 2.Variable (endpoint). The outcome measure, defined completely.
  3. 3.Intercurrent events. Events after randomization that affect interpretation (treatment discontinuation, rescue medication, death). Specify the handling strategy for each: treatment policy, composite, hypothetical, while-on-treatment, or principal stratum.
  4. 4.Population-level summary. How you summarize the outcome over the population: mean difference, risk ratio, median survival, etc.
  5. 5.Causal contrast. Implicit in a randomized trial: the comparison between arms at the same point in time under the randomized allocation.

For most trials, the primary estimand will be a treatment policy strategy for discontinuation (include all data regardless of discontinuation), and the SAP should say so explicitly. The estimand then determines which missing data strategy is appropriate — not the other way around.

This shift in framing — estimand first, analysis method second — is one of the most important changes in regulatory statistics in the last decade. Teams that have not adopted it are starting to see queries that ask for the estimand specification where it is absent.

Common SAP mistakes

Mistake 1: The primary analysis is underspecified.

"Analysis of covariance with baseline as covariate" leaves open: which baseline? Measured how? Transformed? What covariates beyond baseline? What happens with a missing baseline? Over-specification is rare in the primary analysis section. Underspecification is the norm.

Mistake 2: No pre-specified sensitivity analyses.

A primary analysis with no pre-specified sensitivity analyses signals that the team hasn't thought about what assumptions their primary analysis rests on. Regulators always probe the assumptions. If they find them unaddressed in the SAP, they will ask for post-hoc analyses — which are less credible than pre-specified ones and more work.

Mistake 3: The SAP was written after unblinding.

This happens more often than it should, particularly in academic trials and early-phase industry work. An undated SAP, or an SAP dated after the last patient's last visit, raises immediate questions about analytical integrity. The date stamp and the version history are the evidence that the SAP preceded the analysis. Keep them clean.

Mistake 4: Protocol and SAP are inconsistent.

The most common source of inconsistency: the protocol specifies one primary endpoint definition, and the SAP operationalizes it differently. A secondary endpoint listed in the protocol doesn't appear in the SAP, or appears with a different name. Every endpoint that will be analyzed must appear in both documents, consistently defined.

Mistake 5: Analysis population definitions rely on implicit knowledge.

"The safety analysis set includes all patients who received at least one dose of study drug." Clear. "The per-protocol analysis set excludes major protocol deviators." Not clear — what constitutes a major protocol deviation? This needs to be defined by category (significant dosing errors, use of prohibited concomitant medication, failure to meet an eligibility criterion) before the data are unblinded and before anyone knows which patients would be excluded.

Mistake 6: The SAP is the programming spec.

An SAP that specifies SAS macro names, dataset naming conventions, table format details, and output file paths is a programming specification masquerading as a statistical document. Every trivial programming change requires a formal SAP amendment. Keep the SAP at the level of analytical decisions; put the programming specification in a separate shells document or a statistical programming plan.

The amendment process

An SAP amendment is not an admission of failure. It is the appropriate mechanism for updating the statistical plan when something material changes — a protocol amendment, a data management finding that requires a different approach, or a regulatory guidance document issued after the original SAP was filed.

Every amendment should include: the version number, the date, the author, a change log with specific sections modified, and the rationale for each change. The rationale matters. "Revised imputation strategy based on observed missing data pattern in blinded data" is a legitimate amendment. "Revised primary endpoint definition after reviewing interim efficacy summaries" is a serious concern.

Keep a clean version history. If a regulator asks for the SAP that was in effect at the time of the interim analysis, you need to be able to produce it — not reconstruct it from memory. Every version, dated and signed, should be archived with the trial master file.

For minor changes — typographical corrections, corrections to cross-references, clarifications that do not change the analysis — a version note is sufficient. For substantive changes — a new covariate, a different covariance structure, a revised population definition — a formal amendment with regulatory filing is appropriate.

Practical SAP checklist

Before locking the SAP, verify each of these:

  1. 1.Every endpoint in the SAP is in the protocol, defined consistently. Cross-reference section numbers.
  2. 2.The primary analysis model is fully specified. A second statistician should be able to implement it without asking a single question.
  3. 3.Each analysis population is defined precisely enough to construct a flag variable. Variable names and values, not just prose descriptions.
  4. 4.Missing data: primary assumption stated, ICH E9(R1) estimand strategy specified, sensitivity analyses pre-specified. At least one MNAR sensitivity analysis for a primary continuous or binary endpoint.
  5. 5.Multiplicity: procedure stated with family assignments and FWER target. Or an explicit statement that secondary endpoints are exploratory and no alpha adjustment is made.
  6. 6.All pre-specified subgroups are listed with their definitions. No subgroups should appear in the CSR that are not in the SAP.
  7. 7.Safety section covers TEAE window, AESIs, lab grading, and exposure summaries. Not just 'safety will be summarized descriptively.'
  8. 8.Interim analyses fully specified if applicable. Boundary, spending function, information fractions, decision rules, and who decides.
  9. 9.Version, date, and author recorded. Dated before database lock. If using a filing system, confirm the filing timestamp.
  10. 10.SAP version history clean and archived. Every prior version recoverable if a regulator asks.

Bottom line

A good SAP pre-commits you to analytical choices that matter and stays silent on implementation details that don't. It is specific enough that no two statisticians would implement it differently, and flexible enough that a data management finding doesn't require an amendment to address a typo.

The discipline of writing the SAP is half the value. Forcing yourself to specify the estimand, the missing data strategy, and the multiplicity procedure before you see the data surfaces assumptions you hadn't articulated. Those are exactly the assumptions a reviewer will probe.

The question to ask of every section: if this analysis yielded a surprising result, would someone reviewing this document believe the analysis was pre-specified and not post-hoc? If the answer is no, the section needs more specificity. The SAP's job is to make that question unanswerable.

If you are inheriting an existing trial that already has a weak SAP, the priority is a prospective amendment that addresses the most critical underspecifications before the database is locked — not a retrospective explanation after the fact. It's not ideal, but a well-reasoned, timely amendment is significantly more defensible than silence.


Need help writing or reviewing a statistical analysis plan?

I draft SAPs for Phase I–III trials, observational studies, and regulatory submissions — and I review existing SAPs before database lock while there's still time to fix problems.