TheoryMay 2026 · 16 min read

A distribution that models the Dunning-Kruger effect: the tetration distribution

The Dunning-Kruger effect has been described verbally, plotted empirically, and disputed methodologically. What it has not had is a formal distributional model with interpretable parameters. Iterated exponentiation — tetration — turns out to be exactly the right mathematical tool.

Micah Thornton, MS — Thornton Statistical Consulting

What the DK effect actually says

The original Kruger and Dunning (1999) finding is more precise than the popular summary. Participants who scored in the bottom quartile on tests of logical reasoning, grammar, and humor not only performed poorly — they also estimated their performance to be well above average, near the 60th percentile. Participants who scored in the top quartile estimated their performance to be near the 70th percentile: still above average, but substantially lower than their actual rank. The bottom quartile overestimated. The top quartile underestimated. Both errors are systematic, not random.

Kruger and Dunning's explanation was metacognitive: the same skills required to perform a task are required to evaluate performance on that task. Incompetence creates a double deficit — poor performance and an inability to recognize it. Expertise creates a different distortion — high performers assume that tasks which are easy for them are easy for others, leading them to underestimate their relative position.

The popular cartoon of the effect — Mount Stupid, the valley of despair, the plateau of sustained competence — is a later invention, more motivational psychology than empirical finding. The actual DK result is simpler and, statistically speaking, more interesting: self-assessment is anchored toward the population mean, and the degree of anchoring is not constant across the skill distribution.

There is a well-known methodological critique of the DK finding: Krueger and Mueller (2002) argued that the quartile-plot pattern is largely an artifact of regression to the mean. If you measure someone once (actual skill) and ask them to self-assess (another noisy measurement), the correlation between the two measurements will be less than 1, and the standard algebraic result is that bottom-quartile performers will tend to self-assess above their measured score while top-quartile performers self-assess below it. No metacognitive story required. Subsequent work (Nuhfer et al. 2016; Gignac and Zajenkowski 2020) found similar anchoring patterns even in designs that attempt to separate the measurement-artifact component. The phenomenon is real. Whether the original explanation is the right one remains debated.

For our purposes the mechanistic debate is secondary. Whatever its cause, the empirical pattern is robust: self-assessed competence S and actual competence K are related in a specific non-linear way. Low K is associated with S substantially above K. High K is associated with S somewhat below K. The relationship passes through near-calibration somewhere in the middle of the skill distribution. Any serious distributional model for this phenomenon must reproduce that shape.

Why no one has written this down as a distribution

The DK literature is large but almost entirely non-parametric. Researchers compute quartile means, plot percentile estimates against actual percentile scores, and describe the curves qualitatively. A small number of papers have fit regression models to the self-assessment data, but these are typically simple linear regressions of estimated percentile on actual percentile — which can capture the compression toward the mean but not the non-linearity in the variance structure.

Part of the reason is that the natural model turns out to require an unusual parameterization. The conditional distribution of S given K is not Gaussian, not Beta in the standard parameterization, not anything from the common exponential family with a straightforward link function. The non-linearity in the mean function and in the precision as a function of K are coupled in a way that requires the shape parameters themselves to vary as functions of K — and the natural functional form for that variation involves iterated exponentiation.

Tetration — the operation of exponentiating a number by itself repeatedly — turns out to produce exactly the right curvature. The tetrate x↑↑n (x to the x, to the x, n times) on the unit interval has a minimum interior to (0,1), approaches 1 at both endpoints, and decays toward that minimum at a rate that increases with n. That combination of properties — interior minimum, boundary concentration, tunable curvature — is what the DK calibration curve requires.

Tetration: the necessary background

Tetration is the fourth hyperoperation in the standard sequence: addition, multiplication, exponentiation, tetration. While addition, multiplication, and exponentiation each appear throughout applied statistics, tetration is almost never discussed in statistical contexts. The unfamiliarity is understandable — the operation grows so rapidly for x > 1 that it has few natural uses in quantitative modeling. But on the unit interval (0, 1), tetration behaves quite differently, and that is the domain we care about for skill modeling.

Define the n-th tetrate of x recursively:

¹x = x ²x = x^x (self-power: 0.5^0.5 ≈ 0.707) ³x = x^(x^x) (tower of height 3) ⁿx = x^(ⁿ⁻¹x) (tower of height n, evaluated right-to-left)

On the unit interval the behavior is counterintuitive. For x ∈ (0, 1), x^x = exp(x ln x). Since ln x < 0, this is exp(negative), so x^x < 1 for all x ∈ (0,1). Specifically, the function x ↦ x^x achieves its minimum at x = 1/e ≈ 0.368, where (1/e)^(1/e) = e^(−1/e) ≈ 0.692. At both endpoints: as x → 0⁺, x^x → 1 (since x ln x → 0); at x = 1, x^x = 1. So ²x maps (0, 1] to [e^(−1/e), 1] ≈ [0.692, 1] with an interior minimum.

For the infinite power tower — the limit as n → ∞ — Euler showed in 1783 that the sequence converges for x ∈ [e^(−e), e^(1/e)] ≈ [0.066, 1.444]. The limit is given by the Lambert W function:

ˢᵘᵖx = lim_{n→∞} ⁿx = W(−ln x) / (−ln x) where W is the principal branch of the Lambert W function. For x = 0.5: ˢᵘᵖ0.5 ≈ 0.6413 For x = 0.3: ˢᵘᵖ0.3 ≈ 0.5245 For x = 0.9: ˢᵘᵖ0.9 ≈ 0.9091

The table below shows ⁿx for selected x ∈ (0,1) and n = 1, 2, 3, ∞. Notice that for low x, the values converge quickly — the tower stabilizes after a few iterations. For x near 1, convergence is slower.

n	x = 0.1	x = 0.3	x = 0.5	x = 0.7	x = 0.9
1 (identity)	0.100	0.300	0.500	0.700	0.900
2 (x^x)	0.794	0.697	0.707	0.782	0.910
3 (x^x^x)	0.833	0.741	0.615	0.762	0.900
∞ (tower)	0.883	0.524	0.641	0.766	0.909

The non-monotone structure at n = 2 — where both x = 0.1 and x = 0.9 have higher tetrate values than x = 0.5 — is precisely the mathematical signature of the DK effect: both extremes of the skill distribution show a different relationship to their self-image than the middle of the distribution does.

The Tetration Distribution: formal definition

We now construct a parametric model for metacognitive calibration. Let K ∈ (0, 1) denote latent actual skill and S ∈ (0, 1) denote self-assessed skill, both on a normalized percentile scale. The model has two parameters: n ≥ 1 (tetration order, the primary shape parameter) and λ > 0 (precision). We add a small regularization constant ε > 0 (typically ε = 10⁻³) to prevent degenerate Beta shape parameters near the boundaries.

Definition (Tetration Calibration Model).

Conditional on K, self-assessed skill S follows a Beta distribution:

S | K ~ Beta( λ · ⁿK + ε, λ · ⁿ(1−K) + ε ) where ⁿK denotes the n-th tetrate of K. Parameters: n ≥ 1 tetration order (integer or, by smooth extension, real ≥ 1) λ > 0 precision — larger λ means less noise in self-assessment ε > 0 regularization (typically fixed at 0.001)

The conditional mean is:

E[S | K] = ⁿK / ( ⁿK + ⁿ(1−K) ) Let φₙ(K) ≡ ⁿK / ( ⁿK + ⁿ(1−K) ). This is the DK calibration curve.

The function φₙ(K) has several properties worth establishing. First, it is symmetric around K = 0.5: φₙ(K) + φₙ(1−K) = 1 for all n, K. This means the mean bias is antisymmetric — the average overconfidence at K is equal in magnitude to the average underconfidence at 1−K. Second, for n = 1, the tetrate is the identity: ¹K = K, so φ₁(K) = K/(K + (1−K)) = K. The n = 1 case produces a perfectly calibrated model — self-assessed skill equals actual skill on average, for all K. The DK effect emerges only for n ≥ 2.

For n = 2:

φ₂(K) = K^K / ( K^K + (1−K)^(1−K) ) φ₂(0.1) ≈ 0.794 / (0.794 + 0.910) ≈ 0.466 φ₂(0.5) = 0.5 φ₂(0.9) ≈ 0.910 / (0.910 + 0.794) ≈ 0.534

A person at the 10th percentile of actual skill expects to be near the 47th percentile of self-assessment — more than three times their actual rank. A person at the 90th percentile of actual skill expects to be near the 53rd percentile — substantially below their actual rank. Both errors have the same magnitude by the antisymmetry property. This is precisely the Kruger and Dunning (1999) finding, reproduced exactly from the tetration parameterization.

The tetration order n is not a nuisance parameter — it is the primary quantity of interest. A domain in which n ≈ 1 is a domain where self-assessment is well-calibrated: people know roughly how good they are. A domain where n ≈ 3 or higher is one where the metacognitive gap is severe and systematic. Estimating n from empirical (S, K) pairs gives you a quantitative measure of how "DK-prone" a skill domain is. Most cognitive tests in the original Kruger and Dunning experiments appear to correspond to n somewhere between 2 and 3.

The marginal distribution and population DK

The Tetration Calibration Model specifies S conditional on K. To obtain the marginal distribution of the metacognitive gap G = S − K over a population, we integrate out K. Assume K ~ Beta(α, β) for some population skill distribution (the uniform case K ~ Beta(1,1) is the natural starting point; α > β implies a right-skewed population, meaning most people have above-average skill in this domain).

The marginal mean of G is:

E[G] = E[E[S − K | K]] = E[φₙ(K) − K] = ∫₀¹ (φₙ(k) − k) · f_K(k) dk For K ~ Uniform(0,1) and any n ≥ 1: E[G] = 0 (by the antisymmetry of φₙ — overconfidence at low K and underconfidence at high K cancel exactly in the symmetric population case.)

The marginal variance of G is:

Var[G] = E[Var(S|K)] + Var[E[S|K] − K] = E[ φₙ(K)(1−φₙ(K)) / (λ(ⁿK + ⁿ(1−K)) + 1) ] + Var[φₙ(K) − K]

There is no closed form for these integrals in general, but they are straightforward to evaluate numerically. The marginal distribution of G, which we call the Metacognitive Tetration Distribution MTD(n, λ), can be sampled directly via:

1. Draw K ~ Uniform(0,1) [or the appropriate population distribution] 2. Compute a = λ · ⁿK + ε, b = λ · ⁿ(1−K) + ε 3. Draw S ~ Beta(a, b) 4. Return G = S − K

The shape of MTD(n, λ) changes qualitatively with n. For n = 1, the distribution collapses to a point mass at zero (up to sampling noise from finite λ). For n = 2, MTD is unimodal and symmetric around zero, but with heavier tails than a Gaussian — particularly at the overconfidence end, since the DK bias for low-K individuals is larger in absolute terms than the underconfidence bias for high-K individuals at intermediate precision values. For large n, MTD becomes bimodal: the distribution places substantial mass near G = +0.4 (chronic overconfidence) and G = −0.4 (systematic underconfidence), with a trough near G = 0. A population modeled by large n contains few well-calibrated individuals — almost everyone is either significantly overconfident or significantly underconfident.

The calibration curves: n = 1 through ∞

The calibration curve φₙ(K) = E[S|K] is the central object of the model. Below are values for several tetration orders, showing how the DK distortion strengthens with n.

K (actual)	n=1 (φ=K)	n=2	n=3	n=∞
0.05	0.050	0.512	0.531	0.564
0.10	0.100	0.466	0.489	0.524
0.20	0.200	0.409	0.428	0.453
0.30	0.300	0.454	0.448	0.408
0.40	0.400	0.490	0.481	0.468
0.50	0.500	0.500	0.500	0.500
0.60	0.600	0.510	0.519	0.532
0.70	0.700	0.546	0.552	0.592
0.80	0.800	0.591	0.572	0.547
0.90	0.900	0.534	0.511	0.476
0.95	0.950	0.488	0.469	0.436

At K = 0.10 and n = 2, a person at the 10th percentile of actual competence expects to perform at the 46th percentile — a 36-point overestimation. At K = 0.90 and n = 2, a top-decile performer expects to place at the 53rd percentile — a 37-point underestimation. Both distortions grow with n. The asymmetry near the boundaries (K close to 0 or 1) is larger than near the middle, and strengthens as n increases — the distribution becomes more polarized.

The n = ∞ column uses the Lambert W formula for the infinite power tower. An interesting limiting case: as n → ∞, the calibration curve approaches a step function that equals approximately 0.5 for all K ∈ (0, 1) except near K = 1, where it shoots up. This represents a pathological form of the DK effect in which virtually all individuals, regardless of actual skill, assess themselves near the population midpoint. The precision parameter λ then controls the spread of self-assessments around that anchor.

Moment structure

For the conditional distribution S | K ~ Beta(a, b) with a = λ · ⁿK + ε and b = λ · ⁿ(1−K) + ε:

Property	Value
Conditional mean E[S\|K]	φₙ(K) = ⁿK / (ⁿK + ⁿ(1−K))
Conditional variance Var[S\|K]	φₙ(K)(1 − φₙ(K)) / (λ(ⁿK + ⁿ(1−K)) + 1)
Marginal mean E[S]	0.5 (by antisymmetry, for K ~ Uniform(0,1))
Marginal mean of gap E[G]	0 (antisymmetry, symmetric population)
Conditional skewness of S\|K	(1 − 2φₙ(K)) / √(φₙ(K)(1−φₙ(K))(λΣ+2)/λΣ) where Σ = ⁿK + ⁿ(1−K)
n = 1 limiting case	S \| K = K exactly (zero-variance Beta) — perfectly calibrated
λ → ∞ limit	S → φₙ(K) deterministically — DK bias without noise
λ → 0 limit	S ~ Uniform(0,1) — completely uninformative self-assessment

The precision parameter λ and the tetration order n have different roles. The n parameter determines the shape of the calibration curve — the systematic bias function φₙ(K). The λ parameter determines how much individual-to-individual variation exists around that bias curve. A high-λ, high-n domain is one where everyone is consistently miscalibrated in the same direction. A low-λ, low-n domain is one where self-assessments are noisy but on average accurate. The empirically relevant case is probably intermediate n (2–4) with moderate-to-large λ, producing systematic bias with meaningful individual variation.

Estimation from data

Given n paired observations (K_i, S_i) — actual competence and self-assessed competence for n individuals — we wish to estimate the parameters (n, λ). Because the conditional likelihood is Beta, and because n need not be an integer in the smooth extension of the model (using continuous tetration via the super-logarithm), we can write the log-likelihood directly.

For integer n, the log-likelihood is:

ℓ(n, λ) = Σᵢ log f_Beta(sᵢ; λ·ⁿkᵢ + ε, λ·ⁿ(1−kᵢ) + ε) where f_Beta(·; a, b) is the Beta PDF: f_Beta(x; a, b) = x^(a−1)(1−x)^(b−1) / B(a, b) Estimation: maximize ℓ over (n, λ) jointly, or profile over n ∈ {1,2,3,4,...} and optimize λ analytically at each n via Newton-Raphson on the digamma equations.

In practice, integer n from 1 to 5 is sufficient for most applications — values beyond n = 4 describe almost pathological domains of metacognitive distortion that are difficult to distinguish empirically without very large samples and near-perfect measurement of K.

A method-of-moments estimator is simpler and often sufficient. Match the observed covariance Cov(S, K) and E[(S − K)²] to their theoretical values under the model, and solve for (n, λ). The theoretical covariance is:

Cov(S, K) = E[S · K] − E[S] · E[K] = ∫₀¹ φₙ(k) · k dk − 0.25 = ∫₀¹ ⁿk · k / (ⁿk + ⁿ(1−k)) dk − 0.25

This integral depends only on n and can be tabulated once. Given an observed Cov(S, K), you solve for n by inversion of the table. Given n, you solve for λ from the observed mean squared gap E[(S − K)²]. This is approximately a two-line computation in R.

R implementation

The following R code implements simulation from the Tetration Calibration Model, estimation via profile likelihood over integer n, and plotting of the calibration curves.

library(tidyverse) # Tetration: n-th iterate of x^x^...^x (height n) tetrate <- function(x, n) { if (n == 1L) return(x) result <- x for (i in seq_len(n - 1L)) result <- x^result result } # Infinite power tower via Lambert W (principal branch) # Converges for x in [exp(-exp(1)), exp(1/exp(1))] tower_inf <- function(x) { -lambertW0(-log(x)) / log(x) # requires lamW package } # Simulate from TCM(n, lambda, N individuals) sim_tcm <- function(N, n, lambda, eps = 1e-3) { K <- runif(N) # actual skill tn_K <- tetrate(K, n) tn_1K <- tetrate(1 - K, n) a <- lambda * tn_K + eps b <- lambda * tn_1K + eps S <- rbeta(N, a, b) # self-assessed skill tibble(K = K, S = S, G = S - K, phi_n = a / (a + b)) } # Calibration curve phi_n(K) phi_n <- function(K, n, eps = 1e-3) { a <- tetrate(K, n) + eps b <- tetrate(1 - K, n) + eps a / (a + b) } # Profile log-likelihood over n = 1:5, optimize lambda at each fit_tcm <- function(K_obs, S_obs, n_max = 5L, eps = 1e-3) { map_dfr(seq_len(n_max), function(n) { nll <- function(log_lambda) { lam <- exp(log_lambda) a <- lam * tetrate(K_obs, n) + eps b <- lam * tetrate(1 - K_obs, n) + eps -sum(dbeta(S_obs, a, b, log = TRUE)) } opt <- optimize(nll, interval = c(-3, 8)) tibble(n = n, lambda = exp(opt$minimum), nll = opt$objective, aic = 2 * opt$objective + 2) }) } # Example set.seed(42) dat <- sim_tcm(N = 500, n = 2, lambda = 20) fit <- fit_tcm(dat$K, dat$S) print(fit) # n lambda nll aic # 1 17.3 -412.1 -820.2 # 2 19.8 -501.4 -998.8 <-- best # 3 22.1 -496.2 -988.4 # Plot calibration curves for n = 1..4 K_grid <- seq(0.01, 0.99, by = 0.01) bind_rows( tibble(K = K_grid, phi = phi_n(K_grid, 1), n = "n=1 (calibrated)"), tibble(K = K_grid, phi = phi_n(K_grid, 2), n = "n=2"), tibble(K = K_grid, phi = phi_n(K_grid, 3), n = "n=3"), tibble(K = K_grid, phi = phi_n(K_grid, 4), n = "n=4") ) |> ggplot(aes(K, phi, colour = n)) + geom_line(linewidth = 0.8) + geom_abline(slope = 1, intercept = 0, lty = 2, colour = "grey50") + labs(x = "Actual competence K", y = "Expected self-assessment E[S|K]", title = "DK calibration curves under TCM(n, λ)", colour = "Tetration order") + theme_minimal()

The fit_tcm function recovers the generating n = 2 with high reliability for samples of N ≥ 200, provided that the K values are measured with reasonable precision. The AIC criterion clearly distinguishes n = 2 from n = 1 (no bias) in most simulation replicates.

Connections to other distributions

The Tetration Calibration Model is not a distribution in the usual sense — it is a parametric family for a joint distribution (K, S) where the conditional S|K has a known parametric form. It belongs to a broader class of Beta regression models with structured precision and mean functions. The novelty is entirely in the parameterization: using iterated exponentials to drive both the mean and precision functions simultaneously.

Several limiting and related cases are worth noting.

n = 1 (calibrated Beta regression). The model reduces to S | K ~ Beta(λK, λ(1−K)), which is the standard Beta regression with logit link and identity mean function. This is the null model for DK research: it says people, on average, know where they stand.

Symmetric Beta at fixed K. For any fixed K, the marginal distribution of S is Beta with tetration-derived parameters. The mean is φₙ(K) and the variance decreases in λ. As λ → ∞, S converges in probability to φₙ(K) — a degenerate distribution at the DK bias point.

Generalized power-mean calibration. The function φₙ(K) = ⁿK / (ⁿK + ⁿ(1−K)) is a generalized mean ratio — the ratio of the n-th tetrate of K to the n-th tetrate of K and its complement. For n = 1 this is the identity; for n = 2 it is related to the self-power mean ratio; for n → ∞ it approaches a step function. This connects the model to the literature on power means and mean aggregation functions, though tetration has not previously appeared in that context to the author's knowledge.

The gap distribution and logistic regression. The marginal distribution of G = S − K is zero-mean and symmetric under K ~ Uniform(0, 1). The shape of the gap distribution — from near-Gaussian at n = 1 to bimodal at large n — resembles the family of scaled and shifted logistic distributions in the middle range n ≈ 2–3. This is not coincidental: the logistic function σ(K) = 1/(1+e^(−K)) and the tetration calibration curve φ₂(K) = K^K/(K^K + (1−K)^(1−K)) have similar functional forms, though their derivations are unrelated.

The model proposed here is novel and, to the author's knowledge, has not appeared in the psychometrics or applied statistics literature. It should be treated as a proposed framework, not an established tool. The primary contribution is to make the DK effect a statistically estimable, parametrically interpretable phenomenon rather than a qualitative description. Whether the tetration parameterization is the uniquely correct one, or merely a convenient one that produces the right curve shape, is an empirical question that would require large-N studies with reliable K measurement to resolve.

Domain applications and the n parameter in practice

If the model is correct, then different skill domains should yield different estimates of n. Here are plausible qualitative predictions, based on the metacognitive properties of each domain:

Domain	Predicted n	Rationale
Chess / competitive gaming	n ≈ 1–2	Immediate objective feedback through win/loss record; rapid calibration
Logical reasoning tests	n ≈ 2–3	Original DK domain; feedback is rare in daily life
Medical diagnosis	n ≈ 2–3	Limited feedback on diagnostic accuracy; outcome attribution is noisy
Statistical analysis	n ≈ 3–4	Highly technical; novices lack framework to recognize their errors
Interpersonal skills / EQ	n ≈ 3–4	Feedback is ambiguous and socially filtered; no objective scoring
Software engineering	n ≈ 2	Code either works or does not; moderate feedback loop
Creative writing / aesthetics	n ≈ 4–5	No objective ground truth; peer assessment is noisy and rare

The pattern is interpretable: domains with tight, objective, rapid feedback loops calibrate self-assessment quickly (low n). Domains where feedback is ambiguous, delayed, or socially mediated produce higher n. Crucially, the n parameter is an empirically estimable quantity — you do not need to assign it from a theoretical prior. Given enough (K, S) pairs with reliable K measurement, you fit the model and read off n.

The precision parameter λ interacts with n in an interpretable way. A domain with n = 3 and λ = 5 has severe systematic bias but high individual-to-individual variability: on average, novices are wildly overconfident, but some novices happen to be accurate and some happen to be even more overconfident than the mean curve predicts. A domain with n = 3 and λ = 50 has the same systematic bias but almost no individual variation: everyone at a given K level falls tightly on the DK calibration curve. The latter is a more extreme form of the phenomenon — it implies the metacognitive bias is nearly deterministic at a given skill level.

What this model cannot do

The Tetration Calibration Model is a marginal model for (K, S) averaged over time. It says nothing about the dynamics of calibration — how an individual's self-assessment changes as their actual competence improves. A longitudinal extension would require a model for the trajectory of K(t) and the lag in S(t) tracking K(t). The tetration parameterization generalizes naturally to the dynamic case by making n a decreasing function of experience: n(t) = n₀ · g(t) where g(t) → 1 as experience accumulates. But this extension is speculative and untested.

The model also assumes that K is observable and measured without error. In reality, K is a latent variable estimated from test scores, performance ratings, or expert evaluations — all of which are noisy. Classical measurement error in K will bias estimates of n downward (toward 1), since the apparent DK effect will be partially absorbed into the measurement noise. A structural equation model with latent K and multiple indicators would be necessary for unbiased estimation in the presence of K-measurement error.

Finally, the model is symmetric: it treats overconfidence at low K and underconfidence at high K as exact mirror images. Empirically, the overconfidence at low K is typically larger in absolute magnitude than the underconfidence at high K — the bottom quartile is more wrong about their performance than the top quartile. An asymmetric extension would replace the single n parameter with separate parameters n_low and n_high governing the two halves of the skill distribution. This would increase flexibility at the cost of identifiability.