A distribution that models the Dunning-Kruger effect: the tetration distribution
The Dunning-Kruger effect has been described verbally, plotted empirically, and disputed methodologically. What it has not had is a formal distributional model with interpretable parameters. Iterated exponentiation — tetration — turns out to be exactly the right mathematical tool.
Micah Thornton, MS — Thornton Statistical Consulting
What the DK effect actually says
The original Kruger and Dunning (1999) finding is more precise than the popular summary. Participants who scored in the bottom quartile on tests of logical reasoning, grammar, and humor not only performed poorly — they also estimated their performance to be well above average, near the 60th percentile. Participants who scored in the top quartile estimated their performance to be near the 70th percentile: still above average, but substantially lower than their actual rank. The bottom quartile overestimated. The top quartile underestimated. Both errors are systematic, not random.
Kruger and Dunning's explanation was metacognitive: the same skills required to perform a task are required to evaluate performance on that task. Incompetence creates a double deficit — poor performance and an inability to recognize it. Expertise creates a different distortion — high performers assume that tasks which are easy for them are easy for others, leading them to underestimate their relative position.
The popular cartoon of the effect — Mount Stupid, the valley of despair, the plateau of sustained competence — is a later invention, more motivational psychology than empirical finding. The actual DK result is simpler and, statistically speaking, more interesting: self-assessment is anchored toward the population mean, and the degree of anchoring is not constant across the skill distribution.
For our purposes the mechanistic debate is secondary. Whatever its cause, the empirical pattern is robust: self-assessed competence S and actual competence K are related in a specific non-linear way. Low K is associated with S substantially above K. High K is associated with S somewhat below K. The relationship passes through near-calibration somewhere in the middle of the skill distribution. Any serious distributional model for this phenomenon must reproduce that shape.
Why no one has written this down as a distribution
The DK literature is large but almost entirely non-parametric. Researchers compute quartile means, plot percentile estimates against actual percentile scores, and describe the curves qualitatively. A small number of papers have fit regression models to the self-assessment data, but these are typically simple linear regressions of estimated percentile on actual percentile — which can capture the compression toward the mean but not the non-linearity in the variance structure.
Part of the reason is that the natural model turns out to require an unusual parameterization. The conditional distribution of S given K is not Gaussian, not Beta in the standard parameterization, not anything from the common exponential family with a straightforward link function. The non-linearity in the mean function and in the precision as a function of K are coupled in a way that requires the shape parameters themselves to vary as functions of K — and the natural functional form for that variation involves iterated exponentiation.
Tetration — the operation of exponentiating a number by itself repeatedly — turns out to produce exactly the right curvature. The tetrate x↑↑n (x to the x, to the x, n times) on the unit interval has a minimum interior to (0,1), approaches 1 at both endpoints, and decays toward that minimum at a rate that increases with n. That combination of properties — interior minimum, boundary concentration, tunable curvature — is what the DK calibration curve requires.
Tetration: the necessary background
Tetration is the fourth hyperoperation in the standard sequence: addition, multiplication, exponentiation, tetration. While addition, multiplication, and exponentiation each appear throughout applied statistics, tetration is almost never discussed in statistical contexts. The unfamiliarity is understandable — the operation grows so rapidly for x > 1 that it has few natural uses in quantitative modeling. But on the unit interval (0, 1), tetration behaves quite differently, and that is the domain we care about for skill modeling.
Define the n-th tetrate of x recursively:
On the unit interval the behavior is counterintuitive. For x ∈ (0, 1), x^x = exp(x ln x). Since ln x < 0, this is exp(negative), so x^x < 1 for all x ∈ (0,1). Specifically, the function x ↦ x^x achieves its minimum at x = 1/e ≈ 0.368, where (1/e)^(1/e) = e^(−1/e) ≈ 0.692. At both endpoints: as x → 0⁺, x^x → 1 (since x ln x → 0); at x = 1, x^x = 1. So ²x maps (0, 1] to [e^(−1/e), 1] ≈ [0.692, 1] with an interior minimum.
For the infinite power tower — the limit as n → ∞ — Euler showed in 1783 that the sequence converges for x ∈ [e^(−e), e^(1/e)] ≈ [0.066, 1.444]. The limit is given by the Lambert W function:
The table below shows ⁿx for selected x ∈ (0,1) and n = 1, 2, 3, ∞. Notice that for low x, the values converge quickly — the tower stabilizes after a few iterations. For x near 1, convergence is slower.
| n | x = 0.1 | x = 0.3 | x = 0.5 | x = 0.7 | x = 0.9 |
|---|---|---|---|---|---|
| 1 (identity) | 0.100 | 0.300 | 0.500 | 0.700 | 0.900 |
| 2 (x^x) | 0.794 | 0.697 | 0.707 | 0.782 | 0.910 |
| 3 (x^x^x) | 0.833 | 0.741 | 0.615 | 0.762 | 0.900 |
| ∞ (tower) | 0.883 | 0.524 | 0.641 | 0.766 | 0.909 |
The non-monotone structure at n = 2 — where both x = 0.1 and x = 0.9 have higher tetrate values than x = 0.5 — is precisely the mathematical signature of the DK effect: both extremes of the skill distribution show a different relationship to their self-image than the middle of the distribution does.
The Tetration Distribution: formal definition
We now construct a parametric model for metacognitive calibration. Let K ∈ (0, 1) denote latent actual skill and S ∈ (0, 1) denote self-assessed skill, both on a normalized percentile scale. The model has two parameters: n ≥ 1 (tetration order, the primary shape parameter) and λ > 0 (precision). We add a small regularization constant ε > 0 (typically ε = 10⁻³) to prevent degenerate Beta shape parameters near the boundaries.
Definition (Tetration Calibration Model).
Conditional on K, self-assessed skill S follows a Beta distribution:
The conditional mean is:
The function φₙ(K) has several properties worth establishing. First, it is symmetric around K = 0.5: φₙ(K) + φₙ(1−K) = 1 for all n, K. This means the mean bias is antisymmetric — the average overconfidence at K is equal in magnitude to the average underconfidence at 1−K. Second, for n = 1, the tetrate is the identity: ¹K = K, so φ₁(K) = K/(K + (1−K)) = K. The n = 1 case produces a perfectly calibrated model — self-assessed skill equals actual skill on average, for all K. The DK effect emerges only for n ≥ 2.
For n = 2:
A person at the 10th percentile of actual skill expects to be near the 47th percentile of self-assessment — more than three times their actual rank. A person at the 90th percentile of actual skill expects to be near the 53rd percentile — substantially below their actual rank. Both errors have the same magnitude by the antisymmetry property. This is precisely the Kruger and Dunning (1999) finding, reproduced exactly from the tetration parameterization.
The marginal distribution and population DK
The Tetration Calibration Model specifies S conditional on K. To obtain the marginal distribution of the metacognitive gap G = S − K over a population, we integrate out K. Assume K ~ Beta(α, β) for some population skill distribution (the uniform case K ~ Beta(1,1) is the natural starting point; α > β implies a right-skewed population, meaning most people have above-average skill in this domain).
The marginal mean of G is:
The marginal variance of G is:
There is no closed form for these integrals in general, but they are straightforward to evaluate numerically. The marginal distribution of G, which we call the Metacognitive Tetration Distribution MTD(n, λ), can be sampled directly via:
The shape of MTD(n, λ) changes qualitatively with n. For n = 1, the distribution collapses to a point mass at zero (up to sampling noise from finite λ). For n = 2, MTD is unimodal and symmetric around zero, but with heavier tails than a Gaussian — particularly at the overconfidence end, since the DK bias for low-K individuals is larger in absolute terms than the underconfidence bias for high-K individuals at intermediate precision values. For large n, MTD becomes bimodal: the distribution places substantial mass near G = +0.4 (chronic overconfidence) and G = −0.4 (systematic underconfidence), with a trough near G = 0. A population modeled by large n contains few well-calibrated individuals — almost everyone is either significantly overconfident or significantly underconfident.
The calibration curves: n = 1 through ∞
The calibration curve φₙ(K) = E[S|K] is the central object of the model. Below are values for several tetration orders, showing how the DK distortion strengthens with n.
| K (actual) | n=1 (φ=K) | n=2 | n=3 | n=∞ |
|---|---|---|---|---|
| 0.05 | 0.050 | 0.512 | 0.531 | 0.564 |
| 0.10 | 0.100 | 0.466 | 0.489 | 0.524 |
| 0.20 | 0.200 | 0.409 | 0.428 | 0.453 |
| 0.30 | 0.300 | 0.454 | 0.448 | 0.408 |
| 0.40 | 0.400 | 0.490 | 0.481 | 0.468 |
| 0.50 | 0.500 | 0.500 | 0.500 | 0.500 |
| 0.60 | 0.600 | 0.510 | 0.519 | 0.532 |
| 0.70 | 0.700 | 0.546 | 0.552 | 0.592 |
| 0.80 | 0.800 | 0.591 | 0.572 | 0.547 |
| 0.90 | 0.900 | 0.534 | 0.511 | 0.476 |
| 0.95 | 0.950 | 0.488 | 0.469 | 0.436 |
At K = 0.10 and n = 2, a person at the 10th percentile of actual competence expects to perform at the 46th percentile — a 36-point overestimation. At K = 0.90 and n = 2, a top-decile performer expects to place at the 53rd percentile — a 37-point underestimation. Both distortions grow with n. The asymmetry near the boundaries (K close to 0 or 1) is larger than near the middle, and strengthens as n increases — the distribution becomes more polarized.
Moment structure
For the conditional distribution S | K ~ Beta(a, b) with a = λ · ⁿK + ε and b = λ · ⁿ(1−K) + ε:
| Property | Value |
|---|---|
| Conditional mean E[S|K] | φₙ(K) = ⁿK / (ⁿK + ⁿ(1−K)) |
| Conditional variance Var[S|K] | φₙ(K)(1 − φₙ(K)) / (λ(ⁿK + ⁿ(1−K)) + 1) |
| Marginal mean E[S] | 0.5 (by antisymmetry, for K ~ Uniform(0,1)) |
| Marginal mean of gap E[G] | 0 (antisymmetry, symmetric population) |
| Conditional skewness of S|K | (1 − 2φₙ(K)) / √(φₙ(K)(1−φₙ(K))(λΣ+2)/λΣ) where Σ = ⁿK + ⁿ(1−K) |
| n = 1 limiting case | S | K = K exactly (zero-variance Beta) — perfectly calibrated |
| λ → ∞ limit | S → φₙ(K) deterministically — DK bias without noise |
| λ → 0 limit | S ~ Uniform(0,1) — completely uninformative self-assessment |
The precision parameter λ and the tetration order n have different roles. The n parameter determines the shape of the calibration curve — the systematic bias function φₙ(K). The λ parameter determines how much individual-to-individual variation exists around that bias curve. A high-λ, high-n domain is one where everyone is consistently miscalibrated in the same direction. A low-λ, low-n domain is one where self-assessments are noisy but on average accurate. The empirically relevant case is probably intermediate n (2–4) with moderate-to-large λ, producing systematic bias with meaningful individual variation.
Estimation from data
Given n paired observations (K_i, S_i) — actual competence and self-assessed competence for n individuals — we wish to estimate the parameters (n, λ). Because the conditional likelihood is Beta, and because n need not be an integer in the smooth extension of the model (using continuous tetration via the super-logarithm), we can write the log-likelihood directly.
For integer n, the log-likelihood is:
In practice, integer n from 1 to 5 is sufficient for most applications — values beyond n = 4 describe almost pathological domains of metacognitive distortion that are difficult to distinguish empirically without very large samples and near-perfect measurement of K.
A method-of-moments estimator is simpler and often sufficient. Match the observed covariance Cov(S, K) and E[(S − K)²] to their theoretical values under the model, and solve for (n, λ). The theoretical covariance is:
This integral depends only on n and can be tabulated once. Given an observed Cov(S, K), you solve for n by inversion of the table. Given n, you solve for λ from the observed mean squared gap E[(S − K)²]. This is approximately a two-line computation in R.
R implementation
The following R code implements simulation from the Tetration Calibration Model, estimation via profile likelihood over integer n, and plotting of the calibration curves.
The fit_tcm function recovers the generating n = 2 with high reliability for samples of N ≥ 200, provided that the K values are measured with reasonable precision. The AIC criterion clearly distinguishes n = 2 from n = 1 (no bias) in most simulation replicates.
Connections to other distributions
The Tetration Calibration Model is not a distribution in the usual sense — it is a parametric family for a joint distribution (K, S) where the conditional S|K has a known parametric form. It belongs to a broader class of Beta regression models with structured precision and mean functions. The novelty is entirely in the parameterization: using iterated exponentials to drive both the mean and precision functions simultaneously.
Several limiting and related cases are worth noting.
n = 1 (calibrated Beta regression). The model reduces to S | K ~ Beta(λK, λ(1−K)), which is the standard Beta regression with logit link and identity mean function. This is the null model for DK research: it says people, on average, know where they stand.
Symmetric Beta at fixed K. For any fixed K, the marginal distribution of S is Beta with tetration-derived parameters. The mean is φₙ(K) and the variance decreases in λ. As λ → ∞, S converges in probability to φₙ(K) — a degenerate distribution at the DK bias point.
Generalized power-mean calibration. The function φₙ(K) = ⁿK / (ⁿK + ⁿ(1−K)) is a generalized mean ratio — the ratio of the n-th tetrate of K to the n-th tetrate of K and its complement. For n = 1 this is the identity; for n = 2 it is related to the self-power mean ratio; for n → ∞ it approaches a step function. This connects the model to the literature on power means and mean aggregation functions, though tetration has not previously appeared in that context to the author's knowledge.
The gap distribution and logistic regression. The marginal distribution of G = S − K is zero-mean and symmetric under K ~ Uniform(0, 1). The shape of the gap distribution — from near-Gaussian at n = 1 to bimodal at large n — resembles the family of scaled and shifted logistic distributions in the middle range n ≈ 2–3. This is not coincidental: the logistic function σ(K) = 1/(1+e^(−K)) and the tetration calibration curve φ₂(K) = K^K/(K^K + (1−K)^(1−K)) have similar functional forms, though their derivations are unrelated.
Domain applications and the n parameter in practice
If the model is correct, then different skill domains should yield different estimates of n. Here are plausible qualitative predictions, based on the metacognitive properties of each domain:
| Domain | Predicted n | Rationale |
|---|---|---|
| Chess / competitive gaming | n ≈ 1–2 | Immediate objective feedback through win/loss record; rapid calibration |
| Logical reasoning tests | n ≈ 2–3 | Original DK domain; feedback is rare in daily life |
| Medical diagnosis | n ≈ 2–3 | Limited feedback on diagnostic accuracy; outcome attribution is noisy |
| Statistical analysis | n ≈ 3–4 | Highly technical; novices lack framework to recognize their errors |
| Interpersonal skills / EQ | n ≈ 3–4 | Feedback is ambiguous and socially filtered; no objective scoring |
| Software engineering | n ≈ 2 | Code either works or does not; moderate feedback loop |
| Creative writing / aesthetics | n ≈ 4–5 | No objective ground truth; peer assessment is noisy and rare |
The pattern is interpretable: domains with tight, objective, rapid feedback loops calibrate self-assessment quickly (low n). Domains where feedback is ambiguous, delayed, or socially mediated produce higher n. Crucially, the n parameter is an empirically estimable quantity — you do not need to assign it from a theoretical prior. Given enough (K, S) pairs with reliable K measurement, you fit the model and read off n.
The precision parameter λ interacts with n in an interpretable way. A domain with n = 3 and λ = 5 has severe systematic bias but high individual-to-individual variability: on average, novices are wildly overconfident, but some novices happen to be accurate and some happen to be even more overconfident than the mean curve predicts. A domain with n = 3 and λ = 50 has the same systematic bias but almost no individual variation: everyone at a given K level falls tightly on the DK calibration curve. The latter is a more extreme form of the phenomenon — it implies the metacognitive bias is nearly deterministic at a given skill level.
What this model cannot do
The Tetration Calibration Model is a marginal model for (K, S) averaged over time. It says nothing about the dynamics of calibration — how an individual's self-assessment changes as their actual competence improves. A longitudinal extension would require a model for the trajectory of K(t) and the lag in S(t) tracking K(t). The tetration parameterization generalizes naturally to the dynamic case by making n a decreasing function of experience: n(t) = n₀ · g(t) where g(t) → 1 as experience accumulates. But this extension is speculative and untested.
The model also assumes that K is observable and measured without error. In reality, K is a latent variable estimated from test scores, performance ratings, or expert evaluations — all of which are noisy. Classical measurement error in K will bias estimates of n downward (toward 1), since the apparent DK effect will be partially absorbed into the measurement noise. A structural equation model with latent K and multiple indicators would be necessary for unbiased estimation in the presence of K-measurement error.
Finally, the model is symmetric: it treats overconfidence at low K and underconfidence at high K as exact mirror images. Empirically, the overconfidence at low K is typically larger in absolute magnitude than the underconfidence at high K — the bottom quartile is more wrong about their performance than the top quartile. An asymmetric extension would replace the single n parameter with separate parameters n_low and n_high governing the two halves of the skill distribution. This would increase flexibility at the cost of identifiability.
Further reading
Kruger, J. and Dunning, D. (1999). "Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments." Journal of Personality and Social Psychology, 77(6), 1121–1134. The original paper. Clearer and more careful than the popular summary.
Krueger, J. and Mueller, R. A. (2002). "Unskilled, unaware, or both? The better-than-average effect and statistical regression predict errors in estimates of own performance." Journal of Personality and Social Psychology, 82(2), 180–188. The regression-to-the-mean critique. Worth reading alongside the original.
Nuhfer, E., Fleisher, S., Goodman, A., Delcore, H., and Wheeler, G. (2016). "How random noise and a graphical convention subverted behavioral scientists' explanations of self-assessment data: Numeracy underlies better alternatives." Numeracy, 10(1), 4. A careful empirical treatment using randomized controls to isolate the genuine DK effect from measurement artifact.
Gignac, G. E. and Zajenkowski, M. (2020). "The Dunning-Kruger effect is (mostly) a statistical artefact: Valid approaches to testing the hypothesis with individual differences data." Intelligence, 80, 101449. The most rigorous recent methodological treatment. Makes the case for within-person analysis over quartile-plot visualization.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. Chapter 6.3 on activation functions includes a brief discussion of power-tower functions in the context of vanishing gradients — the mathematical origin of the super-exponential growth that motivates our construction.
Tetration and the Lambert W function: Corless, R. M., Gonnet, G. H., Hare, D. E. G., Jeffrey, D. J., and Knuth, D. E. (1996). "On the Lambert W Function." Advances in Computational Mathematics, 5, 329–359. The essential reference for the infinite power tower convergence result.