Statistical Significance Is Not What You Think It Means

Jan 25, 2026

There’s a moment in my first year of medical school where I stopped taking notes mid-lecture.

My biostatistics professor had just said something. Not dramatically—almost in passing, the way you mention rain in the forecast. He moved to the next slide. My classmates kept writing.

And I just sat there, hand frozen, because I’d realized something that made me quietly furious.

I had paid over $200,000 for an undergraduate education at a good school. Took statistics. Got solid grades. Learned p-values, confidence intervals, hypothesis testing. The whole apparatus.

And somehow, nobody had ever told me the most important thing about any of it.

The Gap You Don’t Notice Until You Do

Here’s what every college stats class teaches you: how to determine if a result is “statistically significant.” You learn the threshold (p < 0.05), you learn the tests, you learn to calculate whether something is probably real versus probably just chance.

Here’s what they don’t teach you: that “real” and “meaningful” are completely different questions.

My professor said it plainly: “Statistical significance tells you if an effect exists. Clinical significance tells you if it matters.”

Then he moved on. Like this wasn’t the entire point.

I raised my hand. “Can you go back to that slide?”

After class, I approached him. “That distinction—that’s the most important thing you said today, isn’t it?”

He smiled, a little wearily. “I hope so. I try to emphasize it.”

“I don’t think it landed. You said it the same way you said everything else.”

“You’re probably right.”

When Truth Becomes Meaningless

Consider what this actually means.

You can prove something works—mathematically, rigorously, published in peer-reviewed journals—and it can still be essentially useless.

A pharmaceutical company studies a new weight loss supplement. They recruit 10,000 people, follow them for six months. Results: the supplement group loses 0.8 pounds more than placebo.

With 10,000 participants, that’s statistically significant. It’s real. Not luck, not noise, not measurement error. The supplement genuinely causes slightly more weight loss.

The company’s marketing: “Clinically proven! Statistically significant results!”

Both statements are true.

Also true: 0.8 pounds over six months is nothing. You could lose that by skipping dessert twice.

Statistical significance: Yes, the effect is real. Clinical significance: No, the effect doesn’t matter.

My $200,000 education taught me how to detect the first. It never taught me to ask about the second.

The Math Makes It Worse

Here’s the mechanism that breaks people’s brains once they see it:

With enough participants, you can prove anything has an effect, no matter how small.

Sample size and detectable effect size are inversely related. Study 100 people, you can only detect large effects. Study 10,000 people, you can detect tiny effects. Study 100,000 people, you can prove that almost anything does something.

The math doesn’t care if that something matters.

This means every headline screaming “Study Proves X!” might be technically accurate while being practically meaningless. Not fraud. Not bad science. Just the natural consequence of how statistical testing works.

What This Looks Like in Practice

Your doctor tells you to start taking a statin for cholesterol. “Studies show it reduces heart attack risk.”

The studies are real. The effect is real. Statistically significant, replicated, solid evidence.

What the studies show: for people with no prior heart disease, you need to treat 100-300 people with statins for five years to prevent one heart attack.

Not one death. One heart attack.

So 100-300 people take daily medication for five years. They deal with potential side effects—muscle pain, fatigue, whatever else. They spend money on pills and doctor visits. They turn themselves into patients.

And one person benefits. The other 99-299 were going to be fine anyway.

Is this good medicine?

I genuinely don’t know. It depends entirely on how you weigh the tradeoffs. If you’re terrified of heart attacks and don’t mind pills, maybe it’s worth it. If you’d rather not medicalize your life for a 1-in-200 shot, maybe not.

Both positions are defensible. The statistics don’t resolve this—they just quantify the situation so you can disagree more precisely.

But here’s what makes me want to flip tables: your education taught you to trust “statistically significant” as a quality seal without teaching you that significance says nothing about magnitude.

The Question That Changes Everything

In medicine, we have a metric called Number Needed to Treat (NNT). It asks: how many people do you need to treat to help one person?

Some calibration:

Antibiotics for strep throat: NNT ≈ 4
Blood pressure meds after a stroke: NNT ≈ 11
Aspirin during a heart attack: NNT ≈ 40
Intensive BP control for healthy people: NNT ≈ 61
Statins for primary prevention: NNT ≈ 100-300

All of these interventions are “statistically significant.” They all work, in the technical sense.

But NNT of 4 versus NNT of 300 represents wildly different clinical realities.

I don’t know if other fields have an equivalent metric. They should. Because this is the only question that matters: How much do you have to do to get one unit of the thing you actually care about?

Why Nobody Teaches This First

I’ve thought a lot about why this gap exists.

Statistics courses teach you the machinery: how to run tests, calculate p-values, interpret confidence intervals. The mechanics are complex enough that they fill a semester. By the time you’ve learned how to determine if something is significant, there’s no time left to ask what significance actually means.

Or maybe it’s this: teaching mechanical procedures is easier than teaching judgment. You can test whether students can calculate a p-value. You can’t easily test whether they can weigh the clinical or practical importance of a finding.

But the result is that people leave college knowing how to detect effects without knowing how to evaluate whether those effects matter.

That’s not a gap. That’s a chasm.

The Broader Pattern

This isn’t unique to statistics.

We teach people to write code without teaching them to think about whether the code should exist.

We teach financial modeling without teaching when models mislead more than they illuminate.

We teach argumentation without teaching when you should change your mind.

The technical skills are easier to package, test, and grade. The judgment is harder to systematize.

But the judgment is the entire point.

What I’m Left With

I’m now $600,000 deep into my education—undergrad plus medical school. And the single most valuable thing I’ve learned might be this distinction between statistical and clinical significance.

Not anatomy. Not biochemistry. Not diagnostic algorithms. The idea that numbers can be true without being meaningful.

That should’ve been week one of intro statistics.

Instead, it was an aside in a medical school lecture that most people missed because it was delivered in the same tone as everything else.

The professor knew it mattered. He tried to emphasize it. But knowing something is important and making people feel its importance are different skills.

I caught it because I happened to be paying attention at the right moment. Most of my classmates didn’t. Not because they’re not smart—because nothing in the delivery signaled that this was the point.

What You Can Do With This

You probably don’t care about medical studies specifically. But you do encounter claims: supplements that “work,” diets that are “proven,” products that are “scientifically validated.”

Your new filter:

What’s the actual effect size? Not whether it’s significant, but how big it is. Going from 4% risk to 3% is different from 40% to 30%, even if both are “25% reductions.”

How many people were studied? Bigger studies detect smaller effects. A massive study proving a tiny effect is not the same as a small study proving a big effect.

What’s being compared? New thing versus placebo? Versus doing nothing? Versus best available alternative? Each comparison tells you something different.

Would this matter to me? A statistically significant improvement in something I don’t care about is worthless.

You don’t need to understand the formulas. You just need to know these questions exist.

The Expensive Lesson

I paid a lot of money to learn that truth and importance are separate dimensions.

You can have things that are true but unimportant. You can have things that are important but uncertain. The overlap is smaller than you’d think.

Education taught me to find truth. It didn’t teach me to evaluate importance.

That’s the gap.

And closing that gap—learning to ask not just “is this real?” but “does this matter?”—might be worth more than everything else combined.

It shouldn’t cost $300,000 to figure that out.

But apparently, it did.

Jan 25Edited

Here are some solid sources if you want to dig deeper into this topic:

On statin NNT specifically:

1. TheNNT.com - "Statins for Heart Disease Prevention (Without Prior Heart Disease)"

- This is probably the most accessible resource. They break down benefits and harms in plain language with actual numbers. Shows NNT for mortality, heart attacks, etc.

Website: thennt.com

2. Circulation: Cardiovascular Quality and Outcomes - "Number Needed to Treat With Rosuvastatin to Prevent First Cardiovascular Events"

This is from the JUPITER trial, one of the major statin studies. Shows 5-year NNT ranging from about 20-50 depending on the subgroup and outcome measured.

It's where a lot of the "statins work!" headlines came from, but the NNT tells a more nuanced story.

3. Family Practice (Oxford Academic) - "Number of patients needed to prescribe statins in primary cardiovascular prevention: mirage and reality"

Great title. Shows how NNT changes dramatically based on risk level, and how persistence/adherence makes the real-world numbers even higher.

Published 2018, so relatively recent.

4. Clinical Pharmacology & Therapeutics - "Effectiveness of Statins as Primary Prevention in People With Different Cardiovascular Risk"

Shows NNT of 470 for very low risk patients versus 62 for higher risk patients. Same drug, wildly different clinical significance.

On the broader statistical significance vs clinical significance issue:

1. JAMA Cardiology - "Statin Use in Primary Prevention of Atherosclerotic Cardiovascular Disease According to 5 Major Guidelines"

Compares how different guidelines recommend statins and what the NNTs are for each guideline's threshold.

2. British Journal of General Practice - "Statins for primary prevention of cardiovascular disease: modelling guidelines and patient preferences"

This one's fascinating because it looks at what NNT patients themselves say they'd find acceptable. Spoiler: most people want much lower NNTs than what guidelines often recommend.

For the general concept:

The best accessible resource is honestly just searching "NNT" + any treatment you're curious about. TheNNT.com has reviews of lots of common interventions beyond just statins.

For the statistical vs clinical significance concept more broadly, there isn't one great public-facing resource I've found—which is kind of the problem. It's taught in medical school and grad-level stats, but not really explained well for general audiences. That gap is part of why I wanted to write this essay.

Inimitably Me

Jan 26

🙏🏼🙏🏼🙏🏼🙏🏼 Training to be the most pragmatic doctor in the world! Questioning every status quo and rewriting them!

Side Effects | Ali Mirza

Discussion about this post

Ready for more?