, ,

Last installment, we found that Bayesian hypothesis testing freed us from pitting the null hypothesis against its the mirror image. That’s incredibly important, because most of the time we don’t want to put two perfectly-opposed hypotheses in opposition.

The “non-random” hypothesis of previous installments, which I’ll dub H1 from now on, includes the possibility of someone’s precognition being consistently wrong. In real life, that would never happen: said person would quickly learn to do the opposite of whatever their instinct demanded. So let’s propose H2: precognition exists, but in practice only does better than chance.

This change breaks some assumptions. Bernstein was using a frequentist model, which neatly carved up the probability space, so the odds of precognition existing plus the odds of it being bupkis neatly summed to 1. With H2, though, we’ve created a gap in that space and can’t get away with just a subtract.

Dividing up the One approach is to retreat to frequentism, and fill in the gap with another hypothesis (which I’ll dub H2′). Bayes’ Theorem can easily handle an arbitrary number of hypotheses, after all…

Bayes' Theorem, when applied to multiple hypotheses… but now we’ve introduced a nuisance hypothesis. We don’t really care about H2‘, because we suspect its probability is vanishingly small, yet we have to drag it along anyway just to answer the question.

The other approach is to abandon frequentism completely. The Bayesian interpretation is that we’re looking at degrees of belief, not partitions. We Bayesians don’t need to colour within the lines: the certainty of two, three, or even all of our hypotheses don’t have to add up to one. Theoretically they don’t even need to stay bound between 0 and 1, because all we really care about is the proportion of relative belief among competing hypotheses. This approach offers a lot of flexibility, for good or ill.

Naturally, I’ll opt for the second approach.

While we’re at it, let’s incorporate some of Bem’s other data in that paper. Being proper statisticians, after all, we should always look at the sum total of all relevant evidence.

Success Trials Experiment
828 1560 Experiment 1: erotic images
238 480 Experiment 1: neutral images
246 480 Experiment 1: negative images
536 1080 Experiment 1: positive images
2790 5400 Experiment 2: negative images
1274 2400 Experiment 5: Negative images
1186 2400 Experiment 5: Neutral Images
1251 2496 Experiment 7: Boredom, non-stimulus seekers
1105 2304 Experiment 7: Boredom, stimulus seekers

I'm aww in ur tables, tabbylatin' thinsThis isn’t every experiment or case; I chose this subset because each could be precisely described by the binomial distribution. We’ll need to upgrade our program to handle this extra data. To switch from H1 to H2, change this line…

prob := random.Float64()    // H1: not random

… to this …

prob := .5 + random.Float64()*.5    // H2: not random, but always better than chance

… and punch “Run.” Here’s what I get, for both H1 and H2:

Success/Trials H1/H0 H2/H0
828/1560 0.588416 1.211814
238/ 480 0.057852 0.049696
246/ 480 0.066529 0.092845
536/1080 0.039115 0.032298
2790/5400 0.340399 0.678042
1274/2400 2.448912 4.912015
1186/2400 0.03033 0.017051
1251/2496 0.025082 0.02749
1105/2304 0.178838 0.008695
overall Bayes Factor
0 0
1/overall 99536693.052594 407936437.373478

BWAHAHAHAHAHA! Defeat, from the jaws of VICTORY!!That wasn’t expected! Let’s do some quick math so we can compare H1 and H2 directly:

Comparing different Bayes Factors, in this case H1 and H2So H1 is the better hypothesis, with about four times the certainty of H2. Looking over the individual factors, H2 did better as the success rate climbed above chance, but lost ground when the success rate hovered around neutrality and took a major blow when faced with worse-than-chance results. H1‘s inclusion of mildly-worse cases was actually a boon, but nowhere near enough to stand up to the random chance hypothesis (henceforth H0).

We need to do better. So let’s narrow down our hypothesis by adding some more premises.

If a change is too small, for instance, human beings won’t notice it. Irving Good put the minimal odds ratio we can typically detect at 5:4, or about 55%.[1] Some people claim to have seen or experienced precognition, so if it genuinely exists some fraction of the public must have a success rate greater than 55%. By the same token, since some people claim they haven’t then some fraction must succeed less than 55%. Therefore, precognitive abilities must vary across people. We have no idea how this ability would vary, but if it’s controlled by a small number of factors, like most other things, it probably follows a Gaussian distribution.

I couldn’t find any polls asking how many people thought they were precognitive, but I did find a few asking if precognition existed; about 26% thought it did, within a margin of 3% (which I’ll round up to 5%, to be generous). If we equivocate between the two, then we’ve got a constraint on those Gaussian distributions: the population with precognitive strength greater than 55% or less than 45% is between 21 and 31% of the total.

Figuring out exactly which distributions meet those requirements is a pain, but with a little thought we can rough out some bounds. The average value of a Gaussian distribution sits at the halfway point, so if the average success ratio is greater than 55% then over half of all people will have a success rate greater than that. That violates the constraint, so the average of the distribution must be less than 55%. Similar arguments set a floor of 45%, giving us a range for the mean.

About 66% of a Gaussian distribution’s values are within a standard deviation of the average. But this means 34% are not, and that’s pretty close to 26+-5%. So a Gaussian in the middle of the 45-55 range can’t have a standard deviation greater than 5 percentage points or so (that’s half of 55 minus 45), otherwise we’d again violate the constraint. Others in that range must have deviations lower than that, as more of their values would leak over the line, and no Gaussian can have a standard deviation below 0.

Is simple logicsTah-da, we have our distribution: precognition in the general public follows a Gaussian distribution with an average success rate between 45 and 55%, and a standard deviation between 0 and 5 points. As with H1, we’ll argue worse-than-chance odds don’t make sense, but this time we’ll handle things by mirroring rates below 45% above the 50/50 line, as those would be noticed, and conveniently leave those mildly-worse possibilities intact.

If we assume test results exactly mirror the population, then we have a new hypothesis too! But notice the progression here:

H1: precognition exists.
H2: precognition exists AND it’s never worse than chance.
H3: precognition exists AND if noticeable it’s never worse than chance AND across the population it has a Gaussian distribution AND the average value of said distribution is too small to be noticed AND the standard deviation is small enough that only a certain percentage of people notice.

These theories are steadily getting more complicated. Each component of a theory carries a certainty value, established from our background knowledge, and like any chain of probabilities they’d multiply together. Since each certainty is at most 100%, the net result is that as we add assumptions to a theory we lower its… background certainty…

Hey, that’s Ockham’s Razor, isn’t it? It’s amazing what a little statistics can get you. I can’t help but think I’ve forgotten to do something, though…

[HJH 2015-05-28: Updated a link, minor edits for clarity.]

[1] Good, Irving J. “Studies in the History of Probability and Statistics. XXXVII AM Turing’s Statistical Work in World War II.” Biometrika, 1979, 393–96.