Tags

Oh whoops, we forget to actually run the numbers for the hypothesis we developed last time, H3. To do that in our little helper, swap out:

`prob := random.Float64()                        // H1: not random`

for:

```prob := (random.NormFloat64()*random.Float64()*0.05) + random.Float64()*0.1 + 0.45
if prob < 0.45 { prob = 1 - prob }              // H3: plausible precog, but sloppy sampling```

And tack the results up next to our prior runs.

 Success/Trials H1/H0 H2/H0 H3/H0 828/1560 0.588416 1.211814 4.775636 238/ 480 0.057852 0.049696 0.485005 246/ 480 0.066529 0.092845 0.552351 536/1080 0.039115 0.032298 0.345376 2790/5400 0.340399 0.678042 2.97905 1274/2400 2.448912 4.912015 19.251378 1186/2400 0.03033 0.017051 0.268514 1251/2496 0.025082 0.02749 0.226681 1105/2304 0.178838 0.008695 1.499272 overall Bayes Factor 0 0 2.312523 1/BF 99536693.052594 407936437.373478 0.432428

Wow, that’s quite an improvement! Jettisoning those nonsense values really cranked the odds, if by “cranked” we mean “reached the ‘barely noteworthy’ level.” Even then, the results are tenuous; remove the best study, and the Bayes Factor swings against precognition.

But the improvement comes with a cost. Here’s how H3 claims precognitive ability is spread across all humans. The vertical axis is the raw total, and is in logarithmic scale, with the horizontal is the success rate. Black areas represent certain precognitive ability, white areas indicate no precognition, and gray areas indicate plausible levels of precognition under the model.

In real terms, some 490,000 Canadians can see what’s coming with greater than 66% accuracy at best, but the expected number of the model is around 9,000. Look at a success rate over 75%, and the numbers decline to 1,120 and 6, respectively. We’ve cranked the odds of precognition, but only by watering down what we mean by “precognition.” Those spectacular extremes don’t exist in this model.

It doesn’t help that H3 does a poor job of representing its assumptions.

H3 samples every point within the bounds of this graph, yet only points on that fuzzy boomerang should count towards precognitive ability. With a little math and curve-fitting, we can craft an H4 that does a much better job of grasping the boomerang; on this chart, it only samples the area between those two red lines. See if you can spot the difference between H3‘s predictions for the population, and H4‘s:

We should also stop equivocating between the variation in the general population and the variation between studies; while the two are related, they’re not the same. Say we randomly pluck two hundred people from the general public and calculate the average precognitive ability of the group. While this number is a decent estimate of the population average, the random variance within the population means it’s almost certain the sample’s average will be slightly better or worse than the population average. As you increase the sample size, this difference will gradually fade away.

But if you do multiple small studies, instead of one bigger one, you’ll find their averages dance around the population average, and the distribution of the dance will be strongly correlated to the distribution in the population. Since these averages are, well, averages, the variance between them will be muted down relative to the population. In the case of multiple two hundred sample studies, for instance, the standard deviation between their means will be about 7% of the population standard deviation. Compensating for this effect will toss out even more garbage samples, and further improve the numbers for precognition.

Tightening up the sampling, however, means that H4 requires substantially more code than H3.

```mean := random.Float64()*.1 + .45               // H4: plausible precog, great sampling
sm := mean - 0.5
stdev := sm*sm*(-11.3743730736561 + sm*sm*(-6207.21103956717 + sm*sm*1409626.43860565)) + .045
variation := .38*(.05 - math.Abs(sm))
if variation > .0065 { variation = .0065 }
prob := (random.NormFloat64()*(stdev + 2*variation*random.Float64() - variation))/math.Sqrt(trials[slot]) + mean
if prob < 0.45 { prob = 1 - prob }```

Whew! Figuring all that out was a pain, but it has a good effect on the numbers.

 Success/Trials H1/H0 H2/H0 H3/H0 H4/H0 828/1560 0.588416 1.211814 4.775636 5.68318 238/ 480 0.057852 0.049696 0.485005 0.561392 246/ 480 0.066529 0.092845 0.552351 0.627585 536/1080 0.039115 0.032298 0.345376 0.391712 2790/5400 0.340399 0.678042 2.97905 3.431167 1274/2400 2.448912 4.912015 19.251378 23.724679 1186/2400 0.03033 0.017051 0.268514 0.301883 1251/2496 0.025082 0.02749 0.226681 0.25188 1105/2304 0.178838 0.008695 1.499272 1.768971 overall Bayes Factor 0 0 2.312523 8.588008 1/BF 99536693.052594 407936437.373478 0.432428 0.116441

We’ve finally entered the realm of “positive” results, as per Kass and Rafferty, and our “hero” study from last time is less important (though still a deal-breaker if removed). But can we get numbers even more favorable to precognition?

We could always cheat and shape our hypothesis to fit the data, then marvel when the two are a near-perfect match. Feeding these nine experiments into a spreadsheet, we find they result in a weighted average success rate of 50.8%, with a weighted standard deviation of 1.08 percentage points. This is a twisted result;  assuming the population distribution is Gaussian and reflects the deviation of the study means, then some numeric simulations suggest the population’s standard deviation is 65.3 percentage points. This would mean about nineteen in twenty people would have a noticeable level of precognition, and at the extremes 1.95 million Canadians would have a success rate greater than 95%. The priors on this H5 are subterranean.

Still, it makes a handy benchmark; if this doesn’t show a stronger signal in favor of precognition, something’s gone horribly wrong. So let’s swap in this…

```// H5: cheating, drawn from experiments with certain values
prob := (random.NormFloat64()*0.0176074877) + 0.5082795699```

… and see what we get.

 Success/Trials H1/H0 H2/H0 H3/H0 H4/H0 H5/H0 828/1560 0.588416 1.211814 4.775636 5.68318 6.554058 238/ 480 0.057852 0.049696 0.485005 0.561392 0.733906 246/ 480 0.066529 0.092845 0.552351 0.627585 0.909321 536/1080 0.039115 0.032298 0.345376 0.391712 0.589348 2790/5400 0.340399 0.678042 2.97905 3.431167 6.562676 1274/2400 2.448912 4.912015 19.251378 23.724679 26.016863 1186/2400 0.03033 0.017051 0.268514 0.301883 0.464107 1251/2496 0.025082 0.02749 0.226681 0.25188 0.467214 1105/2304 0.178838 0.008695 1.499272 1.768971 1.295836 overall Bayes Factor 0 0 2.312523 8.588008 123.66863 1/BF 99536693.052594 407936437.373478 0.432428 0.116441 0.008086

Now we’re up to “strongly in favor,” but it only takes two datasets to flip the Bayes Factor. Still not impressive.

Maybe we need more data. There are eleven more studies and datasets mentioned in Bem’s paper, three of which are by other authors. Some of them are too poorly described to be sure of the numbers; others are better described by ANOVA or other statistical methods. However, all of them have a success rate and p-value attached, which can be used to reverse-engineer the total sample size for a binomial test. So we can throw them on the pile, too.

 Success Trials Experiment 59 97 Experiment 3: retro prime, 0.25s < x < 1.5s 58 99 Experiment 4: retro prime, 0.25s < x < 1.5s 1242 2400 Experiment 6: negative 1153 2400 Experiment 6: erotic 109 174 Experiment 8: word recall, stimulus seeking 30 66 Experiment 8: word recall, not s.s. 32 53 Experiment 9: word recall, stimulus seeking 65 108 Experiment 9: word recall, not s.s. 257 479 Savva 2004: spiders 300 557 Parker 2010: habituation 79 176 Parker 2010: non-habituation

And thanks to the magic of computers, updating all our Bayes Factors is quite painless.

 Success/Trials H1/H0 H2/H0 H3/H0 H4/H0 H5/H0 828/1560 0.606666 1.198943 4.784953 5.705459 6.572219 238/480 0.057603 0.04947 0.484612 0.562555 0.733823 246/480 0.065462 0.094775 0.551628 0.629002 0.908678 536/1080 0.039285 0.031954 0.345834 0.393145 0.58999 2790/5400 0.333243 0.680725 3.003229 3.448837 6.588456 1274/2400 2.434773 4.900013 19.316269 23.771786 26.150746 1186/2400 0.030896 0.016999 0.269643 0.300326 0.464338 1251/2496 0.025171 0.027394 0.225731 0.253144 0.465799 1105/2304 0.174386 0.008942 1.500294 1.766644 1.297245 59/97 1.212351 2.408451 2.59965 1.548541 1.631562 58/99 0.532767 1.018189 1.762424 1.269398 1.407962 1242/2400 0.111135 0.216742 0.955962 1.111527 1.959573 1153/2400 0.162183 0.009154 1.372023 1.617248 1.232864 109/174 25.427308 50.621886 23.055238 5.487296 4.325966 30/66 0.20185 0.09183 0.816047 0.948961 0.887938 32/53 0.522674 0.972936 1.409404 1.101852 1.226789 65/108 1.110218 2.211848 2.657623 1.598246 1.662689 257/479 0.203611 0.392438 1.480858 1.488709 1.762783 300/557 0.275613 0.542966 1.987574 1.970612 2.206632 79/176 0.239404 0.041192 0.917414 1.149373 0.82113 Overall BF 0 0 2679.759382 954.921232 17359.565519 1/Overall 217082215663.106 938221934045.138 0.000373 0.001047 0.000058

Interesting, the additional data didn’t benefit H4 as much as it benefited H3, and even H5 can be inferior to other hypotheses in some circumstances. How does that work? Let’s take a look at Experiment 9, non-stimulus seeking, which had an estimated 65 successes in 108 trials. Some poking on the pocket calculator reveals that’s a success ratio of 60%. Scroll back up to the graphs of H3 and H4; notice how H3 has a bit more height at the 60% mark than H4? Thanks to sloppy sampling, it puts more weight on values in that range than its sharper cousin, so it does better in studies with that success rate. H5 places a much greater emphasis on lower success ratios, so it too doesn’t fare as well as H3.

If data comes up that doesn’t square well with a hypothesis, its certainty takes a hit. But if we’re comparing it to another hypothesis that also doesn’t predict the data, the Bayes Factor will remain close to 1 and our certainties won’t shift much at all. Likewise, if both hypotheses strongly predict the data, the Factor again stays close to 1. If we’re looking to really shift our certainty around, we need a big Bayes Factor, which means we need to find scenarios where one hypothesis strongly predicts the data while the other strongly predicts this data shouldn’t happen.

Or, in other words, we should look for situations where one theory is… false. That sounds an awful lot like falsification! Wow, is there anything Bayes’ Theorem can’t do?

Surprisingly, it also points out why being proven wrong is so valuable in a theory. Not seeing it? Let’s pit Aristotelian gravity (things seek out their natural place) against Newtonian gravity (objects attract one another proportionate to their mass and inversely proportionate to the square of their distance). Both predict that if I drop an object, it’ll hit Earth and come to rest. I drop an object. It hits Earth and comes to rest. The Bayes Factor between both hypotheses sits at 1, meaning I have equal certainty about both. I repeat the experiment a lot, and the Bayes Factor remains stuck at 1. Both theories describe this situation equally well, and I’m fully justified in using either.

I need a tie breaker, some situation where one theory falls flat while the other remains strong. The obvious one is to ask a different question: how does that object fall? There are an infinite number of possibilities to choose from; it might move with any number of constant speeds, it may accelerate or do all sorts of weird tricks. Newtonian gravity gives me a very precise answer, it’d accelerate at about 9.81 metres per square second until it strikes the Earth. There are substantially more possibilities than just this, so it’s highly unlikely and very easy to prove wrong. This makes it similar to H0, which is also just one possibility out of an infinite number, but one that also carries a substantial prior probability.

In contrast, Aristotelian gravity doesn’t say how the object falls at all. Every possibility remains, so like H1 we have to integrate over all of them. And just like the battle between H0 and H1, this lack of specificity kills its relative certainty when we run the experiment again. The Bayes Factor comes out decisively in favor of Newtonian gravity, and so our certainty shifts away from Aristotelian and towards Newtonian.

The moral of the story? Theories that make specific predictions are easy to prove wrong, but they’ll always trump ones that make vague or no predictions if the evidence happens to support the more brittle theory.

But by now I’m sure you’re asking: what does all this mean?