, ,

[testing the waters]

There was that bit about cheap computation, after all. And the fact that some, maybe even most of you, will disagree with my numbers. That’s easily solved: the spreadsheet I linked to earlier not only allows you to double-check my work from last instalment, it allows you to tweak the numbers. Download a copy for yourself, plug different values in, cross-check the formulas or even change them if necessary.

I’ll give you a hand with your exploration, too. I whipped up a quick computer program that figures out the sensitivity of each variable in the analysis relative to the numbers I picked. Higher numbers have a greater impact, proportional to the rest; negative numbers mean bumping up a variable increases the certainty, instead of decreasing it.

A table that charts how sensitive the final result is to each value.Notice that the variables that have the greatest tug on the final value all stem from numbers pulled from scientific studies or government-run surveys. This is a good sign, as it suggests that none of the numbers related to human motivation are critical to my analysis. To get specific, whether the odds of Myerson inventing a claim are 0.1%, 1%, or 10% doesn’t have much effect on the final number, so I don’t need to be that exact when assigning it a value. I can be off by an order of magnitude or more, and still get a robust answer.

Still don’t believe me? Fine, you do have reason to: all the above sensitivity values are relative to the values I put in place. Maybe by tweaking a few variables, those sensitivities will change and the human motivation ones will turn out to be crucial?

I’ve got that covered, too. The aforementioned computer program also performs a greedy search; it tweaks each variable, and notes how much impact it had on the final figure. The change with greatest impact becomes permanent, and the process repeats. That way the ever-shifting sensitivities are incrementally compensated for. Punch run on that program, however,  and you’ll find that only two variables ever change.

The results of doing a greedy search of variables; only the false reporting rate and prevalence of nesting/attempts have an impact.[HJH 2015-07-11] One big problem with the single-variable approach, not surprisingly, is that it only tweaks a single variable; while other variables may have less influence, it may not be that much less. A greedy one-at-a-time approach might overdrive a variable and miss out on more balanced changes, ones that keep the variables within reason.

That’s easily fixed. This time the odds of nesting take the greatest hit, but otherwise there are no surprises; again we find the human motivation variables change little, and the ones with the best evidence are pushed outside of reason.

The top five variable changes when doing multi-varate optimization. Nothing surprising, overall.What are we to do with this information, though?

The same thing we always do, every moment of our lives. We constantly plug certainties into risk/benefit calculations, and use those to decide which action to take. If I’m certain I have milk in the fridge, I don’t buy it when I’m at the grocery store. If it later turns out I didn’t have any, that isn’t a big deal. Well, usually it isn’t; if I need milk to cook an important dinner, I might buy some even if I’m pretty certain I already have it. I don’t know what actions you’re considering in light of this analysis, nor what level of risk you’re willing to live with, so I have to cover as many bases as possible and focus more on handing you a methodology instead of a number.

I can further help you come to a decision by sharing certainty values other people have marked as “convincing” for some action. In the scientific realm, for instance, you’re justified in saying a hypothesis is refuted if you cross a threshold. The exact threshold varies depending on the branch of science, but is usually in proportion to the number of independent observations you can gather. In physics, where trillions of observations can happen, a p-value of 99.99997% is sometimes necessary; in the social sciences or medicine, where finding a hundred cases may be impossible, even a p-value of 90% is difficult to reach. The typical threshold for social science is 95%. While p-values are not certainties, we tend to mistakenly think they are. The equivocation is relevant, even if it isn’t correct.

Things are even more murkier in the legal realm.

7.15 Jurors were typically told on a number of occasions throughout the trial that the burden of proof was on the prosecution to prove all the ingredients of the offence “beyond reasonable doubt”. The judge invariably included this in the summing-up. However, in conformity with appellate court direction, judges did little to elaborate on this or explain what it meant, assuming that “beyond reasonable doubt” was a well understood term which juries would apply in a common sense fashion.

7.16 However, many jurors said that they, and the jury as a whole, were uncertain what “beyond reasonable doubt” meant. They generally thought in terms of percentages, and debated and disagreed with each other about the percentage certainty required for “beyond reasonable doubt”, variously interpreting it as 100 per cent, 95 per cent, 75 per cent, and even 50 per cent. Occasionally this produced profound misunderstandings about the standard of proof.

7.17 It cannot be said, however, that this problem produced questionable or perverse verdicts. Individual jurors sometimes agreed to a verdict on the basis of a distorted perception of the standard which they were to apply (and thus made their decision on an incorrect basis), but there is little evidence to suggest that any perverse or questionable verdicts resulted from the application of an incorrect standard by the jury collectively.

This is common in the English-speaking world; in the United States, for instance, there is no standard description for what “reasonable doubt” means (with Justice O’Connor claiming “it defies easy explication”), and in Britain one judge was admonished by another because they merely mentioned the term to the jury (the judgement itself stating a question on certainty levels ”is one that most judges dread.”). Canada’s courts seem the best here, defining “beyond a reasonable doubt” as being “much closer to absolute certainty than to proof on a balance of probabilities”, but at the same time saying it’s nothing like a jurist’s conventional day-to-day judgements. So how does a jurist know what “much closer” means, if they have no frame of reference? Answering this is critical, as confusion over “reasonable doubt” has led to mistrial.

We want numbers here, not vague statements like “much closer.” One study trying to quantize “reasonable doubt” split 172 university students into two groups; one that was simply told to render a guilty verdict if they were confident beyond a reasonable doubt, the other was told that plus a reassurance that absolute certainty wasn’t necessary. Half of the first group pronounced a guilty verdict at 77% certainty, or less; half of the second, at 63% certainty or less. When you look at judges and legal scholars instead, they tend to endorse values above 80 or 90%. As Justice Weinstein put it,

We personally favour burden of proof in the realm of 95+% probability of guilt. Yet, if all our jurors had been given this quantitative definition of the standard, we doubt that the result would have changed in more than in the order of 1% of our jury-tried cases—and the increased acquittals would not necessarily be in cases where the defendant was innocent.

But a 92-98% false report rate implies that, absent any further evidence, we can be 92-98% confident that a claim is genuine. So if we’re willing to temporarily strip someone of their rights with that level of certainty, shouldn’t we be willing to merely believe someone who claims to have been assaulted? This vindicates Caitlin Carmody‘s stance:

One of the bright, glaring, non-negotiable truths I have learned, though, is to believe survivors. Believe them, even if they don’t remember everything. Believe them, even if they remember almost nothing. Believe them, even if the person they say raped them seems like the nicest person in the world to you. Believe them, even if it shatters your whole world to do so. Believe them, even if they don’t want to share details, or press charges, or ever talk about it again. Believe them, even if their story sounds implausible to you. Believe them, even if you don’t want to, even if it breaks your heart.

Every skeptic should act according to their certainty, developed from a thorough analysis of the evidence, even when the conclusion seems counter-intuitive. If we disagree over the certainty or evidence, we debate and discuss the details until we agree.

What we don’t do is refuse to debate without a damn good reason. If it makes sense to say “there’s a 1-2% chance you have red hair” after we estimate 1-2% of the world has red hair, how can we not make a similar claim about sexual assault? If I can do a thorough analysis of whether or not a certain bird species occupies a nest, why can’t I do the same for a sexual assault case?

I can, and I just did. Arguing that’s impossible means declaring that a subset of human behaviour is beyond the scientific method, and that’s just religion without the funny hats.

Having said all that, I’ve been a bit unfair towards 84744 M.S. ….

[HJH 2015-07-11: Added a multivarate iterated refinement to the greedy search, and updated the program.]
[HJH 2015-10-05: Forgot to link to part six.]