, , ,

Combining multiple polls together is an old idea, but in our interconnected and data-rich age it’s become a popular past time. The current US election in particular has spawned maybe a dozen or so, and nearly all of them give Clinton an 80-90% chance of winning.

Two outliers are interesting, though. Nate Silver’s model at Five Thirty Eight has recently put Clinton’s chances in the 60-70% range. This has caused some nervous tuttering; Silver kicked off the modern era of poll aggregators and became the standard that everyone compares themselves to, so many are uneasy to see his organization’s predictions fall out of line with the crowd.

At the other extreme, Sam Wang’s model at the Princeton Election Consortium gave Clinton a 100% chance of victory. In a post, Wang explained that his code had rounded a very high probability to 100%, and that he would code a new version which instead presented the odds as “>99%.” Wang is combative about his model, praising its simplicity over Five Thirty Eight’s much more complex one. Not too surprisingly, the two are longtime foes.

I don’t have the time to weigh in on this feud in enough detail, so I’ll only give a quick take: Silver’s approach is better. Besides Wang’s very frequentist approach that tosses out a lot of data, his model’s ignorance of historic data and assumption of no systematic poll bias lead to unjustified predictions. Since those predictions usually aren’t far out of line with other aggregators, though, they can be easily swept away, leading to a false sense of confidence.

At any rate, we’re just over a day away from getting a proper evaluation of both models. That might seem tough to do, though: if Five Thirty Eight is 68% confident in Clinton while Princeton is 99.6%, and Clinton wins, aren’t both of them right? There are key tells that describe how right each is, though. 

Let’s say Wang’s model is bang-on. Since the confidence level is so high, the model should get almost every state correct, save maybe one. There’s a strong chance Silver’s model will get every state right too, and that’s a bad thing. Think about it: if you believe a coin is unbiased, but each time you toss it you wind up with heads, you’ve got good reason to think the probabilities weren’t 50-50 after all. Likewise, if states where the Five Thirty Eight model predicts 55-45 splits all break towards the 55 side, Silver’s model was too uncertain.

Conversely, if the Five Thirty Eight model is right about the level of uncertainty it’ll get a few states wrong. The ones it flubs should be states with a high level of uncertainty attached, not ones where Clinton had a 90%+ chance of winning. There might be a systematic bias to these errors, like if pollsters were undercounting Latino votes for instance, but that’s not assured. The Princeton model will also get several states wrong, which is a bad thing even if it has a better hit ratio than Five Thirty Eight’s. If you think a coin will always come up heads, flipping a tail will refute that belief even if you’ve flipped heads a hundred times before. 

I’ve gotta get back to work, but I’ll be eagerly following both aggregators on election day. May the better model win.