, , ,

[HJH 2015-09-11: Bah, I screwed up some of the math here. I’ve edited this post accordingly, and you can read the details over here.]

So far, so good. Sony figured the Ghostbusters reboot would earn about $39 million on its opening weekend, while a number of independent sources guessed around $45-$54 mil, and the final number was $46.5 million. It’s better than expected, personal bests for Feig, McCarthy, and the rest, plus Sony is happy. To be considered profitable it’ll have to hit the $150 million mark domestically, though, and as luck would have it Ghostbusters is stuck between a rock (Star Trek Beyond) and a hard place (The Secret Life of Pets). Fortunately, you don’t need to be profitable to earn a sequel.

Beyond the box office, the reviews have been generally positive. Rotten Tomatoes pegs it at around 73% approval, MetaCritic is a bit more sour at 60%, and the only real blight was IMDb where Ghostbusters currently sits at 52%, though it’s been floating upwards in the last few days. Last time, I’d dismissed the latter results because the “wisdom” of the crowds can be easily coloured by a vocal mob or human bias.

But then I looked more carefully at these graphs…

The IMDb rankings of four remakes: The Thing, Evil Dead, RoboCop, and Ghostbusters.

… and saw the hate mob behaved in a predictable way, one that was fairly easy to see and describe mathematically. What if I tried peeling back their influence, leaving only the people giving honest reviews? Would I get substantially different numbers?

[HJH 2016-07-24: If you just want a quick summary of the results, click here.]

There was only one good way to find out: build a model of IMDb users, and feed it into a Bayesian analysis. My model wound up with four components:

1. The Haters. These people are protest voting by leaving low rating. Their votes follow a power law distribution, clumping at 1 star but sometimes leaving a 2. Since their vote is motivated by something other than the quality of the movie, they shouldn’t be included in the movie’s score.

2. The Idolators or “Idols.” There’s another group of voters that behave like the haters, however. They too are blinded by politics or self-interest and don’t leave an honest rating, but unlike the haters they tend to hand out 10-star ratings. This also skews the average, and unfortunately it tends to have a greater effect than the haters.

3. Uncertainty. Picture a group of people who pick their rating out of a hat. Ridiculous, isn’t it? Why would someone go to the time and effort to fire up IMDb and navigate to the polling page, then pick a random value? I wasn’t planning on inserting this group; in addition to absurdity of random voting, this category is also artificially buoyed by the two previous groups of voters, muddying what it stands for, and I already have uncertainty handled via credible intervals on every variable.

When I did runs of the three-component model, however, I noticed that the most probable distributions tended to have very shallow exponents for either Haters or Idols. The system was trying to add in this “uncertainty” category, by warping the Hater or Idol component to be somewhat flat across all the star rankings. This muddied what that component represented and removed my ability to honestly compare Haters and Idols. So I bowed to the system’s wishes, and explicitly modeled uncertainty.

4. The Dispassionates. The remaining people are trying to give an honest opinion of the movie. While there’s quite a bit of variety in how they vote, any shared experience that they have would tug them towards a shared rating. The result is a binomial distribution around that global rating.

The model of different IMDb voters, pulled apart.Here’s all four components pulled apart, after a Bayesian fit to 783 randomly-drawn IMDb entries. Those four require six variables: a strength and shape parameter for both the Hater and Idol parts, a strength parameter for uncertainty, and the global rating of the dispassionate. The math has been adjusted so the strength parameters of each component are always proportional, allowing comparisons. I’ve tried out a few other models, and nothing else did as good a job; one that was just uncertainty plus Dispassionates, for instance, was about 1042,191 times less likely to be the one-true model.

The values in the above chart represent the typical rating you’ll find IMDb. Surprisingly, IMDb users are an optimistic lot, both skewing their votes above average and with more Idolators than Haters. So what did they think of Ghostbusters? That’s best answered by pitting the score the Dispassionates gave it against other reboots or remakes, plus the two previous Ghostbuster films. These scores only make sense relative to one another, after all.

A list of movie ratings, pulled from the Bayesian model in the text, as of July 20th. Click the link for a copy-paste friendly version.Wild, the Ghostbusters remake is pretty good! I didn’t expect it to beat King Kong, Dredd, or the first Mission Impossible movie. This score also fits nicely with a common refrain from reviewers, that the remake isn’t as good as the first film but better than the sequel.

I know, you’re probably a bit skeptical of this rating. While it may seem out of left field, it’s really just the average of the Dispassionate’s distribution. You can even see that distribution in the data yourself; scroll up to the bar charts and look for the “hill” in the scale, and you’ll see the Dispassionate’s average falls right around there (though Idols tend to skew the hill upwards). One other thing that reassures me is how stable this rating is;

A time series of how the model has performed over the last several days. After two days of instability, it found and stuck to a consistent value.While the mean and median have been skating all over the place, after two days of instability the model was quite consistent in where it ranked Ghostbusters. A thousand votes or so should be enough to converge on a stable result, so the fact that the mean/median don’t converge tells us they’re lousy metrics here. The “eyeball test” also shows that middle peak remaining consistent over time.

The instabilities in the model are partly my fault, too; I’m using flat priors for all the variables, which means the optimizer is free to overlap the Dispassionate voters with either the Idols or Haters if there isn’t a strong third peak. More informative priors that favor the centre should lead to stronger convergence.

Speaking of those two voting blocs, though, how big an influence do they have?

A comparison of the relative strength of Idols and Haters for all three Ghostbusters films, plus 34 remakes and reboots.Wow, Ghostbusters is a big outlier! While there are remakes that come close to having the number of Idols or Haters that it does, none have high levels of both at the same time. There just isn’t anything like it in this pool of remakes/reboots. Maybe we can find some similarities in that pool of random IMDb ratings? Let’s only look at movies with at least 1,000 votes to their name, to filter out some noise.

The most polarized films in the sample, determined by (haters * idols). Click through for a text-friendly spreadsheet.OK, so we have a smattering of comedies and horror films, a movie that polarized viewers over race, a number of Bollywood flicks, two remakes, a kid’s movie, something from Uwe Boll, a polemical drama, and Ghostbusters again in the outlier spot. I’m not seeing any pattern here, which leaves us grasping for theories.

One possibility is that the Haters arrived in force, causing a group of Idols to form and counter-vote to compensate. A quick skim of the user reviews shows a lot of people expressing shock or dismay at the poor reviews of others, and a few chiming in to right that wrong. A glance at the ratings for men and women seems to confirm this, too.

Comparing the votes for Ghostbusters (2016), split by gender.

But looks can be deceiving. While the charts seem to show a massive number of female Idols, the total number of women who voted is much lower than the total number of men. We need to factor that in when doing a proper gender breakdown, too, nor can we just look at a single data point.

The gender breakdown of the three main voting blocs, plus the overall breakdown, sorted by Idols. Click the link for a text-friendly version.

The big gender split isn’t isolated to Ghostbusters, it’s actually a general trend among Idolators. Maybe women on IMDb know they’re a minority, so when they see a score they disagree with they’re more likely to compensate with a protest vote? That doesn’t explain why men skew to be Haters, though. It’s also notable that the gender breakdown of the Dispassionate closely tracks the overall breakdown with a subtle male bias…. with one exception, as usual.

Gender ratio for IMDb reviews, comparing the overall split to the Dispassionate split.Women are more likely to be Dispassionate voters, when it comes to Ghostbusters? Tsk, those emotional men…

But hold on, this theory has a few holes. Both Fury Road and The Force Awakens faced a storm of controversy by featuring well-rounded women in starring roles, in franchises that were regarded as male-centric, so where are their Haters?

Ghostbusters vs. Mad Max vs. Star Wars. Only the first has a substantial number of Haters.It’s tempting to argue that since Ghostbusters is a recent release, the Idols will eventually drown out the Haters, but look at the vote counts: in less than a week, Ghostbusters has had more 1-star reviews than The Force Awakens earned in six months, and Fury Road picked up in a year! The Haters may be fading away, but they’ll be a significant part of the ratings for a very long time. In the meantime, we’re still left struggling to explain the lack of Haters for the other two. Shouldn’t a flood of Idols trigger a flood of Haters, as well? Getting sick of popular things is a popular thing itself.

Maybe, just maybe, the people hating on those other two were more bark than bite. Members of 4Chan have done some impressive vote-rigging in the past, but that’s kind of the point; had they rigged the vote for Ghostbusters, the average score would have been zero. No, maybe the large number of Haters isn’t due to external groups trying to muck up the ratings, maybe they’re protest votes from authentic IMDb users. Maybe the huge Hater and Idol totals for Ghostbusters are legitimate reflections of protest…

The voting layout for three cult classic films. All three have a small but noticable Idol bloc.… or of interest. A “cult classic” movie is one that attracts a loyal following of fans. These movies tend to have a higher Idol turnout than is typical, with few Haters…

More cult classics, this time of movies that are so bad they're celebrated.… but there are exceptions. Yes, it’s a bit of an insult to put Ghostbusters and Manos in the same category, but see if you can spot a Dispassionate voting bloc in Manos‘ IMDb rating. Reviewers are quick to call out so-bad-its-good movies, and so far I haven’t seen a single person claim that for Ghostbusters. Instead, in my social circles I see an endless stream of people gushing over the movie and planning to rewatch it. Ghostbusters is legitimately good, unlike say The Room, but both their ratings may suggest they’ve earned “cult classic” status and will be talked about for decades to come.

There’s a way to double-check this. IMDb is famous for their user ratings, and provides excellent demographic information; Rotten Tomatoes is famous for aggregating professional critic reviews, but they also collect user reviews on the side. These are only presented as an average, with no demographic detail, so it isn’t as attractive a target for ballot stuffing. We’d expect there to be less extreme behavior over there.

A comparison of how Rotten Tomatoes and IMDb users rank the same movies.Nope, user reviews on Rotten Tomatoes tell the same basic tale of Haters, Idols, and Dispassionates. The Haters are more tightly clustered on Rotten Tomatoes for Ghostbusters, and Plan 9 from Outer Space is favored more over there, but I don’t see any sign of mass tampering or outside influence. If anything, the Dispassionates rank Ghostbusters slightly higher at Rotten Tomatoes (78.8% vs. 71.6%).

But this theory isn’t without issues, either. If Ghostbusters will become a cult classic, why don’t “Plump Friction” or “Thomas and the Magic Railroad” have that status now? There’s more to a classic than just polarizing viewers; they usually start off obscure, then get promoted by a small group of people who cultivate a taste in bad movies until someone with a platform bumps up that movie’s popularity. Ghostbusters has none of that. Also, the movie‘s been out for a week! Let’s not get too carried away here.

But back to the gender breakdown. Given the demographic differences, you’d expect male and female Dispassionate reviewers for Ghostbusters to come to wildly different conclusions.

The scores of male and female Dispassionate voters, compared and with error bars.Not really. Women tend to hand out higher scores than men, overall, and Ghostbusters doesn’t stray too far from that. Here at last, it’s an outlier with a lot of company.

The bigger story is that men and women largely agree on movie ratings, within a star ranking of each other most of the time. There’s no clear sign of chick or dick flicks. The exceptions can easily be waved away as artifacts of small voter turnouts or flat priors.

This strains credulity; Hollywood has been a heavy user of marketing since it began. In the last few decades, one of the most popular ideas in marketing has been market segmentation: divide up your products so that they appeal to different non-overlapping markets, so when you spend funds to promote one product you won’t be harming another. This is why marketers started pushing gendered toys so hard in recent decades, as it promotes buying two toys instead of one, and why Hollywood markets differently to women. They wouldn’t go to these lengths if these differences didn’t exist, right?

Let’s try something more targeted, and gather a pool of movies that should be gendered. If you tally up who’s voting, you can see a clear difference by gender…

Comparing voter demographics for some movies that should be heavily gendered. Click through to get the raw tables... yet if you look at ratings and sand off the Haters and Idols, the distance dramatically narrows.

Comparing Dispassionate rankings, by gender. Click through for the raw tables.This isn’t hard to explain, as for a good decade there’s been strong evidence that men and women are more similar than different. Any intrinsic difference is easily swamped by the forces of variation and socialization. Some women love watching gunfire and mayhem, some men love a good cry over a bittersweet romance, and while they’ll claim otherwise when you put them on the spot, when safely hidden behind statistical aggregation they’ll open up. There’s little evidence that market segments have genuine differences, beyond those artificially created by advertisers.

This also implies men have no intrinsic problems identifying with female leads, it’s culture that turns them into Haters. As an industry with a disproportionate influence on culture, Hollywood has an obligation to step up and correct that, instead of reinforce it.

[HJH 2016-07-24] Well, that was a deep dive. What have we learned from it?

  • User rankings can be useful after all, so long as you have a good model for how they vote.
  • Once you factor out the protest votes, Ghostbusters jumps from a rating of 5.3/10 to about 71% on IMDb (it was 71.6% when I wrote the above, and stands at 70.4% as I’m typing this). This brings it in line with the Rotten Tomatoes score of 73%.
  • These protests votes don’t seem to be due to external vote rigging, but legitimate IMDb users voicing their opinion. They might indicate Ghostbusters is a cult classic in the making.
  • Once those protest votes are factored out, the huge-looking gulf between how men and women ranked Ghostbusters drastically shrinks (the data which gives 4.7 vs. 8.0 averages by male and female IMDb users, respectively, translate into rankings of 66.6% vs. 75.5%).
  • Men and women generally give movies the same rating, in the extremes differing by ten percentage points or less. There is no such thing as a “chick flick.”