Is a Miss as Good as a Mile (in Senate Polling)?

The folks at fivethirtyeight published their weekly polling recap and in it, included a discussion of how badly Senate polls tend to miss their targets.  The context, in this case, are a recent spate of polls that put Beto O'Rourke in spitting distance of the execrable Ted Cruz in the Texas Senate race.  I'll let you judge the approach for yourself, but I, for one, strongly disagree with the approach that 538 puts forward.  Essentially, the look at a bunch of historical polls and compare the absolute value of the polling miss with respect to the final outcome.

But we already know that when it comes to polls, you need to look at averages. So I'm taking a different approach, and fortunately, I can do so, because 538 make their data publicly available.

So here's what I did.  First, I collected all senate races in the last 20 years where there were at least 5 late (September or later) polls.  I then averaged every race weighted by number of respondents and compared to the final outcome, a total of 142 races.  The result is the figure at top.  As you can see, there is a fair amonut of scatter, and no clear trendline.

We can also look at the histogram of results:
Very much like a bell shape curve.  That is, most senate races were within a few points of the final results.  Some were not.  The average race was only 0.2% from the final result, which means that on average, we can't say that either the Dems or the Republicans typically out-perform the polls.

But we can also break this up year by year, and we find:
Number of
(positive numbers mean Dems outperformed the polls) 
So what does this tell us?
  1. The overall distribution seemed unbiased. That is, the mean residual was only 0.2% (Dems technically did slightly higher than expected, but only barely).
  2. The standard deviation is almost exactly 5%. Another interpretation is that, historically, if candidate A is leading in the polls by 5%, then they have about an 84% change (Gaussian distribution) of ultimately winning. It also means that >10 point leads are pretty safe.

    FWIW, there are currently 6 senate races with polling averages within 5 points of neutral:
    NV, TN, MO, ND, FL, TX, 3 apiece currently held by Dem and GOP. The only one outside that range that's an expected flip is AZ.  You can play around with these numbers at my senate tracker.
  3. From year to year, there is some overall error which is inconsistent with just polling error. Last year, in the 16 races which met the criteria, Republicans did 5 points better than expected _on average_. In 1998, Dems did 6.5 points better. Those were the most extreme.

    Bear in mind that this is a combination of random polling error, the unknown systematics which change from year to year, and last minute changes in the opinions of the electorate. Last year, for instance, there's evidence that nationally, the polls moved by about 2 points in the final 11 days. For some reason. That's not all of the polling miss, but it's a fair part of it.
  4. In a typical year (again, looking at the standard deviation), the global swing (the amount by which the average of the average misses) is about 3%. 
The upshot is that by and large these polls actually do a pretty good job predicting the state of the race, but as always, you need to keep an eye on the errors.