I'm going to make a pitch, and I (for once) am going to try to make it quickly: Things aren't changing very quickly in terms of public opinion, but there's obviously a strong incentive from some pollsters to make things seem exciting at all times. Your daily or weekly trackers give you almost no information and most of them have an agenda. And some (as I've noted before) are pretty awful.
Instead, focus on the high quality pollsters (as identified by 538). These are pollsters that do live interviews using cell phone samples as well as land lines. They tend to produce robust statistics. Here's what you get if you average Trump's approval (circles) and disapproval (diamonds) from the top 10 pollsters:
There's some good news and some bad news in there:
This looks a lot like the plot above, but you'll notice that the scatter is a bit larger. Why? Because each pollster has their own secret sauce. Each is biased (and I use that in the statistics, rather than the pejorative sense) in its own way. What's more, they aren't all measuring the same thing. Most are measuring "Adults," while Quinnipiac, Fox, and PPP are measuring registered voters, which typically lean a few points more Republican.
It's impossible to figure out which poll is right. Indeed, different pollsters will be right on different days, and being right depends on what you'd like to do with the data (gauge general contentment, predict response to a policy, or estimate the results of an election). My working model will simply be that the average of 10 high-quality pollsters will produced an unbiased sample. I assume that because I have to assume something. Consider Quinnipiac (an A- pollster):
Remember, bias isn't a bad word. Fox may be correct, or IBD may be, but in any given week if their margins may differ by more than 10 points (with IBD as the more favorable) and they'd still be telling the same story.
I just undo the biases and draw the best fit model, and – tada – we have our own well-behaved tracked. But, you might be tempted to ask, how do you know that the smoothed, corrected, model works? For each model, we can then look at the residuals, how much each polling measure differs from the model:
$${residual}_i=y_i-model_i$$
Remember that if you interview $N$ people, then there is a sampling error in the approval of $$\sigma=\frac{0.5}{\sqrt{N}}$$.
That's your errorbars. And if you believe your errors, you'll only expect that the true model and the measured value will overlap to within the errorbars about 68% of the time. About half the time, the residuals should be above the model, about half the time, residuals should be below. And there should be no obvious pattern or correlation between them.
Bottom line: Tracking polls seem to be for suckers.
Instead, focus on the high quality pollsters (as identified by 538). These are pollsters that do live interviews using cell phone samples as well as land lines. They tend to produce robust statistics. Here's what you get if you average Trump's approval (circles) and disapproval (diamonds) from the top 10 pollsters:
Did you know that 538 allows you to download their entire database of polls? If not, you should! |
There's some good news and some bad news in there:
- Bad news: Even after discounting the noise and various "house effects" Trump's approval really has improved over the last couple of months by about 2 points.
- Very Good news: His net approval dropped from -6 (last January) to -15 even with this "bump")
- More good news: High quality pollsters (and comparing only like to like) puts his current approval at 39.3, a full point below 538's average.
- Neutral, but wonky news: The scatter in the polling (after corrections) is entirely attributable to noise.
- In the weeds news: 538 seems to dramatically underestimate the "partisan lean" of pollsters. As it happens, Fox seems to be about 3 points more favorable to Trump than the average as is PPP. 538 suggests they're both only more 1 point favorable to Trump, and corrects accordingly.
This looks a lot like the plot above, but you'll notice that the scatter is a bit larger. Why? Because each pollster has their own secret sauce. Each is biased (and I use that in the statistics, rather than the pejorative sense) in its own way. What's more, they aren't all measuring the same thing. Most are measuring "Adults," while Quinnipiac, Fox, and PPP are measuring registered voters, which typically lean a few points more Republican.
It's impossible to figure out which poll is right. Indeed, different pollsters will be right on different days, and being right depends on what you'd like to do with the data (gauge general contentment, predict response to a policy, or estimate the results of an election). My working model will simply be that the average of 10 high-quality pollsters will produced an unbiased sample. I assume that because I have to assume something. Consider Quinnipiac (an A- pollster):
I've fit the Q data (approve and disapprove) to a 6th order polynomial (basically, like fitting a straight line, but with more bumps and wiggles, as you can see). In that way we're able to ignore little variations between polls, as well as model the data on every day.
Here's a similar analysis with ABC/Washington Post (A+). WaPo seems to vary a bit more, but it has less data, so when we average everything, they get weighted less.
Each poll also has their own bias. And while, as I said, we can't tell which one is "right," at very least I can figure out how each one various from the average. A positive number means that the quantity (approval or disapproval) is higher than average:
Pollster | Approval bias | Disapproval bias |
Quinnipiac | -1.3 | 1.6 |
ABC/WaPo | 0.3 | 1.0 |
CBS | -0.5 | 0.0 |
NBC/WSJ | 1.9 | -0.4 |
Fox News | 4.3 | -2.5 |
Monmouth | 1.2 | -4.1 |
Pew | -1.6 | 3.1 |
Marist | -0.8 | -2.8 |
IBD/TIPP | -2.0 | 1.8 |
PPP | 3.1 | -2.4 |
I just undo the biases and draw the best fit model, and – tada – we have our own well-behaved tracked. But, you might be tempted to ask, how do you know that the smoothed, corrected, model works? For each model, we can then look at the residuals, how much each polling measure differs from the model:
$${residual}_i=y_i-model_i$$
Remember that if you interview $N$ people, then there is a sampling error in the approval of $$\sigma=\frac{0.5}{\sqrt{N}}$$.
That's your errorbars. And if you believe your errors, you'll only expect that the true model and the measured value will overlap to within the errorbars about 68% of the time. About half the time, the residuals should be above the model, about half the time, residuals should be below. And there should be no obvious pattern or correlation between them.
Quinnipiac is just an example, but they all look this way. By the way, 14/27=52% of the points intersect the model. Not quite 68% (so we're potentially slightly underestimating the errorbars), but good enough for government work.
Taking all of the residuals from all 10 polls and making a histogram:
The smoothed line is a bell curve – it's the theoretical distribution that we'd expect if the polls only differed from the model due to white noise, and it's not hard to imagine that as more and more polls are accumulated, the histogram will look more and more like the ball curve. The $\sigma$ of 1.9%, incidentally, is exactly what you'd expect from combining a bunch of 1000 person polls.
All of this is to say that the good pollsters produce good polls, and taking them about once a month is probably just fine.
I'd like to finish up with a quick look at the cheaper, less reliable pollsters. We've talked about them before. Doing the same analysis (plot them all, fit with a curve, average the curves, and figure out the systematic biases) gives much more punctuated plot (see especially around September):
The low quality pollsters produce a much bigger overall swing, from +7 in January 2017 to -14 now, as opposed to the high quality pollsters which went from -6 to -15. Some of this may be due to samples (including Rasmussen's "likely voter" model), but some is really just an overresponse to noise.
I bring up Ras again to point out that there is something quite bizarre about their polling. Looking at the residuals compared to the model, you'll find that the errors are highly correlated. Some of this is to be expected. As they poll in 3 day windows, points within 3 days will necessarily correlate. However, it's also easy to see periods of several weeks where the approval (even after applying the bias correction) remains well above or below the smoothed model. It is hard to reconcile this with the idea that the deviations from other polls arises entirely organically as opposed to, say, fixing the party preferences within the model.
Bottom line: Tracking polls seem to be for suckers.
Comments
Post a Comment