The midterms are over, and if you didn't know, the Democrats won, and won big time, especially in the House. Moreover, this was a big success for pollsters who needed a win after 2016. As of right now, the final outcomes seems to be:

Plugging the final result into my prediction gives:

Not bad, just shy of the median.

Of course, a lot of prognosticators seem to have bragging rights. As I'm sure you've seen, most modelers were more or less in this range. Fivethiryeight, for instance, though a bit more bearish (86% probability of winning the House) estimated a median of 231 seats. Others did similarly. In that sense, it almost didn't matter who you followed. After 2016, there was a lot of popular dismissal of the polls, but it turned out that the pollsters (especially the NY Times's Nate Cohn, and the Siena Upshot live polls, which provided nearly half of the data) who were the real heros.

My deeper concern with other modelers was that they were too dependent on "secret sauce" to do their modeling. I wanted something much simpler and a lot more transparent.

My approach was a lot simpler than most. I describe it in more detail here and here, but the upshot is:

No national polls. No corrections for scandals. No tracking of money. Nothing.

I took the polls, assumed a potential range in systematic errors (about 3%), and ran thousands of random realizations of the races. By the way, here is the one bit of historical data that I needed o include: How much, on average, are House polls typically off by? Looking at systematic error in house polls over the last 10 cycles:

the actual range is about 2.1%, less than I used and

Then, during election night, I add results as they come in, and the model slowly converges to the "correct" outcome. In other words, my model wasn't terribly bothered when individual races came in unless they differed tremendously from prediction (which they didn't) or when republican-leaning seats were called early. But other modelers freaked the hell out.

Indeed, around 9:00 EST, 538's model was predicting that the House race was aroun 50-50 because they were fooled by early results.

That red plateau in the middle is an overreaction in 538's model (which was mirrored in the betting markets, and many other modelers). Meanwhile, I couldn't quite understand the fuss. My model never wavered, and as results came in that suggested that the polling was more or less correct, it became

this last in the late evening, when the consensus result swing back to Dems easily taking the House.

This result doesn't just

Also, the scatter, ($\sigma$) is about 6.6%. Since a typical House District poll is about 400 people, the expected error due to sampling alone is about 5%. This is barely more than that.

103 districts is a lot, but it's not all of them. Overall, there were 392 contested seats, and I modeled all of them using the approach above. If a district was also polled, then I combined the polling and the model, to make a prediction. And the seat-by-seat predictions were also really good:

Note that the results cover a

Finally, we can estimate from just the model alone. That is, what happens if we use the polls to make the model, but then just estimate the districts independently (no polls within individual districts)? Again, we do quite well:

Roughly the same scatter, and we have trouble in the same districts.

###

There is a rather surprising result from all of this, and it's kind of startling. The scatter around my simple model and the actual district-by-district outcome is only 7 points. That means that, to leading order:

All of the ineffable qualities of a candidate or their opponent. All of the funding, the ground game, the weather on the day of the election, etc; those things add up to about 7 points. Which means that if you're in a district which you expect to be at more than, say, 15 in a normal year (allowing for a big national swing), no amount of awesomeness will swing the race.

And what are the global numbers? We can look at the final model to estimate them:

- House: 233 (D+38)
- Senate 47 (D-2)

Final House estimate:— Dave Goldberg (@askaphysicist) November 6, 2018

Dems take the House with 90% win probability, and a median of 240 seats, 229 in the "winner takes all" estimate.https://t.co/WD68UXiXIv

The "model" will be updated as results come in, converging on the final result. pic.twitter.com/NEizEVahkg

Plugging the final result into my prediction gives:

Not bad, just shy of the median.

Of course, a lot of prognosticators seem to have bragging rights. As I'm sure you've seen, most modelers were more or less in this range. Fivethiryeight, for instance, though a bit more bearish (86% probability of winning the House) estimated a median of 231 seats. Others did similarly. In that sense, it almost didn't matter who you followed. After 2016, there was a lot of popular dismissal of the polls, but it turned out that the pollsters (especially the NY Times's Nate Cohn, and the Siena Upshot live polls, which provided nearly half of the data) who were the real heros.

My deeper concern with other modelers was that they were too dependent on "secret sauce" to do their modeling. I wanted something much simpler and a lot more transparent.

My approach was a lot simpler than most. I describe it in more detail here and here, but the upshot is:

- Each district can be described by a lean, and the 2012 and 2016 presidential outcomes give you a good sense of those leans.
- For incumbents who also ran in 2016, their over- or under- performance compared to the presidential candidates is included.

- Incumbency Advantage – How much just being an incumbent matters
- The "swing" of the election – Is this election like 2016 or 2012?
- The generic congressional ballot – A number that just gets added across the board.

*only*use district level data and in particular, only non-partisan polls with a rating of B- or higher.No national polls. No corrections for scandals. No tracking of money. Nothing.

I took the polls, assumed a potential range in systematic errors (about 3%), and ran thousands of random realizations of the races. By the way, here is the one bit of historical data that I needed o include: How much, on average, are House polls typically off by? Looking at systematic error in house polls over the last 10 cycles:

the actual range is about 2.1%, less than I used and

*much*less than other modelers. If anything, my 90% likelihood was expected to be*too*cautious.Then, during election night, I add results as they come in, and the model slowly converges to the "correct" outcome. In other words, my model wasn't terribly bothered when individual races came in unless they differed tremendously from prediction (which they didn't) or when republican-leaning seats were called early. But other modelers freaked the hell out.

Indeed, around 9:00 EST, 538's model was predicting that the House race was aroun 50-50 because they were fooled by early results.

That red plateau in the middle is an overreaction in 538's model (which was mirrored in the betting markets, and many other modelers). Meanwhile, I couldn't quite understand the fuss. My model never wavered, and as results came in that suggested that the polling was more or less correct, it became

*more*certain of the outcome. To wit:Hey @FiveThirtyEight , what's going on with your house predictor? Given that Dems are winning in all of their safe and lean seats, and half the tossups, how is it down to 40%?— Dave Goldberg (@askaphysicist) November 7, 2018

I don't understand,looking at the actual house results, why people are freaking out.— Dave Goldberg (@askaphysicist) November 7, 2018

If you haven't been following my house tracker:https://t.co/S42CqwpmNn— Dave Goldberg (@askaphysicist) November 7, 2018

I've been continuously updating, and unlike the needle or the 538, I've never gone below 230 seats or 90%.

this last in the late evening, when the consensus result swing back to Dems easily taking the House.

### District by District

The thing that's interesting about modeling the House is that in some sense, any*given*race is going to be governed by uncertainty and random error. To begin, it's worth reiterating that the pollsters did a*phenomenal*job. Using the polling selection criteria (non-partisan, B- or better):This result doesn't just

*look*like a good match, it is. There are 103 distinct polled districts, and the average of errors is only 1% (Republicans did about a point better than expected, at least in this group), but if you look above, that's actually less than the typical miss (on average, about 2-3 points).Also, the scatter, ($\sigma$) is about 6.6%. Since a typical House District poll is about 400 people, the expected error due to sampling alone is about 5%. This is barely more than that.

103 districts is a lot, but it's not all of them. Overall, there were 392 contested seats, and I modeled all of them using the approach above. If a district was also polled, then I combined the polling and the model, to make a prediction. And the seat-by-seat predictions were also really good:

Note that the results cover a

*much*wider range of territory. While there were a few big outliers, including IL-03,NC-04,NC-06,NC-12,WI-08,WV-02 which all had errors of 20 points or more, but only two of them actually predicted the wrong winner (WV-02 incorrectly picked the Dem, while NC-06 incorrectly picked the Republican). There were lots of close races where I nominally picked the wrong winner, but that's actually the point. I only expected that this or that race was likely to be Dem 55% of the time, for instance. But with hundreds of races, those expectations will average out.Finally, we can estimate from just the model alone. That is, what happens if we use the polls to make the model, but then just estimate the districts independently (no polls within individual districts)? Again, we do quite well:

Roughly the same scatter, and we have trouble in the same districts.

###
**The Big Takeaways**

There is a rather surprising result from all of this, and it's kind of startling. The scatter around my simple model and the actual district-by-district outcome is only 7 points. That means that, to leading order:The Actual Candidates Barely Matter – at least in the House.Think about that. It's nuts. But in district after district, by far the biggest effects were the overall shift of the national environment, whether there was an incumbent running, and how that district voted historically, and

**that's it.**The plot above knows*nothing*about the actual person running for office – only national trends.All of the ineffable qualities of a candidate or their opponent. All of the funding, the ground game, the weather on the day of the election, etc; those things add up to about 7 points. Which means that if you're in a district which you expect to be at more than, say, 15 in a normal year (allowing for a big national swing), no amount of awesomeness will swing the race.

And what are the global numbers? We can look at the final model to estimate them:

- Incumbency Advantage: 3.0%

This is, in some sense, to be expected. The incumbency in 2016 was 7.5%, so the fact that it's lower shouldn't surprise us. Even so, one of the biggest things Dems had going for them was that 40 Republican incumbents opted not to run this time around. - The swing parameter: 0.14

This means, in essence, the in terms of the overall lay of the land, the map more resembles 2016 than 2012, albeit significantly shifted toward the Dems. There is an essay to be written about how there are non college-educated white voters who aren't coming back to the Dems, but I'm not going to be the one to write it. - The Generic Congressional Ballot: 7.6%

This is huge, even by historical standards, and was pretty much necessary to flip the House, and roughly in line with the popular vote advantage (about 7.3%), but this is due to a few things. For one, with 40 uncontested D seats and only 3 uncontested R's.

## Comments

## Post a Comment