What happened?

The midterms are over, and if you didn't know, the Democrats won, and won big time, especially in the House.  Moreover, this was a big success for pollsters who needed a win after 2016.  As of right now, the final outcomes seems to be:
  • House: 233 (D+38)
  • Senate 47 (D-2)
We'll get to the senate eventually, but for today, let's focus on the House.  I'd like to begin with a bit of bragging.  Here's my tweet from morning of election day:

Plugging the final result into my prediction gives:

Not bad, just shy of the median.

Of course, a lot of prognosticators seem to have bragging rights. As I'm sure you've seen, most modelers were more or less in this range.  Fivethiryeight, for instance, though a bit more bearish (86% probability of winning the House) estimated a median of 231 seats.  Others did similarly.  In that sense, it almost didn't matter who you followed.  After 2016, there was a lot of popular dismissal of the polls, but it turned out that the pollsters (especially the NY Times's Nate Cohn, and the Siena Upshot live polls, which provided nearly half of the data) who were the real heros.

My deeper concern with other modelers was that they were too dependent on "secret sauce" to do their modeling. I wanted something much simpler and a lot more transparent. 

My approach was a lot simpler than most.  I describe it in more detail here and here, but the upshot is:
  • Each district can be described by a lean, and the 2012 and 2016 presidential outcomes give you a good sense of those leans.
  • For incumbents who also ran in 2016, their over- or under- performance compared to the presidential candidates is included.
That's it.  That's baseline.  Each district is then adjusted by 3 numbers (none of which have anything to do with the actual person on the ballot):
  1. Incumbency Advantage – How much just being an incumbent matters
  2. The "swing" of the election – Is this election like 2016 or 2012?
  3. The generic congressional ballot – A number that just gets added across the board.
Most importantly, I only use district level data and in particular, only non-partisan polls with a rating of B- or higher. 

No national polls.  No corrections for scandals.  No tracking of money. Nothing.

I took the polls, assumed a potential range in systematic errors (about 3%), and ran thousands of random realizations of the races.  By the way, here is the one bit of historical data that I needed o include: How much, on average, are House polls typically off by?  Looking at systematic error in house polls over the last 10 cycles:

the actual range is about 2.1%, less than I used and much less than other modelers.  If anything, my 90% likelihood was expected to be too cautious.

Then, during election night, I add results as they come in, and the model slowly converges to the "correct" outcome.  In other words, my model wasn't terribly bothered when individual races came in unless they differed tremendously from prediction (which they didn't) or when republican-leaning seats were called early. But other modelers freaked the hell out.

Indeed, around 9:00 EST, 538's model was predicting that the House race was aroun 50-50 because they were fooled by early results.

That red plateau in the middle is an overreaction in 538's model (which was mirrored in the betting markets, and many other modelers).  Meanwhile, I couldn't quite understand the fuss.  My model never wavered, and as results came in that suggested that the polling was more or less correct, it became more certain of the outcome.  To wit:

 this last in the late evening, when the consensus result swing back to Dems easily taking the House.

District by District

The thing that's interesting about modeling the House is that in some sense, any given race is going to be governed by uncertainty and random error.  To begin, it's worth reiterating that the pollsters did a phenomenal job.  Using the polling selection criteria (non-partisan, B- or better):

This result doesn't just look like a good match, it is.  There are 103 distinct polled districts, and the average of errors is only 1% (Republicans did about a point better than expected, at least in this group), but if you look above, that's actually less than the typical miss (on average, about 2-3 points).

Also, the scatter, ($\sigma$) is about 6.6%.  Since a typical House District poll is about 400 people, the expected error due to sampling alone is about 5%.  This is barely more than that.

103 districts is a lot, but it's not all of them.  Overall, there were 392 contested seats, and I modeled all of them using the approach above.  If a district was also polled, then I combined the polling and the model, to make a prediction.  And the seat-by-seat predictions were also really good:

Note that the results cover a much wider range of territory.  While there were a few big outliers, including IL-03,NC-04,NC-06,NC-12,WI-08,WV-02 which all had errors of 20 points or more, but only two of them actually predicted the wrong winner (WV-02 incorrectly picked the Dem, while NC-06 incorrectly picked the Republican).  There were lots of close races where I nominally picked the wrong winner, but that's actually the point.  I only expected that this or that race was likely to be Dem 55% of the time, for instance.  But with hundreds of races, those expectations will average out.

Finally, we can estimate from just the model alone.  That is, what happens if we use the polls to make the model, but then just estimate the districts independently (no polls within individual districts)?  Again, we do quite well:

Roughly the same scatter, and we have trouble in the same districts.

The Big Takeaways

There is a rather surprising result from all of this, and it's kind of startling.  The scatter around my simple model and the actual district-by-district outcome is only 7 points.  That means that, to leading order:
The Actual Candidates Barely Matter – at least in the House.
Think about that.  It's nuts.  But in district after district, by far the biggest effects were the overall shift of the national environment, whether there was an incumbent running, and how that district voted historically, and that's it. The plot above knows nothing about the actual person running for office – only national trends.

All of the ineffable qualities of a candidate or their opponent.  All of the funding, the ground game, the weather on the day of the election, etc; those things add up to about 7 points.  Which means that if you're in a district which you expect to be at more than, say, 15 in a normal year (allowing for a big national swing), no amount of awesomeness will swing the race.

And what are the global numbers? We can look at the final model to estimate them:
  1.  Incumbency Advantage: 3.0%
    This is, in some sense, to be expected.  The incumbency in 2016 was 7.5%, so the fact that it's lower shouldn't surprise us. Even so, one of the biggest things Dems had going for them was that 40 Republican incumbents opted not to run this time around.
  2. The swing parameter: 0.14
    This means, in essence, the in terms of the overall lay of the land, the map more resembles 2016 than 2012, albeit significantly shifted toward the Dems. There is an essay to be written about how there are non college-educated white voters who aren't coming back to the Dems, but I'm not going to be the one to write it.
  3. The Generic Congressional Ballot: 7.6%
    This is huge, even by historical standards, and was pretty much necessary to flip the House, and roughly in line with the popular vote advantage (about 7.3%), but this is due to a few things.  For one, with 40 uncontested D seats and only 3 uncontested R's.
There is another piece of good news.  Even with a more modest incumbency advantage (which may revert to historical norms), now that Dems hold the House, keeping the House in 2020 is a hell of a lot easier.  Which is especially good, as I'd like to put my efforts into flipping the White House and the Senate.  And one final, final note (and caution): what's true for the House need not be true for the Senate.  Dems won in West Virginia and Montana, in seats that Trump won by 42 and 20, respectively.  The same rules clearly don't apply in the Senate.