A quick pitch for my house modeler

On election night, I'll be live-updating my House Model, my latest, greatest widget which, with pretty minimal assumptions (just high-quality House race polls) makes a pretty robust prediction about which seats will flip, what the probabilities of each are, and what the distribution of the final outcome will be.

Here's the current snapshot:
As you can see, it's pretty optimistic.  227 seats for the Dems if you simply give them every one with a >50% probability, and on the whole, an expectation of about 233 seats – with an 89% certainty of winning.  That's a fair amount higher than the estimates at places like 538, which are closer to 70%. 

The interesting thing about this model is that I can feed in election data as it comes in, and it will automatically readjust itself.  I've fed in 2016 House results, and found that by the time I put in about 8 races, the model had already come to the sad conclusion that the Dems were toast.

My model is much, much simpler than most, and simply looks at the historical presidential performance by district and models to 3 parameters (with my current model parameters included)
  • Incumbency Advantage: 4.2% (a little lower than the 7.5% in the last election, but not unrealistic)
  • "Trend" number: 0.3 (for each point a district moved redward above the national average in 2016, this model predicts it'll move another 0.3%, and likewise for blue).  ¯\_(ツ)_/¯
  •  Swing: 10.5% (also known as the "Generic Congressional Ballot.")  
That last number may seem high, but only because the news tends to report a lot of low quality pollsters. When you look at the high quality pollsters only, and compare trends only within particular polls (as I do in my tracker), you get something like this:

Tada! I get about 10.5%.  In other words, even without using national polls at all, we get a similar result.

Now, I know what you're thinking.  "The polls were wrong in 2016."  Well, they were, but at the state level, the average error was only about 3 points. Yes, it made a huge difference, but the error was still pretty small.  My default settings assume that the polls might be off – all of them – by 4 points or more.  That number then shrinks as election night data rolls in.  In other words, the errors would need to be much larger than in 2016 for my model to be terribly surprised by the outcome.

But there's another test we can do now.  This is the data from all of the "high quality" pollsters (B- or better, according to 538), compared to their expected neutral margins (that is, the average of the presidential margins in their districts in 2012 and 2016, with the national averages subtracted off).
Dems are overperforming (they're above the line), and by a lot.  But now, let's model.  By default, I only use public polls to get the numbers above.  Internal polls (polls commissioned by a campaign) have too much of a possibility of bias (you'll only release polls where you look good).   

But we can compare both the public polls (used to make the model) and internal polls (which are totally independent) to the poll values that the model predicts, and they do a very good job:

 The two "x's" are idenfied by the code as outliers (WV-03 and NM-01, if you must know),  and are discarded, but the others are used to make a fit.  And even the red points, the internal polls, get fit really well.  It produces an unbiased result, and with a scatter of about 8 points.  That isn't too much of a surprise.  After all, even the best model won't be perfect for any given district, and, of course, the polls themselves have sampling errors of about 5 points.  The code takes this scatter into account.

Time will tell if I'm too optimistic, but at very least, this produces self-consistent results from simple assumptions, which is at least a good start.