Predictions and Postdictions in the Philly Primaries!

My projections for the 2019 Philadelphia judicial elections, as the result came in.    These are the top 6 candidates  (out of 25 running).  They're the ones who are ultimately getting to become judges.  
Tuesday was Election Day here in Philadelphia, and as we're more or less a one party town, the primary essentially is the election.  I had the pleasure of working with 3 great first-time candidates: Tiffany Palmer (who ran successfully for judge), Jen Devor (who lost in a tough race for election commissioner), and Eryn Santamoor (who missed by a couple of spots in a 30-way race for city council at large).  Following the election, I have some gripes about the excessive influence of the party machine, but now's not the time.

I did some quantitative modeling and analysis for their campaigns throughout, but on election night, I set up a war room, and made real time predictions of the final outcomes.  And as with the midterms, my models were remarkably stable and converged very early on.  They were also surprisingly simple:
  1. I looked at the total number of voters and relative historical turnout division by division (you may know divisions as "precincts").
  2. Looking at some historical candidates, I made templates for how different types of candidates could be expected to perform in each division.  For instance:


    where the templates are for historical progressive, African American, and White "establishment" candidates. I also allowed a possibility for uniform performance across the city. Different modelers might make different choices.
  4. Then, as the data came in, my code did 2 things.  First, it figured out which combinations of these 4 templates modeled each candidate in the race best.  And second, it estimated overall numbers (like total turnout, vote share for each candidate, and so on).  I also included a few corrections for things like ward endorsements, but other than that, everything was on autopilot.
It was very stable and very successful.  For instance, here are the projected vote shares for the top 5 candidates in the judge race as they came in:
By the time about 5% of the vote (79 divisions) had come in, the code more or less nailed the vote share (generally to within half a per cent), and the ranked order of the candidates.  And while there were certainly some that were too close to call (see Crumlish and Jacquinto), by about 8:30 on election night, I felt pretty confident making the call for Tiffany Palmer.  

I should note that we got even greater stability in the city council (top 5 get to serve):

and commissioner (top 2) races:

Part of the reason that this works is that voter behavior tends to be highly correlated.  For instance, very early on, my code identified Tiffany as a "progressive" candidate. Consider her final map:

and compare to the progressive map, above.

But it goes even deeper than that.  Here's plot of her projected votes, division by division, when only 5% of the vote was in:
Yellow points are divisions where the data is already in, so final should be equal to estimated vote.  In some cases, data was only partially in, hence the occasional mismatch.  With the blue points, the y-value is the number of votes estimated by the model at the point, and the x-value is the actual.  The correlation coefficient between the two was 0.9, with a scatter around "perfect" of about 20 votes.
There are some obvious advantages to this level of predictability.  It tells us how to allocate resources, for one. In some of the campaigns, it helped to figure out where to send mailers, or where to place poll volunteers. It also allows us to set expectations, identify targets for fundraising, or even make a pitch about electability.