Hi Folks, just a quicky on how polls could be tallied with significantly reduced noise.
I'd be very interested in hearing your thoughts (especially if your thoughts consist of: this is already done, dummy, and it's called X).
Oh, and apologies to those on mobile devices for which the LaTeX parsing seems not to be working.
Here goes:
I've been thinking that an effective way to poll people who've participated in past elections is to ask them who they voted for (or if they voted) in the last election and then to ask who or which party they'll vote for now.
Say that the last race was a nearly 50-50 split, if the $f_{AB} \ll 1$ is the fraction of A voters switching from parties ($A\rightarrow B$) and $f_{BA} \ll 1$(the number going from $B\rightarrow A$), then the vote share for party A will be:
$$p_A=0.5\times (1-f_{AB}+f_{BA})$$
But the uncertainty* in this will be:
$$\sigma_A=\sqrt{\sigma_{AB}^2+\sigma_{BA}^2}$$
where
$$\sigma_{AB}^2\simeq \frac{0.5\times f_{AB}(1-f_{AB})}{N}$$
So, taking $1-f_{AB}\simeq 1$ we get:
$$\sigma_A=\frac{0.5\times \sqrt{f_{AB}+f_{BA}}}{\sqrt{N}}$$
This is to be compared with the normal error bars for simply asking "who will you vote for?" which yields
$$\sigma_{A,traditional}=\frac{0.5}{\sqrt{N}}$$
So consider a district which went exactly 50-50 in the 2016 election but which now approximately 4% of Trump voters are now feeling regrets and would switch to voting Dem (and no Clinton voters switch). This means that a perfect poll would produce a 52-48 result, in favor of the Dem. The error bars are reduced by approximately a factor of:
$$\frac{\sigma_A}{\sigma_{A,traditional}}\simeq \sqrt{f_{AB}+f_{BA}}=\sqrt{0.04}= 0.2$$
Put another way, you'd get roughly the same errorbars by interviewing 40 people under the new approach as you would with interviewing 1000 people under the old.
Or consider a 400 person survey. Under the traditional approach, your formal uncertainty would be $\sigma_A=2.5\%$. Under the new approach, you'd expect (with 4% of Republican "switchers") to get $\sim 8$ to tell you that they will switch. Those are the only ones you're looking for. In this case, you'd get a formal error of only $0.5\%$.
You'd get a similar result by asking the third option (both from the previous and next election) as to whether they'd voted or intended to vote.
Now, before you jump in with every conceivable objection, I realize that a major issue is that people may simply lie about who they voted for (because of virtue signalling, or to skew the results). This, indeed, was the fatal flaw of the notorious LA Times poll from the last election (which used a panel and missed by 5 points and predicted a significant popular vote victory by Trump). For instance, if the probability of falsely saying that someone had previously voted for $A$ is $L_A$, and for $B$ is $L_B$ then $L_A -L_B$ would produce the same effect as $f_{BA}-f_{AB}$ on the calculation. It's possible that you'd need to correct for that using some sort of Bayesian prior, but at the moment, I don't have deep thoughts about how that would be done.
But there are advantages to this approach as well, besides reducing the formal errorbars. Since election polling is essentially looking for changes in behavior at or near the margins, this approach is much more sensitive to focusing on those changes in behavior. What's more, it's less sensitive to poor sampling by parties. Suppose you inadvertently poll too many Dems, for instance. Traditional polling would over-estimate the Dem result in the next election, but this won't.
* Most reporting gives the "margin of error" (MOE), which is $2\sigma$, corresponding to a 95% likelihood range.
I'd be very interested in hearing your thoughts (especially if your thoughts consist of: this is already done, dummy, and it's called X).
Oh, and apologies to those on mobile devices for which the LaTeX parsing seems not to be working.
Here goes:
I've been thinking that an effective way to poll people who've participated in past elections is to ask them who they voted for (or if they voted) in the last election and then to ask who or which party they'll vote for now.
Say that the last race was a nearly 50-50 split, if the $f_{AB} \ll 1$ is the fraction of A voters switching from parties ($A\rightarrow B$) and $f_{BA} \ll 1$(the number going from $B\rightarrow A$), then the vote share for party A will be:
$$p_A=0.5\times (1-f_{AB}+f_{BA})$$
But the uncertainty* in this will be:
$$\sigma_A=\sqrt{\sigma_{AB}^2+\sigma_{BA}^2}$$
where
$$\sigma_{AB}^2\simeq \frac{0.5\times f_{AB}(1-f_{AB})}{N}$$
So, taking $1-f_{AB}\simeq 1$ we get:
$$\sigma_A=\frac{0.5\times \sqrt{f_{AB}+f_{BA}}}{\sqrt{N}}$$
This is to be compared with the normal error bars for simply asking "who will you vote for?" which yields
$$\sigma_{A,traditional}=\frac{0.5}{\sqrt{N}}$$
So consider a district which went exactly 50-50 in the 2016 election but which now approximately 4% of Trump voters are now feeling regrets and would switch to voting Dem (and no Clinton voters switch). This means that a perfect poll would produce a 52-48 result, in favor of the Dem. The error bars are reduced by approximately a factor of:
$$\frac{\sigma_A}{\sigma_{A,traditional}}\simeq \sqrt{f_{AB}+f_{BA}}=\sqrt{0.04}= 0.2$$
Put another way, you'd get roughly the same errorbars by interviewing 40 people under the new approach as you would with interviewing 1000 people under the old.
Or consider a 400 person survey. Under the traditional approach, your formal uncertainty would be $\sigma_A=2.5\%$. Under the new approach, you'd expect (with 4% of Republican "switchers") to get $\sim 8$ to tell you that they will switch. Those are the only ones you're looking for. In this case, you'd get a formal error of only $0.5\%$.
You'd get a similar result by asking the third option (both from the previous and next election) as to whether they'd voted or intended to vote.
Now, before you jump in with every conceivable objection, I realize that a major issue is that people may simply lie about who they voted for (because of virtue signalling, or to skew the results). This, indeed, was the fatal flaw of the notorious LA Times poll from the last election (which used a panel and missed by 5 points and predicted a significant popular vote victory by Trump). For instance, if the probability of falsely saying that someone had previously voted for $A$ is $L_A$, and for $B$ is $L_B$ then $L_A -L_B$ would produce the same effect as $f_{BA}-f_{AB}$ on the calculation. It's possible that you'd need to correct for that using some sort of Bayesian prior, but at the moment, I don't have deep thoughts about how that would be done.
But there are advantages to this approach as well, besides reducing the formal errorbars. Since election polling is essentially looking for changes in behavior at or near the margins, this approach is much more sensitive to focusing on those changes in behavior. What's more, it's less sensitive to poor sampling by parties. Suppose you inadvertently poll too many Dems, for instance. Traditional polling would over-estimate the Dem result in the next election, but this won't.
* Most reporting gives the "margin of error" (MOE), which is $2\sigma$, corresponding to a 95% likelihood range.
Comments
Post a Comment