About Aggregation, a plea for transparency

Yesterday, Silver’s basic model in Florida went haywire. Clinton had gained in the previous four polls and his probability of Clinton win decreased. Poll Delta was positive for Clinton with negative Model Alpha (coefficient of victory). This means that other factors were outweighing new polling information. These could legitimately include lagged weighting, although any world where positive polls for one candidate could decrease their probability is quite rare. When combined with earlier misreads on the election and process, such as arguing that positive polls for Trump were a bad sign for his candidacy people started to question the model. My best guess would be that this was some combination of lag-weighting and salting state polls with national data. Low Clinton delta combined with an artificial salt score and a lag weight for some older state polls could explain it.

By morning his models had reset to track closer with actual polling data. Clinton appears to be well positioned to win the election. Sam Wang, a political scientist and aggregator, has been clear that this is not a particularly turbulent election. The polls have been remarkably consistent as has his model.

Here are a few important take aways:

A. Aggregated poll methods were an improvement over the simplistic horse-race, especially when those polls are matched to the actual unit of measurement, state votes.

B. Good modeling methods are not dramatic. No. The model won’t wildly swing based on a new poll. If 100 polls show Timmy winning, one poll with Billy winning won’t switch it back to 50/50.

C. This process is out of phase with the news cycle. Once the news cycle starts driving model runs, the model likely will be pushed to the breaking point.

Poll aggregation, like many predictive methods, have instead become the fodder for news the day of the event. This is not optimal. The domain of near future news is interesting, but should be restricted to the best methods. Sometimes there is no news. That is perfectly alright. The Presidential election algorithm is pretty straight forward. Basically, this election has not been particularly volatile, so why run the model after every poll?

Another Note about Policy Debate

One of the factors that keeps my peer group returning to Silver is his history as a high school debater. I debated in the same era and back then a popular argument was called the politics disadvantage. You would argue that the work of Clinton to pass your bill would preclude the passage of some other bill, like the African Growth and Opportunity Act, Debt Relief, KEDO, PNTR with China, or the CTBT. These are all important weighty matters. To get to this point one needs to win a few specific arguments: uniqueness – X will pass now, link – your plan would require substantial capital to pass, internal link – decreased capital means no passage of the other issue, impact – X is really good.

Of course one could argue defensively that the disadvantage is silly for any number of reasons, offensively one can turn the initial link (the plan increases capital/is popular) the impact (CTBT is bad) or the internal link by answering the theory of politics posed by the disadvantage. Some teams carried entire files devoted to responding exclusively on the internal link level. So you would say, political capital (or losers-lose) they could then read cards about Presidential losses causing policy wins (losers-win) or if the disadvantage was reversed they would read (winners-lose) or (winners-win) to argue that the act of spending capital makes more capital. Teams could plug and play the different internal link scenarios. A team with great internal link blocks could mange their research burden and exposure to risk by debating a known quantity.

Unlike an affirmative response to a politics disadvantage, there are not plug and play political theories that can make polls go backwards.

DebateScrape

About Aggregation, a plea for transparency

An empirical approach to social media and election research.