Ad verba per numeros

Tuesday, December 17, 2013, 11:19 PM
If you have reached this post you are probably interested in the feasibility of predicting elections from social media data. If that's the case please let me clarify what I mean by predicting an election:

"To publicly announce the actual results of the election before the election takes place. Those results should be the actual vote share (i.e. percentage) received by any party concurring to the election."

I feel urged to clarify that because I've been told by a few people that I'm confusing "predicting" with "forecasting", and that talking about "predictions" in social science is not actually about making forecasts.

I can, however, accept I'm wrong in that regard; after all, Campbell (2004) or Lewis-Beck (2005) used "forecast" instead of "prediction". Nevertheless, since in common English both "predict" and "forecast" are used almost interchangeably I cannot simply accept the use of "prediction" in the sense of post-facto explanation but only as a statement about a future outcome.

That said, we have to establish a way to determine if the forecast/prediction was accurate or not. Forecasting/predicting the winner is tempting but you just cannot do that. If you don't accept my word please accept that by Campbell (2004). Hence, you must compare your forecasts for vote shares against the actual results. To that end, MAE (Mean Absolute Error) is commonly used although you can try a number of different measures (see Lewis-Beck 2005 for alternatives). However, MAE is just a number and you cannot tell if your MAE is small (which is good) or large (which is bad). Indeed, you have to compare your MAE against other MAE to tell if your forecasting/predictive method is good enough or not.

Certainly, you can try to compare against traditional polls but it could be unfair to compare a method to predict/forecast the future with a method to ask people about their future behavior.

One option is to compare your method against traditional electoral forecasting methods [Oh, you didn't know? Figure it... Political scientists have been trying to forecast/predict elections well before Twitter and Facebook, in fact from the early 1990s –-cf. Lewis-Beck & Rice (1992) and Campbell & Garand (2000)] but they can be quite cumbersome.

Another option is choosing a reasonable baseline. Here I must clarify what a baseline is. A baseline is not a straw-man method so simplistic that it's almost impossible not to beat it. Nope. A baseline is a reasonably simple (but not simplistic) method to produce acceptable results for the problem at hand if no other method is available.

Which could be a reasonable baseline for predicting/forecasting elections? Assuming random results? Obviously not. The most reasonable baseline is assuming the future will repeat the past. That is, that in every election in every place the results will be exactly the same as the prior election.

And here we are. I guess that some teams worldwide are preparing for the many elections that are going to take place in different countries. Since I cannot prepare baselines for all of them I'm guessing (not predicting nor forecasting) that the US elections are going to attract some interest. Therefore, I've collected (from Wikipedia) results for gubernatorial, senate and house of representative elections and they are my prediction/forecast for 2014.

You can access them here.

Gubernatorial and senate are rather simple. I provide there the winner (party) for each race, the vote share for the winner and, as a summary, the number of races won by each party.

The House of Representatives is trickier since most of the states are divided in many districts. Therefore, I've provided the "winner" in each state, the vote share in each state and the number of seats for each party in each state. Please note, that forecasting seats is not a good choice and, hence, vote share should be predicted for each district. However, to be sincere, it is too much effort for me at this point to prepare all of this one year before elections and almost two years before papers predicting/forecasting those elections are published ;)

What's the plan ahead?

November 5, 2014: I'll fill the actual results in the spreadsheet and I'll compute MAE for each kind of election. I'll update the post with comments regarding any forecast/prediction made around that date using social media data.

Feel free to comment anything about the data or the post itself. You can find me in Twitter: @PFCdgayo.


  1. Campbell, James E. "Introduction — The 2004 presidential election forecasts." Political Science and Politics 37.04 (2004): 733-735.
  2. Campbell, James E., and James C. Garand. "Forecasting US National Elections." Before the Vote: Forecasting American National Elections (2000): 3-16.
  3. Lewis-Beck, Michael S. "Election forecasting: principles and practice." The British Journal of Politics & International Relations 7.2 (2005): 145-164.
  4. Lewis-Beck, Michael S., and Tom W. Rice. Forecasting elections. CQ Press, 1992.

Back Next