Forecasting Models

This package implements four different election forecasting models, each with different approaches to handling polling data.

Model Overview

Model

Approach

Complexity

Performance

Poll Average

Weighted average of recent polls

Low

Baseline (yet second best)

Kalman Diffusion

Brownian motion with Kalman filter

Medium

Good

Improved Kalman

Kalman filter with drift and regularization

Medium

Better

Hierarchical Bayes

Bayesian ensemble with bias correction

Highest

Best

1. Poll Average Model

The simplest baseline model that computes a weighted average of recent polls.

Key Features:

  • Uses 14-day polling window

  • Weights polls by sample size

  • Empirical uncertainty estimation

  • Simple horizon adjustment for days until election (decrease uncertainty as we approach election)

File: src/models/poll_average.py

Use Case: Quick baseline forecast, minimal computational cost

2. Kalman Diffusion Model

Implements a Brownian motion model using Kalman filter with Rauch-Tung-Striebel (RTS) smoother.

Key Features:

  • Brownian motion with drift dynamics

  • Forward Kalman filter + backward RTS smoother

  • EM algorithm for parameter estimation

  • Handles irregular time spacing in polls

Mathematical Model:

\[ \begin{align}\begin{aligned}x_{t+1} = x_t + \mu \cdot dt + \epsilon_t, \quad \epsilon_t \sim N(0, \sigma^2 \cdot dt)\\y_t = x_t + v_t, \quad v_t \sim N(0, R_t)\end{aligned}\end{align} \]

where \(\mu\) is drift, \(\sigma^2\) is diffusion variance, and \(R_t\) is observation noise.

File: src/models/kalman_diffusion.py

3. Improved Kalman Model

Enhanced version of Kalman Diffusion with stronger regularization and better uncertainty quantification.

Key Features:

  • Increased minimum diffusion variance

  • Regularized pollster bias estimates

  • Conservative probability clipping

  • Reduced forecast horizon uncertainty

Improvements over Basic Kalman:

  • Minimum diffusion variance: \(\sigma^2_{\min} = 0.003^2\)

  • Pollster bias regularization parameter: \(\lambda = 5.0\)

  • Probability clipping: [0.02, 0.98] instead of [0.01, 0.99]

File: src/models/improved_kalman.py

4. Hierarchical Bayes Model (Best)

The most sophisticated model combining multiple information sources with systematic bias correction.

Key Features:

  1. Fundamentals Prior: Weighted average of 2012 (70%) and 2008 (30%) election results

  2. Kalman-Filtered Polls: RTS smoother on recent polling data

  3. House Effects: Hierarchical shrinkage estimation of pollster biases

  4. Systematic Bias Correction: Adaptive correction for polling errors

  5. Proper Uncertainty Quantification: Combines multiple uncertainty sources

Model Components:

\[ \begin{align}\begin{aligned}\text{Forecast} = w_{\text{prior}} \cdot \mu_{\text{fund}} + w_{\text{polls}} \cdot \mu_{\text{polls}} - \text{bias}\\\text{Total Variance} = \sigma^2_{\text{combined}} + \sigma^2_{\text{evolution}} + \sigma^2_{\text{bias}}\end{aligned}\end{align} \]

House Effects Estimation:

\[h_p = \frac{n_p}{n_p + \lambda} \cdot \bar{r}_p\]

where \(h_p\) is house effect for pollster \(p\), \(n_p\) is number of polls, \(\lambda\) is shrinkage parameter, and \(\bar{r}_p\) is mean residual. Essentially, if a pollster tends to have very different results compared to other pollsters in a similar time period, shrink their effect.

File: src/models/hierarchical_bayes.py

Adding New Models

We have made all scripts scalable so that all one needs to do to add a new forecasting mode is:

  1. Create new file in src/models/

  2. Inherit from ElectionForecastModel base class

  3. Implement the fit_and_forecast() method, which should return:

    • win_probability: float in [0,1]

    • predicted_margin: float (Democratic margin)

    • margin_std: float (standard deviation)

Example:

from src.models.base_model import ElectionForecastModel

class NewModel(ElectionForecastModel):
    def __init__(self):
        super().__init__("new_model")

    def fit_and_forecast(self, state_polls, forecast_date,
                        election_date, actual_margin):
        # Your forecasting logic here
        return {
            "win_probability": p,
            "predicted_margin": m,
            "margin_std": s,
        }