API Reference

Models

Base Model

Base class for election forecasting models

class src.models.base_model.ElectionForecastModel(name: str, seed: int | None = None)[source]

Bases: ABC

Abstract base class for election forecasting models

__init__(name: str, seed: int | None = None) None[source]

Initialize the model

Parameters:
  • name – Model name

  • seed – Random seed for reproducibility (default: None for non-deterministic)

abstractmethod fit_and_forecast(state_polls: DataFrame, forecast_date: Timestamp, election_date: Timestamp, actual_margin: float, rng: Generator | None = None) Dict[str, float][source]

Fit model on polls up to forecast_date and predict election outcome.

Parameters:
  • state_polls (pd.DataFrame) – DataFrame with columns [middate, dem_proportion, margin, samplesize, pollster]

  • forecast_date (pd.Timestamp) – Date to make forecast from

  • election_date (pd.Timestamp) – Election day

  • actual_margin (float) – Actual two-party margin (for evaluation)

  • rng (np.random.Generator, optional) – NumPy random generator for reproducibility (default: None)

Returns:

Dictionary with keys: win_probability, predicted_margin, margin_std

Return type:

dict

load_data() Tuple[DataFrame, Dict[str, float]][source]

Load polling and election results data

Returns:

tuple of (polls DataFrame, actual_margin dict)

run_forecast(forecast_dates: List[Timestamp] | None = None, min_polls: int = 10, verbose: bool = False, n_workers: int | None = None) DataFrame[source]

Run forecast across multiple dates and states

Parameters:
  • forecast_dates – List of pd.Timestamp dates to forecast from (default: 4 dates in Oct-Nov 2016)

  • min_polls – Minimum number of polls required to forecast a state

  • verbose – If True, print processing status for each state

  • n_workers – Number of parallel workers (default: None for sequential, >1 for parallel)

Returns:

state, forecast_date, win_probability, predicted_margin, margin_std, actual_margin

Return type:

DataFrame with columns

save_results() DataFrame[source]

Save predictions and metrics to CSV and text files

Creates predictions/{model_name}.csv and metrics/{model_name}.txt

Returns:

forecast_date, n_states, brier_score, log_loss, mae_margin

Return type:

DataFrame with columns

plot_state(state: str) None[source]

Create time-series plot for a specific state showing model predictions over time

Parameters:

state – Two-letter state code (e.g., ‘FL’, ‘PA’)

Saves:

PNG file to plots/{model_name}/{state}.png

Poll Average Model

Simple Poll-of-Polls Average Model Weighted average of recent polls with empirical uncertainty

class src.models.poll_average.PollAverageModel(seed=None)[source]

Bases: ElectionForecastModel

Simple weighted poll average baseline

__init__(seed=None)[source]

Initialize the model

Parameters:
  • name – Model name

  • seed – Random seed for reproducibility (default: None for non-deterministic)

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]

Compute weighted poll average with empirical uncertainty

Kalman Diffusion Model

Kalman Filter Diffusion Model with Improved Regularization Brownian motion with drift + pollster biases + fundamentals prior

class src.models.kalman_diffusion.KalmanDiffusionModel(seed=None)[source]

Bases: ElectionForecastModel

Improved diffusion model with Kalman filter/RTS smoother

__init__(seed=None)[source]

Initialize the model

Parameters:
  • name – Model name

  • seed – Random seed for reproducibility (default: None for non-deterministic)

kalman_filter_smoother(dates, observations, obs_variance, mu, sigma2)[source]

Kalman filter + RTS smoother for Brownian motion with drift

Parameters:
  • dates – Array of time points (in days)

  • observations – Array of poll margins

  • obs_variance – Array of observation variances

  • mu – Drift parameter

  • sigma2 – Diffusion variance

Returns:

smoothed state estimates and variances

Return type:

tuple of (x_smooth, P_smooth)

fit_state_diffusion(state_polls, prior_mean=0.0, max_iter=10)[source]

Fit diffusion model with EM algorithm

Parameters:
  • state_polls – DataFrame of polls for a single state

  • prior_mean – Prior mean for fundamentals

  • max_iter – Maximum number of EM iterations

Returns:

tuple of (mu, sigma2, pollster_bias, x_smooth, P_smooth, dates)

simulate_forward(x_start, P_start, mu, sigma2, days, N=2000, rng=None)[source]

Simulate forward with Euler-Maruyama method

Parameters:
  • x_start – Initial state estimate

  • P_start – Initial state variance

  • mu – Drift parameter

  • sigma2 – Diffusion variance

  • days – Number of days to simulate forward

  • N – Number of simulation samples

  • rng – NumPy random generator (default: None uses default_rng)

Returns:

Array of final margin values (length N)

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]

Fit Kalman diffusion and forecast election outcome

Improved Kalman Model

Improved Kalman Diffusion Model

Key improvements over basic Kalman: - Increased minimum diffusion variance - Better regularized pollster biases - Smaller forecast horizon uncertainty - More conservative probability clipping

class src.models.improved_kalman.ImprovedKalmanModel(seed=None)[source]

Bases: ElectionForecastModel

Improved Kalman filter diffusion model

__init__(seed=None)[source]

Initialize the model

Parameters:
  • name – Model name

  • seed – Random seed for reproducibility (default: None for non-deterministic)

kalman_filter_rts(dates, observations, obs_variance, mu, sigma2)[source]

Kalman filter + RTS smoother

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]

Fit improved Kalman diffusion and forecast

simulate_forward(x_start, P_start, mu, sigma2, days, N=2000, rng=None)[source]

Simulate forward with Euler-Maruyama

Parameters:
  • x_start – Initial state estimate

  • P_start – Initial state variance

  • mu – Drift parameter

  • sigma2 – Diffusion variance

  • days – Number of days to simulate forward

  • N – Number of simulation samples

  • rng – NumPy random generator (default: None uses default_rng)

Returns:

Array of final margin values (length N)

Hierarchical Bayes Model

Hierarchical Bayesian Ensemble with Systematic Bias Adjustment (HBE-SBA)

Combines: 1. Fundamentals prior from historical results 2. Kalman-filtered polls with house effects 3. Adaptive systematic bias correction 4. Proper uncertainty quantification

class src.models.hierarchical_bayes.HierarchicalBayesModel(seed=None)[source]

Bases: ElectionForecastModel

Hierarchical Bayesian ensemble with bias correction

__init__(seed=None)[source]

Initialize the model

Parameters:
  • name – Model name

  • seed – Random seed for reproducibility (default: None for non-deterministic)

estimate_house_effects(all_polls, lambda_shrink=10)[source]

Estimate pollster house effects with hierarchical shrinkage

Parameters:
  • all_polls – DataFrame of all polling data

  • lambda_shrink – Shrinkage parameter (higher = more shrinkage to zero)

Returns:

dict mapping pollster name to estimated house effect

kalman_filter_rts(dates, observations, obs_variance, mu, sigma2)[source]

Kalman filter with Rauch-Tung-Striebel (RTS) backward smoother

Parameters:
  • dates – Array of time points (in days)

  • observations – Array of poll margins

  • obs_variance – Array of observation variances

  • mu – Drift parameter

  • sigma2 – Diffusion variance

Returns:

smoothed state estimates and variances

Return type:

tuple of (x_smooth, P_smooth)

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]

Hierarchical Bayesian forecast with bias correction

Utilities

Data Utilities

Shared data loading and preprocessing utilities

src.utils.data_utils.load_polling_data() DataFrame[source]

Load and preprocess 2016 polling data from FiveThirtyEight

Returns:

middate, dem, rep, margin, dem_proportion, samplesize, pollster, state_code

Return type:

DataFrame with columns

src.utils.data_utils.load_election_results() Dict[str, float][source]

Load actual 2016 election results from MIT Election Lab

Returns:

dict mapping state code to actual Democratic margin

src.utils.data_utils.load_fundamentals() Dict[str, Dict[str, float]][source]

Load historical election results for fundamentals prior

Computes weighted average of 2012 (70%) and 2008 (30%) results

Returns:

margin, margin_2012, margin_2008

Return type:

dict mapping state code to fundamentals dict with keys

src.utils.data_utils.get_state_list(polls: DataFrame, actual_results: Dict[str, float]) List[str][source]

Get list of states with sufficient polling data

Parameters:
  • polls – DataFrame of polling data

  • actual_results – dict of actual election results

Returns:

list of state codes

src.utils.data_utils.compute_metrics(predictions_df: DataFrame) DataFrame[source]

Compute evaluation metrics from predictions

Parameters:

predictions_df – DataFrame with columns: forecast_date, win_probability, predicted_margin, actual_margin

Returns:

forecast_date, n_states, brier_score, log_loss, mae_margin

Return type:

DataFrame with columns

Scripts

Run All Models

src.scripts.run_all_models.discover_models()[source]

Auto-discover all model classes using importlib.resources

Returns:

List of tuples (model_class_name, model_class) sorted by name

src.scripts.run_all_models.generate_forecast_dates(n_dates, election_date='2016-11-08', start_date='2016-09-01')[source]

Generate n evenly-spaced forecast dates between start_date and election_date

Parameters:
  • n_dates – Number of forecast dates to generate

  • election_date – Election day

  • start_date – Earliest date to start forecasting from

Returns:

List of pd.Timestamp forecast dates

src.scripts.run_all_models.main()[source]

Compare Models

Compare all forecasting models

Generates comparison tables, rankings, and plots for all models

src.scripts.compare_models.parse_metrics(filename)[source]

Parse metrics from text file

Parameters:

filename – Path to metrics text file

Returns:

date, brier, log_loss, mae

Return type:

DataFrame with columns

src.scripts.compare_models.main()[source]

Load all model metrics, compare performance, and generate visualizations

Generate Plots

Generate state-level plots for all models

Usage:

election-plot # Default: plot key swing states election-plot –all # Plot all states with sufficient data election-plot –states FL PA MI WI # Plot specific states

src.scripts.generate_plots.discover_models()[source]

Auto-discover all model classes using importlib.resources

src.scripts.generate_plots.main()[source]

Run All Pipeline

src.scripts.run_all.run_with_temp_argv(argv, func)[source]

Temporarily override sys.argv to call a subcommand.

src.scripts.run_all.run_step(step_number, title, func, argv=None)[source]

Run a step with a spinner, timing, and pretty output.

src.scripts.run_all.main()[source]