API Reference

Models

Base Model

Base class for election forecasting models.

This version is generalized so that the election year and election date are not hard-coded to 2016. The date is taken from src.utils.data_utils.get_current_election_date(), which in turn is controlled by set_election_config(…).

class src.models.base_model.ElectionForecastModel(name: str, seed: int | None = None)[source]

Bases: ABC

Abstract base class for election forecasting models.

__init__(name: str, seed: int | None = None) None[source]

Initialize the model.

Parameters:
  • name – Model name.

  • seed – Random seed for reproducibility (default: None for non-deterministic).

abstractmethod fit_and_forecast(state_polls: DataFrame, forecast_date: Timestamp, election_date: Timestamp, actual_margin: float, rng: Generator | None = None) Dict[str, float][source]

Fit model on polls up to forecast_date and predict election outcome.

Must return a dict with keys:
  • “win_probability”

  • “predicted_margin”

  • optionally “margin_std”

load_data() Tuple[DataFrame, Dict[str, float]][source]

Load polling and election results data.

Returns:

tuple of (polls DataFrame, actual_margin dict)

run_forecast(forecast_dates: List[Timestamp] | None = None, min_polls: int = 10, verbose: bool = False, n_workers: int | None = None) DataFrame[source]

Run forecast across multiple dates and states.

Parameters:
  • forecast_dates – List of forecast dates. If None, use four default dates in October/November of the election year.

  • min_polls – Minimum number of polls required to forecast a state.

  • verbose – If True, log per-state progress.

  • n_workers – If None or <=1, run sequentially; otherwise use ProcessPoolExecutor with the given number of workers.

Returns:

state, forecast_date, win_probability, predicted_margin, margin_std, actual_margin

Return type:

DataFrame of predictions with columns

save_results() DataFrame[source]

Save predictions and metrics to CSV and text files.

Returns:

DataFrame of metrics as returned by compute_metrics().

plot_state(state: str) None[source]

Create time-series plot for a specific state showing model predictions over time.

Saves PNG to both:

plots/{model_name}/{state}.png (legacy path, used by tests) plots/{model_name}/{election_year}/{state}.png (year-specific path)

Poll Average Model

Simple Poll-of-Polls Average Model Weighted average of recent polls with empirical uncertainty

class src.models.poll_average.PollAverageModel(seed=None)[source]

Bases: ElectionForecastModel

Simple weighted poll average baseline

__init__(seed=None)[source]

Initialize the model.

Parameters:
  • name – Model name.

  • seed – Random seed for reproducibility (default: None for non-deterministic).

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]

Compute weighted poll average with empirical uncertainty

Kalman Diffusion Model

Kalman Filter Diffusion Model with Improved Regularization Brownian motion with drift + pollster biases + fundamentals prior

class src.models.kalman_diffusion.KalmanDiffusionModel(seed=None)[source]

Bases: ElectionForecastModel

Improved diffusion model with Kalman filter/RTS smoother

__init__(seed=None)[source]

Initialize the model.

Parameters:
  • name – Model name.

  • seed – Random seed for reproducibility (default: None for non-deterministic).

kalman_filter_smoother(dates, observations, obs_variance, mu, sigma2)[source]

Kalman filter + RTS smoother for Brownian motion with drift

Parameters:
  • dates – Array of time points (in days)

  • observations – Array of poll margins

  • obs_variance – Array of observation variances

  • mu – Drift parameter

  • sigma2 – Diffusion variance

Returns:

smoothed state estimates and variances

Return type:

tuple of (x_smooth, P_smooth)

fit_state_diffusion(state_polls, prior_mean=0.0, max_iter=10)[source]

Fit diffusion model with EM algorithm

Parameters:
  • state_polls – DataFrame of polls for a single state

  • prior_mean – Prior mean for fundamentals

  • max_iter – Maximum number of EM iterations

Returns:

tuple of (mu, sigma2, pollster_bias, x_smooth, P_smooth, dates)

simulate_forward(x_start, P_start, mu, sigma2, days, N=2000, rng=None)[source]

Simulate forward with Euler-Maruyama method

Parameters:
  • x_start – Initial state estimate

  • P_start – Initial state variance

  • mu – Drift parameter

  • sigma2 – Diffusion variance

  • days – Number of days to simulate forward

  • N – Number of simulation samples

  • rng – NumPy random generator (default: None uses default_rng)

Returns:

Array of final margin values (length N), clipped to [-1, 1]

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]

Fit Kalman diffusion and forecast election outcome

Improved Kalman Model

Improved Kalman Diffusion Model

Key improvements over basic Kalman: - Increased minimum diffusion variance - Better regularized pollster biases - Smaller forecast horizon uncertainty - More conservative probability clipping

class src.models.improved_kalman.ImprovedKalmanModel(seed=None)[source]

Bases: ElectionForecastModel

Improved Kalman filter diffusion model

__init__(seed=None)[source]

Initialize the model.

Parameters:
  • name – Model name.

  • seed – Random seed for reproducibility (default: None for non-deterministic).

kalman_filter_rts(dates, observations, obs_variance, mu, sigma2)[source]

Kalman filter + RTS smoother

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]

Fit improved Kalman diffusion and forecast

simulate_forward(x_start, P_start, mu, sigma2, days, N=2000, rng=None)[source]

Simulate forward with Euler-Maruyama

Parameters:
  • x_start – Initial state estimate

  • P_start – Initial state variance

  • mu – Drift parameter

  • sigma2 – Diffusion variance

  • days – Number of days to simulate forward

  • N – Number of simulation samples

  • rng – NumPy random generator (default: None uses default_rng)

Returns:

Array of final margin values (length N)

Hierarchical Bayes Model

Hierarchical Bayesian Ensemble with Systematic Bias Adjustment (HBE-SBA)

Combines: 1. Fundamentals prior from historical results 2. Kalman-filtered polls with house effects 3. Adaptive systematic bias correction 4. Proper uncertainty quantification

class src.models.hierarchical_bayes.HierarchicalBayesModel(seed=None)[source]

Bases: ElectionForecastModel

Hierarchical Bayesian ensemble with bias correction

__init__(seed=None)[source]

Initialize the model.

Parameters:
  • name – Model name.

  • seed – Random seed for reproducibility (default: None for non-deterministic).

estimate_house_effects(all_polls, lambda_shrink=10)[source]

Estimate pollster house effects with hierarchical shrinkage

Parameters:
  • all_polls – DataFrame of all polling data

  • lambda_shrink – Shrinkage parameter (higher = more shrinkage to zero)

Returns:

dict mapping pollster name to estimated house effect

kalman_filter_rts(dates, observations, obs_variance, mu, sigma2)[source]

Kalman filter with Rauch-Tung-Striebel (RTS) backward smoother

Parameters:
  • dates – Array of time points (in days)

  • observations – Array of poll margins

  • obs_variance – Array of observation variances

  • mu – Drift parameter

  • sigma2 – Diffusion variance

Returns:

smoothed state estimates and variances

Return type:

tuple of (x_smooth, P_smooth)

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]

Hierarchical Bayesian forecast with bias correction

Utilities

Data Utilities

Shared data loading and preprocessing utilities.

This version supports multiple election cycles (e.g. 2012, 2016, 2020) and both the original 2016 timeseries file and FiveThirtyEight-style long polls files (like 2020_president_polls.csv).

src.utils.data_utils.set_election_config(year: int = 2016, polls_file: str | None = None) None[source]

Configure which election cycle the rest of the module should use.

Parameters:
  • year – Election year (e.g. 2012, 2016, 2020).

  • polls_file

    Optional path to a FiveThirtyEight-style polls CSV. If None, we:

    • use the original 2016 timeseries file for year=2016

    • otherwise fall back to f”data/polls/{year}_president_polls.csv”

src.utils.data_utils.get_election_date(year: int) str[source]

Return the election day (YYYY-MM-DD) for a given year.

src.utils.data_utils.get_current_election_date() str[source]

Convenience wrapper that uses the currently configured election year.

src.utils.data_utils.load_polling_data() DataFrame[source]

Load polling data for the currently configured election.

Behaviour:
  • If CURRENT_ELECTION_YEAR == 2016 and CURRENT_POLLS_FILE is None, this uses the original _load_polling_data_2016() to preserve backwards compatibility.

  • Otherwise, it expects a FiveThirtyEight-style CSV (either provided via CURRENT_POLLS_FILE or inferred as data/polls/{year}_president_polls.csv) and parses it with _load_polling_data_fte_long.

src.utils.data_utils.load_election_results() Dict[str, float][source]

Public wrapper used by models.

Uses the currently configured election year.

src.utils.data_utils.load_fundamentals() Dict[str, Dict[str, float]][source]

Load historical election results for fundamentals prior.

Computes weighted average of 2012 (70%) and 2008 (30%) results.

NOTE: This is still the same 2016-oriented prior as in the original project. If you want a 2012 or 2020-specific fundamentals prior, you can generalise this function further (e.g. use (2008, 2004) for 2012, or (2016, 2012) for 2020).

Returns:

margin, margin_2012, margin_2008

Return type:

dict mapping state code to fundamentals dict with keys

src.utils.data_utils.get_state_list(polls: DataFrame, actual_results: Dict[str, float]) List[str][source]

Get list of states with sufficient polling data.

Parameters:
  • polls – DataFrame of polling data

  • actual_results – dict of actual election results

Returns:

list of state codes

src.utils.data_utils.compute_metrics(predictions_df: DataFrame) DataFrame[source]

Compute evaluation metrics from predictions.

Parameters:

predictions_df – DataFrame with columns: forecast_date, win_probability, predicted_margin, actual_margin

Returns:

forecast_date, n_states, brier_score, log_loss, mae_margin

Return type:

DataFrame with columns

Scripts

Run All Models

src.scripts.run_all_models.discover_models()[source]

Auto-discover all model classes using importlib.resources.

Returns:

List of tuples (model_class_name, model_class) sorted by name.

src.scripts.run_all_models.generate_forecast_dates(n_dates: int, election_date: str | None = None, start_date: str | None = None) List[Timestamp][source]

Generate n evenly-spaced forecast dates between start_date and election_date.

Parameters:
  • n_dates – Number of forecast dates to generate.

  • election_date – Election day as a string (YYYY-MM-DD). If None, use the currently configured election date.

  • start_date – Earliest date to start forecasting from. If None, default to September 1 of the election year.

Returns:

List of pd.Timestamp forecast dates.

src.scripts.run_all_models.main()[source]

Compare Models

Compare all forecasting models

Generates comparison tables, rankings, and plots for all models

src.scripts.compare_models.parse_metrics(filename)[source]

Parse metrics from text file

Parameters:

filename – Path to metrics text file

Returns:

date, brier, log_loss, mae

Return type:

DataFrame with columns

src.scripts.compare_models.main()[source]

Load all model metrics, compare performance, and generate visualizations

Generate Plots

Generate state-level plots for all models

Usage:

election-plot # Default: plot key swing states (for 2016) election-plot –all # Plot all states with sufficient data election-plot –states FL PA MI WI # Plot specific states election-plot –year 2020 –all # Plot all states for 2020 election-plot –year 2020 –polls-file data/polls/2020_president_polls.csv –all

src.scripts.generate_plots.discover_models()[source]

Auto-discover all model classes using importlib.resources.

src.scripts.generate_plots.main()[source]

Run All Pipeline

src.scripts.run_all.run_with_temp_argv(argv, func)[source]

Temporarily override sys.argv to call a subcommand.

src.scripts.run_all.run_step(step_number, title, func, argv=None)[source]

Run a step with a spinner, timing, and pretty output.

src.scripts.run_all.main()[source]