API Reference

Models

Base Model

Base class for election forecasting models

class src.models.base_model.ElectionForecastModel(name: str, seed: int | None = None)[source]

Bases: ABC

Abstract base class for election forecasting models

__init__(name: str, seed: int | None = None) → None[source]

Initialize the model

Parameters:

name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)

abstractmethod fit_and_forecast(state_polls: DataFrame, forecast_date: Timestamp, election_date: Timestamp, actual_margin: float, rng: Generator | None = None) → Dict[str, float][source]

Fit model on polls up to forecast_date and predict election outcome.

Parameters:

state_polls (pd.DataFrame) – DataFrame with columns [middate, dem_proportion, margin, samplesize, pollster]
forecast_date (pd.Timestamp) – Date to make forecast from
election_date (pd.Timestamp) – Election day
actual_margin (float) – Actual two-party margin (for evaluation)
rng (np.random.Generator, optional) – NumPy random generator for reproducibility (default: None)

Returns:

Dictionary with keys: win_probability, predicted_margin, margin_std

Return type:

dict

load_data() → Tuple[DataFrame, Dict[str, float]][source]

Load polling and election results data

Returns:: tuple of (polls DataFrame, actual_margin dict)

run_forecast(forecast_dates: List[Timestamp] | None = None, min_polls: int = 10, verbose: bool = False, n_workers: int | None = None) → DataFrame[source]

Run forecast across multiple dates and states

Parameters:

forecast_dates – List of pd.Timestamp dates to forecast from (default: 4 dates in Oct-Nov 2016)
min_polls – Minimum number of polls required to forecast a state
verbose – If True, print processing status for each state
n_workers – Number of parallel workers (default: None for sequential, >1 for parallel)

Returns:

state, forecast_date, win_probability, predicted_margin, margin_std, actual_margin

Return type:

DataFrame with columns

save_results() → DataFrame[source]

Save predictions and metrics to CSV and text files

Creates predictions/{model_name}.csv and metrics/{model_name}.txt

Returns:: forecast_date, n_states, brier_score, log_loss, mae_margin
Return type:: DataFrame with columns

plot_state(state: str) → None[source]

Create time-series plot for a specific state showing model predictions over time

Parameters:: state – Two-letter state code (e.g., ‘FL’, ‘PA’)

Saves:: PNG file to plots/{model_name}/{state}.png

Poll Average Model

Simple Poll-of-Polls Average Model Weighted average of recent polls with empirical uncertainty

class src.models.poll_average.PollAverageModel(seed=None)[source]

Bases: ElectionForecastModel

Simple weighted poll average baseline

__init__(seed=None)[source]

Initialize the model

Parameters:

name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]: Compute weighted poll average with empirical uncertainty

Kalman Diffusion Model

Kalman Filter Diffusion Model with Improved Regularization Brownian motion with drift + pollster biases + fundamentals prior

class src.models.kalman_diffusion.KalmanDiffusionModel(seed=None)[source]

Bases: ElectionForecastModel

Improved diffusion model with Kalman filter/RTS smoother

__init__(seed=None)[source]

Initialize the model

Parameters:

name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)

kalman_filter_smoother(dates, observations, obs_variance, mu, sigma2)[source]

Kalman filter + RTS smoother for Brownian motion with drift

Parameters:

dates – Array of time points (in days)
observations – Array of poll margins
obs_variance – Array of observation variances
mu – Drift parameter
sigma2 – Diffusion variance

Returns:

smoothed state estimates and variances

Return type:

tuple of (x_smooth, P_smooth)

fit_state_diffusion(state_polls, prior_mean=0.0, max_iter=10)[source]

Fit diffusion model with EM algorithm

Parameters:

state_polls – DataFrame of polls for a single state
prior_mean – Prior mean for fundamentals
max_iter – Maximum number of EM iterations

Returns:

tuple of (mu, sigma2, pollster_bias, x_smooth, P_smooth, dates)

simulate_forward(x_start, P_start, mu, sigma2, days, N=2000, rng=None)[source]

Simulate forward with Euler-Maruyama method

Parameters:

x_start – Initial state estimate
P_start – Initial state variance
mu – Drift parameter
sigma2 – Diffusion variance
days – Number of days to simulate forward
N – Number of simulation samples
rng – NumPy random generator (default: None uses default_rng)

Returns:

Array of final margin values (length N)

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]: Fit Kalman diffusion and forecast election outcome

Improved Kalman Model

Improved Kalman Diffusion Model

Key improvements over basic Kalman: - Increased minimum diffusion variance - Better regularized pollster biases - Smaller forecast horizon uncertainty - More conservative probability clipping

class src.models.improved_kalman.ImprovedKalmanModel(seed=None)[source]

Bases: ElectionForecastModel

Improved Kalman filter diffusion model

__init__(seed=None)[source]

Initialize the model

Parameters:

name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)

kalman_filter_rts(dates, observations, obs_variance, mu, sigma2)[source]: Kalman filter + RTS smoother

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]: Fit improved Kalman diffusion and forecast

simulate_forward(x_start, P_start, mu, sigma2, days, N=2000, rng=None)[source]

Simulate forward with Euler-Maruyama

Parameters:

x_start – Initial state estimate
P_start – Initial state variance
mu – Drift parameter
sigma2 – Diffusion variance
days – Number of days to simulate forward
N – Number of simulation samples
rng – NumPy random generator (default: None uses default_rng)

Returns:

Array of final margin values (length N)

Hierarchical Bayes Model

Hierarchical Bayesian Ensemble with Systematic Bias Adjustment (HBE-SBA)

Combines: 1. Fundamentals prior from historical results 2. Kalman-filtered polls with house effects 3. Adaptive systematic bias correction 4. Proper uncertainty quantification

class src.models.hierarchical_bayes.HierarchicalBayesModel(seed=None)[source]

Bases: ElectionForecastModel

Hierarchical Bayesian ensemble with bias correction

__init__(seed=None)[source]

Initialize the model

Parameters:

name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)

estimate_house_effects(all_polls, lambda_shrink=10)[source]

Estimate pollster house effects with hierarchical shrinkage

Parameters:

all_polls – DataFrame of all polling data
lambda_shrink – Shrinkage parameter (higher = more shrinkage to zero)

Returns:

dict mapping pollster name to estimated house effect

kalman_filter_rts(dates, observations, obs_variance, mu, sigma2)[source]

Kalman filter with Rauch-Tung-Striebel (RTS) backward smoother

Parameters:

dates – Array of time points (in days)
observations – Array of poll margins
obs_variance – Array of observation variances
mu – Drift parameter
sigma2 – Diffusion variance

Returns:

smoothed state estimates and variances

Return type:

tuple of (x_smooth, P_smooth)

fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]: Hierarchical Bayesian forecast with bias correction

Utilities

Data Utilities

Shared data loading and preprocessing utilities

src.utils.data_utils.load_polling_data() → DataFrame[source]

Load and preprocess 2016 polling data from FiveThirtyEight

Returns:: middate, dem, rep, margin, dem_proportion, samplesize, pollster, state_code
Return type:: DataFrame with columns

src.utils.data_utils.load_election_results() → Dict[str, float][source]

Load actual 2016 election results from MIT Election Lab

Returns:: dict mapping state code to actual Democratic margin

src.utils.data_utils.load_fundamentals() → Dict[str, Dict[str, float]][source]

Load historical election results for fundamentals prior

Computes weighted average of 2012 (70%) and 2008 (30%) results

Returns:: margin, margin_2012, margin_2008
Return type:: dict mapping state code to fundamentals dict with keys

src.utils.data_utils.get_state_list(polls: DataFrame, actual_results: Dict[str, float]) → List[str][source]

Get list of states with sufficient polling data

Parameters:

polls – DataFrame of polling data
actual_results – dict of actual election results

Returns:

list of state codes

src.utils.data_utils.compute_metrics(predictions_df: DataFrame) → DataFrame[source]

Compute evaluation metrics from predictions

Parameters:: predictions_df – DataFrame with columns: forecast_date, win_probability, predicted_margin, actual_margin
Returns:: forecast_date, n_states, brier_score, log_loss, mae_margin
Return type:: DataFrame with columns

Scripts

Run All Models

src.scripts.run_all_models.discover_models()[source]

Auto-discover all model classes using importlib.resources

Returns:: List of tuples (model_class_name, model_class) sorted by name

src.scripts.run_all_models.generate_forecast_dates(n_dates, election_date='2016-11-08', start_date='2016-09-01')[source]

Generate n evenly-spaced forecast dates between start_date and election_date

Parameters:

n_dates – Number of forecast dates to generate
election_date – Election day
start_date – Earliest date to start forecasting from

Returns:

List of pd.Timestamp forecast dates

src.scripts.run_all_models.main()[source]

Compare Models

Compare all forecasting models

Generates comparison tables, rankings, and plots for all models

src.scripts.compare_models.parse_metrics(filename)[source]

Parse metrics from text file

Parameters:: filename – Path to metrics text file
Returns:: date, brier, log_loss, mae
Return type:: DataFrame with columns

src.scripts.compare_models.main()[source]: Load all model metrics, compare performance, and generate visualizations

Generate Plots

Generate state-level plots for all models

Usage:: election-plot # Default: plot key swing states election-plot –all # Plot all states with sufficient data election-plot –states FL PA MI WI # Plot specific states

src.scripts.generate_plots.discover_models()[source]: Auto-discover all model classes using importlib.resources

src.scripts.generate_plots.main()[source]

Run All Pipeline

src.scripts.run_all.run_with_temp_argv(argv, func)[source]: Temporarily override sys.argv to call a subcommand.

src.scripts.run_all.run_step(step_number, title, func, argv=None)[source]: Run a step with a spinner, timing, and pretty output.

src.scripts.run_all.main()[source]