API Reference
Models
Base Model
Base class for election forecasting models
- class src.models.base_model.ElectionForecastModel(name: str, seed: int | None = None)[source]
Bases:
ABCAbstract base class for election forecasting models
- __init__(name: str, seed: int | None = None) None[source]
Initialize the model
- Parameters:
name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)
- abstractmethod fit_and_forecast(state_polls: DataFrame, forecast_date: Timestamp, election_date: Timestamp, actual_margin: float, rng: Generator | None = None) Dict[str, float][source]
Fit model on polls up to forecast_date and predict election outcome.
- Parameters:
state_polls (pd.DataFrame) – DataFrame with columns [middate, dem_proportion, margin, samplesize, pollster]
forecast_date (pd.Timestamp) – Date to make forecast from
election_date (pd.Timestamp) – Election day
actual_margin (float) – Actual two-party margin (for evaluation)
rng (np.random.Generator, optional) – NumPy random generator for reproducibility (default: None)
- Returns:
Dictionary with keys: win_probability, predicted_margin, margin_std
- Return type:
- load_data() Tuple[DataFrame, Dict[str, float]][source]
Load polling and election results data
- Returns:
tuple of (polls DataFrame, actual_margin dict)
- run_forecast(forecast_dates: List[Timestamp] | None = None, min_polls: int = 10, verbose: bool = False, n_workers: int | None = None) DataFrame[source]
Run forecast across multiple dates and states
- Parameters:
forecast_dates – List of pd.Timestamp dates to forecast from (default: 4 dates in Oct-Nov 2016)
min_polls – Minimum number of polls required to forecast a state
verbose – If True, print processing status for each state
n_workers – Number of parallel workers (default: None for sequential, >1 for parallel)
- Returns:
state, forecast_date, win_probability, predicted_margin, margin_std, actual_margin
- Return type:
DataFrame with columns
Poll Average Model
Simple Poll-of-Polls Average Model Weighted average of recent polls with empirical uncertainty
- class src.models.poll_average.PollAverageModel(seed=None)[source]
Bases:
ElectionForecastModelSimple weighted poll average baseline
Kalman Diffusion Model
Kalman Filter Diffusion Model with Improved Regularization Brownian motion with drift + pollster biases + fundamentals prior
- class src.models.kalman_diffusion.KalmanDiffusionModel(seed=None)[source]
Bases:
ElectionForecastModelImproved diffusion model with Kalman filter/RTS smoother
- __init__(seed=None)[source]
Initialize the model
- Parameters:
name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)
- kalman_filter_smoother(dates, observations, obs_variance, mu, sigma2)[source]
Kalman filter + RTS smoother for Brownian motion with drift
- Parameters:
dates – Array of time points (in days)
observations – Array of poll margins
obs_variance – Array of observation variances
mu – Drift parameter
sigma2 – Diffusion variance
- Returns:
smoothed state estimates and variances
- Return type:
tuple of (x_smooth, P_smooth)
- fit_state_diffusion(state_polls, prior_mean=0.0, max_iter=10)[source]
Fit diffusion model with EM algorithm
- Parameters:
state_polls – DataFrame of polls for a single state
prior_mean – Prior mean for fundamentals
max_iter – Maximum number of EM iterations
- Returns:
tuple of (mu, sigma2, pollster_bias, x_smooth, P_smooth, dates)
- simulate_forward(x_start, P_start, mu, sigma2, days, N=2000, rng=None)[source]
Simulate forward with Euler-Maruyama method
- Parameters:
x_start – Initial state estimate
P_start – Initial state variance
mu – Drift parameter
sigma2 – Diffusion variance
days – Number of days to simulate forward
N – Number of simulation samples
rng – NumPy random generator (default: None uses default_rng)
- Returns:
Array of final margin values (length N)
Improved Kalman Model
Improved Kalman Diffusion Model
Key improvements over basic Kalman: - Increased minimum diffusion variance - Better regularized pollster biases - Smaller forecast horizon uncertainty - More conservative probability clipping
- class src.models.improved_kalman.ImprovedKalmanModel(seed=None)[source]
Bases:
ElectionForecastModelImproved Kalman filter diffusion model
- __init__(seed=None)[source]
Initialize the model
- Parameters:
name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)
- kalman_filter_rts(dates, observations, obs_variance, mu, sigma2)[source]
Kalman filter + RTS smoother
- fit_and_forecast(state_polls, forecast_date, election_date, actual_margin, rng=None)[source]
Fit improved Kalman diffusion and forecast
- simulate_forward(x_start, P_start, mu, sigma2, days, N=2000, rng=None)[source]
Simulate forward with Euler-Maruyama
- Parameters:
x_start – Initial state estimate
P_start – Initial state variance
mu – Drift parameter
sigma2 – Diffusion variance
days – Number of days to simulate forward
N – Number of simulation samples
rng – NumPy random generator (default: None uses default_rng)
- Returns:
Array of final margin values (length N)
Hierarchical Bayes Model
Hierarchical Bayesian Ensemble with Systematic Bias Adjustment (HBE-SBA)
Combines: 1. Fundamentals prior from historical results 2. Kalman-filtered polls with house effects 3. Adaptive systematic bias correction 4. Proper uncertainty quantification
- class src.models.hierarchical_bayes.HierarchicalBayesModel(seed=None)[source]
Bases:
ElectionForecastModelHierarchical Bayesian ensemble with bias correction
- __init__(seed=None)[source]
Initialize the model
- Parameters:
name – Model name
seed – Random seed for reproducibility (default: None for non-deterministic)
- estimate_house_effects(all_polls, lambda_shrink=10)[source]
Estimate pollster house effects with hierarchical shrinkage
- Parameters:
all_polls – DataFrame of all polling data
lambda_shrink – Shrinkage parameter (higher = more shrinkage to zero)
- Returns:
dict mapping pollster name to estimated house effect
- kalman_filter_rts(dates, observations, obs_variance, mu, sigma2)[source]
Kalman filter with Rauch-Tung-Striebel (RTS) backward smoother
- Parameters:
dates – Array of time points (in days)
observations – Array of poll margins
obs_variance – Array of observation variances
mu – Drift parameter
sigma2 – Diffusion variance
- Returns:
smoothed state estimates and variances
- Return type:
tuple of (x_smooth, P_smooth)
Utilities
Data Utilities
Shared data loading and preprocessing utilities
- src.utils.data_utils.load_polling_data() DataFrame[source]
Load and preprocess 2016 polling data from FiveThirtyEight
- Returns:
middate, dem, rep, margin, dem_proportion, samplesize, pollster, state_code
- Return type:
DataFrame with columns
- src.utils.data_utils.load_election_results() Dict[str, float][source]
Load actual 2016 election results from MIT Election Lab
- Returns:
dict mapping state code to actual Democratic margin
- src.utils.data_utils.load_fundamentals() Dict[str, Dict[str, float]][source]
Load historical election results for fundamentals prior
Computes weighted average of 2012 (70%) and 2008 (30%) results
- Returns:
margin, margin_2012, margin_2008
- Return type:
dict mapping state code to fundamentals dict with keys
- src.utils.data_utils.get_state_list(polls: DataFrame, actual_results: Dict[str, float]) List[str][source]
Get list of states with sufficient polling data
- Parameters:
polls – DataFrame of polling data
actual_results – dict of actual election results
- Returns:
list of state codes
- src.utils.data_utils.compute_metrics(predictions_df: DataFrame) DataFrame[source]
Compute evaluation metrics from predictions
- Parameters:
predictions_df – DataFrame with columns: forecast_date, win_probability, predicted_margin, actual_margin
- Returns:
forecast_date, n_states, brier_score, log_loss, mae_margin
- Return type:
DataFrame with columns
Scripts
Run All Models
- src.scripts.run_all_models.discover_models()[source]
Auto-discover all model classes using importlib.resources
- Returns:
List of tuples (model_class_name, model_class) sorted by name
- src.scripts.run_all_models.generate_forecast_dates(n_dates, election_date='2016-11-08', start_date='2016-09-01')[source]
Generate n evenly-spaced forecast dates between start_date and election_date
- Parameters:
n_dates – Number of forecast dates to generate
election_date – Election day
start_date – Earliest date to start forecasting from
- Returns:
List of pd.Timestamp forecast dates
Compare Models
Compare all forecasting models
Generates comparison tables, rankings, and plots for all models
Generate Plots
Generate state-level plots for all models
- Usage:
election-plot # Default: plot key swing states election-plot –all # Plot all states with sufficient data election-plot –states FL PA MI WI # Plot specific states
Run All Pipeline
- src.scripts.run_all.run_with_temp_argv(argv, func)[source]
Temporarily override sys.argv to call a subcommand.