laser.cholera package¶

laser.cholera.compute(args)[source]¶

Subpackages¶

laser.cholera.metapop package

Submodules¶

laser.cholera.calc_log_likelihood_distributions module¶

Log-likelihood functions for Beta, Binomial, Gamma, NegBin, Normal, and Poisson.

Translated from calc_log_likelihood_distributions.R. Each function: - Removes NaN/non-finite entries across observed, estimated, and weights before computing. - Defaults all weights to 1 when not supplied. - Logs diagnostics at INFO level when verbose=True.

R mapping notes:: R var(x) / sd(x) use ddof=1; mapped to np.var(x, ddof=1) / np.std(x, ddof=1). R dbeta/dbinom/dgamma/dnorm/dpois(x, …, log=TRUE) map to scipy.stats.*.(log)pmf/pdf. R dnbinom(x, size=k, mu=mu, log=TRUE) maps to scipy.stats.nbinom.logpmf(x, n=k, p=k/(k+mu)). R shapiro.test(x) maps to scipy.stats.shapiro(x). R NA_real_ maps to float(“nan”). R message(…) maps to logger.info(…); R warning(…) maps to logger.warning(…). R is.na(x) check uses ~np.isfinite(x) (catches both NaN and Inf).

laser.cholera.calc_log_likelihood_distributions.calc_log_likelihood_beta(observed: ndarray, estimated: ndarray, mean_precision: bool = True, weights: ndarray | None = None, verbose: bool = True) → float[source]¶

Calculate log-likelihood for Beta-distributed proportions.

Computes the total log-likelihood for proportion data under the Beta distribution. Supports either the mean-precision parameterization (default) or the standard shape parameterization. Shape parameters are estimated from the data via method of moments.

Parameters:

observed – Observed values strictly in (0, 1).
estimated – Model-predicted values strictly in (0, 1).
mean_precision – If True (default), use mean-precision parameterization where phi = mu*(1-mu)/Var(residuals) - 1 and shape_1 = estimated*phi, shape_2 = (1-estimated)*phi. If False, estimate shape parameters directly from the observed vector.
weights – Non-negative weights, same length as observed. Defaults to ones.
verbose – If True, logs shape parameter estimates and total log-likelihood.

Returns:

Scalar log-likelihood. Returns float(“nan”) if all inputs are non-finite.

Raises:

ValueError – If lengths of observed and estimated do not match, any weights are negative, all weights are zero, observed or estimated values fall outside (0, 1), residual variance is non-positive, phi is non-positive, or estimated shape parameters are non-positive.

Examples

>>> import numpy as np
>>> from laser.cholera.calc_log_likelihood_distributions import calc_log_likelihood_beta
>>> calc_log_likelihood_beta(
...     np.array([0.2, 0.6, 0.4]), np.array([0.25, 0.55, 0.35]),
...     verbose=False,
... )
4.770704709814893

laser.cholera.calc_log_likelihood_distributions.calc_log_likelihood_binomial(observed: ndarray, estimated: ndarray, trials: ndarray, weights: ndarray | None = None, verbose: bool = True) → float[source]¶

Calculate log-likelihood for Binomial-distributed count data.

Computes the total weighted log-likelihood for integer counts of successes under the Binomial distribution.

Parameters:

observed – Integer counts of successes (non-negative, <= trials).
estimated – Expected success probabilities in (0, 1), same length as observed.
trials – Total trial counts (positive integers), same length as observed.
weights – Non-negative weights, same length as observed. Defaults to ones.
verbose – If True, logs total log-likelihood.

Returns:

Scalar log-likelihood. Returns float(“nan”) if all inputs are non-finite.

Raises:

ValueError – If lengths of observed and estimated do not match, any weights are negative, all weights are zero, observed values are not integer counts in [0, trials], trials are not positive integers, or estimated probabilities are not in (0, 1).

Examples

>>> import numpy as np
>>> from laser.cholera.calc_log_likelihood_distributions import calc_log_likelihood_binomial
>>> calc_log_likelihood_binomial(
...     np.array([3, 4, 2]), np.array([0.3, 0.5, 0.25]), np.array([10, 10, 8]),
...     verbose=False,
... )
-4.071992199424135

laser.cholera.calc_log_likelihood_distributions.calc_log_likelihood_gamma(observed: ndarray, estimated: ndarray, weights: ndarray | None = None, verbose: bool = True) → float[source]¶

Calculate log-likelihood for Gamma-distributed positive continuous data.

The shape parameter alpha is estimated via method of moments from the observed values (alpha = mean^2 / var). The scale parameter is per-observation: scale_i = estimated_i / alpha.

Parameters:

observed – Positive observed values.
estimated – Positive expected means from the model, same length as observed.
weights – Non-negative weights, same length as observed. Defaults to ones.
verbose – If True, logs estimated shape and total log-likelihood.

Returns:

Scalar log-likelihood. Returns float(“nan”) if all inputs are non-finite.

Raises:

ValueError – If lengths of observed and estimated do not match, any weights are negative, all weights are zero, or any observed or estimated values are non-positive.

Examples

>>> import numpy as np
>>> from laser.cholera.calc_log_likelihood_distributions import calc_log_likelihood_gamma
>>> calc_log_likelihood_gamma(
...     np.array([2.5, 3.2, 1.8]), np.array([2.4, 3.0, 2.0]),
...     verbose=False,
... )
-1.731035287031648

laser.cholera.calc_log_likelihood_distributions.calc_log_likelihood_negbin(observed: ndarray, estimated: ndarray, k: float | None = None, k_min: float = 3, weights: ndarray | None = None, verbose: bool = True) → float[source]¶

Calculate log-likelihood for Negative Binomial-distributed count data.

Computes the total weighted log-likelihood for count data under the Negative Binomial distribution. When estimated <= 0 and observed > 0, a proportional penalty (-observed * log(1e6)) is applied instead of -Inf.

Parameters:

observed – Non-negative integer counts (rounded internally for float safety).
estimated – Expected means from the model, same length as observed.
k – NB dispersion (size) parameter. If None, estimated via method of moments as mu^2 / (s^2 - mu); falls back to Inf (Poisson) when s^2 <= mu.
k_min – Minimum dispersion floor applied when k is finite. Defaults to 3. Pass 0 to disable flooring.
weights – Non-negative weights, same length as observed. Defaults to ones.
verbose – If True, logs k estimation details and total log-likelihood.

Returns:

Scalar log-likelihood. Returns float(“nan”) if all inputs are non-finite.

Raises:

ValueError – If lengths of observed and estimated do not match, any weights are negative, all weights are zero, or any observed values are negative after rounding.

Examples

>>> import numpy as np
>>> from laser.cholera.calc_log_likelihood_distributions import calc_log_likelihood_negbin
>>> calc_log_likelihood_negbin(np.array([0, 5, 9]), np.array([3, 4, 5]),
...                            verbose=False)
-7.540078861809464
>>> calc_log_likelihood_negbin(np.array([0, 5, 9]), np.array([3, 4, 5]),
...                            k=1.2, verbose=False)
-7.540078861809464

laser.cholera.calc_log_likelihood_distributions.calc_log_likelihood_normal(observed: ndarray, estimated: ndarray, weights: ndarray | None = None, verbose: bool = True) → float[source]¶

Calculate log-likelihood for Normally-distributed continuous data.

The residual standard deviation sigma is estimated from residuals (observed - estimated) using ddof=1. A Shapiro-Wilk normality test is run when n <= 5000, with a warning logged when p < 0.05.

Parameters:

observed – Continuous observed values.
estimated – Model-predicted means, same length as observed.
weights – Non-negative weights, same length as observed. Defaults to ones.
verbose – If True, logs estimated sigma, Shapiro-Wilk p-value, and total log-likelihood.

Returns:

Scalar log-likelihood. Returns float(“nan”) if all inputs are non-finite.

Raises:

ValueError – If lengths of observed and estimated do not match, fewer than 3 non-missing observations exist, any weights are negative, all weights are zero, or the residual standard deviation is non-positive.

Examples

>>> import numpy as np
>>> from laser.cholera.calc_log_likelihood_distributions import calc_log_likelihood_normal
>>> ll = calc_log_likelihood_normal(
...     np.array([1.2, 2.8, 3.1]), np.array([1.0, 3.0, 3.2]),
...     verbose=False,
... )

laser.cholera.calc_log_likelihood_distributions.calc_log_likelihood_poisson(observed: ndarray, estimated: ndarray, weights: ndarray | None = None, zero_buffer: bool = True, verbose: bool = True) → float[source]¶

Calculate log-likelihood for Poisson-distributed count data.

When estimated <= 0 and observed > 0, a proportional penalty (-observed * log(1e6)) is applied. When zero_buffer=True, observed values are rounded and estimated values are floored to 1e-10.

Parameters:

observed – Non-negative integer counts.
estimated – Expected values from the model, same length as observed.
weights – Non-negative weights, same length as observed. Defaults to ones.
zero_buffer – If True (default), rounds observed to integers and floors estimated to 1e-10. If False, enforces strict integer requirements.
verbose – If True, logs overdispersion warnings and total log-likelihood.

Returns:

Scalar log-likelihood. Returns float(“nan”) if all inputs are non-finite.

Raises:

ValueError – If lengths of observed and estimated do not match, any weights are negative, all weights are zero, or (when zero_buffer=False) observed contains non-integer or negative values.

Examples

>>> import numpy as np
>>> from laser.cholera.calc_log_likelihood_distributions import calc_log_likelihood_poisson
>>> calc_log_likelihood_poisson(
...     np.array([2, 3, 4]), np.array([2.2, 2.9, 4.1]),
...     verbose=False,
... )
-4.447965653589073

laser.cholera.calc_model_likelihood module¶

Likelihood functions for scoring cholera model fits against observed data.

Translated from calc_model_likelihood.R. Scores model fits using Negative Binomial (NB) time-series log-likelihood per location and outcome (cases, deaths) with a weighted method-of-moments dispersion estimate and a k_min floor.

Optional shape terms are enabled by setting their weight > 0: peak timing (Normal), peak magnitude (log-Normal with adaptive sigma), cumulative progression (NB at cumulative fractions), and Weighted Interval Score (WIS). All weights default to 0.

Shape terms are internally T-normalized so that weight parameters share a common scale: weight=0.25 means the term contributes roughly 25% as much as the NB core.

The peak shape terms in the main function take an explicit epidemic_peaks DataFrame argument (columns iso_code, peak_date, loc_idx) together with date_start / date_stop. Peak rows whose peak_date falls outside [date_start, date_stop] are dropped before index assignment, so out-of-window calendar peaks contribute nothing rather than getting clamped to the simulation endpoints. The legacy helpers calc_multi_peak_timing_ll / calc_multi_peak_magnitude_ll accept the same DataFrame via their epidemic_peaks argument, apply the same in-window filter, and dispatch by iso_code; they do not require the loc_idx column.

Key design decisions:

Indexing: R uses 1-based indices; Python uses 0-based. All peak_indices stored and passed as 0-based. Window slices use [w_start:w_end] with w_end = peak_idx + 15 (exclusive) to match R’s (peak_idx-14):(peak_idx+14) inclusive range.

NB distribution: R’s dnbinom(x, size=k, mu=mu) → scipy.stats.nbinom.logpmf(x, n=k, p=k/(k+mu)). The p conversion is applied wherever NB distributions are evaluated.

`MOSAIC::calc_log_likelihood`: Implemented locally as _calc_log_likelihood_nb since it’s not in the provided R source.

`MOSAIC::epidemic_peaks`: Replaced with an explicit pandas DataFrame argument — epidemic_peaks in both the main function and the legacy helpers.

laser.cholera.calc_model_likelihood.calc_model_likelihood(obs_cases: ndarray, est_cases: ndarray, obs_deaths: ndarray, est_deaths: ndarray, weight_cases: float = 1.0, weight_deaths: float = 1.0, weights_location: ndarray | None = None, weights_time: ndarray | None = None, weight_peak_timing: float = 0, weight_peak_magnitude: float = 0, weight_cumulative_total: float = 0, weight_wis: float = 0, sigma_peak_time: float = 1, sigma_peak_log: float = 0.5, epidemic_peaks: DataFrame | None = None, date_start: datetime | None = None, date_stop: datetime | None = None, wis_quantiles: ndarray = array([0.025, 0.25, 0.5, 0.75, 0.975]), cumulative_timepoints: ndarray = array([0.25, 0.5, 0.75, 1.]), nb_k_min_cases: float = 3, nb_k_min_deaths: float = 3) → float[source]¶

Compute total model log-likelihood against observed cases and deaths.

Scores model fits using a weighted Negative Binomial time-series log-likelihood per location and outcome. The NB dispersion k is estimated from observed data via weighted method-of-moments with a k_min floor, making it a property of the observation process rather than the model fit.

Optional shape terms (all off by default) are T-normalized so that a weight of 0.25 contributes roughly 25% as much as the NB core:

Peak timing: Normal(0, sigma_peak_time) on the timing offset in weeks.
Peak magnitude: log-Normal with adaptive sigma on the observed/estimated peak ratio.
Cumulative progression: NB on cumulative sums at fractional timepoints.
WIS: Negated Weighted Interval Score using NB quantile functions.

The peak shape terms require an epidemic_peaks DataFrame (with the loc_idx column identifying the simulation row each peak belongs to) and the simulation calendar bounds date_start and date_stop. If any of the three is None (or no peak weights are set), the peak terms are skipped.

Assembly formula per location j:

ll_loc = wc * NB_cases + wd * NB_deaths
       + (N_obs/N_peaks)    * w_pt  * (wc*pt_c  + wd*pt_d)
       + (N_obs/N_peaks)    * w_pm  * (wc*pm_c  + wd*pm_d)
       + (N_obs/N_eval_pts) * w_cum * (wc*cum_c + wd*cum_d)
       + (N_obs/N_quant)    * w_wis * (wc*wis_c + wd*wis_d)

Parameters:

obs_cases – Observed case counts, shape (n_locations, n_time_steps).
est_cases – Estimated case counts, shape (n_locations, n_time_steps).
obs_deaths – Observed death counts, shape (n_locations, n_time_steps).
est_deaths – Estimated death counts, shape (n_locations, n_time_steps).
weight_cases – Scalar weight multiplier for all case components. Defaults to 1.
weight_deaths – Scalar weight multiplier for all death components. Defaults to 1.
weights_location – Non-negative location weights, length n_locations. Defaults to ones. Must contain at least one positive entry; an all-zero vector raises ValueError (see Raises).
weights_time – Non-negative time weights, length n_time_steps. Defaults to ones. Must contain at least one positive entry; an all-zero vector raises ValueError (see Raises).
weight_peak_timing – Weight for peak timing term (T-normalized). Defaults to 0.
weight_peak_magnitude – Weight for peak magnitude term (T-normalized). Defaults to 0.
weight_cumulative_total – Weight for cumulative progression term. Defaults to 0.
weight_wis – Weight for WIS term (T-normalized). Defaults to 0.
sigma_peak_time – SD in weeks for the peak timing Normal prior. Defaults to 1.
sigma_peak_log – Base SD on log-scale for peak magnitude prior. Defaults to 0.5.
epidemic_peaks – Optional pandas DataFrame of epidemic peaks with columns iso_code, peak_date, and loc_idx (0-based row index into obs/est arrays). When None, peak shape terms are skipped regardless of their weights.
date_start – Calendar date of time-step 0 (any value pandas can promote to a Timestamp). Required for the peak shape terms. Defaults to None.
date_stop – Calendar date of the final time-step. Used together with date_start to build the daily/weekly index lookup. Defaults to None.
wis_quantiles – Quantile levels for WIS scoring. Defaults to [0.025, 0.25, 0.5, 0.75, 0.975].
cumulative_timepoints – Fractional timepoints for cumulative progression. Defaults to [0.25, 0.5, 0.75, 1.0].
nb_k_min_cases – Minimum NB dispersion floor for cases. Defaults to 3.
nb_k_min_deaths – Minimum NB dispersion floor for deaths. Defaults to 3.

Returns:

Scalar total log-likelihood. Returns -np.inf if the total is non-finite and np.nan if all locations contribute NA (e.g., all have too few observations).

Raises:

ValueError – If any input is not a 2-D array, dimensions are inconsistent, estimated values are negative, weights are negative, or weight vectors sum to zero.

Example

Compute the core NB log-likelihood for a small two-location, four-timestep toy problem with no shape terms:

>>> import numpy as np
>>> from laser.cholera.calc_model_likelihood import calc_model_likelihood
>>> obs_cases = np.array([[5, 8, 12, 7], [3, 6, 9, 4]], dtype=float)
>>> est_cases = np.array([[6, 9, 11, 7], [4, 6, 8, 5]], dtype=float)
>>> obs_deaths = np.zeros_like(obs_cases)
>>> est_deaths = np.zeros_like(est_cases)
>>> ll = calc_model_likelihood(obs_cases, est_cases, obs_deaths, est_deaths)
>>> ll < 0
True

laser.cholera.calc_model_likelihood.calc_multi_peak_magnitude_ll(obs_vec: ndarray, est_vec: ndarray, iso_code: str | None = None, date_start=None, date_stop=None, sigma_peak_log: float = 0.5, epidemic_peaks=None) → float[source]¶

Compute peak magnitude log-likelihood using epidemic peaks data (legacy interface).

Matches epidemic peak dates to the time series via a date sequence, then scores estimated peak magnitudes within ±14-step windows using an adaptive log-Normal prior. Unlike the main calc_model_likelihood function, this helper dispatches by iso_code against the supplied DataFrame and does not require a loc_idx column.

Parameters:

obs_vec – Observed time series for one location (1-D array).
est_vec – Estimated time series for one location (1-D array).
iso_code – ISO country code used to look up peaks in epidemic_peaks.
date_start – Start date of the time series (string or date-like).
date_stop – End date of the time series (string or date-like).
sigma_peak_log – Base SD on the log scale. Defaults to 0.5.
epidemic_peaks – pandas DataFrame with at least iso_code and peak_date columns. A loc_idx column, if present, is ignored. Returns 0.0 if None.

Returns:

Sum of Normal log-PDFs for log-ratio peak magnitudes. Returns 0.0 if required inputs are missing, no peaks are found, or the date sequence cannot be built.

Raises:

KeyError – If epidemic_peaks is supplied but lacks an iso_code or peak_date column — propagated from pandas indexing.

laser.cholera.calc_model_likelihood.calc_multi_peak_timing_ll(obs_vec: ndarray, est_vec: ndarray, iso_code: str | None = None, date_start=None, date_stop=None, sigma_peak_time: float = 1, epidemic_peaks=None) → float[source]¶

Compute peak timing log-likelihood using epidemic peaks data (legacy interface).

Matches epidemic peak dates to the time series via a date sequence, then scores estimated peak timing within ±14-step windows using a Normal prior. Unlike the main calc_model_likelihood function, this helper dispatches by iso_code against the supplied DataFrame and does not require a loc_idx column.

Parameters:

obs_vec – Observed time series for one location (1-D array).
est_vec – Estimated time series for one location (1-D array).
iso_code – ISO country code used to look up peaks in epidemic_peaks.
date_start – Start date of the time series (string or date-like).
date_stop – End date of the time series (string or date-like).
sigma_peak_time – SD in weeks for the Normal timing prior. Defaults to 1.
epidemic_peaks – pandas DataFrame with at least iso_code and peak_date columns. A loc_idx column, if present, is ignored. Returns 0.0 if None.

Returns:

Sum of Normal log-PDFs for timing offsets. Returns 0.0 if required inputs are missing, no peaks are found, or the date sequence cannot be built.

Raises:

KeyError – If epidemic_peaks is supplied but lacks an iso_code or peak_date column — propagated from pandas indexing.

laser.cholera.calc_model_likelihood.compute_wis_parametric_row(y: ndarray, est: ndarray, w_time: ndarray, probs: ndarray, k_use: float) → float[source]¶

Compute Weighted Interval Score (WIS) for a single time-series row.

Uses NB (or Poisson when k_use is infinite) quantile functions evaluated at each time step to score the observed series against the estimated series. The final score is the weighted average over time of interval scores across all symmetric quantile pairs, plus a median absolute error term.

Parameters:

y – Observed time series (1-D array).
est – Estimated means (1-D array, same length as y).
w_time – Per-timestep weights (non-negative).
probs – Quantile levels. Symmetric pairs (p, 1-p) are matched for interval scoring; the 0.5 quantile is used for the median AE term.
k_use – NB dispersion. Use np.inf to fall back to Poisson.

Returns:

WIS score (lower is better). Returns np.nan if all observations are non-finite or total weight is zero.

laser.cholera.calc_model_likelihood.ll_cumulative_progressive_nb(obs_vec: ndarray, est_vec: ndarray, timepoints: ndarray = array([0.25, 0.5, 0.75, 1.]), k_data: float | None = None, weights_time: ndarray | None = None, k_fallback: float = 10.0) → float[source]¶

Compute cumulative-progression NB log-likelihood at fractional timepoints.

Evaluates the NB log-PMF at cumulative sums of obs/est at each fractional timepoint. The NB size is scaled proportionally to the number of summed timesteps (k * end_idx), reflecting variance scaling of summed NB variables. Each timepoint contribution is normalized by end_idx to yield a per-observation LL, making it scale-compatible with other shape components at assembly.

Parameters:

obs_vec – Observed count time series (1-D array).
est_vec – Estimated count time series (1-D array).
timepoints – Fractional timepoints at which cumulative sums are evaluated. Defaults to [0.25, 0.5, 0.75, 1.0].
k_data – Data-driven NB dispersion from nb_size_from_obs_weighted. If None or non-finite, falls back to k_fallback.
weights_time – Retained for API compatibility with the R version; not used in the cumulative sum computation.
k_fallback – Fallback k when k_data is unavailable. Defaults to 10.0.

Returns:

Mean per-observation LL across valid timepoints. Returns 0.0 if no valid timepoints exist.

laser.cholera.calc_model_likelihood.mask_weights(w: ndarray, obs_vec: ndarray, est_vec: ndarray | None = None) → ndarray[source]¶

Zero out weights where observations or estimates are non-finite.

Parameters:

w – Weight vector, same length as obs_vec.
obs_vec – Observed values.
est_vec – Optional estimated values; non-finite entries also zero out weights.

Returns:

Copy of w with weights zeroed where obs_vec (or est_vec) is non-finite.

laser.cholera.calc_model_likelihood.nb_size_from_obs_weighted(x: ndarray, w: ndarray, k_min: float = 3, k_max: float = 100000.0) → float[source]¶

Estimate NB dispersion (size) from observed data via weighted method-of-moments.

Uses Bessel-corrected weighted variance (V1² / (V1² − V2) normalisation, where V1 = Σw and V2 = Σw²) to avoid underestimating variance with small or unequal-weight samples.

Parameters:

x – Observed count time series (1-D array).
w – Non-negative weights, same length as x.
k_min – Minimum dispersion floor. Defaults to 3.
k_max – Maximum dispersion cap. Defaults to 1e5.

Returns:

Estimated NB size parameter k, clipped to [k_min, k_max]. Returns np.inf if fewer than two finite, positive-weight observations exist or if the variance does not exceed the mean (Poisson / sub-Poisson regime).

laser.cholera.cli module¶

Module that contains the command line app.

Why does this file exist, and why not put this in __main__?

You might be tempted to import things from __main__ later, but that will cause problems: the code will get executed twice:

When you run python -mlaser_cholera python will execute __main__.py as a script. That means there will not be any laser_cholera.__main__ in sys.modules.

When you import __main__ it will get executed again (as a module) because there”s no laser_cholera.__main__ in sys.modules.

Also see (1) from https://click.palletsprojects.com/en/stable/setuptools/

laser.cholera.core module¶

laser.cholera.core.compute(args)[source]¶

laser.cholera.iso_codes module¶

laser.cholera.likelihood module¶

laser.cholera.likelihood.calc_log_likelihood(observed, simulated, family, weights=None, **kwargs)[source]¶

Calculate the log-likelihood of the observed data given the simulated data.

Parameters:

observed (np.ndarray) – Observed data.
simulated (np.ndarray) – Simulated data.
family (str) – The family of the distribution (e.g., “poisson”, “negbin”).
weights (np.ndarray, optional) – Weights for the data. If None, all weights are set to 1.
**kwargs (dict) – Additional arguments for the likelihood calculation.

Returns: