Fitting a Model¶

The `fit()` Pipeline¶

Calling w.fit() runs three steps in sequence:

Cross-validation — leave-one-out jackknife to add yHat, SE, ConcHat to the Sample
Surface estimation — fit the censored regression at every grid point
Daily estimation — interpolate surfaces onto the daily record and flow-normalise

w = WRTDS(daily, sample)
w.fit()

All three steps use the same window parameters, which can be customised:

w.fit(
    window_y=7.0,     # Time half-window in years
    window_q=2.0,     # Discharge half-window in log units
    window_s=0.5,     # Season half-window in years
    min_num_obs=100,   # Minimum total observations per regression
    min_num_uncen=50,  # Minimum uncensored observations per regression
    edge_adjust=True,  # Double time window near record boundaries
)

Window Parameters¶

Time Window (`window_y`)¶

Controls how much weight is given to observations from other years. The default of 7 years means observations 7+ years away from the target get zero weight. Larger values produce smoother trends.

Discharge Window (`window_q`)¶

Controls sensitivity to flow conditions. The default of 2 log units means the model adapts to flow-dependent patterns in concentration. Smaller values make the model more flow-sensitive.

Season Window (`window_s`)¶

Controls seasonal smoothing. The default of 0.5 years gives zero weight to observations from the opposite season.

Minimum Observations¶

If a target grid point does not have enough observations within its windows, all three windows are expanded by 10% iteratively until min_num_obs (total) and min_num_uncen (uncensored) thresholds are met.

Edge Adjustment¶

When edge_adjust=True, the time window is doubled minus the distance to the nearest record boundary for target points near the start or end of the record. This reduces edge bias.

Calling Sub-Steps Individually¶

You can run the pipeline steps separately for more control:

w = WRTDS(daily, sample)

# Step 1: Cross-validation only
w.cross_validate()

# Step 2: Surface estimation (independent of cross-validation)
w.estimate_surfaces()

# Step 3: Daily estimation + flow normalization (requires surfaces)
w.estimate_daily()

Warning

estimate_daily() requires estimate_surfaces() to have been called first. The fit() method handles this ordering automatically.

What `fit()` Produces¶

After fitting, the following columns are available:

On `w.sample` (from cross-validation)¶

Column	Description
`yHat`	Predicted log-concentration (leave-one-out)
`SE`	Standard error of prediction
`ConcHat`	Bias-corrected predicted concentration

On `w.daily` (from daily estimation + flow normalization)¶

Column	Description
`yHat`	Predicted log-concentration
`SE`	Standard error
`ConcDay`	Daily estimated concentration (mg/L)
`FluxDay`	Daily estimated flux (kg/day)
`FNConc`	Flow-normalised concentration
`FNFlux`	Flow-normalised flux

On `w` (stored attributes)¶

Attribute	Description
`surfaces`	3-D numpy array of shape `(n_logq, n_year, 3)`
`surface_index`	Dict with grid parameters (`bottom_logq`, `top_logq`, `step_logq`, etc.)

Performance¶

Surface estimation is the computational bottleneck — it requires solving one MLE regression per grid point (typically ~7,000 solves). Cross-validation adds another n solves (one per sample observation). A dataset with 500 samples and 30 years of daily data typically takes a few minutes to fit.

Fitting a Model¶

The fit() Pipeline¶

Window Parameters¶

Time Window (window_y)¶

Discharge Window (window_q)¶

Season Window (window_s)¶