Fitting a Model¶
The fit() Pipeline¶
Calling w.fit() runs three steps in sequence:
- Cross-validation — leave-one-out jackknife to add
yHat,SE,ConcHatto the Sample - Surface estimation — fit the censored regression at every grid point
- Daily estimation — interpolate surfaces onto the daily record and flow-normalise
All three steps use the same window parameters, which can be customised:
w.fit(
window_y=7.0, # Time half-window in years
window_q=2.0, # Discharge half-window in log units
window_s=0.5, # Season half-window in years
min_num_obs=100, # Minimum total observations per regression
min_num_uncen=50, # Minimum uncensored observations per regression
edge_adjust=True, # Double time window near record boundaries
)
Window Parameters¶
Time Window (window_y)¶
Controls how much weight is given to observations from other years. The default of 7 years means observations 7+ years away from the target get zero weight. Larger values produce smoother trends.
Discharge Window (window_q)¶
Controls sensitivity to flow conditions. The default of 2 log units means the model adapts to flow-dependent patterns in concentration. Smaller values make the model more flow-sensitive.
Season Window (window_s)¶
Controls seasonal smoothing. The default of 0.5 years gives zero weight to observations from the opposite season.
Minimum Observations¶
If a target grid point does not have enough observations within its windows, all three
windows are expanded by 10% iteratively until min_num_obs (total) and min_num_uncen
(uncensored) thresholds are met.
Edge Adjustment¶
When edge_adjust=True, the time window is doubled minus the distance to the nearest
record boundary for target points near the start or end of the record. This reduces
edge bias.
Calling Sub-Steps Individually¶
You can run the pipeline steps separately for more control:
w = WRTDS(daily, sample)
# Step 1: Cross-validation only
w.cross_validate()
# Step 2: Surface estimation (independent of cross-validation)
w.estimate_surfaces()
# Step 3: Daily estimation + flow normalization (requires surfaces)
w.estimate_daily()
Warning
estimate_daily() requires estimate_surfaces() to have been called first.
The fit() method handles this ordering automatically.
What fit() Produces¶
After fitting, the following columns are available:
On w.sample (from cross-validation)¶
| Column | Description |
|---|---|
yHat |
Predicted log-concentration (leave-one-out) |
SE |
Standard error of prediction |
ConcHat |
Bias-corrected predicted concentration |
On w.daily (from daily estimation + flow normalization)¶
| Column | Description |
|---|---|
yHat |
Predicted log-concentration |
SE |
Standard error |
ConcDay |
Daily estimated concentration (mg/L) |
FluxDay |
Daily estimated flux (kg/day) |
FNConc |
Flow-normalised concentration |
FNFlux |
Flow-normalised flux |
On w (stored attributes)¶
| Attribute | Description |
|---|---|
surfaces |
3-D numpy array of shape (n_logq, n_year, 3) |
surface_index |
Dict with grid parameters (bottom_logq, top_logq, step_logq, etc.) |
Performance¶
Surface estimation is the computational bottleneck — it requires solving one MLE regression per grid point (typically ~7,000 solves). Cross-validation adds another n solves (one per sample observation). A dataset with 500 samples and 30 years of daily data typically takes a few minutes to fit.