Fastlad
Modern FastLAD techniques aim to bring into the realm of seconds or minutes , comparable to OLS on the same hardware.
| Situation | Recommendation | |-----------|----------------| | | Use coordinate‑descent with sparse matrix support ( scipy.sparse ). Pre‑scale columns to unit ℓ₁‑norm to improve conditioning. | | Streaming or online learning | Adopt stochastic sub‑gradient or online ADMM ; keep a running estimate of the median residual for step‑size adaptation. | | Mixed numeric‑categorical predictors | Encode categoricals with one‑hot (but watch dimensionality) or target encoding ; LAD is linear, so interactions must be added manually if needed. | | Ill‑conditioned design matrix | Standardize each column (mean‑center + unit variance) before fitting; many FastLAD solvers automatically do this internally. | | Need confidence intervals | Classic LAD does not provide easy analytic SEs. Use bootstrapping (e.g., 1 000 resamples) or asymptotic normality under Laplace errors (requires large n). | | Comparing to OLS | Run OLS first as a sanity check. Large discrepancies in coefficients usually signal outliers that LAD will down‑weight. | | Parallel / GPU usage | Choose an ADMM implementation that exposes a n_jobs or device='cuda' argument. Make sure data fits in GPU memory (often ≤ 2 GB for dense matrices). | | Choosing tolerance | A relative tolerance of 1e‑4 is usually sufficient for prediction; tighten ( 1e‑6 ) only when the model is used for inference on small samples. | | Regularization | If you also need sparsity , combine L1 loss with an L1 penalty → Robust LASSO ( quantile regression at τ = 0.5 with alpha>0 ). Many FastLAD libraries have a penalty argument. | fastlad