Roadmap
We welcome proposals — open a GitHub issue with your use case or RFC.
Planned
- API for linear models
- Diffusion model support
- API for multimodal models
- Inferlet profiler
Previous Releases
v0.3.0 (2026-05-05) — Native C++ driver implementations, PEFT adapters API, market-based contention management, API redesign, and Bakery for inferlet distribution.
v0.2.0 (2026-01-20) — Multi-GPU serving (DP + TP), Apple Metal backend, weight quantization (float8 / int8 / int4), and new model support.
v0.1.0 (2025-10-13) — First public release. Paper artifact at SOSP '25.