Skip to main content

Roadmap

We welcome proposals — open a GitHub issue with your use case or RFC.

Planned

  • API for linear models
  • Diffusion model support
  • API for multimodal models
  • Inferlet profiler

Previous Releases

v0.3.0 (2026-05-05) — Native C++ driver implementations, PEFT adapters API, market-based contention management, API redesign, and Bakery for inferlet distribution.

v0.2.0 (2026-01-20) — Multi-GPU serving (DP + TP), Apple Metal backend, weight quantization (float8 / int8 / int4), and new model support.

v0.1.0 (2025-10-13) — First public release. Paper artifact at SOSP '25.