Roadmap
This roadmap is directional. Priorities may evolve based on user feedback and research findings.
Legend
Released in 0.2.0 (2026-01-20)
-
Apple Metal support runtime
Enable GPU execution on Apple silicon via Metal for laptop development and demos. -
Multi-GPU support with mixed DP and TP performance
Scale model size and throughput across devices using Data Parallelism and Tensor Parallelism. -
Quantization support (float8/int8/float4) performance
Reduce VRAM while preserving quality; per‑layer policies. -
New model support runtime
Support for gpt-oss, ministral3, gemma2, gemma3, olmo3.
In progress
-
PEFT adapters API (e.g., LoRA) api
Hot‑swap adapters per request/session with safe isolation. -
KV‑cache quantization performance
Larger batches and longer contexts via cache compression. -
Inferlet standard libraries (Python / JS) tooling
Ergonomic helpers for common patterns; batteries included. -
Inferlet package manager tooling
Discover, version, and share inferlets and libraries.
We welcome proposals—open a GitHub issue with your use case or RFC.
Feature ideas / RFCs: please file a GitHub issue describing the problem, constraints, and expected API shape.
If something here would unblock you, let us know. Real workloads get priority.