Pie 0.4.0: Eliminating the programmability tax, and becoming multimodal
ยท 5 min read
Pie 0.4.0 is out. Two changes lead the release. Pie's new speculative execution hides the runtime's per-token overhead behind GPU compute, effectively removing 90% of the overhead of programmable serving. A new model-agnostic media API makes Pie multimodal: images, audio, and video go in, and audio comes out.
In addition to those two, 0.4.0 also features new frontier model support, including GLM 5.1 and Nemotron H, native multi-token-prediction (MTP) support for recent Qwen and Gemma models, as well as KV cache quantization.