Serve programs, not prompts
In existing serving systems, the inference workflow is baked into the engine. In Pie, you write it.
Conventional serving systems
Every request runs through the same fixed pipeline. Branching and tool calls live outside the engine.
Programmable serving system - Pie
Each inferlet runs its own workflow inside the engine. It controls the KV cache, forward pass, and tool calls directly.
01
Program the forward pass
The decoder loop, sampler, and constraint matcher are exposed to user code. Write your own drafter, sampling rule, or grammar.
02
Program the KV cache
Branch, rewind, persist, and reopen the cache by calling engine APIs from user code.
03
Program the I/O
Tool clients, persistent files, and message brokers run inside the engine. User code calls them like a library.
Ready to write the loop?
Install the engine, ship your first inferlet, and program the forward pass, KV cache, and I/O from your own code.