Programmable LLM serving

Serve programs, not prompts

In existing serving systems, the inference workflow is baked into the engine. In Pie, you write it.

Every request runs through the same fixed pipeline. Branching and tool calls live outside the engine.

Each inferlet runs its own workflow inside the engine. It controls the KV cache, forward pass, and tool calls directly.

Branch, rewind, persist, and reopen the cache by calling engine APIs from user code.

loading demo…

The decoder loop, sampler, and constraint matcher are exposed to user code. Write your own drafter, sampling rule, or grammar.

loading demo…

Tool clients, persistent files, and message brokers run inside the engine. User code calls them like a library.

loading demo…

Install the engine, ship your first inferlet, and program the forward pass, KV cache, and I/O from your own code.