Serve programs, not prompts
In existing serving systems, the inference workflow is baked into the engine. In Pie, you write it.
Conventional serving systems
Every request runs through the same fixed pipeline. Branching and tool calls live outside the engine.
Programmable serving system - Pie
Each inferlet runs its own workflow inside the engine. It controls the KV cache, forward pass, and tool calls directly.
More to optimize, more to customize
Pie unlocks opportunities for optimization and custom model behavior.
Each tab is an inferlet that customizes the inference loop in a different way.
use inferlet::{Context, Result, model::Model, runtime, sample::Sampler};
#[inferlet::main]
async fn main(prompt: String) -> Result<String> {
let model = Model::load(runtime::models().first().ok_or("no models")?)?;
let mut ctx = Context::new(&model)?;
ctx.system("You are a helpful assistant.")
.user(&prompt)
.cue();
ctx.generate(Sampler::TopP { temperature: 0.6, p: 0.95 })
.max_tokens(256)
.collect_text()
.await
}