Your first inferlet
This page walks you from a fresh shell to a custom inferlet that loads a model, generates text, and returns a string. It uses Rust, the canonical SDK. After you have this working, the rest of the Guide shows what else inferlets can do.
Prerequisites
-
Pie installed and a model downloaded. See Install & first run.
-
Rust 1.85+ with the WebAssembly target:
rustup target add wasm32-wasip2 -
The bakery CLI. It ships with Pie:
bakery --version
Scaffold
bakery create my-inferlet
cd my-inferlet
bakery create lays down a Rust crate with two manifests and one source file:
my-inferlet/
├── Cargo.toml
├── Pie.toml
└── src/
└── lib.rs
Cargo.toml declares the crate as a WebAssembly component (crate-type = ["cdylib"]) and depends on the inferlet crate.
Pie.toml is the inferlet manifest. The inferlet::main macro reads it at compile time, and pie run reads it at launch.
[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["You <you@example.com>"]
[runtime]
core = "^0.2.0"
[parameters]
prompt = { type = "string", description = "User prompt" }
max_tokens = { type = "int", optional = true }
The [parameters] block documents the JSON input the inferlet expects. pie run -- --prompt "..." folds CLI flags into the input dict; your inferlet's Input type performs the actual validation.
The code
src/lib.rs:
use inferlet::{Context, Result, model::Model, runtime, sample::Sampler};
use serde::Deserialize;
#[derive(Deserialize)]
struct Input {
prompt: String,
#[serde(default)]
max_tokens: Option<usize>,
}
#[inferlet::main]
async fn main(input: Input) -> Result<String> {
let model = Model::load(runtime::models().first().ok_or("no models")?)?;
let mut ctx = Context::new(&model)?;
ctx.system("You are a helpful assistant.")
.user(&input.prompt)
.cue();
ctx.generate(Sampler::TopP { temperature: 0.6, p: 0.95 })
.max_tokens(input.max_tokens.unwrap_or(256))
.collect_text()
.await
}
What each piece does:
#[inferlet::main]generates the WebAssembly entry point and a JSON bridge. BecauseInputis notString, the macro wires upserde_json::from_strfor you.Model::loadbinds to a model the engine has loaded.runtime::models()lists the names from~/.pie/config.toml.Context::newallocates a fresh KV cache context.ctx.system(...).user(...).cue()formats a chat conversation.cue()marks the position where the model takes over.ctx.generate(sampler).max_tokens(n).collect_text().awaitruns the autoregressive loop and returns the decoded reply.Result<String>aliasesResult<String, String>. ReturningErr(s)becomes theErrorevent the client receives.
Build
bakery build . -o my-inferlet.wasm
You get a my-inferlet.wasm next to your Pie.toml.
If you prefer raw cargo:
cargo build --target wasm32-wasip2 --release
# output: target/wasm32-wasip2/release/my_inferlet.wasm
Both paths produce the same artifact.
Run
pie run \
--path ./my-inferlet.wasm \
--manifest ./Pie.toml \
-- \
--prompt "What is the capital of France?" \
--max-tokens 64
You see something like:
╭─ Pie Run ───────────────────────────────────────╮
│ Inferlet my-inferlet@0.1.0 │
│ Model default (Qwen/Qwen3-0.6B) │
│ Driver cuda_native │
│ Device cuda:0 │
╰─────────────────────────────────────────────────╯
The capital of France is Paris.
CLI flags after -- are folded into Input (with int / float / bool inference). --prompt "..." becomes {"prompt": "..."}. --max-tokens 64 becomes {"max_tokens": 64}.
Iterate
The dev loop is edit, build, run.
# After editing src/lib.rs:
bakery build . -o my-inferlet.wasm
pie run --path ./my-inferlet.wasm --manifest ./Pie.toml -- --prompt "Summarize KV caching in one sentence."
The first build is slow because the toolchain pulls dependencies. Subsequent rebuilds are incremental.
Notes
Error handling
Result<T> aliases Result<T, String>. Any Err(s) you return becomes the Error event the client receives.
let model = Model::load("default")?; // propagates Err("...") on failure
Debugging
eprintln! writes to stderr, which pie run and clients receive as Stderr events.
eprintln!("[debug] loaded model {name}");
pie run prints stderr alongside stdout in the terminal.
Next steps
- Generator: samplers, collectors, per-step control, multi-turn.
- Forking and saving: branching and prefix caching for fan-out workflows.
- Build and publish: publish your inferlet to the registry.