Your first inferlet

This page walks you from a fresh shell to a custom inferlet that loads a model, generates text, and returns a string. It uses Rust, the canonical SDK. After you have this working, the rest of the Guide shows what else inferlets can do.

Prerequisites

Pie installed and a model downloaded. See Install & first run.
Rust 1.85+ with the WebAssembly target:
```
rustup target add wasm32-wasip2
```
The bakery CLI. It ships with Pie:
```
bakery --version
```

Scaffold

bakery create my-inferlet
cd my-inferlet

bakery create lays down a Rust crate with two manifests and one source file:

my-inferlet/
├── Cargo.toml
├── Pie.toml
└── src/
    └── lib.rs

Cargo.toml declares the crate as a WebAssembly component (crate-type = ["cdylib"]) and depends on the inferlet crate.

Pie.toml is the inferlet manifest. The inferlet::main macro reads it at compile time, and pie run reads it at launch.

[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["You <you@example.com>"]

[runtime]
core = "^0.2.0"

[parameters]
prompt     = { type = "string", description = "User prompt" }
max_tokens = { type = "int", optional = true }

The [parameters] block documents the JSON input the inferlet expects. pie run -- --prompt "..." folds CLI flags into the input dict; your inferlet's Input type performs the actual validation.

The code

src/lib.rs:

use inferlet::{Context, Result, model::Model, runtime, sample::Sampler};
use serde::Deserialize;

#[derive(Deserialize)]
struct Input {
    prompt: String,
    #[serde(default)]
    max_tokens: Option<usize>,
}

#[inferlet::main]
async fn main(input: Input) -> Result<String> {
    let model = Model::load(runtime::models().first().ok_or("no models")?)?;
    let mut ctx = Context::new(&model)?;

    ctx.system("You are a helpful assistant.")
       .user(&input.prompt)
       .cue();

    ctx.generate(Sampler::TopP { temperature: 0.6, p: 0.95 })
        .max_tokens(input.max_tokens.unwrap_or(256))
        .collect_text()
        .await
}

What each piece does:

#[inferlet::main] generates the WebAssembly entry point and a JSON bridge. Because Input is not String, the macro wires up serde_json::from_str for you.
Model::load binds to a model the engine has loaded. runtime::models() lists the names from ~/.pie/config.toml.
Context::new allocates a fresh KV cache context.
ctx.system(...).user(...).cue() formats a chat conversation. cue() marks the position where the model takes over.
ctx.generate(sampler).max_tokens(n).collect_text().await runs the autoregressive loop and returns the decoded reply.
Result<String> aliases Result<String, String>. Returning Err(s) becomes the Error event the client receives.

Build

bakery build . -o my-inferlet.wasm

You get a my-inferlet.wasm next to your Pie.toml.

If you prefer raw cargo:

cargo build --target wasm32-wasip2 --release
# output: target/wasm32-wasip2/release/my_inferlet.wasm

Both paths produce the same artifact.

Run

pie run \
  --path ./my-inferlet.wasm \
  --manifest ./Pie.toml \
  -- \
  --prompt "What is the capital of France?" \
  --max-tokens 64

You see something like:

╭─ Pie Run ───────────────────────────────────────╮
│ Inferlet     my-inferlet@0.1.0                  │
│ Model        default (Qwen/Qwen3-0.6B)          │
│ Driver       cuda_native                        │
│ Device       cuda:0                             │
╰─────────────────────────────────────────────────╯

The capital of France is Paris.

CLI flags after -- are folded into Input (with int / float / bool inference). --prompt "..." becomes {"prompt": "..."}. --max-tokens 64 becomes {"max_tokens": 64}.

Iterate

The dev loop is edit, build, run.

# After editing src/lib.rs:
bakery build . -o my-inferlet.wasm
pie run --path ./my-inferlet.wasm --manifest ./Pie.toml -- --prompt "Summarize KV caching in one sentence."

The first build is slow because the toolchain pulls dependencies. Subsequent rebuilds are incremental.

Notes

Error handling

Result<T> aliases Result<T, String>. Any Err(s) you return becomes the Error event the client receives.

let model = Model::load("default")?;   // propagates Err("...") on failure

Debugging

eprintln! writes to stderr, which pie run and clients receive as Stderr events.

eprintln!("[debug] loaded model {name}");

pie run prints stderr alongside stdout in the terminal.

Next steps

Generator: samplers, collectors, per-step control, multi-turn.
Forking and saving: branching and prefix caching for fan-out workflows.
Build and publish: publish your inferlet to the registry.

Prerequisites​

Scaffold​

The code​

Build​

Run​

Iterate​

Notes​

Error handling​

Debugging​

Next steps​