Skip to main content

Your First Inferlet

This guide walks you through writing, building, and running your first inferlet — a lightweight WebAssembly program that controls inference logic inside Pie.

Prerequisites

  • Pie installed and a model downloaded (see Installation)
  • Bakery installed (the inferlet build tool): pip install -e sdk/tools/bakery
  • Rust toolchain installed
  • WebAssembly target added: rustup target add wasm32-wasip2

Project Structure

You can scaffold a new project with bakery create (supports Rust and TypeScript), or create the files manually. The layout depends on your language:

my-inferlet/
├── Pie.toml
├── Cargo.toml
└── src/
└── lib.rs

Pie.toml — every inferlet needs this manifest:

[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["Your Name"]

[runtime]
core = "^0.2.0"
mcp = "^0.2.0"

Cargo.toml — standard Rust crate config:

[package]
name = "my-inferlet"
version = "0.1.0"
edition = "2024"

[lib]
crate-type = ["cdylib"]

[dependencies]
inferlet.workspace = true
inferlet-macros.workspace = true
note

inferlet.workspace = true works when building inside the Pie repository. For standalone projects, use path dependencies or published crate versions instead.

Write Your First Inferlet

The code below creates a simple text-completion inferlet. It reads a prompt from the command-line arguments, fills a chat context, and generates a response.

src/lib.rs
use inferlet::stop_condition::{self, StopCondition};
use inferlet::{Args, Result, Sampler};

#[inferlet::main]
async fn main(mut args: Args) -> Result<()> {
// Parse command-line arguments with defaults
let prompt: String = args
.value_from_str(["-p", "--prompt"])
.unwrap_or("What is the capital of France?".to_string());
let max_tokens: usize = args.value_from_str(["-n", "--max-tokens"]).unwrap_or(256);

// Load the model and create a generation context
let model = inferlet::get_auto_model();
let mut ctx = model.create_context();

// Build the conversation (tokens are buffered, not yet processed)
ctx.fill_system("You are a helpful, respectful and honest assistant.");
ctx.fill_user(&prompt);

// Configure when to stop: max tokens or end-of-sequence
let stop = stop_condition::max_len(max_tokens)
.or(stop_condition::ends_with_any(model.eos_tokens()));

// Generate with nucleus sampling and print the result
let output = ctx.generate(Sampler::top_p(0.6, 0.95), stop).await;
println!("{}", output);
Ok(())
}

Build

Compile your inferlet to WebAssembly using Bakery:

bakery build ./my-inferlet -o my-inferlet.wasm

Bakery detects the language from your project structure and produces a single .wasm file.

Run

Launch your inferlet on a running Pie engine:

pie run --path ./my-inferlet.wasm -- --prompt "What is the capital of France?"

You should see the model's response printed to stdout.

Next Steps

  • Core Concepts — understand contexts, KV cache control, forking, and custom sampling in depth.
  • Examples — explore real-world patterns like prefix caching, parallel generation, and constrained decoding.