Your First Inferlet
This guide walks you through writing, building, and running your first inferlet — a lightweight WebAssembly program that controls inference logic inside Pie.
Prerequisites
- Pie installed and a model downloaded (see Installation)
- Bakery installed (the inferlet build tool):
pip install -e sdk/tools/bakery
- Rust
- Python
- JavaScript
- Rust toolchain installed
- WebAssembly target added:
rustup target add wasm32-wasip2
- Python 3.x installed
- componentize-py installed:
uv tool install componentize-py
- Node.js v18+ installed
Project Structure
You can scaffold a new project with bakery create (supports Rust and TypeScript), or create the files manually. The layout depends on your language:
- Rust
- Python
- JavaScript
my-inferlet/
├── Pie.toml
├── Cargo.toml
└── src/
└── lib.rs
Pie.toml — every inferlet needs this manifest:
[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["Your Name"]
[runtime]
core = "^0.2.0"
mcp = "^0.2.0"
Cargo.toml — standard Rust crate config:
[package]
name = "my-inferlet"
version = "0.1.0"
edition = "2024"
[lib]
crate-type = ["cdylib"]
[dependencies]
inferlet.workspace = true
inferlet-macros.workspace = true
inferlet.workspace = true works when building inside the Pie repository. For standalone projects, use path dependencies or published crate versions instead.
bakery create does not yet support Python. Create the files manually.
my-inferlet/
├── Pie.toml
├── pyproject.toml
└── main.py
Pie.toml — Python inferlets need the python-runtime dependency:
[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["Your Name"]
[runtime]
core = "^0.2.0"
mcp = "^0.2.0"
python-runtime = "^0.3.0"
pyproject.toml — standard Python project config:
[project]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
requires-python = ">=3.10"
my-inferlet/
├── Pie.toml
├── package.json
└── index.ts
Pie.toml:
[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["Your Name"]
[runtime]
core = "^0.2.0"
mcp = "^0.2.0"
Write Your First Inferlet
The code below creates a simple text-completion inferlet. It reads a prompt from the command-line arguments, fills a chat context, and generates a response.
- Rust
- Python
- JavaScript
use inferlet::stop_condition::{self, StopCondition};
use inferlet::{Args, Result, Sampler};
#[inferlet::main]
async fn main(mut args: Args) -> Result<()> {
// Parse command-line arguments with defaults
let prompt: String = args
.value_from_str(["-p", "--prompt"])
.unwrap_or("What is the capital of France?".to_string());
let max_tokens: usize = args.value_from_str(["-n", "--max-tokens"]).unwrap_or(256);
// Load the model and create a generation context
let model = inferlet::get_auto_model();
let mut ctx = model.create_context();
// Build the conversation (tokens are buffered, not yet processed)
ctx.fill_system("You are a helpful, respectful and honest assistant.");
ctx.fill_user(&prompt);
// Configure when to stop: max tokens or end-of-sequence
let stop = stop_condition::max_len(max_tokens)
.or(stop_condition::ends_with_any(model.eos_tokens()));
// Generate with nucleus sampling and print the result
let output = ctx.generate(Sampler::top_p(0.6, 0.95), stop).await;
println!("{}", output);
Ok(())
}
from inferlet import Context, get_auto_model, get_arguments, set_return
def main() -> None:
# Parse arguments passed via `pie run ... -- --prompt "..."`
args = get_arguments()
prompt = args.get("prompt", "What is the capital of France?")
max_tokens = int(args.get("max_tokens", 256))
# Load the model configured on the server
model = get_auto_model()
# Context manager handles resource cleanup automatically
with Context(model) as ctx:
# Build the conversation
ctx.system("You are a helpful assistant.")
ctx.user(prompt)
# Generate and return the result to the client
result = ctx.generate(max_tokens=max_tokens, stream=False)
set_return(result.text)
if __name__ == "__main__":
main()
import { Context, getAutoModel, getArguments, send } from 'inferlet';
// Parse arguments passed via `pie run ... -- --prompt "..."`
const args = getArguments();
const prompt = (args.prompt as string) ?? 'What is the capital of France?';
const maxTokens = Number(args.maxTokens ?? 256);
// Load model and create a generation context
const model = getAutoModel();
const ctx = new Context(model);
// Build the conversation
ctx.fillSystem('You are a helpful, respectful and honest assistant.');
ctx.fillUser(prompt);
// Generate with nucleus sampling and send the result to the client
const result = await ctx.generate({
sampling: { topP: 0.95, temperature: 0.6 },
stop: { maxTokens, sequences: model.eosTokens }
});
send(result);
Build
Compile your inferlet to WebAssembly using Bakery:
bakery build ./my-inferlet -o my-inferlet.wasm
Bakery detects the language from your project structure and produces a single .wasm file.
Run
Launch your inferlet on a running Pie engine:
pie run --path ./my-inferlet.wasm -- --prompt "What is the capital of France?"
You should see the model's response printed to stdout.
Next Steps
- Core Concepts — understand contexts, KV cache control, forking, and custom sampling in depth.
- Examples — explore real-world patterns like prefix caching, parallel generation, and constrained decoding.