Your First Inferlet

This guide walks you through writing, building, and running your first inferlet — a lightweight WebAssembly program that controls inference logic inside Pie.

Prerequisites

Pie installed and a model downloaded (see Installation)
Bakery installed (the inferlet build tool): pip install -e sdk/tools/bakery

Rust
Python
JavaScript

Rust toolchain installed
WebAssembly target added: rustup target add wasm32-wasip2

Python 3.x installed
componentize-py installed: uv tool install componentize-py

Project Structure

You can scaffold a new project with bakery create (supports Rust and TypeScript), or create the files manually. The layout depends on your language:

Rust
Python
JavaScript

my-inferlet/
├── Pie.toml
├── Cargo.toml
└── src/
    └── lib.rs

Pie.toml — every inferlet needs this manifest:

[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["Your Name"]

[runtime]
core = "^0.2.0"
mcp = "^0.2.0"

Cargo.toml — standard Rust crate config:

[package]
name = "my-inferlet"
version = "0.1.0"
edition = "2024"

[lib]
crate-type = ["cdylib"]

[dependencies]
inferlet.workspace = true
inferlet-macros.workspace = true

note

inferlet.workspace = true works when building inside the Pie repository. For standalone projects, use path dependencies or published crate versions instead.

note

bakery create does not yet support Python. Create the files manually.

my-inferlet/
├── Pie.toml
├── pyproject.toml
└── main.py

Pie.toml — Python inferlets need the python-runtime dependency:

[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["Your Name"]

[runtime]
core = "^0.2.0"
mcp = "^0.2.0"
python-runtime = "^0.3.0"

pyproject.toml — standard Python project config:

[project]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
requires-python = ">=3.10"

my-inferlet/
├── Pie.toml
├── package.json
└── index.ts

Pie.toml:

[package]
name = "my-inferlet"
version = "0.1.0"
description = "My first inferlet"
authors = ["Your Name"]

[runtime]
core = "^0.2.0"
mcp = "^0.2.0"

Write Your First Inferlet

The code below creates a simple text-completion inferlet. It reads a prompt from the command-line arguments, fills a chat context, and generates a response.

Rust
Python
JavaScript

src/lib.rs
use inferlet::stop_condition::{self, StopCondition};
use inferlet::{Args, Result, Sampler};

#[inferlet::main]
async fn main(mut args: Args) -> Result<()> {
    // Parse command-line arguments with defaults
    let prompt: String = args
        .value_from_str(["-p", "--prompt"])
        .unwrap_or("What is the capital of France?".to_string());
    let max_tokens: usize = args.value_from_str(["-n", "--max-tokens"]).unwrap_or(256);

    // Load the model and create a generation context
    let model = inferlet::get_auto_model();
    let mut ctx = model.create_context();

    // Build the conversation (tokens are buffered, not yet processed)
    ctx.fill_system("You are a helpful, respectful and honest assistant.");
    ctx.fill_user(&prompt);

    // Configure when to stop: max tokens or end-of-sequence
    let stop = stop_condition::max_len(max_tokens)
        .or(stop_condition::ends_with_any(model.eos_tokens()));

    // Generate with nucleus sampling and print the result
    let output = ctx.generate(Sampler::top_p(0.6, 0.95), stop).await;
    println!("{}", output);
    Ok(())
}

main.py
from inferlet import Context, get_auto_model, get_arguments, set_return

def main() -> None:
    # Parse arguments passed via `pie run ... -- --prompt "..."`
    args = get_arguments()
    prompt = args.get("prompt", "What is the capital of France?")
    max_tokens = int(args.get("max_tokens", 256))

    # Load the model configured on the server
    model = get_auto_model()

    # Context manager handles resource cleanup automatically
    with Context(model) as ctx:
        # Build the conversation
        ctx.system("You are a helpful assistant.")
        ctx.user(prompt)

        # Generate and return the result to the client
        result = ctx.generate(max_tokens=max_tokens, stream=False)
        set_return(result.text)

if __name__ == "__main__":
    main()

index.ts
import { Context, getAutoModel, getArguments, send } from 'inferlet';

// Parse arguments passed via `pie run ... -- --prompt "..."`
const args = getArguments();
const prompt = (args.prompt as string) ?? 'What is the capital of France?';
const maxTokens = Number(args.maxTokens ?? 256);

// Load model and create a generation context
const model = getAutoModel();
const ctx = new Context(model);

// Build the conversation
ctx.fillSystem('You are a helpful, respectful and honest assistant.');
ctx.fillUser(prompt);

// Generate with nucleus sampling and send the result to the client
const result = await ctx.generate({
  sampling: { topP: 0.95, temperature: 0.6 },
  stop: { maxTokens, sequences: model.eosTokens }
});

send(result);

Build

Compile your inferlet to WebAssembly using Bakery:

bakery build ./my-inferlet -o my-inferlet.wasm

Bakery detects the language from your project structure and produces a single .wasm file.

Run

Launch your inferlet on a running Pie engine:

pie run --path ./my-inferlet.wasm -- --prompt "What is the capital of France?"

You should see the model's response printed to stdout.

Next Steps

Core Concepts — understand contexts, KV cache control, forking, and custom sampling in depth.
Examples — explore real-world patterns like prefix caching, parallel generation, and constrained decoding.

Prerequisites​

Project Structure​

Write Your First Inferlet​

Build​

Run​

Next Steps​

Prerequisites

Project Structure

Write Your First Inferlet

Build

Run

Next Steps