Initial setup

This page configures Pie after installation and runs a sample prompt to verify the engine boots end-to-end. Read this after Installation.

Initialize the config

pie config init

pie config init writes a default ~/.pie/config.toml and downloads the embedded Python runtime into ~/.pie/ (used to host Python inferlets). The TOML it writes:

[server]
host = "127.0.0.1"
port = 8080

[auth]
enabled = false

[runtime]
# Tokio + wasmtime tuning, filesystem / network policy. Defaults are pinned by pie.

[[model]]
name = "default"
hf_repo = "Qwen/Qwen3-0.6B"

[model.driver]
type = "cuda_native"  # or "portable" / "dummy" depending on the installed build
device = ["cuda:0"]   # portable defaults to ["cpu"]

The default config is generated from the drivers compiled into your pie binary. CUDA builds prefer cuda_native; portable-only builds use portable; dummy is always available for smoke tests. See Configuration for the full schema.

Download a model

The default config points at Qwen/Qwen3-0.6B, a 600M-parameter model that fits on most GPUs and on Apple Silicon laptops. Download it:

pie model download Qwen/Qwen3-0.6B

Other compatible repos:

pie model download Qwen/Qwen2.5-7B-Instruct
pie model download meta-llama/Llama-3.2-1B-Instruct

pie model list reports compatibility. Architectures supported by the installed drivers get a ✓. The full list of supported architectures is on the CUDA and Portable driver pages.

To point the config at a different model:

pie config set model.0.hf_repo Qwen/Qwen2.5-7B-Instruct  # update the first [[model]] entry

Run your first prompt

pie run text-completion -- --prompt "The capital of France is"

╭─ Pie Run ───────────────────────────────────────╮
│ Inferlet     text-completion@0.1.0              │
│ Model        default (Qwen/Qwen3-0.6B)          │
│ Driver       cuda_native                        │
│ Device       cuda:0                             │
╰─────────────────────────────────────────────────╯

The capital of France is Paris, which is located in…

pie run boots a one-shot engine, runs the inferlet, prints its output, and exits. The first run is slower because of JIT and kernel warmup.

What the command does

Part	Meaning
`pie run`	Boot a one-shot engine, run an inferlet, exit.
`text-completion`	The inferlet to run. Resolved to `text-completion@<latest>` via the registry.
`-- --prompt "…"`	Becomes `{"prompt": "…"}` in the inferlet's input JSON.

Flags after -- fold into the input dict. Type inference handles ints, floats, and booleans automatically:

pie run text-completion \
  -- \
  --prompt "Write a haiku about pie" \
  --max-tokens 64 \
  --temperature 0.7

Pin a version

pie run text-completion@0.1.0 -- --prompt "hi"

Without @, Pie resolves to the latest version in the registry.

Inspect or edit the config

pie config show              # pretty-print ~/.pie/config.toml
pie config set server.port 9090
pie config set model.0.hf_repo Qwen/Qwen2.5-3B-Instruct

Diagnose with doctor

If something looks wrong, run pie doctor:

pie doctor

It reports your platform, GPU visibility, compiled-in embedded drivers, Python driver venvs, and whether the config and default model are present.

Development environment: install the toolchain to write your own inferlets.
Tutorial: build a parallel research agent end-to-end.
Your first inferlet: a minimal Rust inferlet with build and run.

Initialize the config​

Download a model​

Run your first prompt​

What the command does​

Pin a version​

Inspect or edit the config​

Diagnose with doctor​

Next​