Initial setup
This page configures Pie after installation and runs a sample prompt to verify the engine boots end-to-end. Read this after Installation.
Initialize the config
pie config init
pie config init writes a default ~/.pie/config.toml and downloads the embedded Python runtime into ~/.pie/ (used to host Python inferlets). The TOML it writes:
[server]
host = "127.0.0.1"
port = 8080
[auth]
enabled = false
[runtime]
# Tokio + wasmtime tuning, filesystem / network policy. Defaults are pinned by pie.
[[model]]
name = "default"
hf_repo = "Qwen/Qwen3-0.6B"
[model.driver]
type = "cuda_native" # or "portable" / "dummy" depending on the installed build
device = ["cuda:0"] # portable defaults to ["cpu"]
The default config is generated from the drivers compiled into your pie binary. CUDA builds prefer cuda_native; portable-only builds use portable; dummy is always available for smoke tests. See Configuration for the full schema.
Download a model
The default config points at Qwen/Qwen3-0.6B, a 600M-parameter model that fits on most GPUs and on Apple Silicon laptops. Download it:
pie model download Qwen/Qwen3-0.6B
Other compatible repos:
pie model download Qwen/Qwen2.5-7B-Instruct
pie model download meta-llama/Llama-3.2-1B-Instruct
pie model list reports compatibility. Architectures supported by the installed drivers get a ✓. The full list of supported architectures is on the CUDA and Portable driver pages.
To point the config at a different model:
pie config set model.0.hf_repo Qwen/Qwen2.5-7B-Instruct # update the first [[model]] entry
Run your first prompt
pie run text-completion -- --prompt "The capital of France is"
╭─ Pie Run ───────────────────────────────────────╮
│ Inferlet text-completion@0.1.0 │
│ Model default (Qwen/Qwen3-0.6B) │
│ Driver cuda_native │
│ Device cuda:0 │
╰─────────────────────────────────────────────────╯
The capital of France is Paris, which is located in…
pie run boots a one-shot engine, runs the inferlet, prints its output, and exits. The first run is slower because of JIT and kernel warmup.
What the command does
| Part | Meaning |
|---|---|
pie run | Boot a one-shot engine, run an inferlet, exit. |
text-completion | The inferlet to run. Resolved to text-completion@<latest> via the registry. |
-- --prompt "…" | Becomes {"prompt": "…"} in the inferlet's input JSON. |
Flags after -- fold into the input dict. Type inference handles ints, floats, and booleans automatically:
pie run text-completion \
-- \
--prompt "Write a haiku about pie" \
--max-tokens 64 \
--temperature 0.7
Pin a version
pie run text-completion@0.1.0 -- --prompt "hi"
Without @, Pie resolves to the latest version in the registry.
Inspect or edit the config
pie config show # pretty-print ~/.pie/config.toml
pie config set server.port 9090
pie config set model.0.hf_repo Qwen/Qwen2.5-3B-Instruct
Diagnose with doctor
If something looks wrong, run pie doctor:
pie doctor
It reports your platform, GPU visibility, compiled-in embedded drivers, Python driver venvs, and whether the config and default model are present.
Next
- Development environment: install the toolchain to write your own inferlets.
- Tutorial: build a parallel research agent end-to-end.
- Your first inferlet: a minimal Rust inferlet with build and run.