Skip to main content

Initial setup

This page configures Pie after installation and runs a sample prompt to verify the engine boots end-to-end. Read this after Installation.

Initialize the config

pie config init

pie config init writes a default ~/.pie/config.toml and downloads the embedded Python runtime into ~/.pie/ (used to host Python inferlets). The TOML it writes:

[server]
host = "127.0.0.1"
port = 8080

[auth]
enabled = false

[runtime]
# Tokio + wasmtime tuning, filesystem / network policy. Defaults are pinned by pie.

[[model]]
name = "default"
hf_repo = "Qwen/Qwen3-0.6B"

[model.driver]
type = "cuda_native" # or "portable" / "dummy" depending on the installed build
device = ["cuda:0"] # portable defaults to ["cpu"]

The default config is generated from the drivers compiled into your pie binary. CUDA builds prefer cuda_native; portable-only builds use portable; dummy is always available for smoke tests. See Configuration for the full schema.

Download a model

The default config points at Qwen/Qwen3-0.6B, a 600M-parameter model that fits on most GPUs and on Apple Silicon laptops. Download it:

pie model download Qwen/Qwen3-0.6B

Other compatible repos:

pie model download Qwen/Qwen2.5-7B-Instruct
pie model download meta-llama/Llama-3.2-1B-Instruct

pie model list reports compatibility. Architectures supported by the installed drivers get a . The full list of supported architectures is on the CUDA and Portable driver pages.

To point the config at a different model:

pie config set model.0.hf_repo Qwen/Qwen2.5-7B-Instruct # update the first [[model]] entry

Run your first prompt

pie run text-completion -- --prompt "The capital of France is"
╭─ Pie Run ───────────────────────────────────────╮
│ Inferlet text-completion@0.1.0 │
│ Model default (Qwen/Qwen3-0.6B) │
│ Driver cuda_native │
│ Device cuda:0 │
╰─────────────────────────────────────────────────╯

The capital of France is Paris, which is located in…

pie run boots a one-shot engine, runs the inferlet, prints its output, and exits. The first run is slower because of JIT and kernel warmup.

What the command does

PartMeaning
pie runBoot a one-shot engine, run an inferlet, exit.
text-completionThe inferlet to run. Resolved to text-completion@<latest> via the registry.
-- --prompt "…"Becomes {"prompt": "…"} in the inferlet's input JSON.

Flags after -- fold into the input dict. Type inference handles ints, floats, and booleans automatically:

pie run text-completion \
-- \
--prompt "Write a haiku about pie" \
--max-tokens 64 \
--temperature 0.7

Pin a version

pie run text-completion@0.1.0 -- --prompt "hi"

Without @, Pie resolves to the latest version in the registry.

Inspect or edit the config

pie config show # pretty-print ~/.pie/config.toml
pie config set server.port 9090
pie config set model.0.hf_repo Qwen/Qwen2.5-3B-Instruct

Diagnose with doctor

If something looks wrong, run pie doctor:

pie doctor

It reports your platform, GPU visibility, compiled-in embedded drivers, Python driver venvs, and whether the config and default model are present.

Next