Supported Models

Pie supports a variety of open-source LLM models. This page lists all currently supported model families and their variants.

Model Families

Family	Models	Notes
Llama 3.x	Llama 3.2 (1B, 3B), Llama 3.1 (8B, 70B), Llama 3 (8B, 70B)	Full support including instruct variants
Qwen	Qwen 2, Qwen 2.5, Qwen 3	All sizes supported
Gemma	Gemma 2, Gemma 3	Google's open models
Mistral	Ministral 3B, Mistral 7B	Including instruct variants
OLMo	OLMo 3	AI2's open language model
GPT-OSS	Various	Community open-source GPT variants

Downloading Models

Use the Pie CLI to download models from HuggingFace:

# Download a model
pie model download meta-llama/Llama-3.2-1B-Instruct

# List downloaded models
pie model list

# Remove a model
pie model remove meta-llama/Llama-3.2-1B-Instruct

Configuring Models

Set your default model in ~/.pie/config.toml:

[[model]]
hf_repo = "meta-llama/Llama-3.2-1B-Instruct"
device = ["cuda:0"]

For multi-GPU setups:

[[model]]
hf_repo = "meta-llama/Llama-3.1-70B-Instruct"
device = ["cuda:0", "cuda:1", "cuda:2", "cuda:3"]

Quantization

Pie supports quantized inference to reduce memory usage. Configure quantization in your ~/.pie/config.toml:

[[model]]
hf_repo = "meta-llama/Llama-3.2-1B-Instruct"
device = ["cuda:0"]
activation_dtype = "bfloat16"  # or "float8"
weight_dtype = "float8"        # or "int8", "float4"

Format	Description
bfloat16	Default, full precision
float8	8-bit floating point, good balance of speed and quality
int8	8-bit integer quantization
float4	4-bit floating point, maximum memory savings

Adding Model Support

Want support for a new model? Open an issue on GitHub with:

Model name and HuggingFace link
Architecture details (if non-standard)
Your use case

We prioritize models based on community demand.

Model Families​

Downloading Models​

Configuring Models​

Quantization​

Adding Model Support​

Model Families

Downloading Models

Configuring Models

Quantization

Adding Model Support