Supported Models
Pie supports a variety of open-source LLM models. This page lists all currently supported model families and their variants.
Model Families
| Family | Models | Notes |
|---|---|---|
| Llama 3.x | Llama 3.2 (1B, 3B), Llama 3.1 (8B, 70B), Llama 3 (8B, 70B) | Full support including instruct variants |
| Qwen | Qwen 2, Qwen 2.5, Qwen 3 | All sizes supported |
| Gemma | Gemma 2, Gemma 3 | Google's open models |
| Mistral | Ministral 3B, Mistral 7B | Including instruct variants |
| OLMo | OLMo 3 | AI2's open language model |
| GPT-OSS | Various | Community open-source GPT variants |
Downloading Models
Use the Pie CLI to download models from HuggingFace:
# Download a model
pie model download meta-llama/Llama-3.2-1B-Instruct
# List downloaded models
pie model list
# Remove a model
pie model remove meta-llama/Llama-3.2-1B-Instruct
Configuring Models
Set your default model in ~/.pie/config.toml:
[[model]]
hf_repo = "meta-llama/Llama-3.2-1B-Instruct"
device = ["cuda:0"]
For multi-GPU setups:
[[model]]
hf_repo = "meta-llama/Llama-3.1-70B-Instruct"
device = ["cuda:0", "cuda:1", "cuda:2", "cuda:3"]
Quantization
Pie supports quantized inference to reduce memory usage. Configure quantization in your ~/.pie/config.toml:
[[model]]
hf_repo = "meta-llama/Llama-3.2-1B-Instruct"
device = ["cuda:0"]
activation_dtype = "bfloat16" # or "float8"
weight_dtype = "float8" # or "int8", "float4"
| Format | Description |
|---|---|
| bfloat16 | Default, full precision |
| float8 | 8-bit floating point, good balance of speed and quality |
| int8 | 8-bit integer quantization |
| float4 | 4-bit floating point, maximum memory savings |
Adding Model Support
Want support for a new model? Open an issue on GitHub with:
- Model name and HuggingFace link
- Architecture details (if non-standard)
- Your use case
We prioritize models based on community demand.