Skip to main content

Supported Models

Pie supports a variety of open-source LLM models. This page lists all currently supported model families and their variants.

Model Families

FamilyModelsNotes
Llama 3.xLlama 3.2 (1B, 3B), Llama 3.1 (8B, 70B), Llama 3 (8B, 70B)Full support including instruct variants
QwenQwen 2, Qwen 2.5, Qwen 3All sizes supported
GemmaGemma 2, Gemma 3Google's open models
MistralMinistral 3B, Mistral 7BIncluding instruct variants
OLMoOLMo 3AI2's open language model
GPT-OSSVariousCommunity open-source GPT variants

Downloading Models

Use the Pie CLI to download models from HuggingFace:

# Download a model
pie model download meta-llama/Llama-3.2-1B-Instruct

# List downloaded models
pie model list

# Remove a model
pie model remove meta-llama/Llama-3.2-1B-Instruct

Configuring Models

Set your default model in ~/.pie/config.toml:

[[model]]
hf_repo = "meta-llama/Llama-3.2-1B-Instruct"
device = ["cuda:0"]

For multi-GPU setups:

[[model]]
hf_repo = "meta-llama/Llama-3.1-70B-Instruct"
device = ["cuda:0", "cuda:1", "cuda:2", "cuda:3"]

Quantization

Pie supports quantized inference to reduce memory usage. Configure quantization in your ~/.pie/config.toml:

[[model]]
hf_repo = "meta-llama/Llama-3.2-1B-Instruct"
device = ["cuda:0"]
activation_dtype = "bfloat16" # or "float8"
weight_dtype = "float8" # or "int8", "float4"
FormatDescription
bfloat16Default, full precision
float88-bit floating point, good balance of speed and quality
int88-bit integer quantization
float44-bit floating point, maximum memory savings

Adding Model Support

Want support for a new model? Open an issue on GitHub with:

  • Model name and HuggingFace link
  • Architecture details (if non-standard)
  • Your use case

We prioritize models based on community demand.