Models
What you can run on Pie depends on three choices: the model architecture, the driver Pie routes inference through, and the precision and parallelism that driver supports for that architecture. This page is the at-a-glance matrix. For configuration and supported architectures of each driver, see CUDA, Portable, vLLM, SGLang, and TensorRT-LLM.
Drivers
cuda
Standalone C++/CUDA binary for high-throughput inference.
portable
Standalone C++ binary for local inference based on ggml.
vllm@0.16.0
Delegates forward pass execution to vLLM. Check the Inferlet API capability.
sglang@0.5.9
Delegates forward pass execution to SGLang. Check the Inferlet API capability.
tensorrt-llm@1.2.1
Delegates forward pass execution to TensorRT-LLM. Check the Inferlet API capability.
Architecture support
| Architecture | cuda | portable | vllm | sglang | trtllm | Checkpoints |
|---|---|---|---|---|---|---|
| Qwen | ||||||
| Qwen 2qwen2 | stable | stable | upstream | upstream | — | |
| Qwen 2.5qwen2 | stable | stable | upstream | upstream | — | |
| Qwen 3qwen3 | stable | stable | upstream | upstream | upstream | |
| Qwen 3 MoEqwen3_moe | preview | preview | upstream | upstream | — | |
| Qwen 3.5qwen3_5 | stable | preview | — | upstream | upstream | |
| Qwen 3.5 MoEqwen3_5_moe | — | preview | — | upstream | — | |
| Qwen 3.6qwen3_5 | stable | preview | — | upstream | upstream | |
| Qwen 3.6 MoEqwen3_5_moe | — | preview | — | upstream | — | |
| Qwen3-VLqwen3_vl | stable | — | upstream | upstream | — | |
| Llama | ||||||
| Llama 3llama | stable | stable | upstream | upstream | — | |
| Llama 3.1llama | stable | stable | upstream | upstream | — | |
| Llama 3.2llama | stable | stable | upstream | upstream | — | |
| Gemma | ||||||
| Gemma 2gemma2 | stable | stable | upstream | upstream | — | |
| Gemma 3gemma3, gemma3_text | stable | stable | upstream | upstream | — | |
| Gemma 3ngemma3n, gemma3n_text | — | stable | upstream | upstream | — | |
| Gemma 4gemma4, gemma4_text | stable | stable | — | — | upstream | |
| Gemma 4 MoEgemma4 (with experts) | — | preview | — | — | — | |
| GPT-OSS | ||||||
| GPT-OSSgpt_oss, gptoss | stable | preview | upstream | upstream | — | |
| Mistral | ||||||
| Mistralmistral | — | stable | upstream | upstream | — | |
| Ministral 3mistral3 | stable | stable | — | upstream | — | |
| Mistral Smallmistral3 | stable | stable | upstream | upstream | — | |
| Mixtralmixtral | stable | preview | upstream | upstream | — | |
| Phi | ||||||
| Phi-3phi3 | stable | stable | upstream | upstream | — | |
| Phi-3-smallphi3small | — | preview | upstream | upstream | — | |
| Phi-3.5 / Phi-4phi3 | stable | stable | upstream | upstream | — | |
| Phi-3.5-MoEphimoe | — | stable | upstream | upstream | — | |
| OLMo | ||||||
| OLMo 2olmo2 | — | stable | upstream | upstream | — | |
| OLMo 3olmo3 | stable | stable | upstream | upstream | — | |
| GLM | ||||||
| GLM-5.1glm_moe_dsa | preview | — | upstream | upstream | — | |
| Nemotron | ||||||
| Nemotron-Hnemotron_h | stable | — | — | — | — | |
| DeepSeek / Kimi | ||||||
| DeepSeek V3deepseek_v3 | preview | — | upstream | upstream | — | |
| Kimi K2kimi_k2 | preview | — | upstream | upstream | — | |
| Sesame | ||||||
| CSMcsm | preview | — | — | — | — | |
Multimodal models. Qwen3-VL (vision), Gemma 4 (vision + audio), Nemotron-3 Omni (vision + audio), and CSM (audio output) accept or produce non-text modalities. See Multimodal generation.