Speculative decoding
Inferlets that implement custom draft strategies on top of Pie's Speculator trait. For the runtime-managed n-gram path, see text-completion-spec under Chat and generation.
Running on the dummy driver?
The dummy driver samples a fresh random token for every slot, so every draft is rejected and these inferlets run in 1-token-per-step fallback — they complete correctly but no speedup is observable.
| Inferlet | What it shows |
|---|---|
jacobi-decoding | Parallel Jacobi decoding (custom drafter). |
cacheback-decoding | Cache-based drafter via n-gram matching. |
Related guide
- Speculative decoding: the
Speculatortrait, system speculation, and the verify path. - The forward pass: position IDs that drafts share with their target.