Skip to main content

Speculative decoding

Inferlets that implement custom draft strategies on top of Pie's Speculator trait. For the runtime-managed n-gram path, see text-completion-spec under Chat and generation.

Running on the dummy driver?

The dummy driver samples a fresh random token for every slot, so every draft is rejected and these inferlets run in 1-token-per-step fallback — they complete correctly but no speedup is observable.

InferletWhat it shows
jacobi-decodingParallel Jacobi decoding (custom drafter).
cacheback-decodingCache-based drafter via n-gram matching.