Context and KV cache

Inferlets that exercise the page-based KV cache — forking from a shared prefix, building prefix trees, sliding-window attention, and the runtime's automatic page-trim.

Running on the dummy driver?

parallel-generation and prefix-tree fork from a shared prefix and rely on each branch producing a related continuation. On the dummy, branches share KV pages correctly but each branch's tokens are drawn independently — the page state machine is exercised, the branch content is unrelated random. windowed-attention, attention-sink, and page-trim-bench are unaffected; they exercise mask and page-trim plumbing, not token semantics.

Inferlet	What it shows
`parallel-generation`	Forked contexts that share committed pages and decode in parallel.
`prefix-tree`	Prefix-tree caching with concurrent generation from one shared context.
`windowed-attention`	Sliding window: bounded-memory generation by masking + releasing pages.
`attention-sink`	Sink + sliding window (StreamingLLM).
`page-trim-bench`	Benchmarks the runtime's page-trim optimization on a sink+window mask.

Pages: the page model and committed-vs-working distinction.
Forking and saving: copy-on-write branching and named snapshots.
Inputs: BRLE attention masks for sliding windows and sinks.

Related guide​

Related guide