Adapters
LoRA adapters are runtime-managed overlays scoped to a model. The full lifecycle (create, open, fork, save, load, destroy) is available in all three SDKs. This page covers using an adapter at inference. Read this after The forward pass.
Adapter ops are no-ops on the dummy driver — Adapter::create, load, fork, save, swap, and destroy all succeed but inference is identical with or without an adapter attached.
Using an adapter
Attach an adapter to a Generator or to a manual forward() pass. The adapter applies to that one pass; subsequent passes default to no adapter unless you attach again.
- Rust
- Python
- JavaScript
use inferlet::adapter::Adapter;
let adapter = Adapter::open(&model, "my-adapter").ok_or("adapter not found")?;
let text = ctx
.generate(sampler)
.adapter(&adapter)
.max_tokens(256)
.collect_text()
.await?;
// Or on a manual forward pass:
let mut fwd = ctx.forward();
fwd.adapter(&adapter);
fwd.input_at(&tokens, &positions);
let out = fwd.execute().await?;
from inferlet import Adapter
adapter = Adapter.open(model, "my-adapter")
text = await ctx.generate(
sampler, max_tokens=256, adapter=adapter,
).collect_text()
import { Adapter } from 'inferlet';
const adapter = Adapter.open(model, 'my-adapter');
const text = await ctx
.generate(sampler, { maxTokens: 256, adapter })
.collectText();
Adapters compose with all other forward-pass primitives. You can attach an adapter and apply a constraint and sample with TopP on the same pass.
Lifecycle
Adapters are named at the engine level. The lifecycle methods follow the same shape across SDKs.
- Rust
- Python
- JavaScript
use inferlet::adapter::Adapter;
// Create an empty adapter
let a = Adapter::create(&model, "draft-adapter")?;
// Open an existing adapter
let a = Adapter::open(&model, "draft-adapter").ok_or("not found")?;
// Fork: O(1) copy-on-write clone under a new name
let a2 = a.fork("draft-adapter-v2");
// Save weights to a file (host filesystem)
a.save("/data/adapters/draft-v1.bin")?;
// Load weights back
let a = Adapter::create(&model, "draft-adapter")?;
a.load("/data/adapters/draft-v1.bin")?;
// Drop the registry slot (instance method, consumes the handle)
a.destroy();
from inferlet import Adapter
a = Adapter.create(model, "draft-adapter")
a = Adapter.open(model, "draft-adapter") # returns Adapter | None
a2 = a.fork("draft-adapter-v2") # fork takes the new name
a.save("/data/adapters/draft-v1.bin")
a.load("/data/adapters/draft-v1.bin")
# Python adapters drop on garbage-collect; no explicit destroy.
import { Adapter } from 'inferlet';
const a = Adapter.create(model, 'draft-adapter');
const a2 = a.fork('draft-adapter-v2'); // fork takes the new name
a.save('/data/adapters/draft-v1.bin');
a.load('/data/adapters/draft-v1.bin');
a.destroy(); // instance method
A few details that bite the first time:
fork(new_name)requires the new adapter's name. Adapters are registered at the engine level, so the fork needs an identity.destroy()is an instance method that consumes the handle (Rust) or releases the slot (JS). Python has no explicitdestroy; the slot drops on garbage-collect.Adapter::openreturnsOption<Adapter>(Rust) /Adapter | None(Python) /Adapter | undefined(JS). Check before use.
Like saved contexts, adapters persist across inferlet runs as long as the engine is up. They do not survive a server restart unless you save() to disk and load() back on startup.
Next
- Speculative decoding: the next layer down on the forward pass.
- The forward pass: the primitive an adapter attaches to.
- Pages: page rollback, useful for branching workloads on top of adapters.