JavaScript SDK reference

The full inferlet SDK API for JavaScript and TypeScript. The Guide walks through how to use these APIs with runnable code; this page enumerates the surface.

Inferlet entry point

export async function main(input: Input): Promise<Output> {
    return { message: `hello ${input.name ?? 'world'}` } as Output;
}

The JS inferlet runtime invokes a top-level export async function main(input). input is a parsed JSON object whose shape matches the manifest's [parameters] block. The return value is JSON-serialized into the Return event the client receives.

Throw to fail the run; the message becomes the Error event.

Imports

The whole SDK lives under inferlet:

import {
    Model, Tokenizer, Context, Adapter,
    Forward, Output, Generator, GenStep,
    Sampler, Logits, Distribution, Logprob, Logprobs, Entropy,
    Grammar, GrammarConstraint,
    jsonSchema, anyJson, regex, ebnf, grammar,
    chat, reasoning, tools,
    runtime, scheduling, session, messaging, mcp,
} from 'inferlet';

Runtime

import { runtime } from 'inferlet';

Function	Returns	Description
`runtime.models()`	`string[]`	Names of every model the engine has loaded.
`runtime.version()`	`string`	Pie runtime version string.
`runtime.instanceId()`	`string`	Unique identifier for this engine instance.
`runtime.username()`	`string`	Username of the user who launched the inferlet.

Model

import { Model } from 'inferlet';

Method	Description
`Model.load(name: string): Model`	Bind to a loaded model.
`model.tokenizer(): Tokenizer`	The model's tokenizer.

Tokenizer

Method	Returns	Description
`tk.encode(text)`	`Uint32Array`	Text to token IDs.
`tk.decode(tokens)`	`string`	Token IDs to text.
`tk.vocabs()`	`[Uint32Array, Uint8Array[]]`	Full vocabulary as parallel arrays.
`tk.specialTokens()`	`[Uint32Array, Uint8Array[]]`	Special token IDs.
`tk.splitRegex()`	`string`	The split regex used during BPE pre-tokenization.

Context

import { Context } from 'inferlet';

using ctx = new Context(model);

Context implements Disposable, so using (TC39 Explicit Resource Management) releases pages on scope exit.

Construction and lifecycle

Method	Description
`new Context(model)`	Fresh anonymous context.
`Context.open(model, name): Context \| undefined`	Clone a saved snapshot.
`Context.take(model, name): Context \| undefined`	Take ownership of a snapshot (snapshot removed).
`Context.delete(model, name): void`	Drop a saved snapshot.
`ctx.fork(): Context`	Copy-on-write clone. O(1).
`ctx.save(name): void`	Snapshot under a user-chosen name.
`ctx.snapshot(): string`	Snapshot under a runtime-generated name. Returns the name.
`ctx.release(): void`	Force-destroy this context immediately.

Filling

Method	Description
`ctx.system(text): this`	Add a system message.
`ctx.user(text): this`	Add a user message.
`ctx.assistant(text): this`	Add a pre-filled assistant turn.
`ctx.cue(): this`	Mark the current position as the model's start.
`ctx.seal(): this`	Close the current assistant turn.
`ctx.append(tokens): this`	Append raw token IDs.
`ctx.appendText(text): this`	Tokenize and append text.
`await ctx.flush()`	Run prefill on buffered tokens; commit pages.

Inspection

Property / method	Type	Description
`ctx.model`	`Model`	The bound model.
`ctx.pageSize`	`number`	Tokens per KV page.
`ctx.seqLen`	`number`	Total committed + working tokens.
`ctx.buffer()`	`Uint32Array`	SDK-side buffered tokens.

Truncate

Method	Description
`ctx.truncate(n)`	Drop the trailing `n` tokens from working pages.

The JS SDK does not expose raw page operations (reserve_working_pages, commit_working_pages, etc.); those are Rust-only.

Generator

import { Sampler } from 'inferlet';

const g = ctx.generate(Sampler.topP(0.6, 0.95), { maxTokens: 256 });

ctx.generate(sampler, options) returns a Generator. Drive it with one of the collectors, with for await, or manually.

`GenerateOptions`

Field	Type	Description
`maxTokens`	`number`	Stop after `n` tokens.
`stop`	`Iterable<number>`	Stop-token IDs. With `autoFlush: true` (default), defaults to the chat template's stop tokens.
`constrain`	`Schema \| Constraint \| Array<Schema \| Constraint>`	Attach one or more constraints. Multiple compose by AND-ing per-step BRLE masks.
`logitMask`	`Brle`	Static BRLE mask applied every step. Composes with `constrain`.
`speculator`	`Speculator`	Custom speculator.
`systemSpeculation`	`boolean`	Use the runtime's built-in n-gram drafter. Default `false`.
`adapter`	`Adapter`	LoRA adapter to apply.
`zoSeed`	`number`	Evolution Strategies seed for every forward pass.
`horizon`	`number`	Hint expected output length.
`autoFlush`	`boolean`	When `true` (default), append `cue()` to the buffer before returning the Generator and use chat-template stop tokens by default.

speculator and systemSpeculation are mutually exclusive.

Builder methods

Method	Description
`g.maxTokens(n): this`	Hard cap on tokens.
`g.stop(tokens): this`	Replace the stop set.
`g.addStop(tokens): this`	Append to the stop set.
`g.constrain(c): this`	Add a constraint.
`g.horizon(n): this`	Hint expected output length.
`g.adapter(a): this`	Apply an adapter.
`g.zoSeed(seed): this`	Set an Evolution Strategies seed for every step.
`g.probeEachStep(idx, probe): ProbeHandle`	Attach a probe at `idx` on every step.

Inspection

Property	Description
`g.tokensGenerated`	Tokens accepted so far.
`g.isDone`	`true` after termination.

Collectors

Method	Returns	Description
`await g.collectTokens()`	`Uint32Array`	Drain; return all tokens.
`await g.collectText()`	`string`	Drain, run a chat decoder internally, return the assembled string.
`await g.collectJson<T>(opts?)`	`Promise<T>`	Add a JSON-schema constraint, drain, parse. Options: `schema?: string`, `parse?: (val) => T`.

Per-step iteration

for await (const step of g) {
    const out = await step.execute();
    // inspect; optionally call g.accept(...)
}

`GenStep` method	Description
`step.clearSampler(): this`	Drop the auto-attached sampler.
`step.probe(idx, probe): ProbeHandle`	Attach an extra probe for this iteration.
`await step.execute(): Promise<Output>`	Run the forward pass.

Generator method	Description
`g.accept(tokens): Uint32Array`	Register manually-sampled tokens.

Forward

const fwd = ctx.forward();
fwd.input(tokenIds);
const h = fwd.sample([0], Sampler.argmax());
const out = await fwd.execute();
const token = out.token(h);

Builder methods

Method	Description
`fwd.input(tokens): this`	Token IDs with auto-derived positions.
`fwd.inputAt(tokens, positions): this`	Token IDs with explicit positions.
`fwd.attentionMask(masks): this`	Per-input-token attention masks.
`fwd.mask(brle): this`	Logit mask.
`fwd.sample(indices, sampler): SampleHandle`	Attach a sampler.
`fwd.probe(idx, probe): ProbeHandle`	Attach a probe.
`fwd.adapter(a): this`	Apply an adapter.
`fwd.zoSeed(seed): this`	Set an Evolution Strategies seed for this pass.
`await fwd.execute(): Promise<Output>`	Run the pass.

Inspection

Method	Description
`fwd.startPosition()`	Position the first auto-input token will occupy.

Output access

Method	Returns	Use after
`out.token(h)`	`number \| undefined`	Single-index sampler.
`out.tokensAt(h)`	`Uint32Array`	Multi-index sampler.
`out.distribution(h)`	`[Uint32Array, Float32Array] \| undefined`	`Distribution(...)` probe.
`out.logits(h)`	`Uint8Array \| undefined`	`Logits()` probe. Length `vocab_size * 4`, native-endian f32.
`out.logprobs(h)`	`Float32Array \| undefined`	`Logprob` or `Logprobs` probe.
`out.entropy(h)`	`number \| undefined`	`Entropy()` probe.
`out.tokens`	`Uint32Array`	Generator-accepted tokens this step (post stop / max-tokens truncation). Empty for raw `Forward.execute()`.
`out.autoSampler`	`SampleHandle \| undefined`	Handle for the Generator's auto-attached sampler. `undefined` for raw Forward and after `clearSampler()`.
`out.raw`	underlying WIT object	Property. The raw slot list + speculative side channel.

Samplers

import { Sampler } from 'inferlet';

Constructor	Description
`Sampler.argmax()`	Greedy.
`Sampler.topP(temperature, p)`	Nucleus sampling.
`Sampler.topK(temperature, k)`	Top-k sampling.
`Sampler.minP(temperature, p)`	Min-p sampling.
`Sampler.topKTopP(temperature, k, p)`	Top-k filter, then nucleus.
`Sampler.multinomial(temperature, draws)`	Multinomial draws.

Probes

import { Logits, Distribution, Logprob, Logprobs, Entropy } from 'inferlet';

Probe	Output accessor	Returns
`Logits()`	`out.logits(h)`	`Uint8Array` (length `vocab_size * 4`, native-endian f32).
`Distribution(temperature, k)`	`out.distribution(h)`	`[ids, probs]`. `k=0` for full vocab.
`Logprob(token)`	`out.logprobs(h)`	Length-1 `Float32Array`.
`Logprobs(tokens)`	`out.logprobs(h)`	Length-K `Float32Array` (input order).
`Entropy()`	`out.entropy(h)`	`number`.

Constraints

Schema

import { jsonSchema, anyJson, regex, ebnf, grammar } from 'inferlet';

Function	Returns	Description
`anyJson()`	`Schema`	Any valid JSON.
`jsonSchema(s)`	`Schema`	JSON conforming to a JSON Schema string.
`regex(p)`	`Schema`	Strings matching a regex.
`ebnf(g)`	`Schema`	Custom EBNF grammar.
`grammar(g)`	`Schema`	Pre-compiled `Grammar`.

Pass to ctx.generate(sampler, { constrain }) or g.constrain(...).

Custom Schema

class MySchema {
    buildConstraint(model: Model): GrammarConstraint {
        return GrammarConstraint.json(model);
    }
}

Any object with a buildConstraint(model) method satisfies the Schema interface.

Custom Constraint

import type { Constraint } from 'inferlet';

class MyConstraint implements Constraint {
    step(accepted: Uint32Array): Uint32Array {
        // return a BRLE-encoded logit mask
    }
}

Return an empty Uint32Array for "no restriction this step."

Grammar / GrammarConstraint / Matcher

import { Grammar, GrammarConstraint, Matcher } from 'inferlet';

Method	Description
`Grammar.fromJsonSchema(s)`	Build from JSON Schema.
`Grammar.json()`	Free-form JSON.
`Grammar.fromRegex(p)`	Regex pattern.
`Grammar.fromEbnf(g)`	EBNF (Lark) grammar.
`GrammarConstraint.fromGrammar(g, model)`	Pre-compiled grammar.
`GrammarConstraint.fromJsonSchema(s, model)`	JSON Schema.
`GrammarConstraint.json(model)`	Free-form JSON.
`GrammarConstraint.fromRegex(p, model)`	Regex.
`GrammarConstraint.fromEbnf(g, model)`	EBNF.
`new Matcher(grammar, tokenizer)`	Stateful walker.
`m.acceptTokens(ids)`	Advance the matcher state.
`m.nextTokenLogitMask()`	BRLE mask.
`m.isTerminated()`	Whether the matcher reached a terminal state.
`m.reset()`	Reset to initial state.

Speculative decoding

import type { Speculator } from 'inferlet';

class MySpec implements Speculator {
    draft(): [Uint32Array, Uint32Array] {
        return [new Uint32Array(), new Uint32Array()];
    }

    accept(tokens: Uint32Array): void {
        return;
    }

    rollback(n: number): void {
        return;
    }

    reset(): void {
        return;
    }
}

Pass via ctx.generate(sampler, { speculator: new MySpec() }). systemSpeculation: true opts into the runtime's built-in n-gram drafter (mutually exclusive with speculator).

Adapters

import { Adapter } from 'inferlet';

Method	Description
`Adapter.create(model, name): Adapter`	Create a new LoRA overlay.
`Adapter.open(model, name): Adapter \| undefined`	Open an existing one.
`a.fork(newName): Adapter`	Copy under a new name.
`a.save(path): void`	Serialize to disk.
`a.load(path): void`	Load weights from disk.
`a.destroy(): void`	Drop the registry slot.

Adapter implements Disposable. Use with using:

using a = Adapter.create(model, 'draft');

Decoders (parsers)

All three follow the same shape: new chat.Decoder(model) constructor, feed(tokens) returning an event, reset().

chat.Decoder

import { chat } from 'inferlet';

const dec = new chat.Decoder(model);
const ev = dec.feed(tokens);
if (ev.type === 'delta') { /* ... */ }

Event `type`	Fields	Meaning
`'idle'`	(none)	No semantic boundary.
`'delta'`	`text: string`	Streaming visible text.
`'done'`	`text: string`	End of turn.
`'interrupt'`	`token: number`	Template control token.

Helpers: chat.system(model, msg), chat.user(...), chat.assistant(...), chat.cue(model), chat.seal(model), chat.stopTokens(model) — return token-ID arrays for use with ctx.append(...).

reasoning.Decoder

import { reasoning } from 'inferlet';

Event `type`	Fields	Meaning
`'idle'`	(none)	No reasoning content.
`'start'`	(none)	Entering a reasoning block.
`'delta'`	`text: string`	Reasoning text.
`'end'`	`text: string`	Reasoning block closed.

tools.Decoder

import { tools } from 'inferlet';

Event `type`	Fields	Meaning
`'start'`	(none)	Tool call assembling.
`'call'`	`name: string`, `args: string`	Call complete. `args` is JSON-encoded.

Helpers:

Function	Description
`tools.equipPrefix(model, schemas)`	Tokens that register the tool schemas in the chat template. Append before your user message via `Context.append`.
`tools.answerPrefix(model, name, value)`	Tokens that frame a tool result for the next turn. `value` may be a string, object, or array (non-strings are JSON-encoded).
`tools.nativeMatcher(model, schemas)`	Build a `Matcher` over the model's native tool-call format. Returns `undefined` if the model has no enforceable format — fall through to free-form generation + your own parser. Wrap with `new GrammarConstraint(matcher)` to pass to `Generator.constrain`.

I/O

Session

import { session } from 'inferlet';

Function	Returns	Description
`session.send(message)`	—	Send to the client. Strings go through verbatim; other types JSON-encoded.
`session.sendFile(data: Uint8Array)`	—	Send a binary blob.
`await session.receive()`	`string`	Wait for the next inbound message.
`await session.receiveFile()`	`Uint8Array`	Wait for the next inbound file.

Signals from proc.signal(...) arrive through session.receive.

Messaging

import { messaging, Subscription } from 'inferlet';

Function	Description
`messaging.broadcast(topic, message)`	Publish to every subscriber.
`messaging.subscribe(topic): Subscription`	Open a subscription.
`messaging.push(topic, message)`	Push onto a queue.
`await messaging.pull(topic): string`	Wait for the next queued message.

Subscription (implements AsyncIterable<string> and Disposable):

Method	Description
`await sub.next(): Promise<string \| undefined>`	Wait for the next broadcast.
`sub.unsubscribe(): void`	Drop the subscription.
`for await (const msg of sub)`	Async-iterable shorthand.

MCP

import { mcp, McpSession } from 'inferlet';

Function	Returns	Description
`mcp.availableServers()`	`string[]`	Names of registered servers.
`mcp.connect(name): McpSession`	session	Open a session.

McpSession:

Method	Returns	Description
`s.listTools()`	`string` (JSON)	Raw `tools/list` result.
`s.callTool(name, args)`	`string` (JSON)	Raw `tools/call` result.
`s.listResources()`	`string` (JSON)	Raw `resources/list` result.
`s.readResource(uri)`	`string` (JSON)	Raw `resources/read` result.
`s.listPrompts()`	`string` (JSON)	Raw `prompts/list` result.
`s.getPrompt(name, args)`	`string` (JSON)	Raw `prompts/get` result.

Scheduling

import { scheduling } from 'inferlet';

Function	Returns	Description
`scheduling.balance(model)`	`number`	Current credit balance.
`scheduling.rent(ctx)`	`number`	Clearing price from the most recent auction.
`scheduling.dividend(model)`	`number`	Endowment-proportional dividend last step.
`scheduling.latency(ctx)`	`number`	Per-tick wall time in seconds.
`scheduling.price()`	`number`	Cost in credits per new KV page.

To override the default bid: ctx.setBid(value). To skip bidding for a scope: using _ = ctx.idle();.

Filesystem and HTTP

The JS SDK does not currently expose Pie-specific HTTP or filesystem APIs. Use Node's built-in fetch and node:fs/promises against the host-preopened /scratch directory. Native HTTP support inside the inferlet sandbox is in progress; see I/O / HTTP for the intended API shape.

Rust SDK reference: the canonical SDK.
Python SDK reference: the Python counterpart.
Inferlet manifest: the Pie.toml schema.

Inferlet entry point​

Imports​

Runtime​

Model​

Tokenizer​

Context​

Construction and lifecycle​

Filling​

Inspection​

Truncate​

Generator​

GenerateOptions​

Builder methods​

Inspection​

Collectors​

Per-step iteration​

Forward​

Builder methods​

Inspection​

Output access​

Samplers​

Probes​

Constraints​

Schema​

Custom Schema​

Custom Constraint​

Grammar / GrammarConstraint / Matcher​

Speculative decoding​

Adapters​

Decoders (parsers)​

chat.Decoder​

reasoning.Decoder​

tools.Decoder​

I/O​

Session​

Messaging​

MCP​

Scheduling​

Filesystem and HTTP​

Related​

Inferlet entry point

Imports

Runtime

Model

Tokenizer

Context

Construction and lifecycle

Filling

Inspection

Truncate

Generator

`GenerateOptions`

Builder methods

Inspection

Collectors

Per-step iteration

Forward

Builder methods

Inspection

Output access

Samplers

Probes

Constraints

Schema

Custom Schema

Custom Constraint

Grammar / GrammarConstraint / Matcher

Speculative decoding

Adapters

Decoders (parsers)

chat.Decoder

reasoning.Decoder

tools.Decoder

I/O

Session

Messaging

MCP

Scheduling

Filesystem and HTTP

Related