Skip to main content

JavaScript SDK reference

The full inferlet SDK API for JavaScript and TypeScript. The Guide walks through how to use these APIs with runnable code; this page enumerates the surface.

Inferlet entry point

export async function main(input: Input): Promise<Output> {
return { message: `hello ${input.name ?? 'world'}` } as Output;
}

The JS inferlet runtime invokes a top-level export async function main(input). input is a parsed JSON object whose shape matches the manifest's [parameters] block. The return value is JSON-serialized into the Return event the client receives.

Throw to fail the run; the message becomes the Error event.

Imports

The whole SDK lives under inferlet:

import {
Model, Tokenizer, Context, Adapter,
Forward, Output, Generator, GenStep,
Sampler, Logits, Distribution, Logprob, Logprobs, Entropy,
Grammar, GrammarConstraint,
jsonSchema, anyJson, regex, ebnf, grammar,
chat, reasoning, tools,
runtime, scheduling, session, messaging, mcp,
} from 'inferlet';

Runtime

import { runtime } from 'inferlet';
FunctionReturnsDescription
runtime.models()string[]Names of every model the engine has loaded.
runtime.version()stringPie runtime version string.
runtime.instanceId()stringUnique identifier for this engine instance.
runtime.username()stringUsername of the user who launched the inferlet.

Model

import { Model } from 'inferlet';
MethodDescription
Model.load(name: string): ModelBind to a loaded model.
model.tokenizer(): TokenizerThe model's tokenizer.

Tokenizer

MethodReturnsDescription
tk.encode(text)Uint32ArrayText to token IDs.
tk.decode(tokens)stringToken IDs to text.
tk.vocabs()[Uint32Array, Uint8Array[]]Full vocabulary as parallel arrays.
tk.specialTokens()[Uint32Array, Uint8Array[]]Special token IDs.
tk.splitRegex()stringThe split regex used during BPE pre-tokenization.

Context

import { Context } from 'inferlet';

using ctx = new Context(model);

Context implements Disposable, so using (TC39 Explicit Resource Management) releases pages on scope exit.

Construction and lifecycle

MethodDescription
new Context(model)Fresh anonymous context.
Context.open(model, name): Context | undefinedClone a saved snapshot.
Context.take(model, name): Context | undefinedTake ownership of a snapshot (snapshot removed).
Context.delete(model, name): voidDrop a saved snapshot.
ctx.fork(): ContextCopy-on-write clone. O(1).
ctx.save(name): voidSnapshot under a user-chosen name.
ctx.snapshot(): stringSnapshot under a runtime-generated name. Returns the name.
ctx.release(): voidForce-destroy this context immediately.

Filling

MethodDescription
ctx.system(text): thisAdd a system message.
ctx.user(text): thisAdd a user message.
ctx.assistant(text): thisAdd a pre-filled assistant turn.
ctx.cue(): thisMark the current position as the model's start.
ctx.seal(): thisClose the current assistant turn.
ctx.append(tokens): thisAppend raw token IDs.
ctx.appendText(text): thisTokenize and append text.
await ctx.flush()Run prefill on buffered tokens; commit pages.

Inspection

Property / methodTypeDescription
ctx.modelModelThe bound model.
ctx.pageSizenumberTokens per KV page.
ctx.seqLennumberTotal committed + working tokens.
ctx.buffer()Uint32ArraySDK-side buffered tokens.

Truncate

MethodDescription
ctx.truncate(n)Drop the trailing n tokens from working pages.

The JS SDK does not expose raw page operations (reserve_working_pages, commit_working_pages, etc.); those are Rust-only.

Generator

import { Sampler } from 'inferlet';

const g = ctx.generate(Sampler.topP(0.6, 0.95), { maxTokens: 256 });

ctx.generate(sampler, options) returns a Generator. Drive it with one of the collectors, with for await, or manually.

GenerateOptions

FieldTypeDescription
maxTokensnumberStop after n tokens.
stopIterable<number>Stop-token IDs. With autoFlush: true (default), defaults to the chat template's stop tokens.
constrainSchema | Constraint | Array<Schema | Constraint>Attach one or more constraints. Multiple compose by AND-ing per-step BRLE masks.
logitMaskBrleStatic BRLE mask applied every step. Composes with constrain.
speculatorSpeculatorCustom speculator.
systemSpeculationbooleanUse the runtime's built-in n-gram drafter. Default false.
adapterAdapterLoRA adapter to apply.
zoSeednumberEvolution Strategies seed for every forward pass.
horizonnumberHint expected output length.
autoFlushbooleanWhen true (default), append cue() to the buffer before returning the Generator and use chat-template stop tokens by default.

speculator and systemSpeculation are mutually exclusive.

Builder methods

MethodDescription
g.maxTokens(n): thisHard cap on tokens.
g.stop(tokens): thisReplace the stop set.
g.addStop(tokens): thisAppend to the stop set.
g.constrain(c): thisAdd a constraint.
g.horizon(n): thisHint expected output length.
g.adapter(a): thisApply an adapter.
g.zoSeed(seed): thisSet an Evolution Strategies seed for every step.
g.probeEachStep(idx, probe): ProbeHandleAttach a probe at idx on every step.

Inspection

PropertyDescription
g.tokensGeneratedTokens accepted so far.
g.isDonetrue after termination.

Collectors

MethodReturnsDescription
await g.collectTokens()Uint32ArrayDrain; return all tokens.
await g.collectText()stringDrain, run a chat decoder internally, return the assembled string.
await g.collectJson<T>(opts?)Promise<T>Add a JSON-schema constraint, drain, parse. Options: schema?: string, parse?: (val) => T.

Per-step iteration

for await (const step of g) {
const out = await step.execute();
// inspect; optionally call g.accept(...)
}
GenStep methodDescription
step.clearSampler(): thisDrop the auto-attached sampler.
step.probe(idx, probe): ProbeHandleAttach an extra probe for this iteration.
await step.execute(): Promise<Output>Run the forward pass.
Generator methodDescription
g.accept(tokens): Uint32ArrayRegister manually-sampled tokens.

Forward

const fwd = ctx.forward();
fwd.input(tokenIds);
const h = fwd.sample([0], Sampler.argmax());
const out = await fwd.execute();
const token = out.token(h);

Builder methods

MethodDescription
fwd.input(tokens): thisToken IDs with auto-derived positions.
fwd.inputAt(tokens, positions): thisToken IDs with explicit positions.
fwd.attentionMask(masks): thisPer-input-token attention masks.
fwd.mask(brle): thisLogit mask.
fwd.sample(indices, sampler): SampleHandleAttach a sampler.
fwd.probe(idx, probe): ProbeHandleAttach a probe.
fwd.adapter(a): thisApply an adapter.
fwd.zoSeed(seed): thisSet an Evolution Strategies seed for this pass.
await fwd.execute(): Promise<Output>Run the pass.

Inspection

MethodDescription
fwd.startPosition()Position the first auto-input token will occupy.

Output access

MethodReturnsUse after
out.token(h)number | undefinedSingle-index sampler.
out.tokensAt(h)Uint32ArrayMulti-index sampler.
out.distribution(h)[Uint32Array, Float32Array] | undefinedDistribution(...) probe.
out.logits(h)Uint8Array | undefinedLogits() probe. Length vocab_size * 4, native-endian f32.
out.logprobs(h)Float32Array | undefinedLogprob or Logprobs probe.
out.entropy(h)number | undefinedEntropy() probe.
out.tokensUint32ArrayGenerator-accepted tokens this step (post stop / max-tokens truncation). Empty for raw Forward.execute().
out.autoSamplerSampleHandle | undefinedHandle for the Generator's auto-attached sampler. undefined for raw Forward and after clearSampler().
out.rawunderlying WIT objectProperty. The raw slot list + speculative side channel.

Samplers

import { Sampler } from 'inferlet';
ConstructorDescription
Sampler.argmax()Greedy.
Sampler.topP(temperature, p)Nucleus sampling.
Sampler.topK(temperature, k)Top-k sampling.
Sampler.minP(temperature, p)Min-p sampling.
Sampler.topKTopP(temperature, k, p)Top-k filter, then nucleus.
Sampler.multinomial(temperature, draws)Multinomial draws.

Probes

import { Logits, Distribution, Logprob, Logprobs, Entropy } from 'inferlet';
ProbeOutput accessorReturns
Logits()out.logits(h)Uint8Array (length vocab_size * 4, native-endian f32).
Distribution(temperature, k)out.distribution(h)[ids, probs]. k=0 for full vocab.
Logprob(token)out.logprobs(h)Length-1 Float32Array.
Logprobs(tokens)out.logprobs(h)Length-K Float32Array (input order).
Entropy()out.entropy(h)number.

Constraints

Schema

import { jsonSchema, anyJson, regex, ebnf, grammar } from 'inferlet';
FunctionReturnsDescription
anyJson()SchemaAny valid JSON.
jsonSchema(s)SchemaJSON conforming to a JSON Schema string.
regex(p)SchemaStrings matching a regex.
ebnf(g)SchemaCustom EBNF grammar.
grammar(g)SchemaPre-compiled Grammar.

Pass to ctx.generate(sampler, { constrain }) or g.constrain(...).

Custom Schema

class MySchema {
buildConstraint(model: Model): GrammarConstraint {
return GrammarConstraint.json(model);
}
}

Any object with a buildConstraint(model) method satisfies the Schema interface.

Custom Constraint

import type { Constraint } from 'inferlet';

class MyConstraint implements Constraint {
step(accepted: Uint32Array): Uint32Array {
// return a BRLE-encoded logit mask
}
}

Return an empty Uint32Array for "no restriction this step."

Grammar / GrammarConstraint / Matcher

import { Grammar, GrammarConstraint, Matcher } from 'inferlet';
MethodDescription
Grammar.fromJsonSchema(s)Build from JSON Schema.
Grammar.json()Free-form JSON.
Grammar.fromRegex(p)Regex pattern.
Grammar.fromEbnf(g)EBNF (Lark) grammar.
GrammarConstraint.fromGrammar(g, model)Pre-compiled grammar.
GrammarConstraint.fromJsonSchema(s, model)JSON Schema.
GrammarConstraint.json(model)Free-form JSON.
GrammarConstraint.fromRegex(p, model)Regex.
GrammarConstraint.fromEbnf(g, model)EBNF.
new Matcher(grammar, tokenizer)Stateful walker.
m.acceptTokens(ids)Advance the matcher state.
m.nextTokenLogitMask()BRLE mask.
m.isTerminated()Whether the matcher reached a terminal state.
m.reset()Reset to initial state.

Speculative decoding

import type { Speculator } from 'inferlet';

class MySpec implements Speculator {
draft(): [Uint32Array, Uint32Array] {
return [new Uint32Array(), new Uint32Array()];
}

accept(tokens: Uint32Array): void {
return;
}

rollback(n: number): void {
return;
}

reset(): void {
return;
}
}

Pass via ctx.generate(sampler, { speculator: new MySpec() }). systemSpeculation: true opts into the runtime's built-in n-gram drafter (mutually exclusive with speculator).

Adapters

import { Adapter } from 'inferlet';
MethodDescription
Adapter.create(model, name): AdapterCreate a new LoRA overlay.
Adapter.open(model, name): Adapter | undefinedOpen an existing one.
a.fork(newName): AdapterCopy under a new name.
a.save(path): voidSerialize to disk.
a.load(path): voidLoad weights from disk.
a.destroy(): voidDrop the registry slot.

Adapter implements Disposable. Use with using:

using a = Adapter.create(model, 'draft');

Decoders (parsers)

All three follow the same shape: new chat.Decoder(model) constructor, feed(tokens) returning an event, reset().

chat.Decoder

import { chat } from 'inferlet';

const dec = new chat.Decoder(model);
const ev = dec.feed(tokens);
if (ev.type === 'delta') { /* ... */ }
Event typeFieldsMeaning
'idle'(none)No semantic boundary.
'delta'text: stringStreaming visible text.
'done'text: stringEnd of turn.
'interrupt'token: numberTemplate control token.

Helpers: chat.system(model, msg), chat.user(...), chat.assistant(...), chat.cue(model), chat.seal(model), chat.stopTokens(model) — return token-ID arrays for use with ctx.append(...).

reasoning.Decoder

import { reasoning } from 'inferlet';
Event typeFieldsMeaning
'idle'(none)No reasoning content.
'start'(none)Entering a reasoning block.
'delta'text: stringReasoning text.
'end'text: stringReasoning block closed.

tools.Decoder

import { tools } from 'inferlet';
Event typeFieldsMeaning
'start'(none)Tool call assembling.
'call'name: string, args: stringCall complete. args is JSON-encoded.

Helpers:

FunctionDescription
tools.equipPrefix(model, schemas)Tokens that register the tool schemas in the chat template. Append before your user message via Context.append.
tools.answerPrefix(model, name, value)Tokens that frame a tool result for the next turn. value may be a string, object, or array (non-strings are JSON-encoded).
tools.nativeMatcher(model, schemas)Build a Matcher over the model's native tool-call format. Returns undefined if the model has no enforceable format — fall through to free-form generation + your own parser. Wrap with new GrammarConstraint(matcher) to pass to Generator.constrain.

I/O

Session

import { session } from 'inferlet';
FunctionReturnsDescription
session.send(message)Send to the client. Strings go through verbatim; other types JSON-encoded.
session.sendFile(data: Uint8Array)Send a binary blob.
await session.receive()stringWait for the next inbound message.
await session.receiveFile()Uint8ArrayWait for the next inbound file.

Signals from proc.signal(...) arrive through session.receive.

Messaging

import { messaging, Subscription } from 'inferlet';
FunctionDescription
messaging.broadcast(topic, message)Publish to every subscriber.
messaging.subscribe(topic): SubscriptionOpen a subscription.
messaging.push(topic, message)Push onto a queue.
await messaging.pull(topic): stringWait for the next queued message.

Subscription (implements AsyncIterable<string> and Disposable):

MethodDescription
await sub.next(): Promise<string | undefined>Wait for the next broadcast.
sub.unsubscribe(): voidDrop the subscription.
for await (const msg of sub)Async-iterable shorthand.

MCP

import { mcp, McpSession } from 'inferlet';
FunctionReturnsDescription
mcp.availableServers()string[]Names of registered servers.
mcp.connect(name): McpSessionsessionOpen a session.

McpSession:

MethodReturnsDescription
s.listTools()string (JSON)Raw tools/list result.
s.callTool(name, args)string (JSON)Raw tools/call result.
s.listResources()string (JSON)Raw resources/list result.
s.readResource(uri)string (JSON)Raw resources/read result.
s.listPrompts()string (JSON)Raw prompts/list result.
s.getPrompt(name, args)string (JSON)Raw prompts/get result.

Scheduling

import { scheduling } from 'inferlet';
FunctionReturnsDescription
scheduling.balance(model)numberCurrent credit balance.
scheduling.rent(ctx)numberClearing price from the most recent auction.
scheduling.dividend(model)numberEndowment-proportional dividend last step.
scheduling.latency(ctx)numberPer-tick wall time in seconds.
scheduling.price()numberCost in credits per new KV page.

To override the default bid: ctx.setBid(value). To skip bidding for a scope: using _ = ctx.idle();.

Filesystem and HTTP

The JS SDK does not currently expose Pie-specific HTTP or filesystem APIs. Use Node's built-in fetch and node:fs/promises against the host-preopened /scratch directory. Native HTTP support inside the inferlet sandbox is in progress; see I/O / HTTP for the intended API shape.