Skip to main content

Text Completion

Now that you have learned the basics of writing and running an inferlet, let's explore how to use Pie for actual LLM tasks, starting with text completion.

First, let's create an empty Rust project for our text completion inferlet:

cargo new text_completion
cd text_completion

Don't forget to add the inferlet dependency to your Cargo.toml.

Input and output

Inferlets can accept input command-line arguments and return output values. The mut args: Args parameter in the main function provides access to command-line arguments. The Args struct from the pico-args crate is used to parse the arguments.

In this example, we will accept two arguments: --prompt (or -p) for the text prompt and --max-tokens (or -n) for the maximum number of tokens to generate. The inferlet will return the generated text as a string.

use inferlet::{Args, Result};

#[inferlet::main]
async fn main(mut args: Args) -> Result<String> {
let prompt: String = args.value_from_str(["-p", "--prompt"])?;
let max_num_outputs: usize = args.value_from_str(["-n", "--max-tokens"]).unwrap_or(256);

let final_text = String::from("This is a placeholder for the generated text.");

Ok(final_text)
}

You can pass these arguments when running the inferlet using the Pie CLI or the Python/JavaScript client APIs.

Passing arguments via the Pie CLI

Passing arguments via the Pie CLI is straightforward. Use -- to separate the inferlet path from the arguments.

pie run target/wasm32-wasip2/release/text_completion.wasm -- --prompt "Once upon a time" --max-tokens 50

Passing arguments via the Python API

You can also pass arguments using the Python API. The arguments parameter in the launch_instance method accepts a list of strings representing the command-line arguments.

await client.launch_instance(program_hash, arguments=['--prompt', 'Once upon a time', '--max-tokens', '50'])

The same applies to the JavaScript API.

Text completion with standard library

To perform text completion, we will use the Pie standard library to interact with the LLM. The standard library provides abstractions for model loading, context management, token sampling, and stop conditions.

For instance, we can use the get_auto_model function to load the first available model, and the create_context method to create a new inference context. This context provides simplified context management and text generation through fill and generate methods.

use inferlet::stop_condition::{StopCondition, ends_with_any, max_len};
use inferlet::{Args, Result, Sampler, get_auto_model};

#[inferlet::main]
async fn main(mut args: Args) -> Result<String> {
let prompt: String = args.value_from_str(["-p", "--prompt"])?;
let max_num_outputs: usize = args.value_from_str(["-n", "--max-tokens"]).unwrap_or(256);

let model = get_auto_model();
let mut ctx = model.create_context();

ctx.fill_system("You are a helpful, respectful and honest assistant.");
ctx.fill_user(&prompt);

let sampler = Sampler::top_p(0.6, 0.95);
let stop_cond = max_len(max_num_outputs).or(ends_with_any(model.eos_tokens()));

let final_text = ctx.generate(sampler, stop_cond).await;

Ok(final_text)
}

Breaking down the generate call

The ctx.generate method is a high-level helper that combines several steps into one. You can replace the ctx.generate call with the following code to see how it works under the hood:

let mut generated_token_ids = Vec::new();
let tokenizer = model.get_tokenizer();

// The autoregressive generation loop
loop {
let next_token_id = ctx.decode_step(&sampler).await;

ctx.fill_token(next_token_id);

generated_token_ids.push(next_token_id);

if stop_cond.check(&generated_token_ids) {
break;
}
}

tokenizer.detokenize(&generated_token_ids)

Visit the API reference for more details on the standard library.