Text Completion
Now that you have learned the basics of writing and running an inferlet, let's explore how to use Pie for actual LLM tasks, starting with text completion.
First, let's create an empty Rust project for our text completion inferlet:
cargo new text_completion
cd text_completion
Don't forget to add the inferlet
dependency to your Cargo.toml
.
Input and output
Inferlets can accept input command-line arguments and return output values.
The mut args: Args
parameter in the main
function provides access to command-line arguments. The Args struct from the pico-args
crate is used to parse the arguments.
In this example, we will accept two arguments: --prompt
(or -p
) for the text prompt and --max-tokens
(or -n
) for the maximum number of tokens to generate. The inferlet will return the generated text as a string.
use inferlet::{Args, Result};
#[inferlet::main]
async fn main(mut args: Args) -> Result<String> {
let prompt: String = args.value_from_str(["-p", "--prompt"])?;
let max_num_outputs: usize = args.value_from_str(["-n", "--max-tokens"]).unwrap_or(256);
let final_text = String::from("This is a placeholder for the generated text.");
Ok(final_text)
}
You can pass these arguments when running the inferlet using the Pie CLI or the Python/JavaScript client APIs.
Passing arguments via the Pie CLI
Passing arguments via the Pie CLI is straightforward. Use --
to separate the inferlet path from the arguments.
pie run target/wasm32-wasip2/release/text_completion.wasm -- --prompt "Once upon a time" --max-tokens 50
Passing arguments via the Python API
You can also pass arguments using the Python API. The arguments
parameter in the launch_instance
method accepts a list of strings representing the command-line arguments.
await client.launch_instance(program_hash, arguments=['--prompt', 'Once upon a time', '--max-tokens', '50'])
The same applies to the JavaScript API.
Text completion with standard library
To perform text completion, we will use the Pie standard library to interact with the LLM. The standard library provides abstractions for model loading, context management, token sampling, and stop conditions.
For instance, we can use the get_auto_model
function to load the first available model, and the create_context
method to create a new inference context. This context provides simplified context management and text generation through fill
and generate
methods.
use inferlet::stop_condition::{StopCondition, ends_with_any, max_len};
use inferlet::{Args, Result, Sampler, get_auto_model};
#[inferlet::main]
async fn main(mut args: Args) -> Result<String> {
let prompt: String = args.value_from_str(["-p", "--prompt"])?;
let max_num_outputs: usize = args.value_from_str(["-n", "--max-tokens"]).unwrap_or(256);
let model = get_auto_model();
let mut ctx = model.create_context();
ctx.fill_system("You are a helpful, respectful and honest assistant.");
ctx.fill_user(&prompt);
let sampler = Sampler::top_p(0.6, 0.95);
let stop_cond = max_len(max_num_outputs).or(ends_with_any(model.eos_tokens()));
let final_text = ctx.generate(sampler, stop_cond).await;
Ok(final_text)
}
Breaking down the generate
call
The ctx.generate
method is a high-level helper that combines several steps into one. You can replace the ctx.generate
call with the following code to see how it works under the hood:
let mut generated_token_ids = Vec::new();
let tokenizer = model.get_tokenizer();
// The autoregressive generation loop
loop {
let next_token_id = ctx.decode_step(&sampler).await;
ctx.fill_token(next_token_id);
generated_token_ids.push(next_token_id);
if stop_cond.check(&generated_token_ids) {
break;
}
}
tokenizer.detokenize(&generated_token_ids)
Visit the API reference for more details on the standard library.