Tutorial | Pie

📄️ Hello World

In this tutorial, we will write and run a simple "Hello, World!" inferlet.

📄️ Text Completion

Now that you have learned the basics of writing and running an inferlet, let's explore how to use Pie for actual LLM tasks, starting with text completion.

Oftentimes, the generation process has to be an open-loop, where the model has to interact with the user or other agents frequently. In this tutorial, we will learn how to make an inferlet to do message passing across multiple turns of interaction, between the user and other running inferlets.

📄️ Parallel Generation

Many deliberate prompting strategies, such as best-of-N, tree-of-thoughts, and graph-of-thoughts, require multiple parallel calls to the language model. Pie makes it highly efficient to implement these strategies.

📄️ Prompt Caching

Oftentimes, you may want to reuse the KV cache across multiple inferlet invocations to save computation and latency. Pie provides built-in support for inter-inferlet KV cache reuse. In this tutorial, we will cover the preliminary concepts about how Pie virtually manages the KV cache, and how you can export/import KV cache pages to/from different inferlets.

📄️ Function Calling

Pie leverages WebAssembly System Interface (WASI) to enable inferlets to directly initiate HTTP requests to external APIs. This saves the overhead of routing API calls to application servers, thereby reducing latency.

📄️ Hello World

📄️ Text Completion

📄️ Interactive Chat

📄️ Parallel Generation

📄️ Prompt Caching

📄️ Function Calling