📄️ Hello World
In this tutorial, we will write and run a simple "Hello, World!" inferlet.
📄️ Text Completion
Now that you have learned the basics of writing and running an inferlet, let's explore how to use Pie for actual LLM tasks, starting with text completion.
📄️ Interactive Chat
Oftentimes, the generation process has to be an open-loop, where the model has to interact with the user or other agents frequently. In this tutorial, we will learn how to make an inferlet to do message passing across multiple turns of interaction, between the user and other running inferlets.
📄️ Parallel Generation
Many deliberate prompting strategies, such as best-of-N, tree-of-thoughts, and graph-of-thoughts, require multiple parallel calls to the language model. Pie makes it highly efficient to implement these strategies.
📄️ Prompt Caching
Oftentimes, you may want to reuse the KV cache across multiple inferlet invocations to save computation and latency. Pie provides built-in support for inter-inferlet KV cache reuse. In this tutorial, we will cover the preliminary concepts about how Pie virtually manages the KV cache, and how you can export/import KV cache pages to/from different inferlets.
📄️ Function Calling
Pie leverages WebAssembly System Interface (WASI) to enable inferlets to directly initiate HTTP requests to external APIs. This saves the overhead of routing API calls to application servers, thereby reducing latency.