Server Mode
While pie run is great for quick tests, production use cases need a persistent server.
Start the Server
Launch Pie in server mode:
pie serve
Output:
╭─ Pie Engine (server) ────────────────────────╮
│ Host 127.0.0.1:8080 │
│ Model meta-llama/Llama-3.2-1B-Instruct │
│ Device cuda:0 │
╰──────────────────────────────────────────────╯
✓ Backend started on cuda:0
✓ Engine listening on ws://127.0.0.1:8080
The server is now ready to accept client connections.
Interactive Mode
For development and testing, use interactive mode:
pie serve -i
This gives you a shell to run inferlets directly:
Type 'help' for commands, ↑/↓ for history
pie> run text-completion --prompt "Hello world"
Hello world! How are you today?
pie> help
Available commands:
run <inferlet> [args] - Run an inferlet
list - List running instances
exit - Shutdown and exit
Monitor Mode
For real-time performance monitoring:
pie serve -m
This launches a TUI dashboard showing:
- Active requests
- Throughput (tokens/sec)
- Memory usage
- Batch statistics
Command-Line Options
| Option | Description |
|---|---|
--config, -c | Path to config file |
--host | Override host address |
--port | Override port |
--no-auth | Disable authentication |
--verbose, -v | Enable verbose logging |
--interactive, -i | Interactive shell mode |
--monitor, -m | TUI monitor mode |
Examples:
# Custom port, no auth
pie serve --port 9000 --no-auth
# Verbose logging
pie serve -v
# Custom config file
pie serve -c /path/to/config.toml
Connecting Clients
Once the server is running, connect with a client:
from pie import PieClient
async with PieClient("ws://127.0.0.1:8080") as client:
await client.authenticate("username")
# ... use the client
See Client Basics for more.
Graceful Shutdown
Press Ctrl+C to shut down:
^C
Shutting down...
✓ Shutdown complete
Pie will:
- Stop accepting new connections
- Wait for running inferlets to complete
- Terminate backends
- Clean up resources
Next Steps
- Learn to connect from code
- Explore the CLI Reference