Mantler

Introduction

Mantler is a workbench for composing, deploying, and operating AI inference stacks on hardware you control.

Mantler lets you build stacks (called mantles) — validated combinations of machine, runtime, model, and optional layers — deploy them to your hardware, and expose them as OpenAI-compatible inference endpoints.

What Mantler is

A mantle is a composition. You pick a machine, an inference runtime (Ollama, vLLM, llama.cpp, MLX, etc.), a model, and optional wiring layers. Mantler resolves compatibility in real time and deploys the whole thing to the target machine.

Once deployed, a mantle is accessible as a standard /v1/chat/completions endpoint. Any client that speaks OpenAI — Cursor, Continue, VS Code extensions, custom scripts — works without modification.

How the system is structured

Three components work together:

ComponentWhat it does
Mantler (web app)Visual workbench — build and manage stacks, view machine state, manage API keys
mantlerdDaemon that runs on each worker machine; receives commands, manages runtimes and models, relays inference traffic
mantler-apiOpenAI-compatible gateway — authenticates requests, routes them to the right machine

You interact with the web app. mantlerd and mantler-api run in the background.

Quick orientation

On this page