Introduction

Mantler is a workbench for composing, deploying, and operating AI inference stacks on hardware you control.

Mantler lets you build stacks (called mantles) — validated combinations of machine, runtime, model, and optional layers — deploy them to your hardware, and expose them as OpenAI-compatible inference endpoints.

What Mantler is

A mantle is a composition. You pick a machine, an inference runtime (Ollama, vLLM, llama.cpp, MLX, etc.), a model, and optional wiring layers. Mantler resolves compatibility in real time and deploys the whole thing to the target machine.

Once deployed, a mantle is accessible as a standard /v1/chat/completions endpoint. Any client that speaks OpenAI — Cursor, Continue, VS Code extensions, custom scripts — works without modification.

How the system is structured

Three components work together:

Component	What it does
Mantler (web app)	Visual workbench — build and manage stacks, view machine state, manage API keys
mantlerd	Daemon that runs on each worker machine; receives commands, manages runtimes and models, relays inference traffic
mantler-api	OpenAI-compatible gateway — authenticates requests, routes them to the right machine

You interact with the web app. mantlerd and mantler-api run in the background.

Quick orientation

New to Mantler? Start with Getting started.
Setting up a machine? See Installing mantlerd.
Using the API? See Authentication and Endpoints.
Security questions? See the Security overview.

Introduction

What Mantler is

How the system is structured

Quick orientation

On this page