Usage & limits

Usage logging

Every request through api.mantler.ai is logged with:

Timestamp
API key identifier (not the key itself)
Model and mantle ID
Token counts (prompt, completion, total)
Latency

Usage is visible on the Usage page in the Mantler web app, broken down by key and model.

Rate limits

Rate limits are applied per API key. Default limits are set at the org level.

When creating or editing a key, you can set lower per-key limits:

Requests per minute (RPM)
Tokens per minute (TPM)

When a limit is exceeded, the API returns 429 Too Many Requests. The response includes a Retry-After header indicating how many seconds to wait.

Token counting

Token counts in responses are approximate and depend on the runtime. For Ollama and llama.cpp backed models, the values reported in usage.prompt_tokens and usage.completion_tokens are the counts returned by the runtime.

Coming soon

Per-key spend limits and budget alerts are planned. Check the changelog for updates.

Usage & limits

Usage logging

Rate limits

Token counting

Coming soon

On this page