Billing & Limits

Aerostack bots support three billing modes that control how LLM costs are charged. You can set the billing mode per bot.

Billing Modes

Wallet (Default)

The wallet mode deducts LLM costs from your prepaid Aerostack balance.

How it works:

Before processing a message, the bot checks your wallet has sufficient balance
The message is processed through the LLM (with or without tool calls)
Cost is calculated based on tokens used
The cost is automatically deducted from your wallet

# Create a bot with wallet billing (default)
curl -X POST https://api.aerostack.dev/api/bots \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Bot",
    "platform": "custom",
    "workspace_id": "...",
    "system_prompt": "...",
    "billing_mode": "wallet"
  }'

BYOK (Bring Your Own Key)

The BYOK mode uses your own LLM API key. Aerostack does not charge for LLM usage — you pay your provider directly at their standard rates.

To use BYOK:

Set billing_mode to byok
Provide your LLM API key via the llm_api_key field (encrypted at rest)

curl -X POST https://api.aerostack.dev/api/bots \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "BYOK Bot",
    "platform": "custom",
    "workspace_id": "...",
    "system_prompt": "...",
    "billing_mode": "byok",
    "llm_provider": "openai",
    "llm_model": "gpt-4o",
    "llm_api_key": "sk-..."
  }'

BYOK mode still tracks token usage and cost estimates in analytics, but no charges are deducted from your wallet.

Plan Quota (Coming Soon)

The plan_quota mode will use your account plan’s included LLM usage quota. This mode is not yet fully implemented.

LLM Pricing

All prices are per 1 million tokens when using wallet mode (Aerostack’s pooled keys).

Anthropic

Model	Input (per 1M tokens)	Output (per 1M tokens)
claude-opus-4-6	$15.00	$75.00
claude-sonnet-4-6	$3.00	$15.00
claude-haiku-4-5	$0.80	$4.00

OpenAI

Model	Input (per 1M tokens)	Output (per 1M tokens)
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
o1	$15.00	$60.00

Google

Model	Input (per 1M tokens)	Output (per 1M tokens)
gemini-2.5-pro	$1.25	$5.00
gemini-2.5-flash	$0.15	$0.60

Groq

Model	Input (per 1M tokens)	Output (per 1M tokens)
llama-3.3-70b-versatile	$0.59	$0.79
mixtral-8x7b-32768	$0.24	$0.24

Workers AI models are free (no token charges) but do not support tool calling.

Spending Caps

Set a spending cap on any bot to limit total lifetime spend. When the bot’s cumulative cost reaches the cap, it stops responding to messages.

# Set a $5 spending cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": 500 }'

When the cap is reached, the bot returns a message like: “Bot has reached its spending limit.”

To remove a spending cap, set it to null:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": null }'

Cost Visibility

Per-Message Costs

The test endpoint returns detailed cost information:

{
  "response": "...",
  "tokens": { "input": 485, "output": 42 },
  "cost_cents": 1,
  "latency_ms": 2340,
  "token_breakdown": {
    "systemPrompt": 120,
    "toolDefinitions": 200,
    "conversationHistory": 80,
    "currentMessage": 25,
    "toolResults": 40,
    "llmOutput": 42,
    "total": 507
  }
}

Analytics Dashboard

The analytics endpoint provides daily cost rollups:

curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-01&to=2026-03-15" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Returns daily breakdowns of tokens used, costs, and conversation counts:

{
  "daily": [
    {
      "date": "2026-03-14",
      "messages_received": 45,
      "messages_sent": 45,
      "tokens_input": 22500,
      "tokens_output": 4200,
      "total_cost_cents": 12,
      "unique_users": 8
    }
  ],
  "summary": {
    "total_conversations": 120,
    "total_messages": 890,
    "total_tokens": 445000,
    "total_cost_cents": 156
  }
}

Rate Limits

Platform	Requests per Minute
Telegram	60
Discord	120
WhatsApp	60
Slack	60
Custom	60
Test endpoint	10 (per user)

Rate limits are enforced per bot (webhooks) or per user (test endpoint). Exceeding the limit returns a 429 Too Many Requests response.

Conversation Limits

Setting	Default	Description
`conversation_max_messages`	20	Maximum messages in the conversation context window. Older messages are summarized.
`conversation_ttl_hours`	24	Hours before a conversation expires. After expiry, a new conversation starts.
`max_loop_iterations`	10	Maximum agent loop iterations (tool call rounds) per message.
`max_tokens_per_turn`	8192	Maximum output tokens per LLM call.
`timeout_ms`	30000	Hard timeout for message processing (max 60000ms).

All of these can be configured per bot:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_max_messages": 30,
    "conversation_ttl_hours": 48,
    "max_loop_iterations": 5,
    "max_tokens_per_turn": 4096,
    "timeout_ms": 45000
  }'

Cost Optimization Tips

Use cheaper models for simple bots. gpt-4o-mini and gemini-2.5-flash are 10-20x cheaper than flagship models while still capable for most tasks.
Set spending caps. Always set a spending cap during development and testing.
Reduce max_loop_iterations. If your bot rarely needs more than 2-3 tool calls, lower this from the default 10 to reduce runaway costs.
Lower conversation_max_messages. Shorter context windows use fewer tokens per message.
Use BYOK for high-volume bots. If you have negotiated rates or free credits with an LLM provider, BYOK lets you pay your provider directly.
Monitor with analytics. Check the analytics endpoint regularly to identify cost spikes.

Cost Optimization Templates