BotsBilling & Limits

Billing & Limits

Aerostack bots support three billing modes that control how LLM costs are charged. You can set the billing mode per bot.


Billing Modes

Wallet (Default)

The wallet mode deducts LLM costs from your prepaid Aerostack balance.

How it works:

  1. Before processing a message, the bot checks your wallet has sufficient balance
  2. The message is processed through the LLM (with or without tool calls)
  3. Cost is calculated based on tokens used
  4. The cost is automatically deducted from your wallet
# Create a bot with wallet billing (default)
curl -X POST https://api.aerostack.dev/api/bots \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Bot",
    "platform": "custom",
    "workspace_id": "...",
    "system_prompt": "...",
    "billing_mode": "wallet"
  }'

BYOK (Bring Your Own Key)

The BYOK mode uses your own LLM API key. Aerostack does not charge for LLM usage — you pay your provider directly at their standard rates.

To use BYOK:

  1. Set billing_mode to byok
  2. Provide your LLM API key via the llm_api_key field (encrypted at rest)
curl -X POST https://api.aerostack.dev/api/bots \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "BYOK Bot",
    "platform": "custom",
    "workspace_id": "...",
    "system_prompt": "...",
    "billing_mode": "byok",
    "llm_provider": "openai",
    "llm_model": "gpt-4o",
    "llm_api_key": "sk-..."
  }'

BYOK mode still tracks token usage and cost estimates in analytics, but no charges are deducted from your wallet.

Plan Quota (Coming Soon)

The plan_quota mode will use your account plan’s included LLM usage quota. This mode is not yet fully implemented.


LLM Pricing

All prices are per 1 million tokens when using wallet mode (Aerostack’s pooled keys).

Anthropic

ModelInput (per 1M tokens)Output (per 1M tokens)
claude-opus-4-6$15.00$75.00
claude-sonnet-4-6$3.00$15.00
claude-haiku-4-5$0.80$4.00

OpenAI

ModelInput (per 1M tokens)Output (per 1M tokens)
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
o1$15.00$60.00

Google

ModelInput (per 1M tokens)Output (per 1M tokens)
gemini-2.5-pro$1.25$5.00
gemini-2.5-flash$0.15$0.60

Groq

ModelInput (per 1M tokens)Output (per 1M tokens)
llama-3.3-70b-versatile$0.59$0.79
mixtral-8x7b-32768$0.24$0.24

Workers AI models are free (no token charges) but do not support tool calling.


Spending Caps

Set a spending cap on any bot to limit total lifetime spend. When the bot’s cumulative cost reaches the cap, it stops responding to messages.

# Set a $5 spending cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": 500 }'

When the cap is reached, the bot returns a message like: “Bot has reached its spending limit.”

To remove a spending cap, set it to null:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": null }'

Cost Visibility

Per-Message Costs

The test endpoint returns detailed cost information:

{
  "response": "...",
  "tokens": { "input": 485, "output": 42 },
  "cost_cents": 1,
  "latency_ms": 2340,
  "token_breakdown": {
    "systemPrompt": 120,
    "toolDefinitions": 200,
    "conversationHistory": 80,
    "currentMessage": 25,
    "toolResults": 40,
    "llmOutput": 42,
    "total": 507
  }
}

Analytics Dashboard

The analytics endpoint provides daily cost rollups:

curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-01&to=2026-03-15" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Returns daily breakdowns of tokens used, costs, and conversation counts:

{
  "daily": [
    {
      "date": "2026-03-14",
      "messages_received": 45,
      "messages_sent": 45,
      "tokens_input": 22500,
      "tokens_output": 4200,
      "total_cost_cents": 12,
      "unique_users": 8
    }
  ],
  "summary": {
    "total_conversations": 120,
    "total_messages": 890,
    "total_tokens": 445000,
    "total_cost_cents": 156
  }
}

Rate Limits

PlatformRequests per Minute
Telegram60
Discord120
WhatsApp60
Slack60
Custom60
Test endpoint10 (per user)

Rate limits are enforced per bot (webhooks) or per user (test endpoint). Exceeding the limit returns a 429 Too Many Requests response.


Conversation Limits

SettingDefaultDescription
conversation_max_messages20Maximum messages in the conversation context window. Older messages are summarized.
conversation_ttl_hours24Hours before a conversation expires. After expiry, a new conversation starts.
max_loop_iterations10Maximum agent loop iterations (tool call rounds) per message.
max_tokens_per_turn8192Maximum output tokens per LLM call.
timeout_ms30000Hard timeout for message processing (max 60000ms).

All of these can be configured per bot:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_max_messages": 30,
    "conversation_ttl_hours": 48,
    "max_loop_iterations": 5,
    "max_tokens_per_turn": 4096,
    "timeout_ms": 45000
  }'

Cost Optimization Tips

  1. Use cheaper models for simple bots. gpt-4o-mini and gemini-2.5-flash are 10-20x cheaper than flagship models while still capable for most tasks.

  2. Set spending caps. Always set a spending cap during development and testing.

  3. Reduce max_loop_iterations. If your bot rarely needs more than 2-3 tool calls, lower this from the default 10 to reduce runaway costs.

  4. Lower conversation_max_messages. Shorter context windows use fewer tokens per message.

  5. Use BYOK for high-volume bots. If you have negotiated rates or free credits with an LLM provider, BYOK lets you pay your provider directly.

  6. Monitor with analytics. Check the analytics endpoint regularly to identify cost spikes.