Billing & Limits
Aerostack bots support three billing modes that control how LLM costs are charged. You can set the billing mode per bot.
Billing Modes
Wallet (Default)
The wallet mode deducts LLM costs from your prepaid Aerostack balance.
How it works:
- Before processing a message, the bot checks your wallet has sufficient balance
- The message is processed through the LLM (with or without tool calls)
- Cost is calculated based on tokens used
- The cost is automatically deducted from your wallet
# Create a bot with wallet billing (default)
curl -X POST https://api.aerostack.dev/api/bots \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "My Bot",
"platform": "custom",
"workspace_id": "...",
"system_prompt": "...",
"billing_mode": "wallet"
}'BYOK (Bring Your Own Key)
The BYOK mode uses your own LLM API key. Aerostack does not charge for LLM usage — you pay your provider directly at their standard rates.
To use BYOK:
- Set
billing_modetobyok - Provide your LLM API key via the
llm_api_keyfield (encrypted at rest)
curl -X POST https://api.aerostack.dev/api/bots \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "BYOK Bot",
"platform": "custom",
"workspace_id": "...",
"system_prompt": "...",
"billing_mode": "byok",
"llm_provider": "openai",
"llm_model": "gpt-4o",
"llm_api_key": "sk-..."
}'BYOK mode still tracks token usage and cost estimates in analytics, but no charges are deducted from your wallet.
Plan Quota (Coming Soon)
The plan_quota mode will use your account plan’s included LLM usage quota. This mode is not yet fully implemented.
LLM Pricing
All prices are per 1 million tokens when using wallet mode (Aerostack’s pooled keys).
Anthropic
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| claude-opus-4-6 | $15.00 | $75.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-haiku-4-5 | $0.80 | $4.00 |
OpenAI
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| o1 | $15.00 | $60.00 |
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| gemini-2.5-pro | $1.25 | $5.00 |
| gemini-2.5-flash | $0.15 | $0.60 |
Groq
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| llama-3.3-70b-versatile | $0.59 | $0.79 |
| mixtral-8x7b-32768 | $0.24 | $0.24 |
Workers AI models are free (no token charges) but do not support tool calling.
Spending Caps
Set a spending cap on any bot to limit total lifetime spend. When the bot’s cumulative cost reaches the cap, it stops responding to messages.
# Set a $5 spending cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "spending_cap_cents": 500 }'When the cap is reached, the bot returns a message like: “Bot has reached its spending limit.”
To remove a spending cap, set it to null:
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "spending_cap_cents": null }'Cost Visibility
Per-Message Costs
The test endpoint returns detailed cost information:
{
"response": "...",
"tokens": { "input": 485, "output": 42 },
"cost_cents": 1,
"latency_ms": 2340,
"token_breakdown": {
"systemPrompt": 120,
"toolDefinitions": 200,
"conversationHistory": 80,
"currentMessage": 25,
"toolResults": 40,
"llmOutput": 42,
"total": 507
}
}Analytics Dashboard
The analytics endpoint provides daily cost rollups:
curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-01&to=2026-03-15" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"Returns daily breakdowns of tokens used, costs, and conversation counts:
{
"daily": [
{
"date": "2026-03-14",
"messages_received": 45,
"messages_sent": 45,
"tokens_input": 22500,
"tokens_output": 4200,
"total_cost_cents": 12,
"unique_users": 8
}
],
"summary": {
"total_conversations": 120,
"total_messages": 890,
"total_tokens": 445000,
"total_cost_cents": 156
}
}Rate Limits
| Platform | Requests per Minute |
|---|---|
| Telegram | 60 |
| Discord | 120 |
| 60 | |
| Slack | 60 |
| Custom | 60 |
| Test endpoint | 10 (per user) |
Rate limits are enforced per bot (webhooks) or per user (test endpoint). Exceeding the limit returns a 429 Too Many Requests response.
Conversation Limits
| Setting | Default | Description |
|---|---|---|
conversation_max_messages | 20 | Maximum messages in the conversation context window. Older messages are summarized. |
conversation_ttl_hours | 24 | Hours before a conversation expires. After expiry, a new conversation starts. |
max_loop_iterations | 10 | Maximum agent loop iterations (tool call rounds) per message. |
max_tokens_per_turn | 8192 | Maximum output tokens per LLM call. |
timeout_ms | 30000 | Hard timeout for message processing (max 60000ms). |
All of these can be configured per bot:
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"conversation_max_messages": 30,
"conversation_ttl_hours": 48,
"max_loop_iterations": 5,
"max_tokens_per_turn": 4096,
"timeout_ms": 45000
}'Cost Optimization Tips
-
Use cheaper models for simple bots.
gpt-4o-miniandgemini-2.5-flashare 10-20x cheaper than flagship models while still capable for most tasks. -
Set spending caps. Always set a spending cap during development and testing.
-
Reduce
max_loop_iterations. If your bot rarely needs more than 2-3 tool calls, lower this from the default 10 to reduce runaway costs. -
Lower
conversation_max_messages. Shorter context windows use fewer tokens per message. -
Use BYOK for high-volume bots. If you have negotiated rates or free credits with an LLM provider, BYOK lets you pay your provider directly.
-
Monitor with analytics. Check the analytics endpoint regularly to identify cost spikes.