Cost Optimization
Bot costs come from LLM token usage. Every message, tool call, and workflow step that involves an LLM call consumes tokens. The strategies below can reduce costs by 5-20x without meaningfully degrading quality.
Choose the Right Model
Model choice is the single largest cost lever. The cheapest models are 50-100x less expensive than the most expensive ones.
| Model | Input / 1M tokens | Output / 1M tokens | Best For |
|---|---|---|---|
| claude-haiku-4-5 | $0.80 | $4.00 | Classification, simple Q&A, routing |
| gpt-4o-mini | $0.15 | $0.60 | FAQ bots, structured extraction |
| gemini-2.5-flash | $0.15 | $0.60 | High-volume, cost-sensitive |
| llama-3.3-70b (Groq) | $0.59 | $0.79 | Fastest inference, good quality |
| claude-sonnet-4-6 | $3.00 | $15.00 | General purpose, strong reasoning |
| gpt-4o | $2.50 | $10.00 | General purpose, tool calling |
| gemini-2.5-pro | $1.25 | $5.00 | Long context, moderate cost |
| claude-opus-4-6 | $15.00 | $75.00 | Complex reasoning, nuanced tasks |
| o1 | $15.00 | $60.00 | Advanced reasoning, chain-of-thought |
Matching Model to Task
| Bot Type | Recommended Model | Reason |
|---|---|---|
| FAQ / knowledge base | gpt-4o-mini or gemini-2.5-flash | Simple retrieval + formatting |
| Customer support (general) | claude-sonnet-4-6 or gpt-4o | Needs good reasoning for tool selection |
| Workflow classification nodes | claude-haiku-4-5 | Only needs one-word classification |
| Complex multi-step reasoning | claude-opus-4-6 or o1 | Worth the cost for high-stakes decisions |
| High-volume Telegram bot | gpt-4o-mini | Cost per message stays below $0.001 |
In a multi-bot architecture, use cheap models for the reception bot (classification only) and reserve expensive models for specialist bots that need deep reasoning. See Bot Teams.
Set Spending Caps
Every bot should have a spending cap, especially during development and testing. When the cap is reached, the bot stops responding and returns a message like “Bot has reached its spending limit.”
# Set a $5 cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "spending_cap_cents": 500 }'| Environment | Recommended Cap |
|---|---|
| Development / testing | $1-5 |
| Staging | $10-20 |
| Production (low volume) | $50-100 |
| Production (high volume) | Based on projected usage |
To remove a cap: { "spending_cap_cents": null }.
Use BYOK Mode
If you have your own API keys with negotiated rates, free credits, or volume discounts, BYOK (Bring Your Own Key) mode bypasses Aerostack’s pooled keys entirely. You pay your LLM provider directly — Aerostack charges nothing for LLM usage.
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"billing_mode": "byok",
"llm_api_key": "sk-your-own-key"
}'BYOK is best when:
- You have enterprise agreements with LLM providers
- You have free credits (e.g., from startup programs)
- Your volume is high enough to negotiate discounts
- You want to use your own rate limits and quotas
BYOK mode still tracks token usage and cost estimates in analytics, but no charges are deducted from your Aerostack wallet.
Reduce Token Usage
Lower max_loop_iterations
The default is 10 agent loop iterations per message. If your bot rarely needs more than 2-3 tool calls, lower this to prevent runaway costs from complex queries:
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "max_loop_iterations": 3 }'Shorten conversation_max_messages
Longer context windows mean more tokens per message. The default is 20 messages. For simple bots that do not need deep conversation history:
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "conversation_max_messages": 10 }'Lower max_tokens_per_turn
The default is 8192 output tokens. Most bot responses are under 500 tokens. Lowering this prevents unexpectedly long responses:
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "max_tokens_per_turn": 2048 }'Reduce conversation_ttl_hours
Conversations expire after 24 hours by default. Shorter TTLs mean new conversations start sooner (with smaller context windows). For stateless bots:
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "conversation_ttl_hours": 1 }'Optimize Workflows
Workflows let you control exactly when LLM calls happen, which is more cost-efficient than the agent loop for structured tasks.
Use code_block Instead of llm_call for Simple Logic
If you need to parse JSON, calculate a total, or format a string, use a code_block node (free) instead of an llm_call node (costs tokens):
{
"type": "code_block",
"data": {
"code": "const order = ctx.variables.order_data; ctx.variables.is_eligible = order.status === 'delivered' && order.days_since_delivery < 30;"
}
}Minimize LLM Calls in Workflows
Each llm_call node costs tokens. A workflow with 5 LLM calls costs roughly 5x a single-call conversation. Strategies:
- Combine related prompts into a single
llm_call - Use
code_blockfor data transformation - Use
logicnodes for branching instead of asking the LLM to decide
Choose Cheaper Models for Classification
Workflow classification nodes (intent detection, language detection, entity extraction) work well with cheap models. If your bot uses claude-sonnet-4-6, consider creating a separate bot for classification and delegating from it.
Monitor with Analytics
Regular monitoring catches cost problems early.
curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-10&to=2026-03-17" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"Watch for:
- Cost per message increasing — usually means tool results are getting larger or the LLM is making more tool calls
- Token usage spikes — often caused by a tool returning unexpectedly large results
- High loop iterations — the LLM may be struggling to find the right tool
Cost Comparison: Agent Loop vs Workflow
For a typical customer support interaction (classify, look up order, respond):
| Mode | LLM Calls | Estimated Tokens | Cost (Sonnet) | Cost (GPT-4o Mini) |
|---|---|---|---|---|
| Agent Loop | 2-3 (discovery + tool calls) | ~2,000 | ~$0.03 | ~$0.001 |
| Workflow | 2 (classify + respond) | ~1,200 | ~$0.02 | ~$0.0007 |
Workflows are typically 30-50% cheaper because they skip tool discovery overhead and make only the LLM calls you explicitly define.
Billing Mode Reference
| Mode | Who Pays for LLM | Platform Fee | Best For |
|---|---|---|---|
| Wallet | Aerostack (prepaid balance) | Included in token price | Getting started, moderate volume |
| BYOK | You (your own API key) | None | High volume, negotiated rates |
| Plan Quota | Included in plan (coming soon) | Plan subscription | Predictable budgets |
Next Steps
- Billing & Limits — Full pricing tables and rate limits
- Bot Teams — Cost-optimize multi-bot architectures
- Testing & Debugging — Verify optimizations with the test console