Cost Optimization

Bot costs come from LLM token usage. Every message, tool call, and workflow step that involves an LLM call consumes tokens. The strategies below can reduce costs by 5-20x without meaningfully degrading quality.

Choose the Right Model

Model choice is the single largest cost lever. The cheapest models are 50-100x less expensive than the most expensive ones.

Model	Input / 1M tokens	Output / 1M tokens	Best For
claude-haiku-4-5	$0.80	$4.00	Classification, simple Q&A, routing
gpt-4o-mini	$0.15	$0.60	FAQ bots, structured extraction
gemini-2.5-flash	$0.15	$0.60	High-volume, cost-sensitive
llama-3.3-70b (Groq)	$0.59	$0.79	Fastest inference, good quality
claude-sonnet-4-6	$3.00	$15.00	General purpose, strong reasoning
gpt-4o	$2.50	$10.00	General purpose, tool calling
gemini-2.5-pro	$1.25	$5.00	Long context, moderate cost
claude-opus-4-6	$15.00	$75.00	Complex reasoning, nuanced tasks
o1	$15.00	$60.00	Advanced reasoning, chain-of-thought

Matching Model to Task

Bot Type	Recommended Model	Reason
FAQ / knowledge base	gpt-4o-mini or gemini-2.5-flash	Simple retrieval + formatting
Customer support (general)	claude-sonnet-4-6 or gpt-4o	Needs good reasoning for tool selection
Workflow classification nodes	claude-haiku-4-5	Only needs one-word classification
Complex multi-step reasoning	claude-opus-4-6 or o1	Worth the cost for high-stakes decisions
High-volume Telegram bot	gpt-4o-mini	Cost per message stays below $0.001

In a multi-bot architecture, use cheap models for the reception bot (classification only) and reserve expensive models for specialist bots that need deep reasoning. See Bot Teams.

Set Spending Caps

Every bot should have a spending cap, especially during development and testing. When the cap is reached, the bot stops responding and returns a message like “Bot has reached its spending limit.”

# Set a $5 cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": 500 }'

Environment	Recommended Cap
Development / testing	$1-5
Staging	$10-20
Production (low volume)	$50-100
Production (high volume)	Based on projected usage

To remove a cap: { "spending_cap_cents": null }.

Use BYOK Mode

If you have your own API keys with negotiated rates, free credits, or volume discounts, BYOK (Bring Your Own Key) mode bypasses Aerostack’s pooled keys entirely. You pay your LLM provider directly — Aerostack charges nothing for LLM usage.

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "billing_mode": "byok",
    "llm_api_key": "sk-your-own-key"
  }'

BYOK is best when:

You have enterprise agreements with LLM providers
You have free credits (e.g., from startup programs)
Your volume is high enough to negotiate discounts
You want to use your own rate limits and quotas

BYOK mode still tracks token usage and cost estimates in analytics, but no charges are deducted from your Aerostack wallet.

Reduce Token Usage

Lower max_loop_iterations

The default is 10 agent loop iterations per message. If your bot rarely needs more than 2-3 tool calls, lower this to prevent runaway costs from complex queries:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "max_loop_iterations": 3 }'

Shorten conversation_max_messages

Longer context windows mean more tokens per message. The default is 20 messages. For simple bots that do not need deep conversation history:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "conversation_max_messages": 10 }'

Lower max_tokens_per_turn

The default is 8192 output tokens. Most bot responses are under 500 tokens. Lowering this prevents unexpectedly long responses:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "max_tokens_per_turn": 2048 }'

Reduce conversation_ttl_hours

Conversations expire after 24 hours by default. Shorter TTLs mean new conversations start sooner (with smaller context windows). For stateless bots:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "conversation_ttl_hours": 1 }'

Optimize Workflows

Workflows let you control exactly when LLM calls happen, which is more cost-efficient than the agent loop for structured tasks.

Use code_block Instead of llm_call for Simple Logic

If you need to parse JSON, calculate a total, or format a string, use a code_block node (free) instead of an llm_call node (costs tokens):

{
  "type": "code_block",
  "data": {
    "code": "const order = ctx.variables.order_data; ctx.variables.is_eligible = order.status === 'delivered' && order.days_since_delivery < 30;"
  }
}

Minimize LLM Calls in Workflows

Each llm_call node costs tokens. A workflow with 5 LLM calls costs roughly 5x a single-call conversation. Strategies:

Combine related prompts into a single llm_call
Use code_block for data transformation
Use logic nodes for branching instead of asking the LLM to decide

Choose Cheaper Models for Classification

Workflow classification nodes (intent detection, language detection, entity extraction) work well with cheap models. If your bot uses claude-sonnet-4-6, consider creating a separate bot for classification and delegating from it.

Monitor with Analytics

Regular monitoring catches cost problems early.

curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-10&to=2026-03-17" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Watch for:

Cost per message increasing — usually means tool results are getting larger or the LLM is making more tool calls
Token usage spikes — often caused by a tool returning unexpectedly large results
High loop iterations — the LLM may be struggling to find the right tool

Cost Comparison: Agent Loop vs Workflow

For a typical customer support interaction (classify, look up order, respond):

Mode	LLM Calls	Estimated Tokens	Cost (Sonnet)	Cost (GPT-4o Mini)
Agent Loop	2-3 (discovery + tool calls)	~2,000	~$0.03	~$0.001
Workflow	2 (classify + respond)	~1,200	~$0.02	~$0.0007

Workflows are typically 30-50% cheaper because they skip tool discovery overhead and make only the LLM calls you explicitly define.

Billing Mode Reference

Mode	Who Pays for LLM	Platform Fee	Best For
Wallet	Aerostack (prepaid balance)	Included in token price	Getting started, moderate volume
BYOK	You (your own API key)	None	High volume, negotiated rates
Plan Quota	Included in plan (coming soon)	Plan subscription	Predictable budgets

Next Steps

Billing & Limits — Full pricing tables and rate limits
Bot Teams — Cost-optimize multi-bot architectures
Testing & Debugging — Verify optimizations with the test console

Testing & Debugging Billing & Limits