BotsCost Optimization

Cost Optimization

Bot costs come from LLM token usage. Every message, tool call, and workflow step that involves an LLM call consumes tokens. The strategies below can reduce costs by 5-20x without meaningfully degrading quality.


Choose the Right Model

Model choice is the single largest cost lever. The cheapest models are 50-100x less expensive than the most expensive ones.

ModelInput / 1M tokensOutput / 1M tokensBest For
claude-haiku-4-5$0.80$4.00Classification, simple Q&A, routing
gpt-4o-mini$0.15$0.60FAQ bots, structured extraction
gemini-2.5-flash$0.15$0.60High-volume, cost-sensitive
llama-3.3-70b (Groq)$0.59$0.79Fastest inference, good quality
claude-sonnet-4-6$3.00$15.00General purpose, strong reasoning
gpt-4o$2.50$10.00General purpose, tool calling
gemini-2.5-pro$1.25$5.00Long context, moderate cost
claude-opus-4-6$15.00$75.00Complex reasoning, nuanced tasks
o1$15.00$60.00Advanced reasoning, chain-of-thought

Matching Model to Task

Bot TypeRecommended ModelReason
FAQ / knowledge basegpt-4o-mini or gemini-2.5-flashSimple retrieval + formatting
Customer support (general)claude-sonnet-4-6 or gpt-4oNeeds good reasoning for tool selection
Workflow classification nodesclaude-haiku-4-5Only needs one-word classification
Complex multi-step reasoningclaude-opus-4-6 or o1Worth the cost for high-stakes decisions
High-volume Telegram botgpt-4o-miniCost per message stays below $0.001

In a multi-bot architecture, use cheap models for the reception bot (classification only) and reserve expensive models for specialist bots that need deep reasoning. See Bot Teams.


Set Spending Caps

Every bot should have a spending cap, especially during development and testing. When the cap is reached, the bot stops responding and returns a message like “Bot has reached its spending limit.”

# Set a $5 cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": 500 }'
EnvironmentRecommended Cap
Development / testing$1-5
Staging$10-20
Production (low volume)$50-100
Production (high volume)Based on projected usage

To remove a cap: { "spending_cap_cents": null }.


Use BYOK Mode

If you have your own API keys with negotiated rates, free credits, or volume discounts, BYOK (Bring Your Own Key) mode bypasses Aerostack’s pooled keys entirely. You pay your LLM provider directly — Aerostack charges nothing for LLM usage.

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "billing_mode": "byok",
    "llm_api_key": "sk-your-own-key"
  }'

BYOK is best when:

  • You have enterprise agreements with LLM providers
  • You have free credits (e.g., from startup programs)
  • Your volume is high enough to negotiate discounts
  • You want to use your own rate limits and quotas

BYOK mode still tracks token usage and cost estimates in analytics, but no charges are deducted from your Aerostack wallet.


Reduce Token Usage

Lower max_loop_iterations

The default is 10 agent loop iterations per message. If your bot rarely needs more than 2-3 tool calls, lower this to prevent runaway costs from complex queries:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "max_loop_iterations": 3 }'

Shorten conversation_max_messages

Longer context windows mean more tokens per message. The default is 20 messages. For simple bots that do not need deep conversation history:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "conversation_max_messages": 10 }'

Lower max_tokens_per_turn

The default is 8192 output tokens. Most bot responses are under 500 tokens. Lowering this prevents unexpectedly long responses:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "max_tokens_per_turn": 2048 }'

Reduce conversation_ttl_hours

Conversations expire after 24 hours by default. Shorter TTLs mean new conversations start sooner (with smaller context windows). For stateless bots:

curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "conversation_ttl_hours": 1 }'

Optimize Workflows

Workflows let you control exactly when LLM calls happen, which is more cost-efficient than the agent loop for structured tasks.

Use code_block Instead of llm_call for Simple Logic

If you need to parse JSON, calculate a total, or format a string, use a code_block node (free) instead of an llm_call node (costs tokens):

{
  "type": "code_block",
  "data": {
    "code": "const order = ctx.variables.order_data; ctx.variables.is_eligible = order.status === 'delivered' && order.days_since_delivery < 30;"
  }
}

Minimize LLM Calls in Workflows

Each llm_call node costs tokens. A workflow with 5 LLM calls costs roughly 5x a single-call conversation. Strategies:

  • Combine related prompts into a single llm_call
  • Use code_block for data transformation
  • Use logic nodes for branching instead of asking the LLM to decide

Choose Cheaper Models for Classification

Workflow classification nodes (intent detection, language detection, entity extraction) work well with cheap models. If your bot uses claude-sonnet-4-6, consider creating a separate bot for classification and delegating from it.


Monitor with Analytics

Regular monitoring catches cost problems early.

curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-10&to=2026-03-17" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Watch for:

  • Cost per message increasing — usually means tool results are getting larger or the LLM is making more tool calls
  • Token usage spikes — often caused by a tool returning unexpectedly large results
  • High loop iterations — the LLM may be struggling to find the right tool

Cost Comparison: Agent Loop vs Workflow

For a typical customer support interaction (classify, look up order, respond):

ModeLLM CallsEstimated TokensCost (Sonnet)Cost (GPT-4o Mini)
Agent Loop2-3 (discovery + tool calls)~2,000~$0.03~$0.001
Workflow2 (classify + respond)~1,200~$0.02~$0.0007

Workflows are typically 30-50% cheaper because they skip tool discovery overhead and make only the LLM calls you explicitly define.


Billing Mode Reference

ModeWho Pays for LLMPlatform FeeBest For
WalletAerostack (prepaid balance)Included in token priceGetting started, moderate volume
BYOKYou (your own API key)NoneHigh volume, negotiated rates
Plan QuotaIncluded in plan (coming soon)Plan subscriptionPredictable budgets

Next Steps