Skip to content

Cost Optimization

Bot costs come from LLM token usage. Every message, tool call, and workflow step that involves an LLM call consumes tokens. The strategies below can reduce costs by 5-20x without meaningfully degrading quality.


Model choice is the single largest cost lever. The cheapest models are 50-100x less expensive than the most expensive ones.

ModelInput / 1M tokensOutput / 1M tokensBest For
claude-haiku-4-5$0.80$4.00Classification, simple Q&A, routing
gpt-4o-mini$0.15$0.60FAQ bots, structured extraction
gemini-2.5-flash$0.15$0.60High-volume, cost-sensitive
llama-3.3-70b (Groq)$0.59$0.79Fastest inference, good quality
claude-sonnet-4-6$3.00$15.00General purpose, strong reasoning
gpt-4o$2.50$10.00General purpose, tool calling
gemini-2.5-pro$1.25$5.00Long context, moderate cost
claude-opus-4-6$15.00$75.00Complex reasoning, nuanced tasks
o1$15.00$60.00Advanced reasoning, chain-of-thought
Bot TypeRecommended ModelReason
FAQ / knowledge basegpt-4o-mini or gemini-2.5-flashSimple retrieval + formatting
Customer support (general)claude-sonnet-4-6 or gpt-4oNeeds good reasoning for tool selection
Workflow classification nodesclaude-haiku-4-5Only needs one-word classification
Complex multi-step reasoningclaude-opus-4-6 or o1Worth the cost for high-stakes decisions
High-volume Telegram botgpt-4o-miniCost per message stays below $0.001

Every bot should have a spending cap, especially during development and testing. When the cap is reached, the bot stops responding and returns a message like “Bot has reached its spending limit.”

Terminal window
# Set a $5 cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "spending_cap_cents": 500 }'
EnvironmentRecommended Cap
Development / testing$1-5
Staging$10-20
Production (low volume)$50-100
Production (high volume)Based on projected usage

To remove a cap: { "spending_cap_cents": null }.


If you have your own API keys with negotiated rates, free credits, or volume discounts, BYOK (Bring Your Own Key) mode bypasses Aerostack’s pooled keys entirely. You pay your LLM provider directly — Aerostack charges nothing for LLM usage.

Terminal window
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"billing_mode": "byok",
"llm_api_key": "sk-your-own-key"
}'

BYOK is best when:

  • You have enterprise agreements with LLM providers
  • You have free credits (e.g., from startup programs)
  • Your volume is high enough to negotiate discounts
  • You want to use your own rate limits and quotas

The default is 10 agent loop iterations per message. If your bot rarely needs more than 2-3 tool calls, lower this to prevent runaway costs from complex queries:

Terminal window
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "max_loop_iterations": 3 }'

Longer context windows mean more tokens per message. The default is 20 messages. For simple bots that do not need deep conversation history:

Terminal window
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "conversation_max_messages": 10 }'

The default is 8192 output tokens. Most bot responses are under 500 tokens. Lowering this prevents unexpectedly long responses:

Terminal window
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "max_tokens_per_turn": 2048 }'

Conversations expire after 24 hours by default. Shorter TTLs mean new conversations start sooner (with smaller context windows). For stateless bots:

Terminal window
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "conversation_ttl_hours": 1 }'

Workflows let you control exactly when LLM calls happen, which is more cost-efficient than the agent loop for structured tasks.

Use code_block Instead of llm_call for Simple Logic

Section titled “Use code_block Instead of llm_call for Simple Logic”

If you need to parse JSON, calculate a total, or format a string, use a code_block node (free) instead of an llm_call node (costs tokens):

{
"type": "code_block",
"data": {
"code": "const order = ctx.variables.order_data; ctx.variables.is_eligible = order.status === 'delivered' && order.days_since_delivery < 30;"
}
}

Each llm_call node costs tokens. A workflow with 5 LLM calls costs roughly 5x a single-call conversation. Strategies:

  • Combine related prompts into a single llm_call
  • Use code_block for data transformation
  • Use logic nodes for branching instead of asking the LLM to decide

Workflow classification nodes (intent detection, language detection, entity extraction) work well with cheap models. If your bot uses claude-sonnet-4-6, consider creating a separate bot for classification and delegating from it.


Regular monitoring catches cost problems early.

Terminal window
curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-10&to=2026-03-17" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"

Watch for:

  • Cost per message increasing — usually means tool results are getting larger or the LLM is making more tool calls
  • Token usage spikes — often caused by a tool returning unexpectedly large results
  • High loop iterations — the LLM may be struggling to find the right tool

For a typical customer support interaction (classify, look up order, respond):

ModeLLM CallsEstimated TokensCost (Sonnet)Cost (GPT-4o Mini)
Agent Loop2-3 (discovery + tool calls)~2,000~$0.03~$0.001
Workflow2 (classify + respond)~1,200~$0.02~$0.0007

Workflows are typically 30-50% cheaper because they skip tool discovery overhead and make only the LLM calls you explicitly define.


ModeWho Pays for LLMPlatform FeeBest For
WalletAerostack (prepaid balance)Included in token priceGetting started, moderate volume
BYOKYou (your own API key)NoneHigh volume, negotiated rates
Plan QuotaIncluded in plan (coming soon)Plan subscriptionPredictable budgets