# Cost Optimization

> Choose the right LLM model, set spending caps, use BYOK mode, and optimize token usage for Aerostack bots.

Bot costs come from LLM token usage. Every message, tool call, and workflow step that involves an LLM call consumes tokens. The strategies below can reduce costs by 5-20x without meaningfully degrading quality.

---

## Choose the Right Model

Model choice is the single largest cost lever. The cheapest models are 50-100x less expensive than the most expensive ones.

| Model | Input / 1M tokens | Output / 1M tokens | Best For |
|-------|-------------------|---------------------|----------|
| **claude-haiku-4-5** | $0.80 | $4.00 | Classification, simple Q&A, routing |
| **gpt-4o-mini** | $0.15 | $0.60 | FAQ bots, structured extraction |
| **gemini-2.5-flash** | $0.15 | $0.60 | High-volume, cost-sensitive |
| **llama-3.3-70b** (Groq) | $0.59 | $0.79 | Fastest inference, good quality |
| **claude-sonnet-4-6** | $3.00 | $15.00 | General purpose, strong reasoning |
| **gpt-4o** | $2.50 | $10.00 | General purpose, tool calling |
| **gemini-2.5-pro** | $1.25 | $5.00 | Long context, moderate cost |
| **claude-opus-4-6** | $15.00 | $75.00 | Complex reasoning, nuanced tasks |
| **o1** | $15.00 | $60.00 | Advanced reasoning, chain-of-thought |

### Matching Model to Task

| Bot Type | Recommended Model | Reason |
|----------|------------------|--------|
| FAQ / knowledge base | gpt-4o-mini or gemini-2.5-flash | Simple retrieval + formatting |
| Customer support (general) | claude-sonnet-4-6 or gpt-4o | Needs good reasoning for tool selection |
| Workflow classification nodes | claude-haiku-4-5 | Only needs one-word classification |
| Complex multi-step reasoning | claude-opus-4-6 or o1 | Worth the cost for high-stakes decisions |
| High-volume Telegram bot | gpt-4o-mini | Cost per message stays below $0.001 |

In a multi-bot architecture, use cheap models for the reception bot (classification only) and reserve expensive models for specialist bots that need deep reasoning. See [Bot Teams](/bots/bot-teams).

---

## Set Spending Caps

Every bot should have a spending cap, especially during development and testing. When the cap is reached, the bot stops responding and returns a message like "Bot has reached its spending limit."

```bash
# Set a $5 cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": 500 }'
```

| Environment | Recommended Cap |
|-------------|----------------|
| Development / testing | $1-5 |
| Staging | $10-20 |
| Production (low volume) | $50-100 |
| Production (high volume) | Based on projected usage |

To remove a cap: `{ "spending_cap_cents": null }`.

---

## Use BYOK Mode

If you have your own API keys with negotiated rates, free credits, or volume discounts, BYOK (Bring Your Own Key) mode bypasses Aerostack's pooled keys entirely. You pay your LLM provider directly — Aerostack charges nothing for LLM usage.

```bash
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "billing_mode": "byok",
    "llm_api_key": "sk-your-own-key"
  }'
```

BYOK is best when:
- You have enterprise agreements with LLM providers
- You have free credits (e.g., from startup programs)
- Your volume is high enough to negotiate discounts
- You want to use your own rate limits and quotas

BYOK mode still tracks token usage and cost estimates in analytics, but no charges are deducted from your Aerostack wallet.

---

## Reduce Token Usage

### Lower max_loop_iterations

The default is 10 agent loop iterations per message. If your bot rarely needs more than 2-3 tool calls, lower this to prevent runaway costs from complex queries:

```bash
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "max_loop_iterations": 3 }'
```

### Shorten conversation_max_messages

Longer context windows mean more tokens per message. The default is 20 messages. For simple bots that do not need deep conversation history:

```bash
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "conversation_max_messages": 10 }'
```

### Lower max_tokens_per_turn

The default is 8192 output tokens. Most bot responses are under 500 tokens. Lowering this prevents unexpectedly long responses:

```bash
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "max_tokens_per_turn": 2048 }'
```

### Reduce conversation_ttl_hours

Conversations expire after 24 hours by default. Shorter TTLs mean new conversations start sooner (with smaller context windows). For stateless bots:

```bash
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "conversation_ttl_hours": 1 }'
```

---

## Optimize Workflows

Workflows let you control exactly when LLM calls happen, which is more cost-efficient than the agent loop for structured tasks.

### Use code_block Instead of llm_call for Simple Logic

If you need to parse JSON, calculate a total, or format a string, use a `code_block` node (free) instead of an `llm_call` node (costs tokens):

```json
{
  "type": "code_block",
  "data": {
    "code": "const order = ctx.variables.order_data; ctx.variables.is_eligible = order.status === 'delivered' && order.days_since_delivery < 30;"
  }
}
```

### Minimize LLM Calls in Workflows

Each `llm_call` node costs tokens. A workflow with 5 LLM calls costs roughly 5x a single-call conversation. Strategies:
- Combine related prompts into a single `llm_call`
- Use `code_block` for data transformation
- Use `logic` nodes for branching instead of asking the LLM to decide

### Choose Cheaper Models for Classification

Workflow classification nodes (intent detection, language detection, entity extraction) work well with cheap models. If your bot uses claude-sonnet-4-6, consider creating a separate bot for classification and delegating from it.

---

## Monitor with Analytics

Regular monitoring catches cost problems early.

```bash
curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-10&to=2026-03-17" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"
```

Watch for:
- **Cost per message** increasing — usually means tool results are getting larger or the LLM is making more tool calls
- **Token usage spikes** — often caused by a tool returning unexpectedly large results
- **High loop iterations** — the LLM may be struggling to find the right tool

---

## Cost Comparison: Agent Loop vs Workflow

For a typical customer support interaction (classify, look up order, respond):

| Mode | LLM Calls | Estimated Tokens | Cost (Sonnet) | Cost (GPT-4o Mini) |
|------|-----------|-----------------|---------------|-------------------|
| Agent Loop | 2-3 (discovery + tool calls) | ~2,000 | ~$0.03 | ~$0.001 |
| Workflow | 2 (classify + respond) | ~1,200 | ~$0.02 | ~$0.0007 |

Workflows are typically 30-50% cheaper because they skip tool discovery overhead and make only the LLM calls you explicitly define.

---

## Billing Mode Reference

| Mode | Who Pays for LLM | Platform Fee | Best For |
|------|------------------|-------------|----------|
| **Wallet** | Aerostack (prepaid balance) | Included in token price | Getting started, moderate volume |
| **BYOK** | You (your own API key) | None | High volume, negotiated rates |
| **Plan Quota** | Included in plan (coming soon) | Plan subscription | Predictable budgets |

---

## Next Steps

- [Billing & Limits](/bots/billing) — Full pricing tables and rate limits
- [Bot Teams](/bots/bot-teams) — Cost-optimize multi-bot architectures
- [Testing & Debugging](/bots/testing) — Verify optimizations with the test console
