# Billing & Limits

> Bot billing modes — wallet, BYOK, and plan quota — plus pricing, spending caps, and rate limits.

Aerostack bots support three billing modes that control how LLM costs are charged. You can set the billing mode per bot.

---

## Billing Modes

### Wallet (Default)

The **wallet** mode deducts LLM costs from your prepaid Aerostack balance.

How it works:
1. Before processing a message, the bot checks your wallet has sufficient balance
2. The message is processed through the LLM (with or without tool calls)
3. Cost is calculated based on tokens used
4. The cost is automatically deducted from your wallet

```bash
# Create a bot with wallet billing (default)
curl -X POST https://api.aerostack.dev/api/bots \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Bot",
    "platform": "custom",
    "workspace_id": "...",
    "system_prompt": "...",
    "billing_mode": "wallet"
  }'
```

### BYOK (Bring Your Own Key)

The **BYOK** mode uses your own LLM API key. Aerostack does not charge for LLM usage — you pay your provider directly at their standard rates.

To use BYOK:
1. Set `billing_mode` to `byok`
2. Provide your LLM API key via the `llm_api_key` field (encrypted at rest)

```bash
curl -X POST https://api.aerostack.dev/api/bots \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "BYOK Bot",
    "platform": "custom",
    "workspace_id": "...",
    "system_prompt": "...",
    "billing_mode": "byok",
    "llm_provider": "openai",
    "llm_model": "gpt-4o",
    "llm_api_key": "sk-..."
  }'
```

BYOK mode still tracks token usage and cost estimates in analytics, but no charges are deducted from your wallet.

### Plan Quota (Coming Soon)

The **plan_quota** mode will use your account plan's included LLM usage quota. This mode is not yet fully implemented.

---

## LLM Pricing

All prices are per 1 million tokens when using wallet mode (Aerostack's pooled keys).

### Anthropic

| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|-------|----------------------|------------------------|
| claude-opus-4-6 | $15.00 | $75.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-haiku-4-5 | $0.80 | $4.00 |

### OpenAI

| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|-------|----------------------|------------------------|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| o1 | $15.00 | $60.00 |

### Google

| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|-------|----------------------|------------------------|
| gemini-2.5-pro | $1.25 | $5.00 |
| gemini-2.5-flash | $0.15 | $0.60 |

### Groq

| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|-------|----------------------|------------------------|
| llama-3.3-70b-versatile | $0.59 | $0.79 |
| mixtral-8x7b-32768 | $0.24 | $0.24 |

Workers AI models are free (no token charges) but do not support tool calling.

---

## Spending Caps

Set a **spending cap** on any bot to limit total lifetime spend. When the bot's cumulative cost reaches the cap, it stops responding to messages.

```bash
# Set a $5 spending cap (500 cents)
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": 500 }'
```

When the cap is reached, the bot returns a message like: "Bot has reached its spending limit."

To remove a spending cap, set it to `null`:

```bash
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "spending_cap_cents": null }'
```

---

## Cost Visibility

### Per-Message Costs

The test endpoint returns detailed cost information:

```json
{
  "response": "...",
  "tokens": { "input": 485, "output": 42 },
  "cost_cents": 1,
  "latency_ms": 2340,
  "token_breakdown": {
    "systemPrompt": 120,
    "toolDefinitions": 200,
    "conversationHistory": 80,
    "currentMessage": 25,
    "toolResults": 40,
    "llmOutput": 42,
    "total": 507
  }
}
```

### Analytics Dashboard

The analytics endpoint provides daily cost rollups:

```bash
curl "https://api.aerostack.dev/api/bots/YOUR_BOT_ID/analytics?from=2026-03-01&to=2026-03-15" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"
```

Returns daily breakdowns of tokens used, costs, and conversation counts:

```json
{
  "daily": [
    {
      "date": "2026-03-14",
      "messages_received": 45,
      "messages_sent": 45,
      "tokens_input": 22500,
      "tokens_output": 4200,
      "total_cost_cents": 12,
      "unique_users": 8
    }
  ],
  "summary": {
    "total_conversations": 120,
    "total_messages": 890,
    "total_tokens": 445000,
    "total_cost_cents": 156
  }
}
```

---

## Rate Limits

| Platform | Requests per Minute |
|----------|-------------------|
| Telegram | 60 |
| Discord | 120 |
| WhatsApp | 60 |
| Slack | 60 |
| Custom | 60 |
| Test endpoint | 10 (per user) |

Rate limits are enforced per bot (webhooks) or per user (test endpoint). Exceeding the limit returns a `429 Too Many Requests` response.

---

## Conversation Limits

| Setting | Default | Description |
|---------|---------|-------------|
| `conversation_max_messages` | 20 | Maximum messages in the conversation context window. Older messages are summarized. |
| `conversation_ttl_hours` | 24 | Hours before a conversation expires. After expiry, a new conversation starts. |
| `max_loop_iterations` | 10 | Maximum agent loop iterations (tool call rounds) per message. |
| `max_tokens_per_turn` | 8192 | Maximum output tokens per LLM call. |
| `timeout_ms` | 30000 | Hard timeout for message processing (max 60000ms). |

All of these can be configured per bot:

```bash
curl -X PATCH https://api.aerostack.dev/api/bots/YOUR_BOT_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_max_messages": 30,
    "conversation_ttl_hours": 48,
    "max_loop_iterations": 5,
    "max_tokens_per_turn": 4096,
    "timeout_ms": 45000
  }'
```

---

## Cost Optimization Tips

1. **Use cheaper models for simple bots.** `gpt-4o-mini` and `gemini-2.5-flash` are 10-20x cheaper than flagship models while still capable for most tasks.

2. **Set spending caps.** Always set a spending cap during development and testing.

3. **Reduce `max_loop_iterations`.** If your bot rarely needs more than 2-3 tool calls, lower this from the default 10 to reduce runaway costs.

4. **Lower `conversation_max_messages`.** Shorter context windows use fewer tokens per message.

5. **Use BYOK for high-volume bots.** If you have negotiated rates or free credits with an LLM provider, BYOK lets you pay your provider directly.

6. **Monitor with analytics.** Check the analytics endpoint regularly to identify cost spikes.
