# AI Streaming — Realtime

> Stream LLM tokens to thousands of WebSocket clients simultaneously. One channel, one connection per client — no SSE connection explosion.

Stream LLM tokens to clients in real time via WebSocket. Unlike SSE (which requires one connection per client), a single Aerostack channel can deliver an AI stream to **thousands of concurrent watchers**.

## How it works

1. Client subscribes to a session-specific channel
2. Client sends an HTTP request to your server to trigger generation
3. Server loops through LLM tokens, emitting each one via `sdk.socket.emit()`
4. All channel subscribers receive tokens as they arrive
5. Server emits `ai:done` to signal completion

```
Client A subscribes to 'ai-chat/session-xyz'
Client B subscribes to 'ai-chat/session-xyz'

POST /ai/generate → triggers server
Server: sdk.socket.emit('ai:token', { token: 'Hello' }, 'ai-chat/session-xyz')
Server: sdk.socket.emit('ai:token', { token: ' world' }, 'ai-chat/session-xyz')
Server: sdk.socket.emit('ai:done', {}, 'ai-chat/session-xyz')

Both Client A and Client B see the streaming response.
```

## Client setup

```tsx

function AiChat() {
  const { realtime } = useAerostack()
  const [messages, setMessages] = useState([])
  const [isStreaming, setIsStreaming] = useState(false)
  const sessionId = useRef(uuid())

  useEffect(() => {
    const channel = realtime.channel(`ai-chat/${sessionId.current}`)

    channel
      .on('ai:token', ({ data }) => {
        // Append each token to the last assistant message
        setMessages(prev => {
          const last = prev[prev.length - 1]
          if (last?.role === 'assistant') {
            return [
              ...prev.slice(0, -1),
              { ...last, content: last.content + data.token }
            ]
          }
          return [...prev, { role: 'assistant', content: data.token }]
        })
      })
      .on('ai:done', () => {
        setIsStreaming(false)
      })
      .subscribe()

    return () => channel.unsubscribe()
  }, [realtime])

  const sendMessage = async (text) => {
    setMessages(prev => [...prev, { role: 'user', content: text }])
    setIsStreaming(true)

    await fetch('/api/ai/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        message: text,
        sessionId: sessionId.current,
      }),
    })
  }

  return (
    <div>
      {messages.map((m, i) => (
        <div key={i} className={m.role === 'user' ? 'text-right' : 'text-left'}>
          {m.content}
          {isStreaming && i === messages.length - 1 && m.role === 'assistant' && (
            <span className="animate-pulse">▊</span>
          )}
        </div>
      ))}
    </div>
  )
}
```

## Server handler

```ts
// In your Worker

app.post('/ai/chat', async (c) => {
  const { message, sessionId } = await c.req.json()
  const roomId = `ai-chat/${sessionId}`

  // Stream from your AI provider, token by token
  const stream = await sdk.ai.streamCompletion({
    prompt: message,
    model: 'gpt-4o-mini',
  })

  for await (const token of stream) {
    sdk.socket.emit('ai:token', { token }, roomId)
  }

  sdk.socket.emit('ai:done', {}, roomId)

  return c.json({ ok: true })
})
```

Use session-specific channel names (e.g., `ai-chat/{uuid}`) to keep conversations private. Multiple users watching the same session ID will all receive the same stream — useful for pair programming or collaborative Q&A.

## Advantages over SSE

| | SSE | Aerostack Realtime |
|--|-----|----------|
| Connections for 1000 watchers | 1000 | 1 (shared channel) |
| Works through proxies/CDN | Sometimes | Yes (WebSocket) |
| Bi-directional | No | Yes |
| Persistent history | Manual | Built-in (`persist: true`) |

See the [AI Streaming Chat example](/examples/ai-streaming-chat) for a complete working implementation.
