33.2 Streaming UX patterns that feel instant

Overview and links for this section of the guide.

The Illusion of Speed

You can't make the model think faster, but you can make the user feel less bored. This is purely a UX challenge.

Streaming means processing chunks of the response as they arrive, rather than waiting for the whole response to finish. A response might take 10 seconds total, but the first word appears in 0.5 seconds.

Streaming Patterns

  • Typewriter Effect: Show text as it arrives. Crucial for chat interfaces.
  • Skeleton Loaders: If you are generating structured JSON (which is hard to stream directly into a UI), show a "thinking..." skeleton state, but stream the raw "thought process" into a collapsed detail view so the user sees activity.
  • Speculative UI: If the user asks for a chart, render an empty chart frame immediately, then fill in the data points as the JSON stream completes.
Don't buffer too much

Some HTTP libraries buffer 4KB of data before flushing. For an LLM, 4KB is huge—it might be the entire response. Ensure your backend flushes the stream after every token or line.

Where to go next