> Stop reading here if you just wanted the how-to. Because I’m going to talk about what I think is better, and that is probably too ‘commercial’ for some folks.
> I work for Ably, and I’m building a dedicated transport for AI applications that...
I built this Clojure lib for robust high scale LLM calls wherein the consumer is usually a http request waiting on an SSE stream. https://github.com/jhancock/aimee
The article states: "Most applications are built on an architecture like the one above, where there are a number of stateless horizontally scaleable server replicas that can handle client requests."
Using the library I built, I have yet to worry about this as Clojure core.async, http libs and Java VM are so rock solid, I don't have a fragile set of stateless servers. Sure, at some point there are rare edge cases but it's nice to get very far along without worrying about them.
> HTTP is just not a good transport for streaming LLM tokens and for building async agentic applications
I don't know if I agree if this is a problem with SSE or HTTP. Something like a Redis Streams-backed SSE would solve most of the 'challenges' presented in the post.
In JS land, this problem (streaming, resuming, recovering, multi-client, etc) has been fully solved by https://durablestreams.com - and it can be self-hosted, or managed via Cloudflare DO.