> The BEAM's "let it crash" philosophy takes the opposite approach. Instead of anticipating every failure mode, you write the happy path and let processes crash. The supervisor detects the crash and restarts the process in a clean state. The rest of the system continues unaffected.
Do I want this? If my request fails because the tool doesn't have a DB connection, I want the model to receive information about that error. If the LLM API returns an error because the conversation is too long, I want to run compacting or other context engineering strategies, I don't want to restart the process just to run into the same thing again. Am I misunderstanding Elixir's advantage here?
You can (and should) always handle whatever errors you actually want to and are able handle, so if that means catching a lot of them and forwarding them to a model, that's not a problem.
The benefit comes mainly from what happens when you encounter unknown errors or errors that you can't handle or errors that would get you into an invalid state. It's normal in BEAM languages to handle the errors you want to/can handle and let the runtime deal with the other transient/unknown errors to restart the process into a known good state.
The big point really is preventing state corruption, so the types of patterns the BEAM encourages will go a long way toward preventing you from accidentally ending up in some kind of unknown zombie state with your model, like for example if your model or control plane think they are connected to each other but actually aren't. That kind of thing.
Happy to clarify more if this sounds strange.
Especially now that those workloads might have something to say about it… e.g. “Why did you make me this way?”
“Let it crash” doesn’t mean keep bashing your head against the wall. Elixir makes it easy to write state machines which reason about different types of failures, but it’s more declarative (this process requires X and Y preconditions, otherwise do Z) rather than imperative (I have to try/catch failures due to X and Y, now do Z). With Elixir you can actually specify that the process doesn’t start until the DB connection is ready, if that was the cause of the failure, it won’t start again (something else can take care of the DB). When the LLM API returns an error you can put the agent in a paused “errored” state and then you can have a different process decide what to do with the error, and pass it back to the main agent when it’s done. This is all really elegant functional code in Elixir compared to try/catches and if statements.