> If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.
Where are the “millions” on infra going? It’s a handful of services and a Postgres?
> Their sales team is also absolutely appalling and desperate.
You said “on-prem”. It’s open source; why are you dealing with their sales team?
> If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day…
If “millions” were required to obtain such tiny scale, I’d agree there’d be a massive problem. No one would use Temporal; it would be a complete waste of resource. If this were true.
Not a couple hundred in one day, a couple hundred being started, concurrently, every second in a day. Each with ~200 events.
We need a 12 node cassandra cluster for this, with 64cpu nodes. So no, it's not a couple of services and a postgres.
Sales team, as we are an enterprise, and they want to extract money from us.
The same with any "open-source" enterprise ($$$) software. It sucks to run yourself. Docs on running/errors are non-existent. Their helm charts are broken. Instead of degraded performance, it just fails.
We also hit scaling problems with temporal.
Postgres doesn't scale at all four our workload, so you're into cassandra.
For a medium sized deployment, you're looking at 200+ vcpus, and then lets say standard dev/uat/prod. So now you're at 600 cpus. Now you need two geographic regions, dev can stay in one place, so now you're at 800. Want a failover cluster for prod? Have another 200 cpus.
and 200 CPUs is a medium deployment, assuming something like 36 cpus per cassandra node, then say 4-8 per instance of matching, worker, history, frontend. Then all your other components around it, ingress controller, service mesh, etc.
There's a million a year easy, for a small deployment.
Our prod one is 4x this size.