logoalt Hacker News

jrjeksjd8dtoday at 12:43 AM5 repliesview on HN

I see you regret Datadog but there's no alternative - did you end up homebrewing metrics, or are you just living with their insane pricing model? In my experience they suck but not enough to leave.


Replies

jpgvmtoday at 4:52 AM

VictoriaMetrics stack. Better, cheaper, faster queries, more k8s native, etc. Easy to run with budget saved from not being on Datadog + attracts smart and observability minded engineers to your team.

stackskiptontoday at 1:35 AM

Not author but Prometheus is perfectly acceptable alternative if you don't want to go whole Otel route.

show 1 reply
lelandbateytoday at 3:08 AM

Currently going through leaving DD at work. Many potential options, many companies trying to break in. The one that calls to me spiritually is: throw it all in Clickhouse (hosted Clickhouse is shockingly cheap) with a hosted HyperDX (logs and metrics UI) instance in front of it. HyperDX has its issues, but it's shocking how cheap it is to toss a couple hundred TB of logs/metrics into Clickhouse per month (compared to the kings ransom DD charges). And you can just query the raw rows, which really comes in handy for understanding some in-the-weeds metrics questions.

velocity3230today at 7:33 AM

LGTM stack?

jamiemallerstoday at 9:06 AM

"No alternative" isn't quite right anymore, though I understand the feeling. The real problem with Datadog isn't the pricing - it's that their per-host model incentivizes you to care about infrastructure topology rather than user-facing behavior. You end up with 10,000 dashboards and still can't answer "is checkout broken right now?"

The open source stack has gotten genuinely viable: Prometheus/VictoriaMetrics for metrics, Grafana for viz, and OpenTelemetry as the collection layer means you're not locked into anyone's agent. The gap used to be in correlation - connecting a metric spike to a trace to a log line - but that's narrowed significantly.

The actual hard part of leaving DD isn't technical, it's organizational. DD becomes load-bearing for on-call runbooks, alert routing, and team muscle memory. Migration is less "swap the backend" and more "retrain your incident response."

If you're evaluating: the question I'd ask isn't "which vendor has the best dashboards" but "can I get from alert to root cause in under 5 minutes with this tool?" That's the metric that actually correlates with MTTR, and it's where most monitoring setups (including expensive ones) fail.