I've tried Greptile and it's pretty much pure noise. I ran it for 3 PRs and then gave up. ...

sebra • today at 8:27 AM • 2 replies • view on HN

I've tried Greptile and it's pretty much pure noise. I ran it for 3 PRs and then gave up. Here are three examples of things it wasted my time on in those 3 PRs:

* Suggested to silence exception instead of crash and burn for "style" (the potential exception was handled earlier in code but it did not manage to catch that context). When I commented that silencing the exception could lead to uncaught bugs it replies "You're absolutely right, remove the try-catch" which I of course never added * Us using python 3.14 is a logic error as "python 3.14 does not exist yet" * "Review the async/await patterns Heavy use of async in model validation might indicate these should be application services instead." whatever this vague sentence means. Not sure if it is suggesting us changing the design pattern used in our entire code base.

Also the "confidence" score added to each PR being 4/5 or something due to these irrelevant comments was a really annoying feature IMO. In general AI tools giving a rating when they're wrong feels like a big productivity loss as then the human reviewer will see that number and think something is wrong with the PR.

Before this we were running Coderabbit which worked really well and caught a lot of bugs / implementation gotchas. It also had "learnings" which it referenced frequently so it seems like it actually did not repeat commenting on intentional things in our code base. With Coderabbit I found myself wanting to read the low confidence comments as well since they were often useful (so too quiet instead of too noisy). Unfortunately our entire Coderabbit integration just stopped working one day and since then we've been in a long back and forth with their support.

I'm not sure what the secret sauce is but it feels like Greptile was GPT 3.5-tier and Coderabbit was Sonnet 4.5-tier.

Replies

Dylan-CodeRab • today at 4:10 PM

I am a member of the CodeRabbit tech support team, would you be able to provide me the ticket number you have open with us? I'd be happy to get this escalated internally so we can get this resolved for you ASAP.

➕ show 1 reply

bjackman • today at 10:00 AM

My experience is that basic generic agents are useless but an agent with extensive prompting about your usecase is extremely valuable.

In my case using these prompts:

https://github.com/masoncl/review-prompts

Took things from "pure noise" to a world where, if you say there's a bug in your patch, people's first question will be "has the AI looked at it?"

FWIW in my case the AI has never yet found _the_ bug I was hunting for but it has found several _other_ significant bugs. I also ran it against old commits that were already reviewed by excellent engineers and running in prod. It found a major bug that wasn't spotted in human review.

Most of the "noise" I get now just leads me to say "yeah I need to add more context to the commit message". E.g the model will say "you forgot to do X" when X is out of scope for the patch and I'm doing it in a later one. So ideally the commit messages should mention this anyway.

alt Hacker News

Replies