I randomly clicked and scrolled through the source code of Stavrobot - The largest thing I’ve bui...

danbruc • today at 9:40 AM • 9 replies • view on HN

I randomly clicked and scrolled through the source code of Stavrobot - The largest thing I’ve built lately is an alternative to OpenClaw that focuses on security. [1] and that is not great code. I have not used any AI to write code yet but considered trying it out - is this the kind of code I should expect? Or maybe the other way around, has someone an example of some non-trivial code - in size and complexity - written by an AI - without babysitting - and the code being really good?

[1] https://github.com/skorokithakis/stavrobot

Replies

never_inline • today at 9:50 AM

I would suggest not delegating the LLD (class / interface level design) to the LLM. The clankeren are super bad at it. They treat everything as a disposable script.

Also document some best practices in AGENT.md or whatever it's called in your app.

    * All imports must be added on top of the file, NEVER inside the function.
    * Do not swallow exceptions unless the scenario calls for fault tolerance.
    * All functions need to have type annotations for parameters and return types.

And so on.

I almost always define the class-level design myself. In some sense I use the LLM to fill in the blanks. The design is still mine.

➕ show 1 reply

xenodium • today at 9:45 AM

From my experience, you kinda get what you ask for. If you don't ask for anything specific, it'll write as it sees fit. The more you involve yourself in the loop, the more you can get it to write according to your expectation. Also helps to give it a style guide of sorts that follows your preferred style.

sweaterkokuro • today at 2:31 PM

In my experience its in all Language Models' nature to maximize token generation. They have been natively incentivized to generate more where possible. So if you dont put down your parameters tightly it will let loose. I usually put hard requirements of efficient code (less is more) and it gets close to how I would implement it. But like the previous comments say, it all depends on how deeply you integrate yourself into the loop.

➕ show 1 reply

mbesto • today at 1:33 PM

> and that is not great code

When you say "is not great code" can you elaborate? Does the code work or not?

➕ show 2 replies

FiberBundle • today at 12:31 PM

Pine Town [1], the "whimsical infinite multiplayer canvas of a meadow", also looks like pure slop.

[1] https://pine.town/

➕ show 1 reply

vidarh • today at 12:40 PM

It's the kind of code you should expect if you don't run a harness that includes review and refactoring stages.

It's by no means the best LLMs can do.

input_sh • today at 11:47 AM

You can make it better by investing a lot of time playing around with the tooling so that it produces something more akin to what you're looking for.

Good luck convincing your boss that this ungodly amount of time spent messing around with your tooling for an immeasurable improvement in your delivery is the time well spent as opposed to using that same amount of time delivering results by hand.

➕ show 1 reply

TacticalCoder • today at 11:40 AM

> is this the kind of code I should expect?

Sadly yes. But it "works", for some definition of working. We all know it's going to be a maintenance nightmare seen the gigantic amount of code and projects now being generated ad infinitum. As someone commented in this thread: it can one-shot an app showing restaurant locations on a map and put a green icon if they're open. But don't except good code, secure code, performant code and certainly not "maintainable code".

By definition, unless the AIs can maintain that code, nothing is maintainable anymore: the reason being the sheer volume. Humans who could properly review and maintain code (and that's not many) are already outnumbered.

And as more and more become "prompt engineers" and are convinced that there's no need to learn anything anymore besides becoming a prompt engineer, the amount of generated code is only going to grow exponentially.

So to me it is the kind of code you should expect. It's not perfect. But it more or less works. And thankfully it shouldn't get worse with future models.

What we now need is tools, tools and more tools: to help keep these things on tracks. If we ever to get some peace of mind about the correctness of this unreviewable generate code, we'll need to automate things like theorem provers and code coverage (which are still nowhere to be seen).

And just like all these models are running on Linux and QEMU and Docker (dev container) and heavily using projects like ripgrep (Claude Code insist on having ripgrep installed), I'm pretty sure all these tools these models rely on and shall rely on to produce acceptable results are going to be, very mostly, written by humans.

I don't know how to put it nicely: an app showing green icon next to open restaurants on a map ain't exactly software to help lift off a rocket or to pilot a MRI machine.

BTW: yup, I do have and use Claude Code. Color me both impressed and horrified by the "working" amount un unmaintainable mess it can spout. Everybody who understands something about software maintenance should be horrified.

➕ show 1 reply

dncornholio • today at 9:56 AM

I also managed to find a 1000 line .cpp file in one of the projects. The article's content doesn't match his apps quality. They don't bring any value. His clock looks completely AI generated.

➕ show 2 replies

alt Hacker News

Replies