logoalt Hacker News

bryanrasmussenyesterday at 3:00 AM2 repliesview on HN

maybe there should be an LLM trained on a corpus of a deletions and cleanup of code.


Replies

krackersyesterday at 4:26 AM

I'm guessing there's a very strong prior to "just keep generating more tokens" as opposed to deleting code that needs to be overcome. Maybe this is done already but since every git project comes with its own history, you could take a notable open-source project (like LLVM) and then do RL training against against each individual patch committed.

show 2 replies
ashdksnndckyesterday at 9:10 AM

I think this is in the training data since they use commit data from repos, but I imagine code deletions are rarer than they should be in the real data as well.

show 1 reply