logoalt Hacker News

khalicyesterday at 1:13 PM3 repliesview on HN

Another example of the mindf@#$ these systems are: I was doing some fine tuning to a small model, take data fields and make a sentence out of it. I was running into mode collapse (basically when the AI simplifies too much and always output the same thing).

I got unstuck by randomizing the field order for each row?!? At training, and now I'm thinking I should do the same at inference time...


Replies

p_stuart82yesterday at 2:30 PM

the irony of modern software engineering: we spent decades perfecting deterministic algorithms, and now we're basically just shaking a black box and hoping the magic rocks align.

show 3 replies
auspivyesterday at 4:16 PM

apparently you can straight up duplicate/add/rearrange layers without changing any of the weights and get better results as well - https://dnhkng.github.io/posts/rys/

show 2 replies
toddmoreyyesterday at 2:14 PM

wow that's fascinating