I was doing some experiments with removing top 100-1000 most common English words from my prompts. M...

computomatic • yesterday at 2:56 PM • 4 replies • view on HN

I was doing some experiments with removing top 100-1000 most common English words from my prompts. My hypothesis was that common words are effectively noise to agents. Based on the first few trials I attempted, there was no discernible difference in output. Would love to compare results with caveman.

Caveat: I didn’t do enough testing to find the edge cases (eg, negation).

Replies

computerphage • yesterday at 3:31 PM

Yeah, when I'm writing code I try to avoid zeros and ones, since those are the most common bits, making them essentially noise

ruairidhwm • yesterday at 3:28 PM

I literally just posted a blog on this. Some seemingly insignificant words are actually highly structural to the model. https://www.ruairidh.dev/blog/compressing-prompts-with-an-au...

➕ show 1 reply

AlecSchueler • yesterday at 3:35 PM

Doesn't it just use more tokens in reasoning?

slashdave • today at 12:21 AM

> My hypothesis was that common words are effectively noise to agents

Umm... a few words can be combined in a rather large number of ways.

Punctuation is used a lot. Why not just remove all the periods and commas and see what happens? Probably not pretty

alt Hacker News

Replies