logoalt Hacker News

visargatoday at 4:50 PM1 replyview on HN

I did something similar - I put 18 years of comments on reddit, HN, slashdot, and 3 years of LLM chats in the system. I ended with a similar conclusion - it was less useful than I expected. My intent was to do RAG over my corpus, have a LLM get direct access to what I commented over the years, but unfortunately this much information has a negative effect on LLM creativity. Its responses started to fall in line too much with my ideas and it lost its spark. In the end my conclusion was that all that data was facing towards the past while I desire LLMs to improve in the other temporal direction.


Replies

arjietoday at 5:12 PM

I did the same but with GPT embeddings. My primary problem was different though. I wanted to find when I talked about a related subject somewhere. Search works really well.