logoalt Hacker News

exiguustoday at 8:40 AM1 replyview on HN

YaCY has a proxy mode that automatically index your web-serving. In my experience, the index grow in size very fast and reaches ~100GB or more. How does the index size of Hister compare to that?


Replies

asciimootoday at 10:14 AM

Hister stores only the text content of HTML/pdf pages. 1000 documents require around 80-100MB of storage and there is still plenty of room to optimize for storage space.

I'm using it for 6-7 months and my index size is below 1GB with almost 10k pages.

Also, a downside of the proxy approach: it does not handle properly JS based websites and cannot identify dynamic content changes. Our extension periodically checks if the browser tabs' content has been changed and automatically updates the index when change detected.