logoalt Hacker News

asciimooyesterday at 10:37 PM17 repliesview on HN

Ohi, I'm the original creator of Searx, but due to the limitations of the metasearch concept I'm not involved in the development anymore. My new search project is https://github.com/asciimoo/hister (https://hister.org/).

Hister is a full text indexer for websites and local files which automatically saves all the visited pages rendered by your browser. Storing full page content allows serving offline result previews and the full page content via MCP.

Take a look at how the MCP can be utilized: https://hister.org/posts/give-your-ai-assistant-a-private-me...


Replies

satvikpendemtoday at 8:52 PM

Looks good, but was also curious about the "limitations of the metasearch concept," could you explain this more?

jodohertytoday at 5:04 PM

Beautiful! Thank you for making this.

I've been trying to find something to use for enriching my own self-hosted LLMs and agentic tools with information I find useful. Metasearch tools like SearXNG make it less likely you'll get blocked by bot detection tools when finding information, but usually it's something I've already found, read, or seen that I want to incorporate into my tooling.

I came to the conclusion that a self-hosted content storage system with a search engine and a browser extension that can extract and save web page content and metadata was the ideal setup for me. Preferably with some sort of federated content sharing ability and the ability to import creative commons content like Wikipedia and Gutenberg.

This looks almost exactly like what I wanted.

It'll be a few weeks before I have time to audit the code and deploy it, but I'm really looking forward to trying it out.

ydjtoday at 3:32 AM

Hister sounds like something I wanted for a while, but never got around to building. Searching stuff I’ve seen before is most of what I do with a search engine, so having it local and fast would be amazing. Eager to give it a try.

show 1 reply
zeroqtoday at 12:03 AM

I'm sorry for not taking the time to read the docs, but I have a question.

Some 20 years ago a friend of mine has set up a local proxy (python if I'm not mistaken) that was gathering all his web traffic and served him as a long term memory. The proxy had a web interface and allowed him to quickly find something he saw ca. 10 days ago, or that specific algorithm he recalls but can't remember it's name.

For years I've been collecting links to different work related trivia which I use on a daily basis as a rabbit-from-a-hat solution to answer random question from friends and coworkers. For example someone randomly asked me for an idea for color palette for data charts and I can immediately give them a scientific research into the color palette. Or an obscure algorithm.

But with time the collection has grown substantially and it's really cumbersome to find the proper things.

Would your project be a good fit for my problem?

show 3 replies
Leonard_of_Qtoday at 1:46 PM

Interesting, a local search option. I made the recoll engine for SearX and now SearXNG and still use this daily over a rather large archive of journal articles and other non-fiction texts. Recoll's indexer can extract text from just about anything I throw at it, it also extracts and indexes metadata. Would Hister serve the same purpose and if so is there a SearXNG engine to integrate it into the result stream?

exiguustoday at 8:40 AM

YaCY has a proxy mode that automatically index your web-serving. In my experience, the index grow in size very fast and reaches ~100GB or more. How does the index size of Hister compare to that?

show 1 reply
BrunoBernardinotoday at 11:09 AM

Hister is a great idea and the creator is a really nice person, please give it an honest look and consider supporting them (I'm Uruky's co-founder and we sponsored them)!

MrDrMcCoytoday at 12:34 AM

Always excited to see new things like Hister in the search space. What are the scaling limits, as far as you can tell in terms of how much can it hold before queries start breaking down or become too slow to be useful? Could it evolve into a general internet search engine if, say, enough trusted members of a geo-distributed YugabyteDB cluster and an army of crawlers built a sufficient index?

show 1 reply
scritty-devtoday at 2:39 PM

this is really cool, first time hearing about this, is there any org level model for this so you can promote individual's indexed websites into an organization/team owned model?

show 1 reply
derridatoday at 1:01 AM

Wow! that looks like a bit of software I have been dreaming about for awhile - will definately check out! You're at least doing something right in communicating the reasons why and appeal for starters! All the best!

Abishek_Muthiantoday at 3:45 AM

This is great, like many others I've been thinking of something like hister but only for bookmarked web pages. I presume it should be straightforward with hister to do that?

All the best!

show 1 reply
chrisss395today at 12:00 AM

I love your idea and wondered why saving and indexing browser visited pages was not being done. Does this handle large amounts of local files, for example 10-20TB across file types like Powerpoint, Excel, Word, and PDF?

show 1 reply
blackqueerirohtoday at 1:04 AM

Oh thank god there used to be several tools like this and they slowly went away. I’ve been wanting this to return.

kristianpaulyesterday at 11:12 PM

Is this similar to fastcrw ?

show 1 reply
nickpsecuritytoday at 3:38 AM

I was considering paying someone to build something like this at some point. With two jobs, I eventually had no time to even organize what I find. It's just piles of links in text files.

Can I give your software a huge list of URL's to index? Or do I need to use browser automation to open them a few at a time with it caching and indexing them?

show 1 reply
operatingthetanyesterday at 10:46 PM

I installed this a while back and honestly I almost never touch it. It turns out that for me searching my history doesn't really replace a search engine at all. The built in extractor list is pretty limited and adding them seems like too much of an ordeal for me to bother.

show 1 reply