It's easy to hand-curate a list of 5,000 "small web" URLs. The problem is scaling. For example, Kagi has a hand-curated "small web" filter, but I never use it because far more interesting and relevant "small web" websites are outside the filter than in it. The same is true for most other lists curated by individual folks. They're neat, but also sort of useless because they are too small: 95% of the things you're looking for are not there.
The question is how do you take it to a million? There probably are at least that many good personal and non-commercial websites out there, but if you open it up, you invite spam & slop.
I mainly use Kagi Small Web as a starting point of my day, with my morning coffee. Especially now when categories are added, always find something worth reading. The size here does not present a problem as I would usually browse 20-30 sites this way.
My approach operates under the assumption that good, non-commercial webpages will be similar to other good webpages. Slop, SEO spam, and affiliate content will resemble other such content.
So a similarity-based graph/network of webpages should cluster good with good, bad with bad. That is what I've seen so far, anyway.
With that, you just need to enter the graph in the right place, something that is fairly trivial.
> The question is how do you take it to a million?
Do you need to take it to a million in the same place? Is that still "small"?
Why not have 2000 hand curated directories instead?