My approach operates under the assumption that good, non-commercial webpages will be similar to other good webpages. Slop, SEO spam, and affiliate content will resemble other such content.
So a similarity-based graph/network of webpages should cluster good with good, bad with bad. That is what I've seen so far, anyway.
With that, you just need to enter the graph in the right place, something that is fairly trivial.