This is why local AI is so important
That doesn't solve this particular problem. Your local model was trained on reddit comments written by bots.
Local AI will have the bias that existed at the time of its training, which is different from no bias. For stuff that needs to be current, a local LLM would need to search the net regardless.
How do you make sure that the model you run locally is not tainted? Is there even a way to confirm this without providing the complete training set?
It's less compromised, but it's still basing the answer on compromised queries. This is why I pay for independent reviews (e.g Which) where their incentives are more aligned with yours.
Not if the models come from Google. The ads will be implicit in the model. X is better that Y an Z would be easy to add to a the training set.
How does that help if it's using search? You get whatever the search engine outputs
Local AI models pull in search results just like ChatGPT does ...
And they are trained on web data just like any other model...
It's already being trained on "public" (ethical or otherwise) data. So, it already has ingested that kind of "optimization" during pre-training and training.
I don't think you can fine-tune your way out of it.