Today I was thinking, if I start a company in the LLM tooling space, I would put in the company mission in the incorporation documents that client data will not be used to train.
The temptation and the value is too great, and the opt-in opt-out consent thing ends up being a fuckery where the company tries to trick the user into allowing them to take a look into the data, presumably because they are selling the product at a loss and need an alternative revenue model.
Just make it impossible from the get-go, the fine print would be that the data can be shared off-band explicitly, in an email, or if explicitly copy pasted in a support chatbox, but there would be no mechanism for us to read the data from the databases much less from the client.
I don't mean it would be an air-tight mechanism like Signal or ProtonMail, if a court order would ask us to produce client info, we would still reserve the right to produce the data, but exceptionally, and definitely not for training models.
More companies need to make, for lack of a better term, "oaths" of what they won't do as a company. My pitch on it is to tie it to financial penalties the company agrees to pay, somewhere in the "enough to incentivize a significant portion of our user base to sue us" territory, such that it would be financial suicide to violate them.