logoalt Hacker News

TZubiritoday at 4:45 PM1 replyview on HN

Today I was thinking, if I start a company in the LLM tooling space, I would put in the company mission in the incorporation documents that client data will not be used to train.

The temptation and the value is too great, and the opt-in opt-out consent thing ends up being a fuckery where the company tries to trick the user into allowing them to take a look into the data, presumably because they are selling the product at a loss and need an alternative revenue model.

Just make it impossible from the get-go, the fine print would be that the data can be shared off-band explicitly, in an email, or if explicitly copy pasted in a support chatbox, but there would be no mechanism for us to read the data from the databases much less from the client.

I don't mean it would be an air-tight mechanism like Signal or ProtonMail, if a court order would ask us to produce client info, we would still reserve the right to produce the data, but exceptionally, and definitely not for training models.


Replies

OkayPhysicisttoday at 4:52 PM

More companies need to make, for lack of a better term, "oaths" of what they won't do as a company. My pitch on it is to tie it to financial penalties the company agrees to pay, somewhere in the "enough to incentivize a significant portion of our user base to sue us" territory, such that it would be financial suicide to violate them.

show 1 reply