Of course just having the hash of the file wouldn't work, they would have to do something more complicated, a kind of perceptual hash. It's not easy, but I think it is doable.
But then I suspect lots of parts in a closed source project are similar to open source code, so you can't just refuse to analyze any code that contains open source parts, and an attacker could put a few open source files into "fake" closed source code, and presumably the llm would not flag them because the ratio open/closed source code is good. But that would raise the costs for attackers.
Of course just having the hash of the file wouldn't work, they would have to do something more complicated, a kind of perceptual hash. It's not easy, but I think it is doable.
But then I suspect lots of parts in a closed source project are similar to open source code, so you can't just refuse to analyze any code that contains open source parts, and an attacker could put a few open source files into "fake" closed source code, and presumably the llm would not flag them because the ratio open/closed source code is good. But that would raise the costs for attackers.