logoalt Hacker News

AI assistance when contributing to the Linux kernel

467 pointsby hmokiguessyesterday at 6:35 PM349 commentsview on HN

Comments

qsortyesterday at 8:00 PM

Basically the rules are that you can use AI, but you take full responsibility for your commits and code must satisfy the license.

That's... refreshingly normal? Surely something most people acting in good faith can get behind.

show 4 replies
ninjagooyesterday at 9:44 PM

  > Signed-Off ...
  > The human submitter is responsible for:
    > Reviewing all AI-generated code
    > Ensuring compliance with licensing requirements
    > Adding their own Signed-off-by tag to certify the DCO
    > Taking full responsibility for the contribution

  > Attribution: ... Contributions should include an Assisted-by tag in the following format:
Responsibility assigned to where it should lie. Expected no less from Torvalds, the progenitor of Linux and Git. No demagoguery, no b*.

I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.

Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.

show 2 replies
sheepscreektoday at 2:01 PM

This is the right way forward for open-source. Correct attribution - by tightening the connection between agents and the humans behind them, and putting the onus on the human to vet the agent output. Thank you Linus.

ipythonyesterday at 8:00 PM

Glad to see the common-sense rule that only humans can be held accountable for code generated by AI agents.

show 1 reply
oytistoday at 9:51 AM

How is one supposed to ensure license compliance while using LLMs which do not (and cannot) attribute sources having contributed to a specific response?

show 1 reply
agentultratoday at 5:47 PM

How do the reviewers feel about this? Hopefully it won't result in them being overwhelmed with PRs. There used to be a kind of "natural limit" to error rates in our code given how much we could produce at once and our risk tolerance for approving changes. Given empirical studies on informal code review which demonstrate how ineffective it is at preventing errors... it seems like we're gearing up to aim a fire-hose of code at people who are ill-prepared to review code at these new volumes.

How long until people get exhausted with the new volume of code review and start "trusting" the LLMs more without sufficient review, I wonder?

I don't envy Linus in his position... hopefully this approach will work out well for the team.

rao-vtoday at 5:47 PM

A phenomenon I can not explain is the fact that this simple clean statement of a fairly obvious approach to AI assistance somehow took this long and Linus to state so cleanly.

Are there other popular repos with effectively this policy stated as neatly that I’ve missed?

show 2 replies
sarchertechyesterday at 8:31 PM

This does nothing to shield Linux from responsibility for infringing code.

This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.

It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.

show 4 replies
newsofthedayyesterday at 8:16 PM

> All code must be compatible with GPL-2.0-only

How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.

show 4 replies
MyUltiDevtoday at 4:20 PM

Reading this right after the Sashiko endorsement is a bit jarring. Greg KH greenlit an AI reviewer running on every patch a couple weeks back, and that direction actually seems to be helping, while here the conversation is still about whether contributors will take responsibility for AI code they submit. That feels like the harder side to police. The bugs that land kernel teams in trouble are race conditions, locking, lifetimes, the things models are most confidently wrong about. I have seen agents produce code that compiles cleanly, reads fine on a Friday review, then deadlocks under contention three weeks later. Is this contributor policy supposed to be the long term answer, or a placeholder until something Sashiko-shaped does the heavy filtering on the maintainer side too?

HarHarVeryFunnytoday at 1:47 PM

It's a sane policy - human is responsible for what they contribute, regardless of what tools they use in the development process.

However, the gotcha here seems to be that the developer has to say that the code is compatible with the GPL, which seems an impossible ask, since the AI models have presumably been trained on all the code they can find on the internet regardless of licensing, and we know they are capable of "regenerating" (regurgitating) stuff they were trained on with high fidelity.

show 1 reply
dataviz1000yesterday at 8:06 PM

This is discussed in the Linus vs Linus interview, "Building the PERFECT Linux PC with Linus Torvalds". [0]

[0] https://youtu.be/mfv0V1SxbNA?si=CBnnesr4nCJLuB9D&t=2003

show 1 reply
WhyNotHugotoday at 3:48 PM

Weird that they're co-opting the "Assisted-by:" trailer to tag software and model being used. This trailer was previously used to tag someone else who has assisted in the commit in some way. Now it has two distinct usages.

The typical trailer for this is "AI-assistant:".

aprentictoday at 3:38 PM

I like this. It's an inversion of the old addage, "a poor craftsman blames his tools" and the corollary, "use the right tool for the job" (because a good craftsman chooses the appropriate tool).

You don't get to bang on a screw and blame the hammer.

KronisLVtoday at 1:19 PM

This is actually a pretty nice idea:

  Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
I feel like a lot of people will have an ideological opposition to AI, but that would lead to people sometimes submitting AI generated code with no attribution and just lying about it.

At the same time, I feel bad for all the people that have to deal with low quality AI slop submissions, in any project out there.

The rules for projects that allow AI submissions might as well state: "You need to spend at least ~10 iterations of model X review agents and 10 USD of tokens on reviewing AI changes before they are allowed to be considered for inclusion."

(I realize that sounds insane, but in my experience iterated review even by the same Opus model can help catch bugs in the code, I feel like the next token prediction in of itself is quite error prone alone)

KaiLetovtoday at 5:34 AM

The policy makes sense as a liability shield, but it doesn't address the actual problem, which is review bandwidth. A human signs off on AI-generated code they don't fully understand, the patch looks fine, it gets merged. Six months later someone finds a subtle bug in an edge case no reviewer would've caught because the code was "too clean."

show 2 replies
dec0dedab0deyesterday at 8:19 PM

All code must be compatible with GPL-2.0-only

Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?

show 2 replies
feverzsjtoday at 4:58 AM

Linux is founded by all these big companies. Linus couldn't block AI pushes from them forever.

show 2 replies
themafiayesterday at 9:58 PM

> All contributions must comply with the kernel's licensing requirements:

I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.

If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.

show 1 reply
KhayaliYyesterday at 10:46 PM

We've seen in the past, for instance in the world of compliance, that if companies/governments want something done or make a mistake, they just have a designated person act as scapegoat.

So what's preventing lawyers/companies having a batch of people they use as scapegoats, should something go wrong?

zxexztoday at 5:45 AM

I like this. It's just saying you have responsibility for the tools you wield. It's concise.

Side note, I'm not sure why I feel weird about having the string "Assisted-by: AGENT_NAME:MODEL_VERSION" [TOOL1] [TOOL2] in the kernel docs source :D. Mostly joking. But if the Linux kernel has it now, I guess it's the inflection point for...something.

deadbabetoday at 1:08 PM

How can we automate the disclosure of what AI agent was used in a PR and the extent of code? Would be nice to also have an audit of prompts used, as that could also be considered “code”.

bharat1010today at 4:13 AM

Honestly kind of surprised they went this route -- just 'you own it, you're responsible for it' is such a clean answer to what feels like an endlessly complicated debate.

lowsongyesterday at 9:06 PM

At least it'll make it easy to audit and replace it all in a few years.

martin-tyesterday at 8:25 PM

This feels like the OSS community is giving up.

LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].

The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which

1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies

2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.

I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.

Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

[0]: https://news.ycombinator.com/item?id=47356000

[1]: http://prize.hutter1.net/

[2]: https://en.wikipedia.org/wiki/ELIZA_effect

[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...

show 6 replies
shevy-javayesterday at 8:28 PM

Fork the kernel!

Humans for humans!

Don't let skynet win!!!

show 1 reply
baggy_troughyesterday at 7:54 PM

Sounds sensible.

spwa4yesterday at 9:16 PM

Why does this file have an extension of .rst? What does that even mean for the fileformat?

show 3 replies
maroondlabstoday at 5:52 PM

[dead]

techpulselabtoday at 4:06 PM

[dead]

Xiaoher-Ctoday at 1:45 PM

[dead]

eddie-wangtoday at 3:05 PM

[dead]

BahaaKhateeb123today at 1:45 PM

[dead]

techpulselabtoday at 12:15 AM

[dead]

cameolkctoday at 10:28 AM

[dead]

redohyesterday at 9:02 PM

[dead]

midnightntoday at 12:38 AM

[dead]

northstar-auyesterday at 10:24 PM

[dead]

builderhq_iotoday at 8:30 AM

[dead]

chaosprinttoday at 3:35 AM

[dead]

the_biotyesterday at 9:25 PM

[flagged]

show 1 reply
bitwizeyesterday at 7:47 PM

Good. The BSDs should follow suit. It is unreasonable to expect any developer not to use AI in 2026.

NetOpWibbyyesterday at 10:38 PM

inb4 people rage against Linux

show 1 reply
gnarlousetoday at 6:00 AM

I wonder if this is happening because Mythos

rwmjtoday at 7:05 AM

Interesting that coccinelle, sparse, smatch & clang-tidy are included, at least as examples. Those aren't AI coding tools in the normal sense, just regular, deterministic static analysis / code generation tools. But fine, I guess.

We've been using Co-Developed-By: <email> for our AI annotations.