New arXiv policy: 1-year ban for hallucinated references

507 points • by gjuggler • yesterday at 8:39 PM • 173 comments • view on HN

Comments

> The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue.

This is incredibly good for science. arXiv is free, but it's a privilege not a right!

I'm not seeing this clearly listed on https://info.arxiv.org/help/policies/index.html so it's possible this is planned but not live yet - or perhaps I'm not digging deeply enough?

As a certain doctor once said: the whole point of the doomsday machine is lost if you keep it a secret!

➕ show 3 replies

noobermin • today at 12:07 AM

Seeing the usual LLM hypers angry replying to this on twitter is such a tell. Just like the comments on the LLM poisoning articles, some people just can't accept that some people don't like LLMs and get upset when you put any amount of hindrance to their rapid acceptance.

➕ show 2 replies

imenani • yesterday at 8:59 PM

https://xcancel.com/tdietterich/status/2055000956144935055

➕ show 1 reply

mks_shuffle • today at 2:22 AM

While this is certainly a welcome step, I hope there is more work done to fix the underlying problem of easily creating correct BibTeX entries for the cited papers. Citations for any given paper can come from a wide range of journals with various publishers, conferences, and preprints. The same paper can be available from multiple sources with varying details, e.g. arXiv and the conference website. Tools like Zotero have certainly made it significantly easier to extract citations from webpages of publication, but I still find issues with the extracted BibTeX details. While author names and titles are often extracted correctly, I still have to manually ensure that details like publication venue, year, volume number, page number, URL, etc. are extracted correctly and also shown correctly in LaTeX format. Different publications can use different citation styles. This can unfortunately lead to taking shortcuts with AI-generated citation data due to the lack of an easy and unified approach to extract consistent citation data. I am not sure whether hallucinated citations are being generated in the main manuscript or in a separate BibTeX file, so I may be a bit off in my understanding.

➕ show 1 reply

az226 • today at 4:57 AM

Next, for AI papers, a reproducibility requirement. So much code and details are fudged and paper's cannot be reproduced. Ran the training with some other config, or other data, etc. to make their mechanism or intervention seem better.

rgmerk • today at 12:29 AM

Good.

If it’s not worth your time to check the output of your LLM carefully, it’s not worth my time to read it.

➕ show 1 reply

scirob • today at 9:53 AM

Great, it's so easy to automate checking ref super bad to not check

Foivos • today at 9:51 AM

I just wish to anyone who is against this policy to be forced to review a paper that turns out to be unedited AI slop. Reviewers are experts volunteers who do it for free. It is incredibly frustrating to have spent 4 hours reading a paper where you try your best to make sense of what the authors are trying to prove just to realize that it is hallucinations.

The authors should value the time of the reviewers higher than their own time. So, if you include AI nonsense in your paper, it is insulting.

soraminazuki • today at 10:02 AM

It's not unexpected, but still sad to see so many comments opposing even the smallest step against low-effort fraud in academic publications. Is this what hacker culture has been reduced to in the age of the slop era? Open hostility against science and engineering?

thatjoeoverthr • today at 8:52 AM

No mercy to brain slugs.

ElenaDaibunny • today at 3:22 AM

how will they detect hallucinated refs at scale? Manual spot checks? Automated DOI verification? The policy seems right but enforcement is the hard part.

➕ show 1 reply

MinimalAction • yesterday at 11:18 PM

There needs be to a careful vetting before such adverse actions. If somebody includes a name and pushed it without express permission, does everyone get the ban? I agree that implemented the right way, this is good.

➕ show 1 reply

cyclecycle • today at 7:05 AM

This has become such a problem in scholarly publishing that we have a business that provides citation checking https://groundedai.company/ that we've been buidling for a couple of years now

➕ show 1 reply

druub • today at 4:41 AM

What are reasonable alternatives to arXiv? It has become increasinbgly slow. Techrxiv?

bigfishrunning • yesterday at 9:04 PM

Good; academic literature is in crisis because of all of the slop. Forcing some consequences on easily-detectable hallucinations can only be a good thing

➕ show 1 reply

squirrelon • yesterday at 10:43 PM

Had a colleague submit a paper with literal AI slop left in the text, got hit with a nasty revision request. Check your drafts before you submit, people. The reviewers will find it.

➕ show 2 replies

az226 • today at 4:58 AM

Hurray!

jimmygrapes • today at 3:14 AM

As of yet no comments here seem to address the "reputable" condition. Reputable review is based on what criteria?

nullc • today at 1:01 AM

It's been pretty eye opening watching Craig Wright (of bitcoin fakery fame) flooding out LLM generated 'academic' papers and even having some of them accepted.

He's toast if SSRN were to adopt a similar policy.

jszymborski • today at 12:14 AM

Should be more harsh in my opinion.

Ozzie-D • today at 10:07 AM

[flagged]

jeremie_strand • today at 3:04 AM

[flagged]

hyunwoo222 • today at 1:18 AM

[flagged]

random3 • yesterday at 9:36 PM

It seems a good idea to ban cheating, but how hard is it, especially in new reasoning/agents contexts to validate references?

The deeper question is whether legitimate AI generated results are allowed or not? Test - In the extreme - think proof of Riemann Hypothesis autonomously generated (end to end) formally proven - is it allowed or not?

➕ show 9 replies

alt Hacker News

New arXiv policy: 1-year ban for hallucinated references

Comments