It should be obvious that LLMs would be able to beat this with ease. Not sure why this paper deliberately skipped comparing to current LLMs
Example of LLMs doing well in similar tasks: https://arxiv.org/abs/2602.16800