> but I think it's unfair to say that LLMs _cannot_ recognize or generate CFGs.
They recognize and/or generate finite (<800 chars) grammars in that paper.Usually, sizes of files on a typical Unix workstation follow two-mode log-normal distribution (sum of two log-normal distributions), with heavy tails due to log-normality [1]. Authors of the paper did not attempt to model that distribution.
[1] This was true for my home directories for several years.