Dyck grammar (balanced brackets) are not an a^nb^n, there are several kinds of brackets.
I cannot find probability of success in paper you linked. Is it 100%? I believe it is less than 100%, because LLMs are intrinsically probabilistic machines.
Figure 12 shows probabilities I think, it actually does seem to be 100% at temperature 0.1 for certain pretraining runs.
Figure 12 shows probabilities I think, it actually does seem to be 100% at temperature 0.1 for certain pretraining runs.