I was thining of something like LLaDa that uses a Transformer to predict forward masked tokens:
https://arxiv.org/abs/2502.09992