The tokenizer changes seem to indicate that 4.7 isn't just a checkpoint but rather a model trained mostly from scratch, right?
You can change tokenizers without a complete retraining from scratch.
You can change tokenizers without a complete retraining from scratch.