Yes, you can use constrained decoding like logit masking to force all invalid tokens in the vocabulary to -inf, and effectively be removed from selection. I believe llama.cpp exposes this by accepting a formatted grammar.