The issue with having a "no answer" option is that you implicitly add a decision problem into your test that depends on the "cost" of answering wrong.
Specifically, your model now has two "correct" classes p(class=y|x) and p(class=⊥|x). This makes the results ambiguous. The way you resolve this is by adding in a cost of missclassification and a cost of answering wrong.
L(y, y') =
0 if y=y' l_err if y≠y' and y'≠⊥ l_⊥ if y' = ⊥
You can then estimate the expected error over your dataset. Notice that this now gives you additional degrees of freedom: Depending on how expensive answering wrong is compared to not answering at all, your predictor might be really bad or really good.
This means when benchmarking with a "no answer" action, you are often not actually benchmarking whether the model works well or not, but rather are benchmarking how well the model _happens_ to agree with the class-error weight you (implicitly) chose in your model.