You can also try speculative decoding with the E2B model. Under some conditions it can result in a decent speed up