How is MTP different from Medusa heads? Also does this mean this model comes "natively" wi...

humblyCrazy • yesterday at 2:04 PM • 0 replies • view on HN

How is MTP different from Medusa heads? Also does this mean this model comes "natively" with speculative decoding - meaning if I use this model in vllm, it's throughput should be higher because it is already doing MTP so it should be able to take advantages of speculative decoding?

alt Hacker News