Super low latency inference might be helpful in applications like quant trading. However, in an era where a frontier model becomes outdated after 6 months, I wonder how useful it can be.
Also, quant trading probably care more about embedding the content instead of generating output tokens
Also, quant trading probably care more about embedding the content instead of generating output tokens