logoalt Hacker News

syntaxpryesterday at 4:50 AM1 replyview on HN

Not training. Transposing rows/columns of matrices to group 128 parameters with similar (shared) scale factor. Qwen-3 model.


Replies

MarsIronPIyesterday at 2:10 PM

I'm not sure what you mean. Could you please elaborate?