logoalt Hacker News

IsTomyesterday at 12:59 PM2 repliesview on HN

I'm not following the whole LLM space, but

> the compute needed to perform matrix multiplications goes up as the cube of their size,

are they really not using even Strassen multiplication?


Replies

jcranmeryesterday at 2:35 PM

I'm not aware of any major BLAS library that uses Strassen's algorithm. There's a few reasons for this; one of the big ones is Strassen is much worse numerical performance than traditional matrix multiplication. Another big one is that at very large dense matrices--which are using various flavors of parallel algorithms--Strassen vastly increases the communication overhead. Not to mention that the largest matrices are probably using sparse matrix arithmetic anyways, which is a whole different set of algorithms.

jiggawattsyesterday at 2:00 PM

AFAIK the best practical matrix multiplication algorithms scale as roughly N^2.7 which is close enough to N^3 to not matter for the point that I'm trying to make.