logoalt Hacker News

bhadasstoday at 3:03 AM0 repliesview on HN

but isn't this like a lot of other CS-related "gradient descent"?

when someone invents a new scheduling algorithm or a new concurrent data structure, it's usually based on hunches and empirical results (benchmarks) too. nobody sits down and mathematically proves their new linux scheduler is optimal before shipping it. they test it against representative workloads and see if there is uplift.

we understand transformer architectures at the same theoretical level we understand most complex systems. we know the principles, we have solid intuitions about why certain things work, but the emergent behavior of any sufficiently complex system isn't fully predictable from first principles.

that's true of operating systems, distributed databases, and most software above a certain complexity threshold.