logoalt Hacker News

LoganDarkyesterday at 3:08 PM1 replyview on HN

> The core what it does seems ridiculously simple. So much so that when I initially considered this I didn’t think it would even be close to being performant. I mean, an interpreted language and all of these movements on large shapes at once!–doesn’t feel like it would be a good idea!!! I guess I was wrong? Doing this on a chunk sized `16 128 16` is pretty fast and I’m able to fly around the map at high speeds (TBA: me demoing this live in a presentation). This kind of boggled my mind and broke my intuitions of what I considered good patterns, at least in the domain of APL.

I wonder what makes it so fast? Is it similar to how GHC can fuse/inline ops and such?


Replies

i_don_t_knowyesterday at 7:54 PM

Iirc, the interpreter recognizes common idioms (sequences of 2-4 operations) and has optimized fused implementations for those idioms that avoid intermediate results.

It can also avoid creating intermediate results for quite a few operations, because for example reverse, transpose, etc only change how the arrays are traversed (order in which elements are accessed). You can reuse the original data and change the indexing information. That’s known as beating and dragging.