Follow-up reading the most technical and research people here: Monokernel deep dive (GPU Engineeri...

gaeld • today at 11:09 AM • 1 reply • view on HN

Follow-up reading the most technical and research people here:

Monokernel deep dive (GPU Engineering): http://blog.kog.ai/building-a-single-kernel-latency-optimize...

Delayed Tensor Parallelism (research): http://blog.kog.ai/delayed-tensor-parallelism-for-faster-tra...

To try the speed on the playground: http://playground.kog.ai

Replies

It looks like DTP is a distinct architectural choice that would require training new models accordingly? This wouldn't be able to just run inference for existing models.

➕ show 1 reply

alt Hacker News

Replies