TorchTPU: Running PyTorch Natively on TPUs at Google Scale

151 points • by mji • yesterday at 8:53 PM • 12 comments • view on HN

Comments

So Google is basically admitting PyTorch/XLA on TPUs didn't work — TorchTPU looks like them rebuilding what should have worked day one. Its hard to run production ML on a toolchain engineers can't trust, no matter how fast the silicon is.

yu3zhou4 • today at 8:32 AM

Adding a support for new hardware to PyTorch is actually quite convenient. I did that with WebGPU using the same PrivateUse1 mechanism TorchTPU used. Every hardware has its own slot and identifier, and when you want to add a support for a new one without merging it into PyTorch, PrivateUse1 works essentially like plug-in slot

https://github.com/jmaczan/torch-webgpu

isusmelj • today at 10:16 AM

Is it just me, or does it feel like everyone now uses AI to write any kind of blog?

These parts here somehow trigger me:

- Enter TorchTPU. As an engineering team, our mandate was to build a stack that leads with usability, portability, and excellent performance.

- Engineering the TorchTPU Stack: The Technical Reality

- Eager First: Flexibility Without Compromise

- The breakthrough, however, is our fused eager mode.

- The Road Ahead: 2026 and Beyond

I have mixed feelings about this. On one hand, we all seem to be using the same tools and converging to the same style. On the other hand, if we all use the same models with the same system prompts, we might lose a lot of creativity and diversity in online content.

➕ show 2 replies

in-silico • yesterday at 11:30 PM

This is great to see.

I did trained some research models using the existing PyTorch/XLA on TPUs, and it was a mess of undocumented behavior and bugs (silently hanging after 8 hours of training!).

If anyone is trying to use PyTorch on TPU before TorchTPU is released, you can check out the training pipeline that I ended up building to support my research: https://github.com/aklein4/easy-torch-tpu

Reubend • yesterday at 11:50 PM

Sounds good, but my main question is: is this a fork, or a new backend they're building in (like MPS)?

➕ show 2 replies

immanuwell • today at 8:32 AM

pitch basically boils down to 'just change one line and it works' which sounds too good to be true, but if they actually pull it off at 100k-chip scale, that's genuinely a big deal

MASNeo • today at 6:34 AM

Now all that’s missing is an actual chip that can be purchased. Any ideas?

➕ show 1 reply

noracists • today at 3:17 AM

Very excited for this.

yujunjie • today at 3:23 AM

[dead]

crimebrasil • today at 1:20 AM

[dead]

alt Hacker News

TorchTPU: Running PyTorch Natively on TPUs at Google Scale

Comments