There's no training code because the author is using an external service for that

yorwba • today at 7:38 AM • 0 replies • view on HN

There's no training code because the author is using an external service for that https://docs.primeintellect.ai/hosted-training/getting-start... The reward function is https://github.com/HarleyCoops/Math-To-Manim/blob/d1c412d22a... The environment is iterative LLM prompting.

The idea is apparently that a model that is bad at fixing its own mistakes might become better if you train it on this task using reinforcement learning.

alt Hacker News