Alright, having tried this with Japanese I can say it's frustrating. As a near complete beginner the tutor kept speaking in Japanese even when I said "sorry I don't understand" repeatedly and then when I asked it to start in English and then gradually transition to Japanese it lasted all of one sentence in English before switching back. I can totally see how this would be useful conversation practice if you've progressed that far, but I'd love to have something for even earlier beginners. Also since many of the models you use are natively multi modal this could readily integrate visual media for discussion and grounding.
Also, for the transcription it would be great to get pure romanji to start with!
Yes, I can understand and empathize with your experience. Quite honestly our current focus is more for B1+ students. That 0 -> 1 / bootstrapping of the language is much better served by traditional material that is less talking / listening-heavy.