logoalt Hacker News

Lessons from Building a Translator App That Beats Google Translate and DeepL

70 pointsby msephton04/29/202536 commentsview on HN

Comments

omneity05/03/2025

Related: I built a translation app[0]* for language pairs that are not traditionally supported by Google Translate or DeepL (Moroccan Arabic with a dozen of other major languages), and also trained a custom translation model for it - a BART encoder/decoder derivative, using data I collected, curated and corrected from scratch, and then I built a continuous training pipeline for it taking people's corrections into account.

Happy to answer questions if anyone is interested in building translation models for low-resource languages, without being a GPT wrapper. A great resource for this is Marian-NMT[1] and the Opus & Tatoeba projects (beware of data quality).

0: https://tarjamli.ma

* Unfortunately not functioning right now due to inference costs for the model, but I plan to launch it sometime soon.

1: https://marian-nmt.github.io

show 6 replies
DiscourseFan05/03/2025

This is a GPT wrapper? GPT is great for general translation, as it is an LLM just like DeepL or Google Translate. However, it is fine-tuned for a different use case than the above. Although, I am a little surprised at how well it functions.

show 2 replies
kyrra05/04/2025

Googler, opinions are my own.

My one issue is that the author does not try to think about ways Google translate is better. It's all about model size. Google Translate models are around 20mb when run local on a phone. That makes them super cheap to run and can be done offline on a phone.

I'm sure Gemini could translate better than Google Translate, but Google is optimizing for speed and compute. It's why they will allow free translation of any webpage in Chrome.

show 1 reply
izabera05/03/2025

i don't understand what market there is for such a product. deepl costs $8.74 for 1 million characters, this costs $1.99 for 5000 (in the basic tiers, and the other tiers scale from there). who's willing to pay ~45x more for slightly better formatting?

show 2 replies
whycome05/04/2025

The most bizarre part of google translate is when it translates a word but gives just one definition when it’s possible to have many. When you know a bit about the translating languages all the flaws really show up.

Falimonda05/03/2025

I'm working on a natural language router system that chooses the optimal model for a given language pair. It uses a combination of RLHF and conventional translation scoring. I envision it to soon become the cheapest translation service providing the highest average quality across languages by striking a balance between Google Translate's expensive API and any given, cheaper, random model's performance across different languages.

I'll beginning to integrate it into my user-facing application for language learners soon: www.abal.ai

dostick05/03/2025

So basically, if you don’t know your market, don’t develop it. There’s still no good posts about building apps that have LLM backend. How do you protect against prompt attacks?

show 1 reply
butz05/04/2025

In this day and age of GetAI everything, is it still possible to find a simple, open dictionary of word pairs for different languages?

joshdavham05/03/2025

Thanks for posting! This was a fun little read. Also, it's always great to see more people using Svelte.

gitroom05/03/2025

Gotta respect the grind you put into collecting and fixing your training data by hand - that's no joke. you think focusing on smaller languages gives an edge over just chasing big ones everyone uses?