logoalt Hacker News

Lessons from Building a Translator App That Beats Google Translate and DeepL

60 pointsby msephtonlast Tuesday at 11:08 PM33 commentsview on HN

Comments

omneityyesterday at 8:16 PM

Related: I built a translation app[0]* for language pairs that are not traditionally supported by Google Translate or DeepL (Moroccan Arabic with a dozen of other major languages), and also trained a custom translation model for it - a BART encoder/decoder derivative, using data I collected, curated and corrected from scratch, and then I built a continuous training pipeline for it taking people's corrections into account.

Happy to answer questions if anyone is interested in building translation models for low-resource languages, without being a GPT wrapper. A great resource for this is Marian-NMT[1] and the Opus & Tatoeba projects (beware of data quality).

0: https://tarjamli.ma

* Unfortunately not functioning right now due to inference costs for the model, but I plan to launch it sometime soon.

1: https://marian-nmt.github.io

show 6 replies
kyrratoday at 4:30 AM

Googler, opinions are my own.

My one issue is that the author does not try to think about ways Google translate is better. It's all about model size. Google Translate models are around 20mb when run local on a phone. That makes them super cheap to run and can be done offline on a phone.

I'm sure Gemini could translate better than Google Translate, but Google is optimizing for speed and compute. It's why they will allow free translation of any webpage in Chrome.

show 1 reply
DiscourseFanyesterday at 7:47 PM

This is a GPT wrapper? GPT is great for general translation, as it is an LLM just like DeepL or Google Translate. However, it is fine-tuned for a different use case than the above. Although, I am a little surprised at how well it functions.

show 2 replies
whycometoday at 1:15 AM

The most bizarre part of google translate is when it translates a word but gives just one definition when it’s possible to have many. When you know a bit about the translating languages all the flaws really show up.

izaberayesterday at 10:27 PM

i don't understand what market there is for such a product. deepl costs $8.74 for 1 million characters, this costs $1.99 for 5000 (in the basic tiers, and the other tiers scale from there). who's willing to pay ~45x more for slightly better formatting?

show 1 reply
Falimondayesterday at 8:20 PM

I'm working on a natural language router system that chooses the optimal model for a given language pair. It uses a combination of RLHF and conventional translation scoring. I envision it to soon become the cheapest translation service providing the highest average quality across languages by striking a balance between Google Translate's expensive API and any given, cheaper, random model's performance across different languages.

I'll beginning to integrate it into my user-facing application for language learners soon: www.abal.ai

dostickyesterday at 7:54 PM

So basically, if you don’t know your market, don’t develop it. There’s still no good posts about building apps that have LLM backend. How do you protect against prompt attacks?

show 1 reply
joshdavhamyesterday at 7:53 PM

Thanks for posting! This was a fun little read. Also, it's always great to see more people using Svelte.

gitroomyesterday at 10:11 PM

Gotta respect the grind you put into collecting and fixing your training data by hand - that's no joke. you think focusing on smaller languages gives an edge over just chasing big ones everyone uses?