That's true. For this reason, most modern search engines support language-aware stemming and to...

philippemnoel • last Friday at 4:49 PM • 0 replies • view on HN

That's true. For this reason, most modern search engines support language-aware stemming and tokenization. Popular tokenizers for CJK languages include Lindera and Jieba.

We (ParadeDB) use a search library called Tantivy under the hood, which supports stemming in Finnish, Danish and many other languages: https://docs.paradedb.com/documentation/token-filters/stemmi...

alt Hacker News