logoalt Hacker News

jamesblondeyesterday at 1:27 PM15 repliesview on HN

I gave a talk at PyData Berlin on how to build your own TikTok recommendation algorithm. The TikTok personalized recommendation engine is the world's most valuable AI. It's TikTok's differentiation. It updates recommendations within 1 second of you clicking - at human perceivable latency. If your AI recommender has poor feature freshness, it will be perceived as slow, not intelligent - no matter how good the recommendations are.

TikTok's recommender is partly built on European Technology (Apache Flink for real-time feature computation), along with Kafka, and distributed model training infrastructure. The Monolith paper is misleading that the 'online training' is key. It is not. It is that your clicks are made available as features for predicitons in less than 1 second. You need a per-event stream processing architecture for this (like Flink - Feldera would be my modern choice as an incremental streaming engine).

* https://www.youtube.com/watch?v=skZ1HcF7AsM

* Monolith paper - https://arxiv.org/pdf/2209.07663


Replies

eddd-dddeyesterday at 7:53 PM

I have to say, it is _extremely_ impressive when a tiktok I watched reminds me of some other tiktok, so I go and search for a very loose description of the tiktok, and the first result is 95% of the time what I wanted to find.

I don't think any single other platform has as good a search feature as TikTok does.

show 1 reply
dmixyesterday at 2:17 PM

I noticed Youtube shorts also seems to update the feed based on how long the last video you watched. If you're scrolling quickly then stop to watch a dog video long enough the next one is likely to be another dog video.

show 4 replies
lsureshyesterday at 5:35 PM

Thanks for the Feldera shoutout Jim.

For anyone else, if you want to try out Feldera and IVM for feature-engineering (it gives you perfect offline-online parity), you can start here: https://docs.feldera.com/use_cases/fraud_detection/

vjerancrnjakyesterday at 3:09 PM

Flink is too slow for this.

If by features you mean tracking state per user, that stuff can be tracked without Flink insanely fast with Redis as well.

If you re saying they dont have to load data to update the state, I dont see how massive these states are to require inmemory updates, and if so, you could just do inmemory updates without Flink.

Similarly, any consumer will have to deal with batches of users and pipelining.

Flink is just a bottleneck.

If they actually use Flink for this, its not the moat.

show 1 reply
not_aiyesterday at 9:32 PM

I’m happy to see that Flink is in this stack, I wish that Pulsar was as well instead of Kafka.

bobekyesterday at 6:09 PM

It is not only recommender though. These guys [1] seem to be able to react pretty quickly and not to create addicts on the way ;(

[1] https://recombee.com

3abitonyesterday at 6:23 PM

It's interesting to how they found out the "lifetime" of features is a feature by itself. Meta features is real.

SpaceManNabsyesterday at 10:35 PM

apache flink is so good. i think netflix used it heavily in 2018. not sure about now.

miohtamayesterday at 3:13 PM

TikTok's differention is the userbase of all teenagers in the world.

show 3 replies
ryanjshawyesterday at 2:20 PM

Great insight. Any thoughts on RisingWave?

show 1 reply
cactusplant7374yesterday at 7:25 PM

I thought was secret information. How long as it been publicly known?

Jamesbeamyesterday at 2:21 PM

[flagged]