logoalt Hacker News

sgtyesterday at 8:48 PM1 replyview on HN

Think they will not train on the dull 2TB but use that as the data lake to start and then apply a more targeted approach.


Replies

winddudeyesterday at 9:15 PM

if you read the article 2pb is available as flash storage in the data pipeline, used to dedupe, clean, normalize, etc, for training from 60pb of raw data.