logoalt Hacker News

aynyc05/06/20250 repliesview on HN

Yes, on Athena, we process much larger CSV files. But the cost is too crazy. We also have ORC and Parquet files for other dataset which we process with EMR Spark. I really want to get off those distributed analytic engines whenever possible.

I have to think about partition, Spark/Athena both had issues with partitioning by received date. They are scanning way too much data.