Yes, on Athena, we process much larger CSV files. But the cost is too crazy. We also have ORC and Parquet files for other dataset which we process with EMR Spark. I really want to get off those distributed analytic engines whenever possible.
I have to think about partition, Spark/Athena both had issues with partitioning by received date. They are scanning way too much data.