Amazon S3 Select enables SQL queries directly on CSV, JSON, or Apache Parquet objects, allowing retrieval of filtered data subsets to reduce latency and costs
S3 Select is, very sadly, deprecated. It also supported HTTP RANGE headers! But they've killed it and I'll never forgive them :)
Still, it's nbd. You can cache a billion Parquet header/footers on disk/ memory and get 90% of the performance (or better tbh).
S3 Select is, very sadly, deprecated. It also supported HTTP RANGE headers! But they've killed it and I'll never forgive them :)
Still, it's nbd. You can cache a billion Parquet header/footers on disk/ memory and get 90% of the performance (or better tbh).