Obligatory DuckDB solution: > duckdb -s "COPY (SELECT url[20:] as url, date, count(*) as c...

csjh • today at 4:42 PM • 1 reply • view on HN

Obligatory DuckDB solution:

> duckdb -s "COPY (SELECT url[20:] as url, date, count(*) as c FROM read_csv('data.csv', columns = { 'url': 'VARCHAR', 'date': 'DATE' }) GROUP BY url, date) TO 'output.json' (ARRAY)"

Takes about 8 seconds on my M1 Macbook. JSON not in the right format, but that wouldn't dominate the execution time.

Replies

cess11 • today at 5:49 PM

This log in one of the PR:s claims a 5.4s running time on some Mac.

https://github.com/tempestphp/100-million-row-challenge/pull...

alt Hacker News

Replies