Hey website creator here! Can you elaborate? Do you mean like going through only the specific unique-words?
That would be interesting. This might be more helpful to the people who are interested in finding people's the unique grammatical words they used
So do note that this comment is written by me (a human hi!:D) but the following sql query isn't.
SELECT
by AS username,
sum(length(splitByWhitespace(text))) AS total_words,
-- Extract words, clean punctuation, and count distinct values
uniqExact(
arrayJoin(
arrayFilter(
x -> x != '',
arrayMap(x -> lower(replaceRegexpAll(x, '[^a-zA-Z]', '')), splitByWhitespace(text))
)
)
) AS unique_words,
-- Calculate diversity: What percentage of their vocabulary is unique
round((unique_words / total_words) * 100, 2) AS diversity_score
FROM hackernews_history
WHERE type = 'comment'
AND deleted = 0
AND lower(by) = lower('NooneAtAll3')
GROUP BY by
Hope it helps :D Have a nice day (still written by human, alright I am going to sleep right now. Had a lot of fun today with this posts/running random sql queries :D)
Good night! This might be my last comment today before sleep! I will be busy tomorrow so I might not be able to see any interesting ideas that people might have here to run it.
Hey website creator here! Can you elaborate? Do you mean like going through only the specific unique-words?
That would be interesting. This might be more helpful to the people who are interested in finding people's the unique grammatical words they used
So do note that this comment is written by me (a human hi!:D) but the following sql query isn't.
SELECT by AS username, sum(length(splitByWhitespace(text))) AS total_words, -- Extract words, clean punctuation, and count distinct values uniqExact( arrayJoin( arrayFilter( x -> x != '', arrayMap(x -> lower(replaceRegexpAll(x, '[^a-zA-Z]', '')), splitByWhitespace(text)) ) ) ) AS unique_words, -- Calculate diversity: What percentage of their vocabulary is unique round((unique_words / total_words) * 100, 2) AS diversity_score FROM hackernews_history WHERE type = 'comment' AND deleted = 0 AND lower(by) = lower('NooneAtAll3') GROUP BY by
№ username total_words unique_words diversity_score1
NooneAtAll3 942372 4752 0.5
Hope it helps :D Have a nice day (still written by human, alright I am going to sleep right now. Had a lot of fun today with this posts/running random sql queries :D)
Good night! This might be my last comment today before sleep! I will be busy tomorrow so I might not be able to see any interesting ideas that people might have here to run it.