logoalt Hacker News

marginalia_nutoday at 5:55 PM10 repliesview on HN

Fwiw I did some more comparisons, looking for words disproportionately favored by noob comments:

    word   noob new   p-value
    ----------------------------
    ai 14.93% 7.87% p=0.00016
    actually 12.53% 5.34% p=1.1e-05
    code 11.47% 6.04% p=0.00081
    real 10.93% 2.95% p=2.6e-08
    built 10.93% 2.11% p=2.1e-10
    data 8.93% 3.51% p=6.1e-05
    tools 7.6% 2.67% p=5.5e-05
    agent 7.47% 2.95% p=0.00024
    app 7.2% 3.09% p=0.00078
    tool 6.8% 1.83% p=8.5e-06
    model 6.8% 2.39% p=0.00013
    agents 6.67% 2.11% p=5.2e-05
    api 6.53% 1.12% p=2.7e-07
    building 6.13% 1.54% p=1.3e-05
    full 6.0% 1.97% p=0.00017
    across 5.87% 1.4% p=1.3e-05
    interesting 5.33% 1.54% p=0.00014
    answer 5.2% 1.4% p=9.6e-05
    simple 4.93% 1.54% p=0.00043
    project 4.8% 1.26% p=0.00015

Replies

xliitoday at 6:14 PM

Actually building full, real AI app project code across simple API data tools helps built model agents answer an interesting tool — an agent.

show 1 reply
nazgul17today at 8:57 PM

Worth pointing out that calculating p-values on a wide set of metrics and selecting for those under $threshold (called p-hacking) is not statistically sound - who cares, we are not an academic journal, but a pill of knowledge.

The idea is, since data has a ~1/20 chance of having a p < 0.05, you are bound to get false positives. In academia it's definitely not something you'd do, but I think here it's fine.

@OP have you considered calculating Cohen's effect size? p only tells us that, given the magnitude of the differences and the number of samples, we are "pretty sure" the difference is real. Cohen's `d` tells us how big the difference is on a "standard" scale.

wavemodetoday at 6:12 PM

It's funny - some months ago I noticed that I use the word "actually" lot, and started trying to curb it from my writing. Not for any AI-related reason, but because it is almost always a meaningless filler word, and I find that being concise helps get my points across more clearly.

e.g. "The body of the template is parsed, but not actually type-checked until the template is used." -> "but not typechecked until the template is used." The word "actually" here has a pleasant academic tone, but adds no meaning.

show 4 replies
RadiozRadioztoday at 7:16 PM

The result for "ai" is possibly skewed because it's a far more popular talking point in recent times versus HN's history as a whole.

show 1 reply
fix4funtoday at 8:51 PM

Thank you marginalia_nu for article and this comment (word stats).

I got similar feeling. I'm new here, but got a feeling that some comments are like bot generated.

Such low p-values are proof that something is going on.

Hipotesis (after your recent word statistics): that some bots are "bumping up" AI related subjects. Maybe some companies using LLM tools want to promote some their products ;)

marginalia_nu respect for your work :)

pvtmerttoday at 7:58 PM

Having mixed feelings on word "actually" as it is/was one of my favorites. Other stuff like "for instance" and "interestingly" are seem to be getting there too...

izuckentoday at 6:26 PM

You've built an interesting statistic from gathering data across the project. The real answer: ai models and agentic apps make building spam tools more simple than ever. All you actually need is just some trivial api automation code.

show 2 replies
daringrain32781today at 7:19 PM

I wonder what “moat” would be. I see this word way too much from LLMs.

hsbauauvhabzbtoday at 7:41 PM

Can you articulate on the column meanings more? Noob new means nothing to me.

show 2 replies
Imustaskforhelptoday at 7:22 PM

Such data analysis of HN related things are always so fun to read. Thanks for making this!

I have a quick question but can you please tell me by what's the age of "new" accounts in your analysis?

Because, I have been called AI sometimes and that's because of the "age" of my comments sometimes (and I reasonably crash afterwards) but for context, I joined in 2024.

It's 2026 now, Almost gonna be 2 years. So would my account be considered new within your data or not?

Another minor point but "actually"/"real" seems to me have risen in usage over 5 times. All of these words look like the words which would be used to defend AI, I am almost certain that I saw the sentence "Actually, AI hype is real and so on.." definitely once, maybe even more than once.

Now for the word real, I can't say this for certain and please take it with a grain of salt but we gen-z love saying this and I am certain that I have seen comments on reddit which just say "real" and OpenAI/other models definitely treat reddit-data as some sort of gold for what its worth so much so that they have special arrangements with reddit.

So to me, it seems that the data has been poised with "real". I haven't really observed this phenomenon but I will try to take a close look if chatgpt is more likely to say "real" or not.

Fwiw, I asked Chatgpt to "defend the position, AI hype sucks" and it responded with the word "real"/"reality" in total 3 times.

(another side fact but real is so used in Gen-z I personally watch channel shorts sometimes https://www.youtube.com/@litteralyme0/shorts which has thousands of videos atp whose title is only "real", this channel is sort of meme of "ryan gosling literally me" and has its own niche lore with metroman lol)

show 1 reply