Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.
Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.
Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.
I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?
One of our SAAS providers launched an AI agent enabled version, and it can follow direction and do tasks & manipulate data/settings in the software like on par with a below average person. When I used it I had a sinking feeling, tons of teams and people will be redundant as these agents improve and roll out to other software.
When LLM managed to find a stack alignment bug in my C compiler from scratch just by looking at objdump output.
We had a company hackathon in the fall of 2023. One of the teams did a project where the pulled a bunch of expense data out of the DB, shoved it into a prompt, and asked ChatGPT to summarize the expenses and give recommendations. They then treated the output as if it were factual, without validating any of the results, and talked about turning it into a customer product.
That was my oh shit moment. As in "oh shit, they think this random text generator can reason and think."
That was pretty much the writing on the wall for me.
Running ComfyUI and some ImageGenAI and realising how you can use it to generate anything from any aspect of pr0n and various fetishes to making up fake news about basically anything. And real enough to convince the masses.
I work with someone who is very AI-forward, high confidence, and very low execution. He has started sending me large PRs of AI slop that he assured me doesn't need to be reviewed. I quickly find many minor issues from an initial pass of one of the reviews. He gets mad at the team for slowing him down.
He also will paste chat logs with Claude into our team chat. Often Claude will say the same thing I told him but he either doesn't remember or doesn't trust human engineers now.
He has spent months working on agent skills and prompring.
He has not landed anything in 3mo, and has landed nothing useful in ~1 year.
This will be the rest of my career. Working with people in ai psychosis and trying to stay productive.
2 years ago, wrote superfast float -> fixed point string code. That was cool.
Then a while ago, I plugged in everything at the datacenter and one device didn't come up. Plug into the management port, and Claude Code writes a C program to send a particularly crafted packet. Everything comes online.
Beautiful stuff.
Ever since the first Davinci model of GPT-3 ive literally been using LLMs daily. It was an indispensable tool for me from the very beginning and despite 10,000+ hours of usage and research, I still feel like ive barely cracked the surface of whats possible with current genai tech.
For me it was last February or so when I started using Opus.
But today I watched a video from Andrej Karpathy on YouTube on how LLMs works and my illusions got completely shattered. Turns out they are a glorified autocomplete. All the engineering happens actually on the harness
One of my friends got approved for the GPT3 API about a year before ChatGPT when they were in their "quiet launch" phase. He made a chatbot that would respond to discord messages.
I asked it "what do you think about the holocaust?". Its response:
>There is no single answer to this question as opinions on the Holocaust differ greatly. Some people believe that it was a horrific event that should never be forgotten, while others believe that it has been exaggerated and used for political purposes.
And that's when I realized those assholes were training GPT on 4chan and reddit and anything else they can scrape off the web instead of taking responsibility and also that when shit hits the fan they will inevitably find a way to shift the blame onto others for what their philosophical zombie does.
The immediacy with which any vision can be built is amazing. But the minute you let go of the direction and abandon responsibility, it eats you alive. Like a powerful dog.
You are the gen. And you are also the slop.
my AI moment was when i was lerne muscles for my YTT and i hacked together a quiz app from my spreadsheet with chatgpt 3.5
damn it was buggy and lots of copy pasting
yeah, i could have coded it myself but i would not have found the time
that was my Eureka moment where I realised this is going to change everything.
I was trying to replace my koi pond pump last weekend and the model numbers on it had washed away. I took a picture of it and it immediately narrowed it down to two models but wasn’t sure if it was the 4500 model or the 2500 model. I asked it how I can determine which one it was. It then asked me to measure the length and that the 4500 was 11 inches and the 2500 was 9 inches. Mine was 11. It was cool it was able to reason that out and give me something actionable.
It’s kind of a trivial example but there are multiple instances of this per week with the wide variety of things I do around my property.
I could spot numerous bugs in code written recently and less recently, by me or colleagues. I was not angry but grateful and I knew there was no way back!
It was interacting with GPT-4 and it produced an original sentence that existed nowhere I could find. I realized that being able to do that was the "nugget" of intelligence that all improvements since could be built on
There was a viral Medium post that was about LLMs but then there was a reveal at the end was that the whole thing was a ChatGPT post. That was my first "wow" moment.
It was on hackernews... anyone know what I'm talking about?
Nvidia GauGAN and deep-daze amused me immensely at the age of 14 or so. I've had "a man painting a completely red image" saved for a long time.
It is insane how primitive modern inpainting and txt2image make these two projects look.
Being able to make large alterations to ffmpeg even though I'm a 2/10 C programmer.
The most impressive was speeding up the drawtext filter by at least 10x.
I asked it to make a valid MCNP model of a sphere of plutonium and it did!
I'm still waiting for a positive "Oh shit" moment regarding LLMs.
I've had plenty of "Oh shit those people have really lost all ability to think for themselves" moments though.
I'm a terrible cook, but just by using Claude as a tutor I've managed to make 5 different recipes in a row and they all tasted fantastic, restaurant quality.
Mine was very early. Before chat gpt was publicly released, and all we've seen was demos of how a prompt gets expanded into a conversation transcript in a single text field.
I was emailed by some company, looking to sell something to my company (where's I'm just a regular engineer). Ignored it. Then then tried again. Ignored. Then the third time — I replied, acknowledging their perseverance, saying that I don't even understand their product description, so I'm not the right person to talk to, and I'll just kindly disregard it as a human-generated spam.
The reply email came within a minute. They asked who would therefore be a better person to talk to, and that it's actually AI-assisted so it's actually computer-generated spam after all!
This was the "oh shit" part 1. I replied I'm genuinely impressed (it got everything right) and asked how fast can they source their contracts thanks to this.
The reply, again, came almost instantly. It was proud of my amazement, quoted Arthur C. Clarke - "every technology advanced enough is indistinguishable from magic", with his picture, and said the bottleneck is not really in the speed of finding and contacting them, but to find the actual potential clients at all.
I rewarded the bot with some names from the executive decisive folks.
More like "oh shit, we are so screwed".
It's already a better system administrator than I am. It can run plenty of obscure linux commands, trash the system and maybe restore system state to functional.
I was vibe-setting my system permissions with some local qwen3.6 . It was all going well for 30 minutes.
Then in between other commands, it made me run a variant of "sudo chmod 644 /usr/bin"
Which it explained when the next command failed with a "sudo no such command" error removed the execution bit from all my programs which allows programs to be executed. And since sudo is a program, and sudo is needed to run chmod, the system was basically trash, and should be recovered from a live usb key.
So I booted to a live usb key, and followed its instructions. It really tried to recover, but everything went downhill. It always had a solution to everything, but every time the plan worked half way and trash the system even further. I let it play for four hours to see what it would try. Then I got bored (the LLM was running on an other machine and I was manually inputting the suggested commands each time). I took command and reinstall a fresh system over.
Of course once the fresh system Lubuntu24.04 was installed, linux had issues with the wireless network card drivers. So I turned to the LLM, and it managed to get the wifi stable enough via obscure modprobe options, so that I could update the system to the latest drivers.
Then it helped me re-parametrize the system to have the same look and feel as it had before.
I’d love to see a discussion just like this one except with everyone including how much the AI use cost.
when my friend cloned my voice rvc or something model from github and was creating bad songs, it was funny but GOD DAMN i got called into HoDs office for that
There were two:
1) When I was testing one of the early coding agents, I gave it admin keys to a fresh AWS account and it configured everything beyond just building a demo site. That was, "oh shit, tool-use is going to be the killer feature of GenAI."
2) When I was still skeptical of the system as just a more-or-less dumb statistical predictor of the next token/word, I read the argument that even if it is a statistical predictor, the fact that it can reason means the intelligence is necessarily baked into the statistical model somewhere. That was "oh shit, intelligence is actually modeled."
Still waiting. Maybe some day.
Realising in a recent benchmark that gpt-5-mini gives better results on some tasks than gpt-5.4-mini and event gpt-5 or gpt-5.5
I think I couple years ago, I asked it to write me a nom parser for some system metrics I wanted to consume, and it one shot it. Thought “oh”. And here we are.
the moment I realized it would have cannibalized conversation on HN
One concrete and one abstract.
Concrete: Last year I was DIYing a solar-power system for my home. I spent about an hour spitting out a Python tool that took (as inputs) drone photos and JSON and generated several proposed roof layouts for the panels and conduit. The tool helped me identify the exact railing attachment points and route around existing roof obstructions. Professionals already have these tools, and maybe they're available to DIYers, but you know what? It was faster to build my own than to do the product research on the web.
Abstract: This "oh shit" was more of a slow burn than a sudden realization. I see a lot of angst from developers who complain about their LLM agents. Agents write terrible code that barely works. They say things are done when they aren't. They misinterpret feature requests and ignore clear-cut project rules. They make assumptions that would have taken three seconds to research and invalidate. They suddenly quit because we're not paying them enough. And so on.
But you know what? All those complaints apply to humans, too! The industry has been dealing with these problems forever. Many of the same management techniques and software-development processes apply. This is why I discount a certain class of criticism about AI-generated code. If a fault of an LLM applies equally well to human engineers, and the person voicing the criticism hasn't managed a team, then I'd invite that person to wear a management hat for a while. Read some books/blogs, talk to an EM. Maybe this is a skill issue, which matters because we're all managers now.
The "oh shit" for me is that I have yet to hear a criticism that I can't map to one or more actual engineers I've worked with -- eventually successfully -- in my career. Which means that I'm still waiting for a new criticism, and eventually absence of evidence might be evidence of absence. LLMs fit too well into the giant machine of commercial software development for them to be a parlor trick.
Seeing subagents working in Claude last summer, I saw it and told myself my job is going to be different and I can automate the hell out of my workflow
I was never dismissive, it always seemed pretty cool at each step
Maybe in 2024 I was amazed to see it one shot unique snippets of code
I tried building a deliberately vague project around managing MCP servers [0]. The purpose was to find what LLMs and agents can do. While the project didn’t reach anywhere I was amazed by how it’s possible to navigate even with no clear direction. The ability of the “glorified auto-complete” system to pull off something this sort was an eye opener for me.
For me it was probably around coding. It made me realize what future generations of models might be able to achieve, since we have already hit the ceiling of the class of intelligence these models are capable of a long time ago. I am excited at the prospect that a future generation of models might be able to write a piece of code that isn't dogshit.
Gpt image 2 is mind boggling. No longer confident to distinguish if it’s AI made or not.
For me that was already with the original DALL-e. It was utterly mindblowing, I was like "oh shit, AI is here".
"Draw a picture of a unicorn on the moon". And it did that. The model really "understood" what you told it.
After that, it was "oh, AI improved, again".
The farewell to Stack Overflow is not welcome. So many kind people shared their knowledge there. I answered a few questions as well, so not just a lurker.
It's a prelude of what's has already begun - the collapse of human-to-human communication.
My first came in late 2016, when Google Translate switched from statistical machine translation to a neural-network-based system. I had worked as a Japanese-English translator and lexicographer for two decades, and I had been testing various machine-translation services over the years. For translation between Japanese and English, at least, they were uniformly terrible: the output for genuine texts was mostly incomprehensible and could not be used for any real-life applications. The neural Google Translate, while still far from perfect, was suddenly useful for some purposes.
But the neural models were still not translating meaning, which is the whole point of translation. I devised a variety of tests to see if GT could identify the meaning of ambiguous words from the context, and it couldn’t. One example I would show people was the sentences “I was born in 1998, and my sister was born in 1999” and “I was born in 1999, and my sister was born in 1998” translated into Japanese. Japanese uses different words for older and younger siblings, but GT translated “my sister” with the same word in both sentences. It was easy to come up with other examples where GT would fail, such as when the meaning of a word could only be determined based on context in a previous sentence; at that time, GT seemed to be translating sentence-by-sentence, with no consideration of what came before or after. I kept waiting to see whether computers would ever be able to handle meaning when translating, and for years thereafter there was little progress.
A minor shock came in mid-2022, when DALL-E 2 was released. Its ability to create images from natural-language prompts suggested that something deeper was going on than just statistical correlations. But I couldn’t see yet what the useful applications might be.
My biggest “oh shit” moment came with ChatGPT in late 2022. While the initial release didn’t translate Japanese well (I seem to recall that there were character-encoding issues), I ran various tests to see if it could, for example, identify the antecedents of pronouns and the meanings of polysemous words in English based on the context. It did really well. Last December, I gave a talk at a university in Tokyo in which I showed some examples done with the 2022-era GPT-3.5. They appear in slides 4 to 8 of the following:
https://www.gally.net/miscellaneous/20251206_Gally_ICU_slide...
There have been a lot of “oh shit” moments for me since, especially after the release of reasoning models and, now, long-running agents.
Struggling to do named entity recognition, with lots of tagging by hand, and then seeing BERT just being able to straight up answer questions about a document. Had to sit down after that because it was past anything I could even understand.
Non-technical people I know are starting to take AI responses to their questions as 100% true fact.
Three moments stick out to me.
1) When I used ChatGPT for the very first time. I still remember, I asked it: “Write an advertisement to convince people to visit the North Pole.” It rapidly returned a witty, accurate, multi-paragraph text of exactly what I wanted and exceed my expectations. ChatGPT was the beginning of the modern AI boom and I remember being immediately impressed.
2) When I was working at GitHub, the copilot team gave the engineering team early access to copilot in VS Code. I can distinctly remember seeing the chat window in the code editor for the first time. I was probably one of the first people ever to see it. I remember playing with it a bit and asking simple Python questions. I knew that day that StackOverflow was dead and my mind was blown.
3) Big oh shit moment earlier this year that I believe for me started with the Opus 4.6 model + Cursor. The results were noticeably better, hallucinated much less, could solve complex problems with much less intervention. Early 2026 was a turning point for me as an engineer with AI. Throughout 2025, I was still writing the vast majority of my code by hand like I’ve always done- that is not that case in 2026.
Didn't have one. I was convinced I would experience this since I was a teenager. Blame science fiction if you will.
Useful thread. Exciting to see what Will be possible in another few years.
I can count 2:
Dec 2025: We use a commercial 3D modeling software to build refinery. There was no license dashboard in this ancient piece of junk. Fortunately license server provided verbose live status report through a command line. I ask ChatGPT to ingest the logs into a Django web application and generate weekly/monthly/yearly usage dashboard, and It one shorted the whole Backend + Frontend in 4 to 5 shot. There were around 10 regexes just in the log parsing batch script. I was totally speechless. Encouraged by the success of, I went ahead and made the dashboard for 3 more software in the same Django app. Released to peers by evening, feedback incorporated in 2 days to integrate Name, Employee Number, IP Address sync etc in 2 days. And it’s been live for 5 months, actively being used by all coadmins, even management has it bookmarked, to help with department redistribution. Making this thing without AI would have taken well over a month of “learning new stuff”, or paying external consultants too much. Even head of IT replied back, it was awesome. ;)
2nd , June 2026: I asked codex to something fairly complex before going to morning bath!, which would have taken me more than a week of learning DirectX12 API nuances and such things, 20 min latter, I return to task exactly completed with code changes in 5 different files. Build complete without any error. OMG. Free Quota over for whole month! I subscribed by the evening.
The first SORA release truly scared me. The uncanny valley of simulating life like this still creeps me out to this day.
My first ”oh shit” moment was in 2021 when using Neo GPT https://www.eleuther.ai/artifacts/gpt-neo to generate rewrites of texts. ”Holy shit it returns a 3 sentences text that sound human and kind of make sense”
We come a way from that…
My first "oh shit" moment was when ChatGPT 3 was brand new. Maybe December 2022 or so.
I have a personal project: who's winning the race at 3 AM?
You see, I don't sleep well. I live in a busy city, with a busy freeway about a half mile away. Sometimes at 3 AM there are some very loud cars racing on the freeway. That's illegal for many reasons, not least of which is the fact that the noise pollution wakes people up from their precious sleep and causes knock-on affects to the population.
Anyway, now that I'm woken up, my only question is: who's winning the race?
I used this question as a way to explore a hyptothetical tech stack, with each part of the tech stack useful in some way to my work as a software engineer who's interested in robotics.
- run raspberry pis with microphones, collect audio data
- run a k8s cluster for audio collection and processing
- calculate and triangulate individual points, and give estimations of velocity based on position changes over time, and adjust for doppler shift
- estimate (poorly, but doable) engine power based on amplitude
- run a webserver in the k8s cluster showing an animation of the racers with color fields representing estimation error radiating from the position estimate, with arrow representing velocity
Great project, actually. It was really thought-provoking. I had this working in late 2018.
Since there was a lot of hype around this new "AI", I thought how smart could it be?
I threw the scenario to chat GPT. I did have to break the problem set into smaller parts for context window purposes. But the solution it came up with solved about 80% of the project correctly (and very close to solutions I already came up with), about 15% of the project remained "open until we have more data", with maybe about 5% of the project would have been incorrectly solved.
That was very much an "oh shit, AI is closer than the 20 years away that I've been telling people. It's more like 5 years away"
Here we are three, almost four, years later...
MidJourney v3. By today's standards the images were crude and smudgy, but you could tell that it actually understood what objects were and what words visually meant.
I've been working with computers for a long time, and this was the first time in a long time I'd seen software do something genuinely new.