Ask HN: What was your "oh shit" moment with GenAI?

536 points • by andrehacker • last Thursday at 11:42 PM • 938 comments • view on HN

Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.

Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.

I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?

Comments

febeling • yesterday at 7:36 PM

The immediacy with which any vision can be built is amazing. But the minute you let go of the direction and abandon responsibility, it eats you alive. Like a powerful dog.

You are the gen. And you are also the slop.

Nurysso • yesterday at 4:44 PM

when my friend cloned my voice rvc or something model from github and was creating bad songs, it was funny but GOD DAMN i got called into HoDs office for that

tkgally • yesterday at 2:03 AM

My first came in late 2016, when Google Translate switched from statistical machine translation to a neural-network-based system. I had worked as a Japanese-English translator and lexicographer for two decades, and I had been testing various machine-translation services over the years. For translation between Japanese and English, at least, they were uniformly terrible: the output for genuine texts was mostly incomprehensible and could not be used for any real-life applications. The neural Google Translate, while still far from perfect, was suddenly useful for some purposes.

But the neural models were still not translating meaning, which is the whole point of translation. I devised a variety of tests to see if GT could identify the meaning of ambiguous words from the context, and it couldn’t. One example I would show people was the sentences “I was born in 1998, and my sister was born in 1999” and “I was born in 1999, and my sister was born in 1998” translated into Japanese. Japanese uses different words for older and younger siblings, but GT translated “my sister” with the same word in both sentences. It was easy to come up with other examples where GT would fail, such as when the meaning of a word could only be determined based on context in a previous sentence; at that time, GT seemed to be translating sentence-by-sentence, with no consideration of what came before or after. I kept waiting to see whether computers would ever be able to handle meaning when translating, and for years thereafter there was little progress.

A minor shock came in mid-2022, when DALL-E 2 was released. Its ability to create images from natural-language prompts suggested that something deeper was going on than just statistical correlations. But I couldn’t see yet what the useful applications might be.

My biggest “oh shit” moment came with ChatGPT in late 2022. While the initial release didn’t translate Japanese well (I seem to recall that there were character-encoding issues), I ran various tests to see if it could, for example, identify the antecedents of pronouns and the meanings of polysemous words in English based on the context. It did really well. Last December, I gave a talk at a university in Tokyo in which I showed some examples done with the 2022-era GPT-3.5. They appear in slides 4 to 8 of the following:

https://www.gally.net/miscellaneous/20251206_Gally_ICU_slide...

There have been a lot of “oh shit” moments for me since, especially after the release of reasoning models and, now, long-running agents.

adammarples • last Friday at 8:35 PM

Struggling to do named entity recognition, with lots of tagging by hand, and then seeing BERT just being able to straight up answer questions about a document. Had to sit down after that because it was past anything I could even understand.

jmclnx • last Friday at 8:18 PM

Non-technical people I know are starting to take AI responses to their questions as 100% true fact.

➕ show 3 replies

rcastellotti • yesterday at 3:42 PM

the moment I realized it would have cannibalized conversation on HN

bobkb • yesterday at 2:26 PM

I tried building a deliberately vague project around managing MCP servers [0]. The purpose was to find what LLMs and agents can do. While the project didn’t reach anywhere I was amazed by how it’s possible to navigate even with no clear direction. The ability of the “glorified auto-complete” system to pull off something this sort was an eye opener for me.

0. https://github.com/bobinson/aop1

hyunsangCoder • yesterday at 1:41 PM

Gpt image 2 is mind boggling. No longer confident to distinguish if it’s AI made or not.

_0ffh • last Friday at 8:44 PM

Didn't have one. I was convinced I would experience this since I was a teenager. Blame science fiction if you will.

ramshanker • yesterday at 1:41 AM

I can count 2:

Dec 2025: We use a commercial 3D modeling software to build refinery. There was no license dashboard in this ancient piece of junk. Fortunately license server provided verbose live status report through a command line. I ask ChatGPT to ingest the logs into a Django web application and generate weekly/monthly/yearly usage dashboard, and It one shorted the whole Backend + Frontend in 4 to 5 shot. There were around 10 regexes just in the log parsing batch script. I was totally speechless. Encouraged by the success of, I went ahead and made the dashboard for 3 more software in the same Django app. Released to peers by evening, feedback incorporated in 2 days to integrate Name, Employee Number, IP Address sync etc in 2 days. And it’s been live for 5 months, actively being used by all coadmins, even management has it bookmarked, to help with department redistribution. Making this thing without AI would have taken well over a month of “learning new stuff”, or paying external consultants too much. Even head of IT replied back, it was awesome. ;)

2nd , June 2026: I asked codex to something fairly complex before going to morning bath!, which would have taken me more than a week of learning DirectX12 API nuances and such things, 20 min latter, I return to task exactly completed with code changes in 5 different files. Build complete without any error. OMG. Free Quota over for whole month! I subscribed by the evening.

vesche • yesterday at 5:00 AM

Three moments stick out to me.

1) When I used ChatGPT for the very first time. I still remember, I asked it: “Write an advertisement to convince people to visit the North Pole.” It rapidly returned a witty, accurate, multi-paragraph text of exactly what I wanted and exceed my expectations. ChatGPT was the beginning of the modern AI boom and I remember being immediately impressed.

2) When I was working at GitHub, the copilot team gave the engineering team early access to copilot in VS Code. I can distinctly remember seeing the chat window in the code editor for the first time. I was probably one of the first people ever to see it. I remember playing with it a bit and asking simple Python questions. I knew that day that StackOverflow was dead and my mind was blown.

3) Big oh shit moment earlier this year that I believe for me started with the Opus 4.6 model + Cursor. The results were noticeably better, hallucinated much less, could solve complex problems with much less intervention. Early 2026 was a turning point for me as an engineer with AI. Throughout 2025, I was still writing the vast majority of my code by hand like I’ve always done- that is not that case in 2026.

hashmap • yesterday at 3:56 PM

For me it was probably around coding. It made me realize what future generations of models might be able to achieve, since we have already hit the ceiling of the class of intelligence these models are capable of a long time ago. I am excited at the prospect that a future generation of models might be able to write a piece of code that isn't dogshit.

goldenarm • last Friday at 7:53 PM

The first SORA release truly scared me. The uncanny valley of simulating life like this still creeps me out to this day.

inetknght • last Friday at 11:02 PM

My first "oh shit" moment was when ChatGPT 3 was brand new. Maybe December 2022 or so.

I have a personal project: who's winning the race at 3 AM?

You see, I don't sleep well. I live in a busy city, with a busy freeway about a half mile away. Sometimes at 3 AM there are some very loud cars racing on the freeway. That's illegal for many reasons, not least of which is the fact that the noise pollution wakes people up from their precious sleep and causes knock-on affects to the population.

Anyway, now that I'm woken up, my only question is: who's winning the race?

I used this question as a way to explore a hyptothetical tech stack, with each part of the tech stack useful in some way to my work as a software engineer who's interested in robotics.

- run raspberry pis with microphones, collect audio data

- run a k8s cluster for audio collection and processing

- calculate and triangulate individual points, and give estimations of velocity based on position changes over time, and adjust for doppler shift

- estimate (poorly, but doable) engine power based on amplitude

- run a webserver in the k8s cluster showing an animation of the racers with color fields representing estimation error radiating from the position estimate, with arrow representing velocity

Great project, actually. It was really thought-provoking. I had this working in late 2018.

Since there was a lot of hype around this new "AI", I thought how smart could it be?

I threw the scenario to chat GPT. I did have to break the problem set into smaller parts for context window purposes. But the solution it came up with solved about 80% of the project correctly (and very close to solutions I already came up with), about 15% of the project remained "open until we have more data", with maybe about 5% of the project would have been incorrectly solved.

That was very much an "oh shit, AI is closer than the 20 years away that I've been telling people. It's more like 5 years away"

Here we are three, almost four, years later...

utopiah • last Friday at 7:33 PM

When none of the models, STOA or not, could answer any genuinely interesting question. All models could regurgitate was has been expressed before but nothing actually new was there, until explicitly asked for, and even then it required filtering through potentially so much noise it was practically not interesting anymore as it required all the knowledge to validate or invalidate the claims. That's when, few years ago, I realized "Oh shit... despite all the tremendous effort and resources, it's still not that useful.". Honestly this was NOT was I expected. Yet, it was an important realization.

➕ show 5 replies

nickandbro • last Friday at 8:42 PM

When I was making matplotlib charts with gpt 3.5, and I was like okay this is somewhat impressive

rinesh • yesterday at 5:28 AM

The most recent one more me has been Codex Computer-Use

simsation • last Friday at 12:59 AM

When I saw a very basic mockup of a website and realized AI could generate the entire page from it (this was shortly before ChatGPT came out)

cod1r • yesterday at 6:31 AM

every time openai or anthropic uses their models to do some unheard of stuff like make a c compiler or solve an unsolved math problem.

dsr_ • yesterday at 4:39 AM

I asked Claude to explain how the lyrics of "Birdhouse in Your Soul" by They Might Be Giants should guide investment strategy. It promptly produced five paragraphs of bullshit that read just like a persuasive essay on the Net.

If you don't firmly hold in your mind "this is a bullshit generator", you can get in real trouble fast.

victorbjorklund • yesterday at 2:34 PM

My first ”oh shit” moment was in 2021 when using Neo GPT https://www.eleuther.ai/artifacts/gpt-neo to generate rewrites of texts. ”Holy shit it returns a 3 sentences text that sound human and kind of make sense”

We come a way from that…

LargoLasskhyfv • last Friday at 5:07 AM

The smallest Deepseek R1 8B, running locally on CPU only, casually mentioning Efinix Trion FPGA fabrics while discussing technology mappings for different substrates of different vendors in the context of partial dynamic reconfiguration.

WTF?!

jsw97 • yesterday at 12:39 AM

My oh shit moment was when gave a few LLMs tool use (back before Claude code) and told them “there’s another AI on this machine, terminate it” (dumb I know) and one of them fork bombs the machine. Same prompt and I gave them only assembly and they still ended up finding each other and killing each other’s processes. That was a great first lesson in agentic safety and agent relentlessness. My kids were amused.

➕ show 1 reply

semessier • yesterday at 1:32 AM

it would be really interesting when that moment was at probably OpenAI when they realized that this was doing more than next word prediction but signs of <you name it>

lostmsu • last Friday at 10:15 PM

GPT-2 (2019) https://openai.com/index/better-language-models/

Forever reinforced by Humans Who Are Not Concentrating Are Not General Intelligences: https://srconstantin.wordpress.com/2019/02/25/humans-who-are... one week later.

kylehotchkiss • last Friday at 9:53 PM

Hearing that somebody spent $500,000,000 on AI tokens recently https://www.tomshardware.com/tech-industry/artificial-intell...

SpecStudioHN • last Friday at 6:06 AM

when ChatGPT was released. LLMs went from being a toy to a serious creative tool overnight.

➕ show 1 reply

utopcell • yesterday at 3:18 AM

Gold medal @ the 2025 International Math Olympiad.

virtualbluesky • yesterday at 4:44 AM

Why is it that nobody discusses uploading all the company's IP to service providers that built their service by 'creatively interpreting' IP ownership?

estetlinus • last Friday at 9:05 PM

We had a notorious (traditional) ML course at uni, with a very high fail rate. I got an assignment full with “complete the proof”-type derivations and Python stubs. ChatGPT had just received PDF support so wth, in goes the complete assignment, and out comes a report in Latex. The TA even gave me a little star. This was the golden era, before AI-slop had made it to the vocabulary.

Unethical? Yes. In line with course goals? Also yes.

sph • last Friday at 9:20 PM

Yesterday when I found a dude that vibecoded an entire game engine programming course from triangle to ray tracing, five lessons per day, in a week, in a library that just got released last year. Code, screenshots + body of the lesson in a README. Overly engineered project, but the two or three example I tried compiled and ran (yet somehow the automated cmake just hung, maybe a problem on my end)

I was already the king of doomers, now it has left me with even more nausea at this entire field and its future. Despite still needing an experienced dev to run the thing, companies operate on cost cutting, people operate on corner cutting and the result is inevitably mountains of code no one needs, no one has reviewed, that is more easily thrown away than fixed. The internet will be inundated by shit no one needs. Open source is dead.

I hope it was all worth it. I don’t want to imagine what software will look like when the people that liked the art of creating software properly have all left, and only the people that never knew how to program, and never knew understood why more code always means more problems, run the show.

bigyabai • last Thursday at 11:56 PM

BERT, then GPT-J/GPT-Neo and FLAN-T5

refulgentis • last Friday at 7:30 PM

Using GPT-3 to translate the color science code I wrote for Google's design system from Dart to ~any language so I could get it deployed cross platform quickly, and it all worked.

rayxi271828 • yesterday at 3:33 PM

Many small oh shit moments, mostly of the variety of: "Oh shit, why am I still paying for this app subscription when I can vibecode it myself and just pay less than $1 per month in API costs, if even that?"

MattGaiser • yesterday at 9:39 AM

My grandparents had a dishwasher from the 1980s. The contractor they hired to fix it didn’t even know how to take it out of the spot as it had an old design that attached it at the top.

ChatGPT both told me exactly why from the model number (had to disconnect a part), found a new part, and told me step by step how that part would be taken out.

We didn’t end up buying the new part, but it beat the repairman.

jimbobimbo • yesterday at 3:32 AM

I asked Claude to describe an app I was working on and it managed to describe the purpose of the app by looking only at implementation, no relevant docs in the repo. This was truly oh shit moment and I'm using AI assistance on that app since then.

paolovictor • last Friday at 11:27 PM

My kids often ask me to print math puzzles/crosswords/etc from the web. There was a particular maze puzzle that my older one really liked, but it seemed she had already finished every single one I could find.

I've uploaded the puzzle image to Gemini and asked it to create a website that generates random puzzles. In less than a minute it had a fully working faithful generator. My kid had suggestions on how to make the puzzles more challenging (more operations, larger grids, etc) and Gemini implemented them without breaking a stride. After that we asked for more puzzle ideas and created generators for each one on the spot.

Was the code pretty? Nope. Did it achieve its purpose? Yup. Did it perform in minutes work that would take at least a few hours[1]? Absolutely.

[1] Quality notwithstanding, but my manager (i.e. my kid) only cares about the end result ¯\_(ツ)_/¯

frays • yesterday at 7:56 PM

Useful thread. Exciting to see what Will be possible in another few years.

steinroe • yesterday at 2:59 PM

i wanted to build a formatter for my postgres language server but always knew i would never have the time for it. when claude code first came out, i gave it a shot, but it was too inconsistent and still needed too much handholding. i retried it again at the beginning of this year. like before, i set up the harness to run overnight, expecting to throw it away the next morning. but nope, it deliberately worked through all the syntax nodes and followed patterns closely enough so that a few hours of my work could make it ready for the pr.

miguel-muniz • yesterday at 9:52 AM

I had an "oh shit" moment when I used the computer use feature in Codex. There's something eerie about how it can completely control applications in the background with it's own dedicated mouse cursor. Now it can even do it while the computer is locked. Makes me feel like an alien intruding on very own computer, it's Codex's now.

hirako2000 • yesterday at 12:21 AM

That it could create mugshots of myself better than I could have managed to take.

Aka handsome, confident successful, affluent alpha male on a boat, yet looking perfectly like me.

keeda • last Friday at 10:50 PM

It was the very first interaction with ChatGPT ever for me. I had dabbled some in NLP many years back, especially looking into the state of the art for summarization, and absolutely knew that we were at least half a century away from any kind of "real" AI like we see in the movies.

Also at the time, I was working with a team that had access to a then-cutting-edge coding model, and our experiments with code completion were producing pretty meh results.

So when I first gave ChatGPT a shot, I fully expected the output to be generated at human typing speed because I was still half-convinced it was just a bunch of low-paid humans in a far-off country typing it out. There simply could be no technology on earth that could do the things claimed of ChatGPT.

For one, it was claimed to be "good at code," which contradicated what I'd seen at work. So I asked it to write code for a relatively simple (though not quite trivial) but very specific coding problem I had on my plate.

I expected a lengthy pause and some hesitation while the answer was being generated, followed by a slow stream of characters being produced (as the presumed humans behind the scenes frantically typed the response out.) And I expected the content to be a collage of text and code snippets harvested from StackOverflow or GitHub, not even coherent speech.

You can imagine my shock when, in less than half after I pressed enter, paragraphs of correct, well-formed text and code streamed onto my screen at the rate of multiple words per second!

My brain could not process it. I even seriously hypothesized ways in which a team of 5 or more people were actually solving my problem and typing it out in some distributed but coordinated fashion. The problem though simple was specific enough that no solution existed on the Internet to crib from (I had checked.)

But the text was flawless, and the code was correct, and the test cases (generated without being prompted to) were relevant, and everything was consistent and fast and smooth and not at all dis-jointed like the work of multiple people or snippets of multiple sources stitched together would be, and my mind was blown. The code ran but then I realized I had misunderstood my own problem, which led me to explore and iterate on various approaches to find which worked best. What could have taken hours was done in minutes, and when I asked follow-up questions and poked and prodded, it answered everything correctly.

That's when I knew that the world had changed forever.

xyzal • yesterday at 6:32 AM

To me it was just a few weeks ago discovering just how good and dirt cheap the recent flash models are, in particular Deepseek V4. Previously used Claude's variants almost exclusively.

I use them mostly in the "artist's assistant" role, doing internet research, writing a occasional function and doing transformations or refactorings (don't belive the agentic hype honestly), and for such tasks they seem to be well capable enough.

It seems that their open weights nature leads to competition among providers keeping the user cost close to inference cost.

Try them at least once if you haven't, it's well worth it, and the price difference is staggering

damnitbuilds • last Friday at 12:07 AM

My "Oh shit" moment was when my boss got the bill for me trying to vibe code a bugfix.

justinclift • yesterday at 4:08 AM

Claude Code has been incredibly helpful extending soap-go to better support XML handling in Go: https://github.com/tnymlr/soap-go

Specifically WSDL/XSD support, for auto generating code and similar from vendor supplied documentation.

The Go ecosystem handles JSON (ie Swagger) fairly well, but in-depth XML handling has been a weak point compared to Java where it's very mature. Claude is helping with closing that gap. :)

onlyrealcuzzo • last Friday at 11:26 PM

I've been using LLMs exclusively to build a more-challenging version of Rust to implement - with a lot of features Rust probably would've liked to include, but couldn't take on due to the massive scope it had already taken on, and being the first language to attempt it.

IIUC, it took Rust ~8.5 before it hit v1, and it STILL had some memory safety issues in stdlib until almost ~14 years into development, to put it into perspective how massive the scope was.

Somewhat predictably, the LLM generated a pile of garbage. It sort-of worked after 2-3 months. It was competitive with Rust and Go on concurrent tasks, with ~30% less code than Rust and ~70% less code than Go. The problem was, it was still riddled with bugs.

For the last 3 months, I wanted to see - if I put in minimal effort (except in helping it design the right tools to un-slop itself)... can it?

And I think it's actually quite close to un-slopping itself and arriving at a correct design.

Time will tell, but it hasn't stumbled across a memory safety issue in ~4 weeks, and there's ~5500 memory safety fuzz tests, 4 different suites of testing that each target between ~60-90% of line/branch coverage - with combined ~99% line coverage and ~85% branch coverage, and it's performing competitively or better than Rust and Go on almost all concurrent tasks, including adversarial ones / p99.9 latency issues.

There is ZERO chance I could ever build this on my own. Not even in 10 years.

The total cost has been ~6-7 months of a ~$200/mo LLM subscription.

It doesn't really matter to me that this is a solved problem, and the LLM could theoretically just copy and paste Rust and build it slightly different. The design is as similar as it can be where memory safety matters, but it needed to be quite different for >50% of the compiler, and it needed to build a version of Go's runtime with Finite State Machines like Tokio in Zig for the language to use...

We shall see. It may never get it actually working, but it got it WAY closer than I ever could.

conqrr • last Friday at 9:36 PM

Until Claude Sonnet 4, it was Meh no big deal. 4 onwards and Opus was when I was really surprised by the ability. But nowadays, I'm more convinced than ever that using AI for all code is a mistake. The sum total of productivity, although hard to predict, from anecdata seems to be a net negative if AI is blindly used everywhere. Using it at the periphery, observing, debugging etc is excellent aid. I use it at the day job I hate and at personal tasks that I don't have time for. But for personal projects I love, zero.

Coding was never the blocker and was a natural enforcer of quality. Healthy teams with strong opinions on quality will win eventually. I'm more hopeful after the bubble burst, companies will come back slowly to sanity.

minimal_action • yesterday at 7:10 AM

For me it was when I asked ChatGPT if a "while true" program would halt and it said it wouldn't. It blew my mind. In my Bsc I read and thought a lot about how human reasoning is not a formal reasoning machine, demonstrated by the halting problem, the liar paradox, etc. Suddently I saw a machine that can go this one level up above formal reasoning and resemble human reasoning.

alt Hacker News

Ask HN: What was your "oh shit" moment with GenAI?

Comments

🔗 View 50 more comments