Ask HN: What was your "oh shit" moment with GenAI?

483 points • by andrehacker • last Thursday at 11:42 PM • 865 comments • view on HN

Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.

Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.

I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?

Comments

abecedarius • yesterday at 11:57 PM

AlphaGo. Reinforcement learning on math with proof assistants was clearly going to be workable after that, even if not right away.

matheusmoreira • yesterday at 9:11 PM

Pretty much immediately after I asked the LLM to perform a complete code review of my projects. I've been programming alone for years, that alone was life changing for me. It only got more impressive from there.

➕ show 1 reply

cjbprime • today at 4:12 AM

ChatGPT reconstructing idiomatic Python source code from Python bytecode was definitely up there. That is not something humans have written a great deal about online. It requires simulating the Python VM.

I remember also having a massive wtf reaction to realizing that original ChatGPT was pretty good at decoding long random/unique base64 strings.

madrox • yesterday at 9:41 PM

I think my favorite early story was when OpenAI launched deep research. I was going to an event that I was headlining, and I gave it a CSV of the attendees and asked it to give me a small background on each company they represented.

When people introduced themselves to me, I knew a little about their startup. Felt magical.

➕ show 1 reply

t_sea • today at 5:22 PM

Was the early ChatGPT. Someone on the team showed off a poem about postgres in the style of the King James Bible. Totally blew my mind.

lodovic • today at 5:30 AM

The first time I pasted a screenshot of a PR review thread, adding just "I had some review comments, fix them" - and it perfectly solved everything, made small commits, and pushed it upstream - this was such a shock.

I now try to keep pushing the boundaries and see where it stops understanding my intention. Give it impossible tasks, gigantic projects, complex architectures. Last result: I wrote a complete OS including MPI, TCP/IP, and a GUI from scratch in only a week, while investing just a few hours a day in it. It even runs Doom!. Coding as a profession is over, but there's such a difference between the result if you approach this with a professional mindset, that I think the software engineering discipline can still provide massive value.

brailsafe • yesterday at 9:58 PM

Not sure that I've had it yet, although hypothetically I'm sure it would probably be something similar to the examples of writing new software for old hardware mentioned ITT. The idea of resurrecting useful but unsupported gadgets that would otherwise become e-waste is something I've always found compelling.

Problem is, I just don't have enough old crap, and if I did, I would have a hard time justifying the expense, because that money could maybe just go toward a more intimate tinkering process.

For everything else, I either haven't had any sufficiently interesting ideas, or they ended up not being worth pursuing with those tools or at all.

When I do have success that I'm happy with and care about, it's a slow process that I ultimately need to know the details of anyway, but otherwise it's a bunch of luckily narrow work-related scenarios with well-documented constraints. Nothing's really been that shocking though.

The shocking thing to me is how unrewarding most of the successful tasks have been, partly because they often create unnecessary work and partly because the type of thinking required to massage or evaluate the result is much less stimulating, and there's much more of it in aggregate. It's fine if it's something like generating a UI from scratch because that hasn't produced dopamine in a long long time anyway

fabianholzer • today at 10:22 AM

I did not yet have a positive "oh shit" moment, but when the corporate manager types that could not deliver a "Hello world" if their live would depend on and would have had a sour look on their face when asked to pay license fees for a proper IDE a 10 to 15 years ago started pushing it hard, way before any but the resume-driven engineers: that has flipped a bit in me.

neom • today at 2:33 PM

When I tried, just for fun, to put together an MVP of a fully autonomous business, I wanted to see how far it would go, when I got it generally working to around a 30% level I stopped because it was enough to see people would make a concerted effort to build this for real. HN was not impressed, heh: https://news.ycombinator.com/item?id=44143928

ramon156 • today at 8:13 AM

I've let it do some commands against a local NUC before, just to see if it knew why something didn't work (it would've taken me ~15-20 mins probably. Not too bad). It took ~18 seconds to think, then ran two commands, and noted what the issue was. Even a 10 yr old could understand what the problem was.

I realized that LLMs were pretty good at calling the right tool, and running the right verbose command to figure out what and how.

Kind of like finding a specific SO post that had your exact problem, and the solved comment is heavily upvoted

fergonco • today at 1:58 PM

When I tried pi.dev (I only used chatgpt before) and told it "add all this scripts I developed over the last couple of years to automate my job as skills".

I love to automate things in bash scripts and these llms just can use them very effectively. It was also surprising how they derive knowledge from those scripts. If you get A from a B uuid, they kind of get the relationship. I am super vague in my request and this thing knows what I am referring to. After some months it's still mind-blowing.

➕ show 1 reply

dirkc • yesterday at 9:35 PM

I started to look at LLMs not as writing code, but rather as predicting what code it would expect someone to write given the context.

For some people that matches their expectation or they don't really have an expectation. While for other people it doesn't match their expectation.

threecheese • today at 1:25 PM

I had an issue with installing OpenClaw, and it helped me debug the failure and get itself working. I had to sit quietly for a moment. No reading docs or inspecting the system, just “what’s wrong here?”.

While I didnt find a use for openclaw, it opened my eyes to the potential for distributing software which, once bootstrapped a bit, can interrogate … itself, understand its own requirements, communicate with the device, and become operable.

Add capable small models to the mix, and it’s almost frightening what good (or malicious) software might be able to do.

thallavajhula • today at 12:15 AM

I wasn't impressed by the LLMs up until January or so when Claude Code swooped in. Until then, I felt like the LLMs were slowing me down. I have been using them for a couple of years now for coding at work, but I never really thought they brought in real value. Then in February I worked on a 1-month-ish project timeline and shrunk it to 3 days and that was it. I didn't write a single line of code in that project and I went all in with Claude Code. That was it, _the moment_ of realization. I was thoroughly impressed. I went from nothing to a tool that served several teams. Now I'm starting to see the cracks in LLMs and I'm slowly getting back to picking which task to offload to AI and which ones to do by myself.

Claude is great at coding. That's it. Outside of it, it's just god awful at pretty much everything else. ChatGPT OTOH, is good at coding, but at everything else, I find it brilliant. Gemini never made me want to stick with it. It's good, but never great for my use cases.

lordnacho • yesterday at 10:56 PM

For me it was gradual, then sudden.

I liked using the early models to do autocompletion. It could do a leetcode style thing, pretty nice, but only useful for small things.

Then I sought out Cursor because that seemed to be able to do multi-document edits. Not bad, but models at the time (2024) still got stuck pretty often. So, cross-document autocomplete. Useful, but definitely within the realm of "nice shortcuts to have".

Then a friend (who works in AI) told me to try Claude last year. I was on holiday at the time, but I spun up my work repo and looked at the backlog.

It chewed through the entire 6-9 months of estimated work in a two-week period while I was watching that Lord of the Rings series with a friend (we watched an episode or two in the evenings). I just chatted with him about the series while checking the progress every few minutes. It was a huge amount of refactoring, and it didn't get everything right the first time, but it made enough progress that it could be directed the right way.

Since then I have hardly coded any manual lines. I just tell Claude what to do, with very little harness (skills, MCPs, instruction files), and I get what I want.

bsiverly • yesterday at 11:10 PM

I had it fill out all the forms to appeal my property tax value. We created an assessment of what my San Francisco property should be worth using deep research. The city agreed and a $12k check arrived shortly after.

autonomousErwin • today at 9:56 AM

I had 2 MacBook Pros. One 2024 and one 2019. The 2024 one would connect fine to the internet, the 2019 one would not.

After pasting in the airportd logs of both (into ChatGPT and Gemini) it found it was down to band switching (2.4GHz and 5GHz) through some really old error code.

This fixed a problem that had plagued me for >12 months. Really magical feeling it got in on first try.

zthrowaway • today at 2:26 PM

“Farewell to stack overflow” juxtaposed with the realization that AI only knows what to troubleshoot and how because of stack overflow…

meken • today at 2:44 PM

Early on in my ChatGPT usage, one of my messages got interrupted/cut off (as happens occasionally).

My first thought was "oh they're going to need to add a UI feature to allow me to click and tell them to continue the conversation".

Then I realized I can just ask the model to continue, obviating the need for a button.

That was a pretty mind blowing moment.

bag_boy • yesterday at 7:43 PM

I had ChatGPT write up a Zillow description for my house in the style of Carrie Bradshaw from “Sex and the City” to impress my wife.

It was unlike anything I had ever experienced.

My wife was unimpressed lol.

This was 2022.

ioman • today at 2:55 PM

Mine was using VScode with copilot. Previously I had used tab completion and thought it was pretty neat. This time I began with the comment for a function I wanted to write. And the entire function just appeared below the comment. Written probably better than I would have. I remember saying, “uh-oh” out loud.

lukan • today at 11:15 AM

2 years ago I played a bit with the abandoned source of

https://www.wickeditor.com

a flash like editor for the web, that I found promising.

But doing it manual, was too much work, outdated and broken build pipeline, stuck on an older node version, deprecated and abandoned dependencies .. so I stopped the experiment.

Then I gave it a try with claude beginning of this year. I remember not expecting anything, but did a bit of steering the direction as I knew the source a bit and let it mostly work on its own - and then it said it is done and it works.

I didn't believe it, but it did. "Can you add this feature?" Yes it could.

Since that experience, I have a hard time taking people serious, who say AI is useless.

jFriedensreich • today at 11:33 AM

I had a pretty involved cross module state bug with complex dependencies and also reactivity issues interleaved. I tried fixing it multiple times manually with 4h time box as well as claude models up to opus 4.6 high and codex 5.3 all which failed. When the GPT-Pro model came out i heard it was not supposed to be an everyday coding model but tried anyways as it looked impressive. It took a single 8h run burning 200$ with doing nothing but occasionally waiting for test runs or me writing “continue”. After 8 hours, and fearing i wasted the money, the bug was consistently fixed, not just one edge case that triggered the behavior.

➕ show 1 reply

ako • today at 7:14 AM

Probably over a year ago, when I first saw reasoning in action in a debugging session: it generated some code, ran it, could not explain the results, then said “let me add some print statements to debug”, reran the application, read the logs, and then stated “now I understand why it’s not working”. Plan, do, check, act in action, AI engineering its own context, and generating the missing information.

jerieljan • today at 11:50 AM

I remember in the early days when I was just trying out ChatGPT on a phone for the first time (this was around GPT-3.5? GPT-4o?) and snapping a picture of our fridge that's full of magnet souvenirs and asked it to identify all the places we've been in and it gave a nice list of what it saw and the places that were featured.

Did it get it fully right? No. But it was one of those "oh wow, you could do that?" moments for me. There's obviously a lot more "oh shit" moments as time went on, but it was a neat little moment.

tobyhinloopen • today at 5:32 AM

A non-technical employee of a client vibe-coded an app and I was asked to review and deploy it.

It was okay, not bad at all. No serious issues.

At the same time, me feeding a whole PDF of feedback from a client - screenshots and such - into Claude, and it fixed everything after 7 hours of reproducing and fixing things mostly unattended, creating a bunch of MRs with fixes. Most fixes were good, some were obviously not what the client wanted but technically correct (which I told Claude and it fixed it)

➕ show 1 reply

rref • yesterday at 9:39 PM

My ducted gas heater wasn't working where I live and I took a photo of the wiring diagram and had Claude step me through troubleshooting it with a multi-meter, and got it fixed.

hatthew • yesterday at 11:50 PM

I'm kinda of surprised that so many here on HN were dismissive/unaware of the capabilities and potential in the DALL-E days and earlier. I feel like this is the sort of forum where most people would be both aware of advancements and aware of their potential.

My moment was GANs and GPT-2 back in 2019. I feel like that's where computer-generated media went from "obviously fake" to "sometimes can be mistaken as real." RLHF for LLMs and diffusion for image generation are both important improvements, but I feel like they aren't fundamental prerequisites for they type of stuff we have today. I think the main advancements since then are just marginal improvements, larger models/datasets, and better surrounding tooling.

csr86 • yesterday at 9:29 PM

I was working on a project for 2 years with about 5 engineers. It was many years before AI. It was new subject for our team, and we were pretty sure it was possible. Turned out it was not.

Much later I asked AI if that kind of project is possible, and it immediately explained why it is not. Would have saved 2 years of our time...

➕ show 2 replies

base698 • today at 1:42 PM

I asked the OpenAI playground to compare and contrast the themes of Point Break and Fight Club. It did a bang up job and blew my mind. I then realized it basically worked for any of the scripts I had for my dev environment too. Fixing and expanding capabilities I'd wanted to had but never had the time to implement.

dtgriscom • yesterday at 9:12 PM

A friend had the power supply die on his high-end turntable. He took a picture of each side of the supply's PCB, handed it to Claude, and it gave him back a schematic.

➕ show 1 reply

variodot • today at 9:31 AM

For me, it was during an on-going incident in a failing IoT OTA service which was growing in priority; taking two items I was unfamiliar with and bolting together new OTA mechanism via alternative SMS provider. I'd never developed in .NET ecosystem before and happened to gain access to another team's Twilio account in a prior week, so took a shot, planned interfaces to extract and implemented alternative Twilio implementation + feature flag

Normal software instincts plus access to a different service flushed the buildup of OTA's and lives on as a fallback mechanism. Amazed me going from idea to execution faster than I could have ever dreamed of even on-boarding myself to the area or environment.

jmpman • today at 3:34 AM

Had an AI plot movie rotten tomato reviews versus cost for 2 adult tickets, plus candy and a large popcorn prices from the specific theater, and the round trip gas from my cross street, including only movies which would get out in time that I can be home by 10pm, including preview times.

None of that is mind blowing, but that Google or some other site has never offered me this type of analytics, is where I'm floored. It's a trivial query, but perfectly useful for planning a night out with my wife.

imetatroll • today at 11:58 AM

Maybe my daily work is rather mundane compared to most people who frequent HN but I am able to create, think about, refine and then go through review cycles at least 2 or 3 times more quickly than I used to.

And software that I can imagine I might want to "make" or have at my fingertips is readily available even though I have a busy schedule with very little free time!

Also, I love feeling like a manager whose direct report actually does what I tell it to. Crazy good feeling.

Sobrino • yesterday at 10:10 PM

I worked in an AI (or well ML) consultancy before the ChatGPT moment. I remember we had a project where we had to extract a large sum of documents (country wide, terrabytes of pdfs of scans). We had to set up a pipeline that looked a bit like this.

Download pdf of scan -> Tessaract to get a text layer -> Clean it up with a language specific BERT model -> detect paragraphs of a certain type -> Look them up against a database we build with scored similar paragraps -> Do recommendations.

The documents were not standard and a lot of them were historical documents and handwritten or with scratched out text with corrections.

We had student workers spending days labeling the data.

It took us months to get it all working with a high accuracy. We were so proud.

Now you can do it all with a prompt and a ChatGPT call.

➕ show 2 replies

sothatsit • yesterday at 11:59 PM

I gave GPT-4 some source code and my existing tests, and asked it to write a new test, and it did it! It didn’t even run straight away, I had to fix it, but it still blew my mind.

Later, I wrote a ~5k line proxy for work in C, and gave the whole thing to ChatGPT o1 and asked it to review it. It found several real memory bugs, and now that service has been running since with no problems.

Just this week, I was trying to write a greedy solver to pick the best subset of block sizes to keep from a larger sweep for shorter testing. Opus 4.8 suggested that this could actually be solved as a MILP problem, and found the perfect solution in 5 mins. I’d never even heard of MILP before.

block_dagger • yesterday at 8:38 PM

I wanted to add gapless playback to an audio archive website I maintain. I tried myself before any of the popular LLMs were available. I failed. I then tried with the first LLMs that came out. They failed. Then, when the first Claude Opus was released, it succeeded. I now have gapless playback.

qnleigh • today at 10:26 AM

They've been coming faster and faster for me. First I was blown away by GPT2, specifically the fake news article about talking unicorns. Just stringing together a few sentences while maintaining logical coherence was very impressive at the time.

Then it was models like Minerva that could actually solve math problems, and the discovery that LLMs were one-shot learners and could write code.

After that, the improvement felt pretty steady, with IMO gold feeling like a watershed moment.

And recently OpenAI's solution to the planar unit distance problem is starting to actually freak me out a bit.

bluejay2387 • yesterday at 7:53 PM

I had a locally hosted model write its own semantic search system that indexed 250,000 documentation and code files and then write a fully functioning mod for one of the games I play based on that documentation that I couldn't get to work after 2 weeks of my own effort, all in under 4 hours (and that included a 25 minute long indexing process). This freaked me out enough that I then had it write a CLI based activity and TODO tracker and then integrate that tool into its coding process to track all of its activities in about another 2 hours. I am still emotionally recovering from this day. I have since replaced the semantic search system with an open source option (though I used it for a few months) but I still use the activity tracker for both coding projects and myself.

➕ show 1 reply

oidar • yesterday at 7:58 PM

Opus 4.6. My standard battery of questions included solving an ascii maze (20x20 grid) without using a script, using only "thinking" as a tool. It was the first model to be able to solve it. It was the first model that really appeared to be able to reason spatially.

➕ show 1 reply

grumblepeet • today at 6:58 AM

My bath hot tap suddenly broke apart and was spilling hot water into the bath. I photographed everything and ChatGPT told me step by step what bits to get to fix it, and how to reassemble it.

A few weeks later some kids in the area were bending the wiper arms in cars in my terraced street, including my car. I thought, I wonder if ChatGPT can help? It explained to me where to get the parts online, an indication of a decent price, and how to fit the replacement parts.

In work we had struggled with filling out the myriad of forms that we need to do to get enrolled on a government framework to apply for contracts. Not only did it do that and explained what we needed to say, but it also told us in detail the steps we needed to follow to get the certification that was a prerequisite. It has genuinely transformed our business as a result.

dash2 • today at 7:22 AM

I asked it to prove the theoretical result in a (published, prize-winning - though not really for the theory) academic paper of mine. The proofs hadn’t been that hard objectively, but they’d taken at least a week. I fed it the model. It got the correct basic results in about 5 minutes.

➕ show 1 reply

KaiserPro • yesterday at 8:24 PM

I've had a few.

The biggest technical one was when we were making an all day wearable AI assistant thing. It basically had really precise office location (think cm level accurate) a shitty VLM to describe what the wide angle lens was looking at, Speech to text, OCR and a gaze recorder that decribed what you were looking at.

This was all streamed to sqlite. The thing that was really "oh shit" what the thing that made the whole system usable: a 4 paragraph prompt that turned natural language into SQL and reported back to the (non technical user) what they wanted to know.

The most recent one is being caught out by Genai video of a gymnast. I worked in VFX so I am normally able to spot dodgy shit, but this one was close to being real, scarily real.

steren • yesterday at 7:47 PM

The moment when I ran llama on my old gaming PC (using something called ChatGPT4All) was my "oh shit" moment: I was now talking... to my PC.

synthc • yesterday at 10:17 PM

I gave it a weird and convoluted code snippet, and asked an LLM to step through the execution and trace the value of the variables at each step.

It was completely correct and I realized LLM are capable of generalizing beyond their training sets

HlessClaudesman • today at 6:35 AM

I was sitting on a cafe listening to a podcast where I heard about a sci-fi author banging out 40+ books per year. How are they doing that?, I thought. Either a team of ghost writers, a boat load of cocaine, or they are using AI.

So I decided to test the frontier of AI, this was back in the early chat GPT era. I downloaded the app and proceeded to go through aln the steps of writing a novel, outline, summary of characters, plot summary, draft chapters, finalised chapters. I had an unedited manuscript by the time I was thinking about my 2nd coffee. It was a terrible novel, but it did have flashes of brilliance that could be harvested and iteratively shaped into something better.

I proved my thesis that AI could mass produce fiction at scale, and If I had a boat load of cocaine the AI and I could probably output 40 books per week.

ben_w • today at 6:28 AM

I had a lot of such moments, including:

• Most recent, I had the option of either buying an app from the app store to train myself on the piano, or vibe coding a web app to connect with an attached MIDI keyboard and accept an uploaded MIDI file and give me an experience like Guitar Hero, and Claude did this in two prompts of their free (not paid subscription) tier, where the second prompt was just the word "continue".

• First demo of InstructGPT (predecessor to ChatGPT), because I remember how much worse the state of the art in NLP had been, and because I hadn't expected instruction following from the quality of continuation seen in GPT-3.x

• 2019, "This Person Does Not Exist"

• 2016, seeing style transfer and similar working (https://github.com/awentzonline/image-analogies) and what would now be called Deep Fakes (back when Two Minute Papers videos were <2 minutes long: https://www.youtube.com/watch?v=_S1lyQbbJM4)

• 2015, when I (in retrospect, foolishly) believed Tesla about their over-the-air software update that introduced self-driving: https://www.popsci.com/tesla-cars-become-autonomous-overnigh...

• 2013, word2vec, "man" - "woman" ~= "king" - "queen", again because of knowing how bad the state of the art in NLP has been

(If you're wondering why "uh oh" from that, consider value in automating propaganda, and surveillance opportunities for automating comprehension of slang/cants like Polari).

• 2010, seeing the demo video of Word Lens: https://www.youtube.com/watch?v=h2OfQdYrHRs

moconnor • yesterday at 7:46 PM

Literally the very first time I used ChatGPT. I had already been experimenting with GPT3 for various jokes and games via the API but the naturalness of it as a chat interface that understood you changed everything.

The first time I used a terminal agent was another one.

jasondigitized • yesterday at 9:20 PM

First time using Claude Code I was rather impressed by how quickly I was able to build out a website with Vue and Supabase. Cool. So.......I always wanted to create a iOS app but knew nothing about Objective C or Swift or XCode. "I wonder if Claude Code can build a iOS app for me?".

I went from 0-to-1 and shipped a podcast player into the AppStore in 2 weeks. Not a simulated app on XCode.....literally a fully approved app on the AppStore. Claude Code walked me through installing XCode all the way through to running a final audit on the app so I wouldn't get flagged during review. Mind blown.

alt Hacker News

Ask HN: What was your "oh shit" moment with GenAI?

Comments

🔗 View 50 more comments