logoalt Hacker News

ogigtoday at 12:17 PM3 repliesview on HN

Setting up fuzzing used to be hard. I haven't tried yet, but my bet is having Claude Code, today, analyze a codebase and suggest where and how to fuzztest it and having it review the crashes and iterate, will produce CVEs.


Replies

rsynctoday at 7:07 PM

"... having Claude Code, today, analyze a codebase and suggest where and how to fuzztest it ..."

I recently directed chatgpt, through the web interface, to create a firefox extension to obfuscate certain HTTP queries and was denied/rebuffed because:

"... (the) system is designed to draw a line between privacy protection and active evasion of safeguards."

Why would this same system empower fuzzing of a binary (or other resource) and why would it allow me to work toward generating an exploit ?

Do the users just keep rephrasing the directive until the model acquiesces ? Or does the API not have the same training wheels as the web interface ?

show 2 replies
xfalcoxtoday at 8:27 PM

Our CEO did that at our company and found 33 CVEs. Rails also did that and found 7 or 8.

show 1 reply
zer00eyztoday at 4:56 PM

It has access to more testing data than I will ever look at. Letting it pull from that knowledge graph is going to give you good results! I just built a chunk of this (type of thinking) into my (now evolving) test harness.

1. Unit testing is (somewhat) dead, long live simulation. Testing the parts only gets you so far. These tests are far more durable, independent artifacts (read, if you went from JS to rust, how much of your testing would carry over)

2. Testing has to be "stand alone". I want to be able to run it from the command line, I want the output to be wrapper so I can shove the output on a web page, or dump it into an API (for AI)

3. Messages (for failures) matter. These are not just simple what's broken, but must contain enough info for context.

4. Your "failed" tests should include logs. Do you have enough breadcrumbs for production? If not, this is a problem that will bite you later.

5. Any case should be an accumulation of state and behavior - this really matters in simulation.

If you have done all the above right and your tool can return all the data, dumping the output into the cheapest model you can find and having it "Write a prompt with recommendations on a fix (not actual code, just what should be done beyond "fix this") has been illuminating.

Ultimately I realized that how I thought about testing was wrong. Its output should be either dead simple, or have enough information that someone with zero knowledge could ramp up into a fix on their first day in the code base. My testing was never this good because the "cost of doing it that way" was always too high... this is no longer the case.