logoalt Hacker News

Monty: A minimal, secure Python interpreter written in Rust for use by AI

296 pointsby dmpetrovyesterday at 9:16 PM155 commentsview on HN

Comments

simonwyesterday at 11:13 PM

I got a WebAssembly build of this working and fired up a web playground for trying it out: https://simonw.github.io/research/monty-wasm-pyodide/demo.ht...

It doesn't have class support yet!

But it doesn't matter, because LLMs that try to use a class will get an error message and rewrite their code to not use classes instead.

Notes on how I got the WASM build working here: https://simonwillison.net/2026/Feb/6/pydantic-monty/

show 8 replies
avaeryesterday at 11:36 PM

This feels like the time I was a Mercurial user before I moved to Git.

Everyone was using git for reasons to me that seemed bandwagon-y, when Mercurial just had such a better UX and mental model to me.

Now, everyone is writing agent `exec`s in Python, when I think TypeScript/JS is far better suited for the job (it was always fast + secure, not to mention more reliable and information dense b/c of typing).

But I think I'm gonna lose this one too.

show 12 replies
imfingtoday at 1:00 AM

This is a really interesting take on the sandboxing problem. This reminds me of an experiment I worked on a while back (https://github.com/imfing/jsrun), which embedded V8 into Python to allow running JavaScript with tightly controlled access to the host environment. Similar in goal to run untrusted code in Python.

I’m especially curious about where the Pydantic team wants to take Monty. The minimal-interpreter approach feels like a good starting point for AI workloads, but the long tail of Python semantics is brutal. There is a trade-off between keeping the surface area small (for security and predictability) and providing sufficient language capabilities to handle non-trivial snippets that LLMs generate to do complex tasks

show 2 replies
hypertextherotoday at 8:07 PM

Potentially unrelated tangent thought:

The Man Who Listens to Horses (1997) is an excellent book by Monty Roberts about learning the language of horses and observing and listening to animals: https://www.biblio.com/search.php?stage=1&title=The+Man+Who+...

Video demonstration of the above: https://www.youtube.com/watch?v=vYtTz9GtAT4

zahlmanyesterday at 9:27 AM

> Instead, it let's you run safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds.

Perhaps if the interpreter is in turn embedded in the executable and runs in-process, but even a do-nothing `uv` invocation takes ~10ms on my system.

I like the idea of a minimal implementation like this, though. I hadn't even considered it from an AI sandboxing perspective; I just liked the idea of a stdlib-less alternative upon which better-thought-out "core" libraries could be stacked, with less disk footprint.

Have to say I didn't expect it to come out of Pydantic.

show 2 replies
kodablahyesterday at 2:43 PM

I'm of the mind that it will be better to construct more strict/structured languages for AI use than to reuse existing ones.

My reasoning is 1) AIs can comprehend specs easily, especially if simple, 2) it is only valuable to "meet developers where they are" if really needing the developers' history/experience which I'd argue LLMs don't need as much (or only need because lang is so flexible/loose), and 3) human languages were developed to provide extreme human subjectivity which is way too much wiggle-room/flexibility (and is why people have to keep writing projects like these to reduce it).

We should be writing languages that are super-strict by default (e.g. down to the literal ordering/alphabetizing of constructs, exact spacing expectations) and only having opt-in loose modes for humans and tooling to format. I admit I am toying w/ such a lang myself, but in general we can ask more of AI code generations than we can of ourselves.

show 1 reply
the_harpia_iotoday at 1:18 PM

the papercut argument jstanley made is valid but there's a flip side - when you're running AI-generated code at scale, every capability you give it is also a capability that malicious prompts can exploit. the real question isn't whether restrictions slow down the model (they do), it's whether the alternative - full CPython with file I/O, network access, subprocess - is something you can safely give to code written by a language model that someone else is prompting.

that said, the class restriction feels weird. classes aren't the security boundary. file access, network, imports - that's where the risk is. restricting classes just forces the model to write uglier code for no security gain. would be curious if the restrictions map to an actual threat model or if it's more of a "start minimal and add features" approach.

show 1 reply
matheus-rrtoday at 2:42 PM

Interesting trade-off: build a minimal interpreter that's "good enough" for AI-generated code rather than trying to match CPython feature-for-feature.

The security angle is probably the most compelling part. Running arbitrary AI-generated Python in a full CPython runtime is asking for trouble — the attack surface is enormous. Stripping it down to a minimal subset at least constrains what the generated code can do.

The bet here seems to be that AI-generated code can be nudged to use a restricted subset through error feedback loops, which honestly seems reasonable for most tool-use scenarios. You don't need metaclasses and dynamic imports to parse JSON or make API calls.

c2xlZXB5today at 12:43 AM

Maybe a dumb question, but couldn't you use seccomp to limit/deny the amount of syscalls the Python interpreter has access to? For example, if you don't want it messing with your host filesystem, you could just deny it from using any filesystem related system calls? What is the benefit of using a completely separate interpreter?

show 2 replies
iandanforthtoday at 2:33 PM

Totally reasonable project for many reasons but fast tools for AI always makes me chuckle. Imagine your job is delivering packages and along the delivery route one of your coworkers is a literal glacier. It doesn't really matter how fast you walk, run, bike, or drive. If part of your delivery chain tops out at 30 meters per day you're going to have a slow delivery service. The ratio between the speed of code execution and AI "thinking" is worse than this analogy.

wiradikusumatoday at 5:40 PM

"To run code written by agents" vs "What Monty cannot do: Use the standard library, ..., Use third party libraries."

But most real world code needs to use (standard/3rd party) library, no? Or is this for AI's own feedback loop?

JoshPurtelltoday at 1:01 AM

Monty is the missing link that's made me ship my rust-based RLM implementation - and I'm certain it'll come in handy in plenty of other contexts.

Just beware of panics!

show 1 reply
kricktoday at 12:43 AM

I don't quite understand the purpose. Yes, it's clearly stated, but, what do you mean "a reasonable subset of Python code" while "cannot use the standard library"? 99.9% of Python I write for anything ever uses standard library and then some (requests?). What do you expect your LLM-agent to write without that? A pseudo-code sorting algorithm sketch? Why would you even want to run that?

show 2 replies
theanonymousonetoday at 8:42 AM

I wish someone commanded their agent to write a Python "compiler" targeting WASM. I'm quite surprised there is still no such thing at this day and age...

show 1 reply
vghaisastoday at 9:33 AM

This is very cool, but I'm having some trouble understanding the use cases.

Is this mostly just for codemode where the MCP calls instead go through a Monty function call? Is it to do some quick maths or pre/post-processing to answer queries? Or maybe to implement CaMeL?

It feels like the power of terminal agents is partly because they can access the network/filesystem, and so sandboxed containers are a natural extension?

SafeDusktoday at 1:46 AM

Sandboxing is going to be of growing interests as more agents go “code mode”.

Will explore this for https://toolkami.com/, which allows plug and play advanced “code mode” for AI agents.

_joelyesterday at 11:25 PM

Well I love the name, so definitely trying this out later, but first...

And now for something, completely different.

geysersamtoday at 1:41 AM

Is ai running regular python really a problem? I see that in principle there is an issue. But in practice I don't know anyone who's had security issues from this. Have you?

show 1 reply
andaitoday at 1:20 PM

Doesn't the agent already have bash though?

My current security model is to give it a separate Linux user.

So it can blow itself up and... I think that's about it?

ontouchstarttoday at 12:34 PM

I wonder when the title will be upgraded to “A minimal, secure Rust interpreter written in Python for use by AI”.

Any human or AI want to take the challenge?

show 1 reply
bigcat12345678today at 4:24 AM

It seems that AI finally give the space to true pure-blood system software systems to unleash their potential.

Pretty much all morn software tooling, removing the parts that aim at appeal to humans, becomes much more reliable tools. But it's not clear if the performance will be better or not.

throwa356262today at 9:29 AM

I really like this!

Claude Code always resorts to running small python scripts to test ideas when it gets stuck.

Something like this would mean I dont need to approve every single experiment it performs.

show 1 reply
dmpetrovyesterday at 9:29 PM

I like the idea a lot but it's still unclear from the docs what the hard security boundary is once you start calling LLMs - can it avoid "breaking out" into the host env in practice?

Retr0idtoday at 12:52 AM

I'm enjoying watching the battle for where to draw the sandbox boundaries (and I don't have any answers, either!)

show 1 reply
tucnaktoday at 6:30 PM

I really like this for CodeAct, but like with other similar tools it's unclear how to implement data pipelining to leverage, like, lockstep batching to remote providers, or paged attention-like optimisations. Basically, let's say I want to run agent for every row in the table, I would probably want to batch most calls...

It's something, I think, missing from smolagents ecosystem anyway!

wewewedxfgdftoday at 3:21 AM

If I say my code is secure does hat make it secure?

Or is all Rust code secure unquestionably?

show 1 reply
nudpiedotoday at 9:55 AM

Serious question: why won’t JUST use SELinux on generated scripts?

It will have access to the original runtimes and ecosystems and it can’t be tampered, it’s well tested, no amount of forks and tricky indirections to bypass syscalls.

Such runtimes come with a bill of technical debt, no support, specific documentation and lack of support for ecosystem and features. And let’s hope in two years isn’t abandoned.

Same could be applied for docker or nix Linux, or isolated containers, etc… the level of security should be good enough for LLMs, not even secure against human (specialist hackers) directed threads

falcor84yesterday at 11:57 PM

Wow, a start latency of 0.06ms

saberiencetoday at 11:20 AM

I actually have no idea why this is needed. I want my models to have access to full libraries/sdks/apis and this is when they become actually useful.

I also want my models to be able to write typescript, python, c# etc, or any language and run it.

Having the model have access to a completely minimal version of python just seems like a waste of time.

globular-toasttoday at 7:29 AM

I don't get what "the complexity of a sandbox" is. You don't have to use Docker. I've been running agents in bubblewrap sandboxes since they first came out.[0]

If the agent can only use the Python interpreter you choose then you could just sandbox regular Python, assuming you trust the agent. But I don't trust any of them because they've probably been vibe coded, so I'll continue to just sandbox the agent using bubblewrap.

[0] https://blog.gpkb.org/posts/ai-agent-sandbox/

OutOfHereyesterday at 11:38 PM

It is absurd for any user to use a half baked Python interpreter, also one that will always majorly lag behind CPython in its support. I advise sandboxing CPython instead using OS features.

show 3 replies
bunthatoday at 10:15 AM

[dead]

spacedatumtoday at 2:58 AM

There is no reason to continue writing Python in 2026. Tell Claude to write Rust apriori. Your future self will thank you.

show 1 reply
rienbdjyesterday at 11:36 PM

If we’re going to have LLMs write the code, why not something more performant? Like pages and pages of Java maybe?

show 1 reply