Hi HN! We're excited to share marimo pair [1] [2], a toolkit that drops AI agents into a running marimo notebook [3] session. This lets agents use marimo as working memory and a reactive Python runtime, while also making it easy for humans and agents to collaborate on computational research and data work.
GitHub repo: https://github.com/marimo-team/marimo-pair
Demo: https://www.youtube.com/watch?v=6uaqtchDnoc
marimo pair is implemented as an agent skill. Connect your agent of choice to a running notebook with:
/marimo-pair pair with me on my_notebook.py
The agent can do anything a human can do with marimo and more. For example, it can obtain feedback by running code in an ephemeral scratchpad (inspect variables, run code against the program state, read outputs). If it wants to persist state, the agent can add cells, delete them, and install packages (marimo records these actions in the associated notebook, which is just a Python file). The agent can even manipulate marimo's user interface — for fun, try asking your agent to greet you from within a pair session.
The agent effects all actions by running Python code in the marimo kernel. Under the hood, the marimo pair skill explains how to discover and create marimo sessions, and how to control them using a semi-private interface we call code mode.
Code mode lets models treat marimo as a REPL that extends their context windows, similar to recursive language models (RLMs). But unlike traditional REPLs, the marimo "REPL" incrementally builds a reproducible Python program, because marimo notebooks are dataflow graphs with well-defined execution semantics. As it uses code mode, the agent is kept on track by marimo's guardrails, which include the elimination of hidden state: run a cell and dependent cells are run automatically, delete a cell and its variables are scrubbed from memory.
By giving models full control over a stateful reactive programming environment, rather than a collection of ephemeral scripts, marimo pair makes agents active participants in research and data work. In our early experimentation [4], we've found that marimo pair accelerates data exploration, makes it easy to steer agents while testing research hypotheses, and can serve as a backend for RLMs, yielding a notebook as an executable trace of how the model answered a query. We even use marimo pair to find and fix bugs in itself and marimo [5]. In these examples the notebook is not only a computational substrate but also a canvas for collaboration between humans and agents, and an executable, literate artifact comprised of prose, code, and visuals.
marimo pair is early and experimental. We would love your thoughts.
[1] https://github.com/marimo-team/marimo-pair
[2] https://marimo.io/blog/marimo-pair
[3] https://github.com/marimo-team/marimo
[4] https://www.youtube.com/watch?v=VKvjPJeNRPk
[5] https://github.com/manzt/dotfiles/blob/main/.claude/skills/m...
I do programming as a side project — Marimo has been a huge unlock for me. Part of it has been just watching the videos that are both updates about the software and also little examples of how to think about data science. Marimo also helps curate useful python stuff to try.
Starting to use AI in Marimo, I was able to both ‘learn polars’ for speed, or create a custom AnyWidget so I could make a UI I could imagine that wouldn’t work with standard UI features.
Giving a LLM more context will be fab for me. Now if I could just teach Claude that this really is the ‘graph’ and it can’t ever re-assign a variable. It’s a gotcha of Marimo vs python. Worth it as a hassle for the interactivity. But makes me feel a bit like I’m writing C and the compiler is telling I need a semicolon at the end of the line. I’ve made that error so many times…..
Super loved the idea about maintaining consistency! Artifacts will make it possible to not lose the thread and reproduce results when working in a team. Love it. If a cell happens to take a long time to compute (large dataset) — how does the agent behave? Does it wait or keep going?
Looks nice! Built a ipython persistent kernel that your agent can operate through cli commands which somewhat goes in a similar direction, but then not with all the Marimo niceties: https://github.com/oegedijk/agentnb
One of the authors here, happy to answer questions.
Building pair has been a different kind of engineering for me. Code mode is not a versioned API. Its consumer is a model, not a program. The contract is between a runtime and something that reads docs and reasons about what it finds.
We've changed the surface several times without migrating the skill. The model picks up new instructions and discovers its capabilities within a session, and figures out the rest.
Very cool!
We’ve been exploring a similar direction too, but with a plain REPL and a much thinner tool surface. In our case, it’s basically one tool for sending input, with interrupts and restarts handled through that same path. Marimo seems to expose much richer notebook structure and notebook-manipulation semantics, which is a pretty different point in the design space.
It seems like the tradeoff is between keeping the interaction model simple and the context small, versus introducing notebook structure earlier so the model works toward an artifact at the same time it iterates and explores. Curious how you think about that balance.
Thank you for this!
I am a big fan of Marimo and was trying to use it as my agent’s “REPL” a while back, because it’s naturally so good at describing its own current state and structure. It made me think that it would make a better state-preserving environment for the agent to work. I’m very excited to play with this.
Looks cool. I love notebooks.
I built something similar with just plain cli agent harnesses for Jupyter a while back.
It supports codex subscriptions and pi, (used to support Claude subs, might still be okay since I didn’t modify the system prompt).
Has some bugs and needs some work but getting help and code changes inline in Jupyter is way better than copy pasta hard to select text from cells and cell output all day.
The idea of an agent having actual working memory inside a live notebook session rather than just firing off ephemeral scripts is genuinely clever — this feels like a much more natural way for humans and models to collaborate.
built https://github.com/danieltanfh95/replsh to pair with local python sessions without additional dependencies, allowing LLMs to directly ground their investigation and coding against local repos and environments. Now supporting docker as well, ssh support will come in the near future.
Genuinely cool. As a cool side-effect you could use notebooks to store your prompts and never lose a prompt again.
[dead]
[dead]
[flagged]
This rules. Just closed on a bunch of data science I was doing on the Medicaid dataset thanks to this. Very timely, zero bugs.
Well done Trevor and team!