logoalt Hacker News

Pushing and Pulling: Three reactivity algorithms

102 pointsby frogulisyesterday at 12:57 AM16 commentsview on HN

Comments

crabmuskettoday at 12:36 AM

Something I rarely see addressed in articles about reactivity systems is how the desired syntax/developer experience affects the algorithms.

For example, I think all the algorithms discussed in this article require knowing the graph topology up front. Even dynamic dependencies need to be known ahead of time, unless I'm misreading.

However, take Vue's reactivity for example. Here's a simple input and output:

    const input = ref(1);
    const output = computed(() => ref.value + 1);
Without actually attempting to evaluate the closure that defines `output`, there's no way of knowing what it depends on.*

My working understanding of such systems, which I think are more or less similar to all the new "signals" libraries popping up in JS, is that they are... Pull-push systems? First, the UI code (i.e. the HTML or JSX template functions) request the value of an output. The reactivity system evaluates the graph (pull), recording dependencies as it goes.

Then, later, when an input changes, it can use the graph it built up to update (push) dirty states and work out which of the currently-live output nodes need to be re-evaluated.

*The only other approach would be to analyse the syntax of the source code itself to create that static dependency graph. Which I understand is what e.g. Svelte does.

show 2 replies
MrJohzyesterday at 8:31 PM

Oh, that's me! Feel free to ask me any questions.

There's some great discussion over on lobste.rs (https://lobste.rs/s/2zk3oe/pushing_pulling_three_reactivity), but I particularly recommend this link that someone posted there to a post that covers much of the same topic but in a lot more detail with reference to various existing libraries and tools and how they do things: https://lord.io/spreadsheets/

show 3 replies
samsartoryesterday at 11:31 PM

I've been working on a reactivity system for rust over the past couple of years, which uses a lot of these ideas! It also tries to make random concurrent modification less of a pain, with transactional memory and CRDT stuff. And gives you free undo/redo.

Still kind of WIP, but it isn't secret. People are welcome to check it out at https://gitlab.com/samsartor/hornpipe

ByteMe95today at 2:07 AM

I recommend checking out https://github.com/Point72/csp

RossBencinatoday at 12:02 AM

When I first started working with dataflow computation I was fortunate to have a computer scientist point me in the direction of an introductory compiler textbook.

It's worth considering that the dataflow graph (as an abstract mathematical graph), the computation graph (the partial order of function execution required to compute the data), the traversal strategy, the runtime representation of the graph, the runtime data structure for the graph, and the runtime data structures for efficient reactive update are all separate but related aspects.

For instance, push and pull are both directed graphs. They have the same connectivity, but the direction of the arrows is reversed. You can only efficiently traverse edges in the direction that you represent. A dataflow graph has edges pointing from sources to sinks, a data dependency graph has edges pointing from sinks to sources. [Side note: if a computation can produce multiple results the data dependency graph and the computation dependency graph are not exactly the same thing and you need to be clear on the distinction, but I am assuming here single-output nodes]. In a dataflow graph you want to evaluate the changed nodes prior to evaluating the downstream nodes that depend on them. As TFA states, this necessitates a postorder (children first) traversal of the data dependency graph, starting at all terminal sinks, and terminating at sources or already visited nodes. You can use a sense-reversing "visited" flag on each node to avoid a reset pass. As noted in the article this traversal need only be performed when the graph topology changes. But for stable traversal order the topological sort can be cached in an array. Needless to say that arrays are much faster to iterate over than any kind of pointer chasing. [Witness the rise of Entity-Component systems over OO models]. I suspect that there is a cut-over point where it is more efficient to iterate the entire array (perhaps with memoized results, or JIT compilation) than to perform a more surgical "update only what is downstream of the changes" approach. Another approach is to assign all nodes a contiguous integer id, and maintain a dirty node bitmask where bit-indices correspond to node ids. In addition, each source has a bitmask that is 1 for all downstream dependent nodes. When a source changes, bitwise-or source.downstream_dependents bitmask with the global dirty_nodes bitmask. To evaluate (not necessarily immediately), iterate in topological order processing only the dirty nodes. In any case, the point I'm trying to make is that the data structure that is best for building or manipulating the graph could very well be different from the data structure that is best for computing the desired results. There will be trade-offs to be made. For this reason alone it's best to keep the graph-theoretic properties and the implementation data structures separate in your head.

In my view the interesting requirements raised by the article are (1) lazy evaluation (e.g. of expensive or conditionally required data). This might be where control flow graphs of basic blocks enter the story. and, (2) dynamic reconfiguration during node evaluation. Some questions I'd be asking about dynamic reconfiguration are: what happens if you delete a node that has yet to be evaluated? will new subgraphs "patched in" to the existing graph (how exactly?), or are they always disconnected components that can be evaluated after the current graph traversal completes?