logoalt Hacker News

conartist6yesterday at 3:14 PM12 repliesview on HN

As it happens i have an even better API than this article proposes!

They propose just using an async iterator of UInt8Array. I almost like this idea, but it's not quite all the way there.

They propose this:

  type Stream<T> = {
    next(): Promise<{ done, value: UInt8Array<T> }>
  }
I propose this, which I call a stream iterator!

  type Stream<T> = {
    next(): { done, value: T } | Promise<{ done, value: T }>
  }
Obviously I'm gonna be biased, but I'm pretty sure my version is also objectively superior:

- I can easily make mine from theirs

- In theirs the conceptual "stream" is defined by an iterator of iterators, meaning you need a for loop of for loops to step through it. In mine it's just one iterator and it can be consumed with one for loop.

- I'm not limited to having only streams of integers, they are

- My way, if I define a sync transform over a sync input, the whole iteration can be sync making it possible to get and use the result in sync functions. This is huge as otherwise you have to write all the code twice: once with sync iterator and for loops and once with async iterators and for await loops.

- The problem with thrashing Promises when splitting input up into words goes away. With async iterators, creating two words means creating two promises. With stream iterators if you have the data available there's no need for promises at all, you just yield it.

- Stream iterators can help you manage concurrency, which is a huge thing that async iterators cannot do. Async iterators can't do this because if they see a promise they will always wait for it. That's the same as saying "if there is any concurrency, it will always be eliminated."


Replies

Joker_vDyesterday at 3:48 PM

> Obviously I'm gonna be biased, but I'm pretty sure my version is also objectively superior:

> - I can easily make mine from theirs

That... doesn't make it superior? On the contrary, theirs can't be easily made out of yours, except by either returning trivial 1-byte chunks, or by arbitrary buffering. So their proposal is a superior primitive.

On the whole, I/O-oriented iterators probably should return chunks of T, otherwise you get buffer bloat for free. The readv/writev were introduced for a reason, you know.

show 2 replies
hinkleyyesterday at 5:45 PM

I did a microbenchmark recently and found that on node 24, awaiting a sync function is about 90 times slower than just calling it. If the function is trivial, which can often be the case.

If you go back a few versions, that number goes up to around 105x. I don’t recall now if I tested back to 14. There was an optimization to async handling in 16 that I recall breaking a few tests that depended on nextTick() behavior that stopped happening, such that the setup and execution steps started firing in the wrong order, due to a mock returning a number instead of a Promise.

I wonder if I still have that code somewhere…

show 2 replies
flowerbreezeyesterday at 4:08 PM

I think the more generic stream concept is interesting, but their proposal is based on different underlying assumptions.

From what it looks like, they want their streams to be compatible with AsyncIterator so it'd fit into existing ecosystem of iterators.

And I believe the Uint8Array is there for matching OS streams as they tend to move batches of bytes without having knowledge about the data inside. It's probably not intended as an entirely new concept of a stream, but something that C/C++ or other language that can provide functionality for JS, can do underneath.

For example my personal pet project of a graph database written in C has observers/observables that are similar to the AsyncIterator streams (except one observable can be listened to by more than one observer) moving about batches of Uint8Array (or rather uint8_t* buffer with capacity/count), because it's one of the fastest and easiest thing to do in C.

It'd be a lot more work to use anything other than uint8_t* batches for streaming data. What I mean by that, is that any other protocol that is aware of the type information would be built on top of the streams, rather than being part of the stream protocol itself for this reason.

show 1 reply
paxysyesterday at 3:50 PM

There is no such thing as Uint8Array<T>. Uint8Array is a primitive for a bunch of bytes, because that is what data is in a stream.

Adding types on top of that isn't a protocol concern but an application-level one.

show 2 replies
pgtyesterday at 4:19 PM

This is similar to how Clojure transducers are implemented: "give me the next thing plz." – https://clojure.org/reference/transducers

hinkleyyesterday at 5:51 PM

I think the context that some other responders are missing is that in some functional languages, like Elixir, streams and iterators are used idiomatically to do staged transforms of data without necessitating accumulation at each step.

They are those languages versions of goroutines, and JavaScript doesn’t have one. Generators sort of, but people don’t use them much, and they don’t compose them with each other.

So if we are going to fix Streams, an implementation that is tuned only for IO-bound workflows at the expense of transform workflows would be a lost opportunity.

lucideeryesterday at 5:36 PM

Other angles of critique & consideration already covered well by sibling commenters. One extra consideration (unrelated to streams, more general) is the API design & dev UX/DX:

  type Stream<T> = {
    next(): { done, value: T } | Promise<{ done, value: T }>
  }
the above can effectively be discussed as a combination of the following:

  type Stream<T> = {
    next(): { done, value: T }
  }
  type Stream<T> = {
    next(): Promise<{ done, value: T }>
  }

You've covered the justifications for the 2nd signature, but it's a messy API. Specifically:

> My way, if I define a sync transform over a sync input, the whole iteration can be sync making it possible to get and use the result in sync functions. This is huge as otherwise you have to write all the code twice: once with sync iterator and for loops and once with async iterators and for await loops.

Writing all the code twice is cleaner in every implementation scenario I can envisage. It's very rare I want generalised flexibility on an API call - that leads to a lot of confusion & ambiguity when reading/reviewing code, & also when adding to/editing code. Any repetitiveness in handling both use-cases (separately) can easily be handled with well thought-out composition.

show 1 reply
paulddraperyesterday at 3:53 PM

Your idea is flatten the UInt8Array into the stream.

While I understand the logic, that's a terrible idea.

* The overhead is massive. Now every 1KiB turns into 1024 objects. And terrible locality.

* Raw byte APIs...network, fs, etc fundamentally operate on byte arrays anyway.

In the most respectful way possible...this idea would only be appealing to someone who's not used to optimizing systems for efficiency.

show 2 replies
ameliusyesterday at 8:08 PM

How do you send multiple sub-streams in parallel?

conartist6yesterday at 3:29 PM

There's one more interesting consequence: you rid yourself of the feedback problem.

To see the problem let's create a stream with feedback. Lets say we have an assembly line that produces muffins from ingredients, and the recipe says that every third muffin we produce must be mushed up and used as an ingredient for further muffins. This works OK until someone adds a final stage to the assembly line, which puts muffins in boxes of 12. Now the line gets completely stuck! It can't get a muffin to use on the start of the line because it hasn't made a full box of muffins yet, and it can't make a full box of muffins because it's starved for ingredients after 3.

If we're mandated to clump the items together we're implicitly assuming that there's no feedback, yet there's also no reason that feedback shouldn't be a first-class ability of streams.

soulofmischiefyesterday at 4:55 PM

In the language I've been working on for a couple months, Eidos, streams are achieved through iterators as well. It's dead simple. And lazy for loops are iterators, and there is piping syntax. This means you can do this (REPL code):

  >> fn double(iter: $iterator<i32>) {
    return *for x in iter { $yield( x * 2 )}
  }

  >> fn add_ten(iter: $iterator<i32>) {
    return *for x in iter { $yield( x + 10 )}
  }

  >> fn print_all(iter: $iterator<i32>) {
    for x in iter { $print( x )}
  }

  >> const source = *for x in [1, 2, 3] { $yield( x )}

  >> source |> double |> add_ten |> print_all
  12
  14
  16
You get backpressure for free, and the compiler can make intelligent decisions, such as automatic inlining, unrolling, kernel fusing, etc. depending on the type of iterators you're working with.