Hacker News new | past | comments | ask | show | jobs | submit login
Introducing a new data structure, streams, in Javascript (streamjs.org)
273 points by gtklocker on Sept 11, 2011 | hide | past | web | favorite | 82 comments



As someone who conjured this exact thing up in another language after reading SICP, I have to suggest that the first two commenters here missed the point: this is a very basic but powerful structure that implements a lazy stream and is nearly a monad (many standard options are written for you but could be generalized).

Want it to apply to data I/O? Stick your reader into the HEAD function, store the pointer to the next byte in TAIL, and implement functions for reading X number of bytes and converting them into their new types.

Yes, you can do math with it. You can also implement process/procedure into it, if you consider yourself popping lambdas off the stream, and which lambda is Next can be guided by choices in the current lambda.

Mostly, you can use this device to hold any collection of things and operate upon them in a streamy-syntax kind of way. For those who are familiar with jQuery, it is a bit of a reimplementation of a lot of jQuery's basic array-data handling with extra helpers like "range" tossed in and a stronger focus on the laziness. Where jQuery's prime target is work with the DOM, this stream package's target is the more theoretical version of some of jQuery's features, specifically those that iterate over the contained arrays. You could wrap jQuery in this and get an amazing hybrid.

Don't sell it short; it really is another useful and powerful way to approach data management. Check out how you construct your own stream with this package, and think beyond the basic numeric approaches.


It's hard to escape reimplementing streams after reading SICP, because Lazy evaluation is the red pill:

http://weblog.raganwald.com/2007/02/haskell-ruby-and-infinit...


As for "appending infinite lists" I'd like to suggest Control.Monad.Omega: http://hackage.haskell.org/package/control-monad-omega

"A monad for enumerating sets: like the list monad, but impervious to infinite descent.

A depth-first search of a data structure can fail to give a full traversal if it has an infinitely deep path. Likewise, a breadth-first search of a data structure can fall short if it has an infinitely branching node. Omega addresses this problem by using a "diagonal" traversal that gracefully dissolves such data."


> As someone who conjured this exact thing up in another language after reading SICP

Me too! http://jsfiddle.net/skilldrick/vT2mC/

I didn't get very far with it though - the combined issues of JavaScript's lack of tail-call optimisation and its verbose lambda syntax made building streams to painful.

This looks like an interesting approach, though you'll never get around JS's heavy lambdas (unless you use CoffeeScript or something).


I just want to quote something you said for the sheer awesome implications of it. "You can also implement process/procedure into it, if you consider yourself popping lambdas off the stream, and which lambda is Next can be guided by choices in the current lambda." This. Is. Powerful.


This comment thread reminds me a bit of "The Emperor's new clothes". I think the code for sieve (even though I understood what it did) is way too unnecessarily confusing and I would not want to see it in a project I'm on.

However, I do appreciate the usefulness of lazy evaluation. Still, I think this documentation could benefit by not making outlandish claims ("it will change the way you think") and just saying the author has implemented lazy evaluation in javascript (and maybe link to the wikipedia article on lazy evaluation and this stack overflow to explain why it's useful: http://stackoverflow.com/questions/2151226/what-are-the-adva...)


If your project team has people comfortable with functional programming - there's absolutely nothing wrong with that code.

As far as changing the way you think, it certainly came as a shock for me when I encountered it years ago in SICP and I'm still enjoying the benefits of lazy sequences in Clojure and ClojureScript.


I was first introduced to the concept of 'streams' akin to this in SuperCollider, which uses them as building blocks for musical patterns.

http://danielnouri.org/docs/SuperColliderHelp/Streams-Patter...

http://danielnouri.org/docs/SuperColliderHelp/Streams-Patter...

SuperCollider is a fun language to study. You end up with compositional structures like the following (notice 'inf', which indicates an infinite loop)

    Pbind(
      \octave, 4,
      \degree, PstepNadd(
            Pseq([1, 2, 3]), 
            Pseq([0, -2, [1, 3], -5]), 
            Pshuf([1, 0, 3, 0], 2)
          ),
      \dur, PstepNadd(
            Pseq([1, 0, 0, 1], 2),
            Pshuf([1, 1, 2, 1], 2)
        ).loop * (1/8),
      \legato, Pn(Pshuf([0.2, 0.2, 0.2, 0.5, 0.5, 1.6, 1.4], 4), inf),
      \scale, #[0, 1, 3, 4, 5, 7, 8]
    ).play;


Um, 'new data structure'? You have to be kidding. This is implemented in practically every single functional programming language. Also, this was already done in JavaScript: https://github.com/Gozala/streamer and looks surprisingly similar to your library. As you can see, streams in JavaScript are nothing new.


Thanks for your feedback. You're quite right. I added some clarifications in the 'Tribute' section :)


First of all this is very nice. (I'm going to get a little negative shortly; I want to be clear that I'm not putting down the writer's efforts at all.)

Several commenters have compared these streams to Python generators. Yes, the two are similar. But there are are a couple of important things Python generators have, that these don't.

(1) Python generators, along with lists, tuples, files, net connections, etc., all support the standard iterator protocol. This means that, for many purposes, you don't need to know or care which kind of data source you are reading from. The itertools functions can be applied to all of them, etc. In contrast, these JS streams are dealt with via specialized member functions. You can't just take array-based code, pop in a stream, and expect it to work.

(2) The iterator returned by a Python generator is a wrapper around a single running computation, which can spit out multiple values. Retrieving a new value is not, at heart, a function call, but rather the continued execution of a coroutine that has already started. This has many advantages; for example, we can write a Python generator that produces an infinite dataset, using a simple loop. I don't think that works here.

Now, as for (1) above, I think this is something we could do in JS. It would just be a matter of everyone agreeing to support a particular interface.

OTOH, it seems to me that JS does not have the functionality to support (2). Python does. So do Ruby, Haskell, and Go -- each in its own way. But JS does not, AFAIK. Am I wrong? And should we be building support for (2) into JS and other languages?

-----

BTW, I find these streams to be pedagogically interesting. Unlike, say, Haskell lists, the thunks in the data structure can be stored explicitly. It's a nice exercise in the actual implementation of a lazy sequence.


Hi ggchappell. First of all, thanks for your feedback. You're quite right about (2); but there's a new Javascript version coming out soon that proposes generators and 'yield' similar to how Python uses them. When it comes out and is supported by most browsers, we can probably implement that (or just use native syntax and get rid of stream.js altogether).

As for (1), I'd be happy to hear a proposal. If you'd like to work on this, feel free to submit an issue or even a pull request on github — suggestions like this are very welcome. Thanks! :)


Well, I'm certainly no JS expert. I doubt I'm the guy to figure out what to do here, at this point.

In any case, are you sure that JS generators should be considered a "future" thing? I just did a bit of research, and it sounds like you are talking about this proposal:

http://wiki.ecmascript.org/doku.php?id=proposals:iterators_a...

According to the following page, much/all of that proposal was included in JS 1.7, which was implemented in Firefox 2.0.

https://developer.mozilla.org/en/New_in_javascript_1.7

And FF 2.0 was released in October 2006. A quick check shows that my FF installation (3.6.22) supports generator functions, and the "yield" keyword, just fine, as long as I say

  <script type="text/javascript;version=1.7">
Regardless, I'm a little out of my depth here.


Here's a project that simulates the "yield" keyword with current version of JavaScript: https://bitbucket.org/balpha/lyfe/wiki/Home


Thanks for this link, I've been looking for something like that.

Generators like in Python allow for nice AI scripting with explicite time-sharing - each AI actor is generator function, that in main loop has yield statement.

That way it all works in one thread, and ai code can be written straightforward, like each actor had it's own thread.


am i the only one here who finds the whole "omfg, this is so amazing!!1" tone of this article mildly offensive? i mean, it's not like he has a self-levitating hamster or anything.


In his defense, the first time I really understood Sequences after reading The Joy of Clojure, I thought it was awesome too.

Now here's where I really blow your brain: a Sequence is some stuff, in a particular order. What about the keyUp event? If I were to listen to that event, it might give me ['T','e','s','t'].

That too, is some stuff, in order. This means, that Events are Sequences. Which means, that all the cool stuff that can be done to Sequences, can be done to an Event too, which is exactly what the Reactive Extensions for JS from Microsoft does (disclaimer: I'm writing a book on this).

Haskell of course has this too, it's called the Continuation Monad.


Also see the paper "A Concurrent Window System" by Pike (89) which talks about representing keyboard and mouse events as items on a non-finite potentially-blocking channel. And Newsqueak almost certainly isn't the first.


I felt roughly the same way about it that I would feel about the frolicking of a self-levitating hamster: pretty silly, but not actually offensive.

Lazy lists are pretty neat, and if you make them a convenient part of a programming language and its software ecosystem, they can be useful. This JavaScript version, though, looks more like a clever toy at the moment. Maybe that will change eventually.


Yep. It's not magical, or new, or.. shrug.


Ah, but he does have this... "If you devote 10 minutes of your time to read this page, it may completely change the way you think about programming."


Unless, of course, you already knew about lazy sequences.

I can explain classes in 10 minutes and have the same effect. Assuming, of course, you've only coded FORTRAN.

In fact, you picked out the part that I mostly found mildly offensive. He's treating his readers like "less smart than him".


skrebbel, thank you for your feedback. I'm sorry it felt that way to you. In no way do I think my readers are less smart than me. I tried in the article to keep a tone that addresses my past self, before I learned about infinite lists in Haskell or streams in Scheme. In that sense, it was a revelation to me, which made me a better programmer. So I'm hoping that it may have introduced at least some of the people reading it to this different paradigm, which, to me, was enlightening.


"self-levitating hamster"

That's hilarious! I'm seriously fighting my social-urge to upvote...


Sounds like generators in python, but implemented in javascript. The head element and tail function was a interesting way to view them.

Generators a great way to do processing on a lot of data, in a very list-y way, without having to read all the data into memory first. It allows you to start seeing results straight away.


The advantage of "streams" is that it can be used to elegantly define infinite sequences recursively.

For example, you can define powers of two as:

   var powersOf2 = new Stream(1, function() {
     return powersOf2.scale(2);
   });
(untested) which means that the sequence of powers of 2 begins with 1 follows with the the same sequence scaled by 2!

The implementation in the linked article however seems to be flawed, since it doesn't store the node's tail after computing it. This means that to get each element of the sequence you have to compute all the elements before it (even though you already did in order to get to that element).

This works not only for numbers but for other infinite sequences such as expressions in a language. I recommend the book Higher Order Perl if you're interested in this.


MJD's Stream module from HOP can be found here: https://metacpan.org/module/HOP::Stream


Check out node-lazy, it does something similar and more nicely: https://github.com/pkrumins/node-lazy

    var Lazy = require('lazy');

    Lazy.range('1..') // infinite list of integers (see readme)
        .filter(function (n) {
            return n % 5 == 0
        })
        .take(5)
        .join(function (result) {
            console.log(result)
        })
    ;
Output:

    [ 5, 10, 15, 20, 25 ]


Hi pkrumins. Thanks for the feedback. I added a link to node-lazy in the 'Tribute' section of the stream.js tutorial. I like its syntax, and I'm sure many people would prefer it :)


Awesome!


It'd be great if the websites for JavaScript libraries like this one would actually include the library in the page, so that while I'm reading about it, I can open up the JavaScript debugging console and actually try out the library as I'm reading about it.


Thanks, that's a great idea! I made it load up the library and added a note in the tutorial that one can use their browser JS console to fiddle with it :)


Thank ye!

Here's another example you might want to include, if you're going for the whole Haskell angle. Not as elegant, but illustrates the concept:

  var fibs = new Stream(1, function() {
      return new Stream(1, function() {
          return fibs.zip(function(a, b) { return a + b; }, fibs.tail());
      });
  }); 

  fibs.take(10).print();


Great example, I've added it. Thank you! :)


It might be convenient to implement s.print(n) as shorthand for s.take(n).print(). :)


exogen — great idea. I've just added that feature. Thank you! :)


You can create lazy enumerables with linq.js as well.

http://neue.cc/reference.htm

    Enumerable
      .Range(0, 1000000000000)
      .Take(10)
      .ToArray()
Output:

    0,1,2,3,4,5,6,7,8,9


There is also "LINQ for Javascript", which is fully integrated in with JQuery and VS as well. http://linqjs.codeplex.com/

Streams seems to be a proper subset of the things you can do with linqjs. Sometimes MSFT can do cool things too, it just doesn't always get picked up :)


Actually, we're talking about the same project. I linked to its reference page for the examples.


Thank you for mentioning this—I had no idea it existed! Do you use lazy enumerables in your projects?

I can think of a few fun uses (such using them as creative replacements for looping constructs), but I'm struggling to find a problem that is most easily solvable with these.


Yes, I use them liberally in C# projects. In fact, I prefer to use them over the standard looping constructs because of the pipe-like characteristics. It's very useful.


buddydvd, linq looks quite nice. Thanks for mentioning it, I had no idea it existed. I added a link to it at the 'Tribute' section of stream.js in case someone wants to check it out or prefers its syntax instead :)


Does that require the obnoxious 100000000000, or could that be left blank?


It's not required, but you'd have to write your own generator function:

    Enumerable
      .Generate((function() {
        var i = 0;
        return function() {
          return i++;
        };
      })())
      .Take(10)
      .ToArray()
Output:

    0,1,2,3,4,5,6,7,8,9


This is nothing like a real byte stream, and is very misleading. The name implies that it's for data i/o, when in reality it's for mathematics.


Agree, i thought the name is awful as I expected to see something related to I/O and buffering of content. It looks like a handy list object and seeing "stream" everywhere was just blinding.

However, right at the bottom the etymology of the name is made clear:

> The name 'stream' is used in Scheme, a LISP dialect that supports these features.

So the name is from the FP community. Given the cultural issues, a name change seems unlikely...


Here's a helpful two-part Berkeley lecture on the topic: http://academicearth.org/lectures/streams-i


I didn't know that series of lectures, so thanks a hundred times. (Just spend the evening watching the first two hours and bought Harvey's book as well.)


It's also used in Ocaml and SML, as far as I remember.


Don't you think that a data stream is a special case of this kind f stream that just happens to have a bunch f idiosyncratic names in it's API? It's not like these streams are a completely different idea.


Maybe I don't get it, but what is the practical use-case for this? Infinite arrays of numbers - maybe useful for math or statistical applications, but are people using javascript for this over things like MATLAB, R, etc?

I can't think of anywhere to use this in the context of building applications and I certainly don't feel like it "completely changed the way I think about programming" at all.

Edit: So it looks like you can put more than just numbers into the streams -- still what does this do that Underscore.js doesn't?


Consider an indeterminate array holding all the input data or events your program will receive in the future.


Here's the Python equivalent, for what it's worth: http://www.trinhhaianh.com/stream.py/

I've been playing around with this in my current project, yet I'm not convinced this is always such a great model. It requires a lot of mental work to remember what types you are operating on at each state in the pipeline. That being said, I'm a big fan of LINQ, and I'd like to see this model applied more frequently to DB queries.


Doesn't "stream" already mean something like "unix pipe" or "queue"? What about calling them Sequences like Clojure does?


For a high-powered version of this written in CoffeeScript which works over arrays and lazy sequences (streams) look here https://github.com/swannodette/fun.coffee/blob/master/src/fu...

ClojureScript of course takes it even further.


"a new data structure" ... really?


For anyone who wasn't paying attention, according to Brendan's slides for TXJS[1][2], TC39 approved iterators and generators for ES, so soon all modern browsers will support them. (Spidermonkey has quietly supported them since Firefox 3—that is, summer 2008.)

1. http://brendaneich.com/2011/08/my-txjs-talk-twitter-remix/

2. http://news.ycombinator.com/item?id=2982256


I'm used to calling these "Generators", as in a "Generator Function". We could use Forth's defining words to build a pretty succinct version of this type of functionality. Generators will be called by executing their address ("xt" in Forth parlance), and will leave a flag on the stack indicating whether the generator had another element available. If this flag is true, the element produced will be underneath.

To start with, let's create a word called "count" which produces a generator which will count from 1 to infinity:

  : count ( -- 'gen )
  	noname create 0 , latestxt
  	does> ( -- value? flag )
  	dup @ 1+ dup >r swap ! r> true
  ;
Then we build another generator which slices the first N elements of another generator, so that we can do things without going into an infinite loop:

  : slice ( 'gen count -- 'gen )
  	noname create , , latestxt
  	does> ( -- value? flag )
  	dup @ 0= if drop false exit then
  	dup dup @ 1- swap !
  	cell+ @ execute
  ;
Finally, let's implement "map", so that we can apply a word to every element produced by the generator:

  : map ( 'gen 'func -- )
  	2>r
  	begin
  		i' execute
  		if   i execute
  		else 2rdrop exit
  		then
  	again
  ;
Now the Forth stack gives us a nice, fluent way of chaining these building blocks together. Let's start with that infinite counter, slice off the first 10 elements and print them out by mapping them to the number printing word ".":

  count 10 slice ' . map
And we get:

  1 2 3 4 5 6 7 8 9 10
Spiffy, hunh?


tl;dr; = It's an array with a few utility methods, but can also hold 'infinite' things like "all the positive even numbers".

Not terribly useful imho and certainly doesn't live up to the hyped intro.


Streams and other lazy infinite data structures can be very convenient, but this isn't really obvious at first glance. I recommend you check out some of the FP literature. A good place to start reading about the virtues of streams and lazy evaluation is "Why functional programming matters" (available at http://www.cs.utexas.edu/~shmat/courses/cs345/whyfp.pdf).


It's a good concept and all, the issue i've had with it for my own use is that the native Array functions are so fast in JavaScript (and function invocation so comparatively slow) that you really need a very specific use case for them to be worthwhile over simply performing your operations on the array directly like Underscore.js.


Hope this doesn't seem like threadjacking, but your README looked like a fun exercise, so I decided to avoid looking at the code and re-implement in Coffeescript. I wound up memoizing while I was at it.

https://github.com/MichaelBlume/coffeestream


Michael — that's very cool. I added a link to your library in the 'Tribute' section. Thanks for taking the time to port it :)


Awesome, thanks for the link, and for inspiring a happy afternoon's coding =)


Just curious -- and be honest -- the snippet at the end they ask you to decipher (and they say "Most programmers find it hard to understand unless they have a functional programming background, so don't feel bad if you don't get it immediately") did you have trouble with it?

For me, I thought I had the answer after maybe 30 seconds of reading it, but decided to "check my work" by walking through it again, and only then, a moment later, did I realize what it was actually doing with that recursive call.

I'm just curious about their "most programmers without a functional background" remark. So: did ya get it?


Yes, but it would have been significantly more baffling if the word "sieve" didn't hint at its behavior.

Still, I'm not sure that a functional background would have helped, or maybe my background is more functional than I think. I know some Python, and I read a few articles about functional programming a couple of years ago, but I don't think I could give you a clear definition of what functional programming is.


The name "sieve" gave it away for me before I'd even looked at the code, but I decided to make sure I was right. Then i looked at it and my mind was kind of blown but how neat and concise it was. This definitely makes me want to take a look at functional programming (just ordered "The Little Schemer")


I don't have a big experience with functional languages, but I got it quickly too. As written in the page, the name is too much of a giveaway.


Here are some key aspects of Streams:

- Streams are an alternative to generators (yield)

- Streams can be used to traverse recursive structures

- Streams implementation only require lambda functions with proper closure

- Node.js continuation passing style is a form of Stream

I've implemented streams using continuations. You can also see a bunch of related functions such as: print, map, filter, reduce, range, enumerate, comprehension, traverse, empty, cons, zip, xrange, generator2stream.

http://blog.vjeux.com/2011/javascript/stream-lazy-iteration-...

Tell me if you have any remark :)


I enjoy using the wonderful lazy list implementation here: https://github.com/pkrumins/node-lazy IMHO, it has abetter interface than streamjs


Embracing functional programming in Javascript is trending. All these dollars, underscores and streams will make for an interesting benchmark against ClojureScript as it matures.

In ClojureScript, streams or lazy lists are an integral language feature. Standalone javascript libraries can not provide a functional framework of the same calibre. You would naturally end up wanting to, e.g., run an underscore.js chain on a stream and that would require more work.


The guy just finished reading that chapter of SICP? ^_^


Hi schiptsov. Not far from the truth, actually! I just finished watching the online video lectures on the MIT OpenCourseWare page. I already knew about infinite lists and streams from Haskell, but their lectures inspired me to make a Javascript port. Their examples are quite nice too. Your comment made me think I should add a link to their book, so that's now included in the 'Tribute' section. Thanks! :)


I've written an article that talks about the same beast: http://blog.vjeux.com/2011/javascript/stream-lazy-iteration-...

However, I use Javascript functions as base instead of a Stream constructor.


I am sorry but I found the tone of the article a bit offensive. You are not coming out as playful or funny, but as arrogant. Drop the attitude mate, try the guitar if you want to be a rockstar. Apart from that, this is a nice example of how versatile the language can be.


Hi BlehBlehBleh. Thank you very much for the feedback. I'm sorry it came out as arrogant — not my intention at all. I tried to make the article enthusiastic in a way that captures the reader's attention to read a bit further and see the interesting things that are possible with this library. The language reflects my own enthusiasm I felt when I first learned about streams and lazy data structures.

Based on your suggestion, I've added a few clarifying comments, especially in the 'Tributes' section to show that this isn't my own brand new idea but is based on other people's work; maybe this will help. Thanks :)


I couldn't get past the 2nd paragraph. the guy's infomercial salesman writing style was hurting my head.


On the surface this looks like a port of Scala's Streams object to Javascript. Any difference?


Seems like they (whoever they is) should add this to Javascript by default.


Dilon, glad you liked stream.js. There's actually a plan to support Iterators and Generators in one of the upcoming versions of Javascript: https://developer.mozilla.org/en/JavaScript/Guide/Iterators_...


So how is this different from an array?


It is infinite, so you can for example have an stream of all the prime numbers (only those you actually use will be computed, of course; the point is some algorithms are easier to express using this abstraction).




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: