

Iterator patterns in Go - LamaOfRuin
http://ewencp.org/blog/golang-iterators/

======
szopa
The iterator pattern I have in
[https://github.com/ryszard/goskiplist](https://github.com/ryszard/goskiplist)
uses an interface similar to

    
    
      type Iterator interface {
          // Next returns true if the iterator contains subsequent elements
          // and advances its state to the next element if that is possible.
          Next() (ok bool)
          // Value returns the current value.
          Value() interface{}
      }
    

which you then use like this:

    
    
      iter := s.Iterator()
      for iter.Next() {
          fmt.Printf("%s\n", iter.Value())
      }
    

I think this is a pretty good pattern: it's concise, looks less ugly than the
callback and closure based iterators, and it's not using anything heavyweight
like channels, so there should be no significant performance degradation.

Out of curiosity, I added this pattern to OP's test code as StatefulIterator
([https://github.com/ewencp/golang-iterators-
benchmark/pull/1](https://github.com/ewencp/golang-iterators-
benchmark/pull/1)), in 2 versions: one in which the iterator is a struct, and
one in which the iterator is an interface. The results look like this (the 4
last rows are my additions):

    
    
      BenchmarkIntsCallbackIterator	         500	   3478962 ns/op
      BenchmarkDataCallbackIterator	         500	   4437885 ns/op
      BenchmarkIntsChannelIterator	          10	 186095017 ns/op
      BenchmarkDataChannelIterator	          10	 188627451 ns/op
      BenchmarkIntsBufferedChannelIterator	  20	  90941766 ns/op
      BenchmarkDataBufferedChannelIterator	  20	  90175550 ns/op
      BenchmarkIntsClosureIterator	         500	   5580257 ns/op
      BenchmarkDataClosureIterator	         500	   6183472 ns/op
      BenchmarkIntStatefulIterator	        1000	   2653265 ns/op
      BenchmarkDataStatefulIterator	         500	   3582051 ns/op
      BenchmarkIntStatefulIteratorInterface	 200	   7516778 ns/op
      BenchmarkDataStatefulIteratorInterface 200	   8250702 ns/op
    
      ok  	_/Users/ryszard/Projects/golang-iterators-benchmark	33.717s
    

As you can see, the struct version is the fastest of the lot: 2653265 ns for
ints and 3582051 ns structs (versus the callback version with 3478962 ns and
4437885 ns, respectively). The interface version is slower, but still in the
same ballpark as the callback and closure based ones, while being much more
readable (but that's only my personal opinion). The not-so-stellar performance
of interfaces is a bit disappointing, but that is a known issue.

------
zemo
Not a single solution has error handling. There actually exists a very good
iterator pattern in the Mongo driver mgo. The idea is that you have some
function Next, and Next takes a pointer. You pass a pointer into Next, and the
iterator fills that address with some data, returning a boolean indicating
success. Then you just put that in a for loop to loop over the whole thing.
The whole thing, with error handling, looks like this:

    
    
        var v someType
        for iter.Next(&v) {
            // do something with v here
        }
    
        if err := iter.Err(); err != nil {
            // the iterator had an error, handle that here.
        }
    

[http://godoc.org/labix.org/v2/mgo#Iter.Next](http://godoc.org/labix.org/v2/mgo#Iter.Next)

~~~
ewencp
For the vast majority of cases, I would expect panic/recover to work for
iteration over in-memory data structures. For something like a database
iterator where the possible errors are more varied and likely, I agree an
explicit iterator object is better. I probably should have defined the scope
of problems I was thinking about better.

------
laureny
Iterators should really disappear and only be used when they are really
needed. Just take a look at all the examples in the article and you will see
that the iterators are noise and unnecessary. Most of the time, all you want
is getting all the values in a collection, why would you even need to read the
word "iterator" in that code?

Modern languages should all support some form of foreach, e.g. Java:

    
    
        for (Customer c : customers) { ... } // no iterator
    

C#, Scala and many others get this right as well. Go doesn't.

And, this:

    
    
       close(ch) // Remember to close or the loop never ends!
    

Seriously?

~~~
ewencp
You need them because the examples are intentionally simplified. Even a data
structure as simple as a dictionary has a useful variety of iterators: over
keys, over values, and in some languages, over (key,value) tuples. Explicit
iterators are really only noise for the "natural" iterator, i.e. in order
iteration over an array.

I agree that having it hidden by a language feature is nice, but somewhere
there is an iterator -- the class that supports that iteration must at least
specify how it's accomplished. I would prefer if there were an easy way to
integrate with the range keyword in Go.

EDIT: Adding to reply to some added comments. The close() one in particular
really threw me. It makes sense in terms of how channels work, but is really
unfortunate. When I first saw the channel-based approach, I thought it was a
neat solution. But then I found all the problems with it (including the best
example I could find of it online missing that close(), which is why I pointed
it out!).

I think Go is close to getting it right if they would do what I mention at the
end of the article: provide a way to hook into the range keyword.

~~~
laureny
> Explicit iterators are really only noise for the "natural" iterator, i.e. in
> order iteration over an array.

Not really, they are really noise in a lot of situations. Right now, I can
only think of one case where they are necessary: lazy traversal. Either the
collection is huge or each element is expensive to materialize, so you want to
do that one at a time. Iterators are perfect for this.

They are unnecessary in all other cases. For example, for your (key, value)
example:

    
    
        for (Entry<String, String> entry: map.entrySet()) { ... }
    

Go exposes too many implementation details to my taste.

~~~
ewencp
Fair enough. My day-to-day coding with Go right now involves large data sets
where materializing something like that would be expensive. I'll admit my view
of the issue is strongly biased by my very real need for iterators or
something equivalent that can avoid that cost!

~~~
aaronblohowiak
in the callback approach, you can pass in the current item and a Breaker or
maybe a RewindBreaker that you can easily _ when you don't care about it.

------
Laremere
It's very important when looking at the channel method to realise that
goroutines are NOT garbage collected. If there is any chance of the loop
calling break or panicking, there is a chance of leaking goroutines. Since you
might not know how your code is going to be used, this is an issue. A possible
solution is to use two channels, the items channel and a fail channel.
Whenever your iterator sends, it needs to use a select which either sends the
item on the channel, or receives from a fail channel (in which case it exits
the function.) I think while the channel method looks nice because of Go's
special treatment of it's basic types (for better or worse,) it's not
practically the best answer you should be using in your code.

~~~
ewencp
I mentioned this pitfall, but didn't show how to solve it. I haven't seen a
complete example yet and didn't take the time to work out the details yet. I'm
less concerned about complexity in the iterator implementation, but handling
this correctly _also_ seemed to require changing the caller code since it
needs to send something on the fail channel. Then the onus is on the caller to
remember to do so, which is just as bad as having to remember not to call
break.

Of course, whether this is a real issue or not also depends on whether you're
writing internal or external APIs. There are lots of other ways to generate
uncollectible garbage too...

~~~
cmccabe
Use "defer close(ch)". If there is a break or a panic, the channel will be
closed.

~~~
Laremere
If there is a break or panic in the iterator, not the loop using the iterator.
If the loop using the iterator closes the channel, the program will crash.

~~~
cmccabe
The sender is supposed to close the channel to signify to the receiver that
there is no more data to be sent. The receiver is not supposed to close the
channel. (You could hack around this by recovering from the panic, but this is
bad style.)

[https://groups.google.com/forum/#!topic/golang-
nuts/RmfHSHE9...](https://groups.google.com/forum/#!topic/golang-
nuts/RmfHSHE9e30)

Note that there is no requirement that you close channels-- they aren't like
file descriptors. They will be garbage collected after they're no longer
needed.

goroutines, on the other hand, will stick around unless you clean them up.
goroutines aren't iterators, and you shouldn't use them as iterators. They're
more analogous to threads.

------
tux21b
The most obvious pattern, a simple for loop and a separate iterator type, is
imho missing. Something like "for iter.Next() { fmt.Println(iter.Value()) }"
is commonly used in Go and doesn't look that bad either...

~~~
enneff
A notable use of this pattern is bufio.Scanner:
[http://golang.org/pkg/bufio/#Scanner](http://golang.org/pkg/bufio/#Scanner)

An example, scanning lines from standard input:

    
    
        s := bufio.NewScanner(os.Stdin)
        for s.Scan() {
            fmt.Println(s.Text()) 
        }
        if err := s.Err(); err != nil {
            fmt.Fprintln(os.Stderr, "reading standard input:", err)
        }

------
i_have_to_speak
Seriously? Using a green thread is a pattern for iteration over an array? Or
that creating a new function object per iteration is efficient? The "patterns"
2 & 3 listed in the article are grossly inefficient to be considered suitable
to implement fixed or dynamic iterators.

~~~
thrownaway2424
Sometimes raw performance is not really called for. A lot of programmers are
looking for ways to, for example, iterate over lots of different kinds of
things that all provide the interface to do so, regardless of their actual
types or the details of their allocation.

Sure, if you're about to iterate over a terabyte of double-precision height
map data or something like that, please use an array.

~~~
i_have_to_speak
Without such advice, that article is dangerous. A "lot of programmers" are
going to read that and assume these patterns differ between them only in terms
of "performance" which equates to "ns/op".

Also, I wouldn't use an array to iterate over a terabyte of double-precision
height map -- have you tried that? I have, kind of. In a genome sequence
visualization application. Try it sometime. It's fun.

------
thrownaway2424
Only a minor point of the article, but clearly the equivalent of Pythonic for
Go would be Gothic, not "Go-ish".

~~~
ceautery
That's good. I was going to say "gauche", but that may be sending the wrong
message.

------
fredsanford
Does Go have generator functions? Yield?

I'm curious because that's what makes python the best language for me in a lot
of cases.

Iterators that don't load 100,000,000 rows at a time for a result set are...
useful. And that's what you get with the simple (iterator/generator func)/db
cursor pattern in python.

Something similar in Go would go a long way in helping me get over the fixed
braces position go enforces. :)

------
iamwil
"I wrote two versions of each. One directly iterates over a []int, the other
introduces a struct and uses a []*struct which it needs to extract the data
from. The goal was to hopefully avoid benefits from cache coherence"

Can someone explain to me that last sentence? What's cache coherence?

~~~
btilly
Cache coherence is when data that you need close in time is physically close
in memory, and therefore loading the first piece of data loads the second into
cache so that it will be quick to load. The performance difference between
fetching from CPU cache and main memory is about a factor of 10, so this is a
significant effect.

When you're writing performance critical code this is a good thing to have
happen, and it can be worth a lot of effort to make sure that you benefit as
often as possible. However general purpose code usually does not wind up with
data that is optimally organized in memory. But in a benchmark the usage
patterns are simple enough that it is trivial to get well-organized data in
memory without even intending to. Whether this is likely to be predictive of
real world performance depends on how carefully coded the real world program
is.

~~~
bad_user
Cache coherence can also be a problem for performance, leading to what is
called _false sharing_.

When reading some value in RAM, values in the same "cache line" (that are
right next to it) get cached. However, if another thread has to modify those
values that just got cached, then the cache needs to be invalidated. This
often happens in producer/consumer implementations.

If this is discovered to be a problem, it's often solved with what is called
_cache padding_. I.e. if you've got variables A and B that are right next to
each other and they are modified from different threads, then you add fake
variables (padding) between them that's the size of a cache-line (which is
between 32 bytes and 256 bytes, depending on architecture).

------
JulianMorrison
What's wrong with returning a populated slice? It's basically the same
operation as returning a pre-sized, pre-filled buffered channel. Then you can
"range" on the returned value.

The only downside is that you can't defer computation of values until they are
asked for.

~~~
yoklov
For one thing, that will use much more memory and take more time to compute.

~~~
JulianMorrison
Don't underestimate the time cost of coroutining (literally goroutining, and
using a channel, or to-and-fro with HasNext and Next). And the unfinished
generator is using memory, too. Code runs fastest when it can sit in cache, so
creating a slice in a tight loop and returning it is probably optimal if the
actual data you just materialized is small.

~~~
yoklov
Sure, it's better than the goroutine approach, but that doesn't say much. It
becomes quite suboptimal if you have a large amount of data. I also think that
you're likely underestimating the cost of the additional memory allocations.

------
qznc
If you like/use iterators, you should read about ranges [0].

[0]
[http://www.informit.com/articles/printerfriendly.aspx?p=1407...](http://www.informit.com/articles/printerfriendly.aspx?p=1407357&rll=1)

------
Ziomislaw
overengineering at its finest.

------
cmccabe
Iterator callbacks, generators, or functors feel very un-Go-like. It's trying
to force a Python or OCaml idiom on to Go. Basically, what you want is to
implement a HasNext and a Next function and write a "for" loop. All you need
is this:

    
    
      for ; iter.HasNext(); val = iter.Next() {
         // do stuff
      }
    

There's no reason to look down on the humble for loop. Complexity is a bad
thing, usually.

Channels and goroutines are also a very Go-like way to process data. But APIs
that return channels assume that the caller wants concurrency, so be sure that
that's true.

~~~
fauigerzigerk
Right, but you have to consider the complexity of the iterator implementation
as well. Every time I implement iterators for non-trivial data structures in
languages without generators I want to hit myself in the face for using that
language.

------
methehack
You gosters are planning on writing a lot of for loops, huh?

Y'all can't be just plain ignorant about what you can do in a language that
has the features to support a proper collection library... So what's up with
that? You like them for loops? Or does everything that's so awesome about go
make it worth the apparently complete crap state of collections?

I suspect all this has to do with a choice of greater type safety over
expressiveness, but I'm not sure. I know there's this big generics discussion
in the go community. The thing about generics is you sprinkle butt-ugly and
cross-eyed syntax on top of expressive and you get...expressive ugly cross
eyed butt syntax collections.

Help me out here -- I read these rave go reviews and then I see this kind of
thing and I think "well, it'd be better than Java -- maybe way better -- but
it feels like Java redux all over again reheated."

~~~
ewencp
I'm just learning Go, I'm certainly no evangelist. One huge difference for me,
however, doesn't have to do with the language. It's deployment. It's _really_
nice for deployment to be scp rather than a bunch of package installs, making
sure your classpaths are right, etc.

That said, one of the language features that drew me in is structural typing.
I'd yet to see a well supported statically typed language that used structural
typing, and it seemed like it would make software maintenance a lot easier. Of
course, tools are often a better solution than language features, so maybe
that's a bad reason to use a language.

