
The real way to generate a list of primes in Haskell - garrisonj
http://www.garrisonjensen.com/2015/05/13/haskell-programs-are-lies.html
======
jkarni
Funnily, there's haskell-cafe thread[0], a github issue, and _even a paper_ ,
about this (and I think maybe reddit got involved too).

Anyhow, the title is kind of too much. At least, given the aforementioned
discussions, we're conflicted liars.

[0][https://mail.haskell.org/pipermail/haskell-
cafe/2015-April/1...](https://mail.haskell.org/pipermail/haskell-
cafe/2015-April/119428.html) [2][http://www.cs.hmc.edu/~oneill/papers/Sieve-
JFP.pdf](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) [3]
[https://github.com/haskell-infra/hl/pull/8](https://github.com/haskell-
infra/hl/pull/8)

~~~
pervycreeper
I don't get it. I simply see the example as an elegant way of introducing the
power of the language. This kind of pedantry only drives curious people away.

~~~
chrisdone
Most people know it's not a real sieve of Eratosthenes and just a
demonstration of Haskell concepts but there will always be people excited to
educate others regardless of the context.

------
rifung
As much as I hate the title of this post, I think the author has a point.

I've also seen this same thing come up when comparing implementations of
quicksort in Haskell to that of other languages. They always show a short,
elegant implementation in Haskell, but the issue is that it's not really
quicksort as it doesn't do the sort in place.

~~~
gohrt
> it doesn't do the sort in place.

That's one of the easiest ways to raise hackles among Haskellers -- they
believe that parallelizability, not in-place sort, is the defining
characteristic of quicksort.

------
ColinWright
This has always bothered me about most implementations of the Sieve of
Eratosthenes. Namely, what they produce isn't it.

If you have division operations, or "mod" operations, it's not the Sieve of
Eratosthenes, it's just a filter.

Not the same thing at all.

~~~
jamesrom
It is a sieve. No one claimed it was the Sieve or Eratosthenes.

~~~
ColinWright
Interestingly, if you look closely you'll notice that I never claimed that
this referenced code was the Sieve of Eratosthenes. It seems you just assumed
that I thought it was claiming to be, when in fact I know it isn't.

And that's the problem. I've found that when this code is presented people
often assume it's intended to be the Sieve of Eratosthenes, and nothing is
done to preempt or prevent that misconception. As observed elsewhere, there
are now several major threads, discussions, and even proper papers about this,
so people _are_ becoming aware of it.

I still meet programmers who think the version shown is intended to be the
Sieve of Eratosthenes. Fortunately I now have several on-line references to
point them at.

~~~
jamesrom
If you look closely I never claimed you thought it was...

~~~
ColinWright
OK, so let's try to be clear about this.

In my experience, that code, pretty much exactly as presented on that page,
has, in the past, been called the Sieve of Eratosthenes. You and I both know
that it isn't, but I, for certain, know that there have been occasions when it
has been claimed to be.

Currently, the version as presented on the Haskell page does not claim that it
is the Sieve of Eratosthenes. However, given its history, some people, and in
my opinion quite reasonably, will believe that it is, or is intended to be,
the Sieve of Eratosthenes.

In my opinion, the people who look after the Haskell page should know that,
and should have _something_ there to preempt that mistaken belief. To do so
would help to increase people's knowledge. If done well, I doubt that it would
hurt, and it might actually help to increase people's curiosity about Haskell.

 _Edit: Just wanted to add that I 've up-voted your comments, because they're
true, and they've made me re-evaluate a few things. Thanks._

------
btilly
Why is Haskell so slow at this?

As far as I can tell, my Perl implementation at
[http://www.perlmonks.org/?node_id=276112](http://www.perlmonks.org/?node_id=276112)
is doing something similar with similar amounts of laziness and no
optimizations build it. Yet I can produce the first 50,000 primes in the time
that this takes to produce the first 10,000. And nobody uses Perl for its
speed!

~~~
delluminatus
I'm not sure, maybe it's just the overhead of having so many function calls
and Set accesses? My guess is that the Haskell one could be made quite a bit
more efficient if you used a low-level mutable array.

To add another benchmarking data point, I have a simple sieve of Eratosthenes
written in Nim using an array that can generate 10,000 primes in less than a
millisecond.

~~~
btilly
That is why I compared to an implementation in Perl that was likewise making
lots of excess function calls and storing things very inefficiently. This was
as close to apples to apples as I could get without putting much energy
forward.

Perl gets a lot faster if you sieve blocks at a time, using vec() to
manipulate bit arrays. And I'm not surprised that an actually efficient
language would be massively faster.

------
te
You can get another ~3x speedup by implementing the three lines of code
comprising the "simple wheel" described at bottom of page 8 of the referenced
O'Neill paper.

------
gohrt
This result was more famously previously published by Melissa E. O’Neill as

[https://www.cs.hmc.edu/~oneill/papers/Sieve-
JFP.pdf](https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf)

The Genuine Sieve of Eratosthenes

Harvey Mudd College, Claremont, CA, U.S.A. (e-mail: oneill@acm.org)

And is available on Hackage as:

[https://hackage.haskell.org/package/primes-0.2.1.0/docs/Data...](https://hackage.haskell.org/package/primes-0.2.1.0/docs/Data-
Numbers-Primes.html)

------
Chinjut
I agree that this is a more efficient way of generating primes in Haskell than
the typical Haskell 101 approach. However, I disagree with the idea that the
Haskell 101 approach does not deserve to be called an implementation of the
"Sieve of Eratosthenes".

The distinction is only this: when we have found a prime p and are eliminating
numbers accordingly, do we consider ourselves only to spend time directly
enumerating the multiples of p and crossing them off? Or do we consider
ourselves as running through the entire list and going "Ok, ok, cross, ok, ok,
cross, ok, ok, cross" (for, for example, p = 3), thus spending time traversing
through multiples and non-multiples alike? So to speak, do we jump from
"cross" to "cross", or do we walk along through the "ok"s inbetween?

In the former case, each new candidate is worked on only in proportion to its
number of prime factors; in the latter case, each new candidate is worked on
in proportion to all smaller primes. The former is the more efficient way of
generating primes; the latter is (essentially) the ubiquitous, naive approach.

But I don't think one can say the traditional understanding of the Sieve of
Eratosthenes draws a strong distinction between these two! Traditional
accounts would not explicate any difference between "Jump directly from
'cross' to 'cross' " and "Walk from 'cross' to 'cross', saying 'ok' to
everything inbetween". It's not a distinction anyone was traditionally worried
about. Eratosthenes certainly didn't.

So I think both of these are deserving of the name "Sieve of Eratosthenes".
They're just different approaches to that sieve.

In either case, we say there are primes, to each prime we associate the set of
its multiples, we merge these sets into the set of composites, and close the
loop of our recursion by noting that the primes are to be the complement of
these composites. The difference is, in some sense, arising just from how we
represent and manipulate subsets of the naturals (as pertaining to the set of
multiples of each prime, as well as their merger into the totality of
composites): either as streams of increasing naturals [efficient], or as
streams of "In"s and "Out"s [less efficient].

------
jamesrom
Where is it implied on the Haskell home page that it's the Sieve of
Eratosthenes?

The variable name? It's a sieve.

[http://en.m.wikipedia.org/wiki/Sieve_theory](http://en.m.wikipedia.org/wiki/Sieve_theory)

------
codygman
Upvote despite the article being titled "Haskell programmers are liars". Great
decision to use subtitle admins. +1 to bitemyapp for submitting a PR to change
"sieve" to "primeFilter" and avoid this in the future.

------
pathikrit
My Scala one using a Java BitSet:
[https://github.com/pathikrit/scalgos/blob/9bd0dd81df52a5a410...](https://github.com/pathikrit/scalgos/blob/9bd0dd81df52a5a410c2e6844c5ac0ba62cf544e/src/main/scala/com/github/pathikrit/scalgos/NumberTheory.scala#L19-L26)

------
wyc
Even if you know the most optimal solution in terms of computational
complexity, it might not be the best thing to put into your code base. There
are a lot of other things to balance including (but not limited to)
readability, maintainability, probability of correctness, and of course
developer time. Considering these multiple dimensions is essential to good
engineering.

I'm not saying you shouldn't know the best algorithms for a problem, as this
article clearly demonstrates the effectiveness of a more efficient solution.
In fact, having better understanding of algorithms and computational
complexity makes it safer for you to accurately assess the trade-offs you'll
be making by picking slower but simpler code or faster code with more
complexity. There is more to consider than just big-O when writing software.

Note: What I'm saying most strongly applies to software with functionality
that doesn't exist yet. If there's a reliable library with what you're seeking
(such as a way to generate primes), it's usually best to use it.

------
thegeomaster
I might be missing something obvious so excuse me, but why not use a heap as a
priority queue? It has O(1) find-minimum and O(log n) insert, which is better
than a set which is probably some kind of self-balancing BST (I don't speak
Haskell).

~~~
coolsunglasses
You can use mutable data structures in Haskell but we strive to avoid it
except where strictly necessary. To find the "Haskell" version it suffices to
add either "Haskell" or "persistent" to the search query for a data structure.

Here's a priority queue library for Haskell, if you'd like an example:
[https://hackage.haskell.org/package/pqueue](https://hackage.haskell.org/package/pqueue)

------
petermora
Looks the same as in Clojure's lazy-seq documentation
[https://clojuredocs.org/clojure.core/lazy-
seq](https://clojuredocs.org/clojure.core/lazy-seq)

------
fibo
I wrote this with osfameron's help
[https://gist.github.com/fibo/1203756](https://gist.github.com/fibo/1203756)

------
sjbr
Shame on them!

------
LukeHoersten
That's an overly broad and sensational title. The simplified example on the
website certainly isn't representative of all Haskell programmers being liars.
It's unfortunate because the article about mis-implementations of the Sieve of
Eratosthenes is decent.

Edit: mods, thanks for changing the misleading title.

~~~
dang
Fortunately there's a decent subtitle and we can just use that.

~~~
LukeHoersten
Thanks a lot.

