
Sum of 1 to 1000000000 in different programming languages - dcro
http://stackoverflow.com/q/18046347/1027148
======
tsahyt
Haskell

    
    
        foldl' (+) 0 [1..1000000000]
    

You _could_ use sum, but that will eat up _a lot_ of RAM because of the
laziness.

EDIT: For the fun of it, I decided to do the same in a slightly more esoteric
language, so here's a Prolog version (given that your stack is big enough)

    
    
        rangesum(0,0).
        rangesum(N,X) :- M is N - 1, rangesum(M,Y), X is Y + N.
    
        ?- rangesum(1000000000, X), write(X).

~~~
dons
sum is fine in GHC. It is specialised for Integer. GHCi uses naive sum though.

~~~
cgag
I tried sum in GHC and it ate up my 16 gigs of ram and then crashed. I'm a
noob so maybe I did something wrong, but my code was:

    
    
        main = do
            putStrLn $ show  $ sum [1..1000000000]

~~~
TheCoelacanth
Pass GHC the -O2 option to turn on optimizations. You need the strictness
analyzer to run so that it can determine that sum is strict, otherwise you get
a space leak.

~~~
cgag
Thank you :)

------
Aardwolf
Someone replied the following in there:

"The key in this case is using C99's long long data type. It provides the
biggest primitive storage C can manage (128-bits on 32-bit machines and
256-bits on a 64-bit machine) and it runs really, really fast."

Isn't long long "typically" 64-bit? (I know the C standard doesn't actually
specify any actual size).

What platform does this long long type really give you the full 128 or 256
bits on?

And do 64-bit CPU's indeed support 256 bit integer types? If so, what can I do
to play with it!! C does not provide it for me on Linux!

Thanks :)

~~~
longlonguserna
If you want a guaranteed 64-bit type, put in your code:

#include <stdint.h>

then, use uint64_t for unsigned and int64_t for signed. If you want 128 bits,
in gcc you can use __uint128_t (it has two extra underscores at the beginning
because that size is nonstandard), but I don't think there is support for 256
bit integers.

Try a big integer library:
[http://stackoverflow.com/questions/124332/c-handling-very-
la...](http://stackoverflow.com/questions/124332/c-handling-very-large-
integers)

~~~
Moral_
int64_t is just a typdef'd long long in linux and osx:

typedef long long int int64_t;

------
fooyc
This issue happens on 32 bits builds of PHP and nodejs : The language switches
to a floating point representation when the result of some operation exceeds
INT_MAX.

In 64 bits PHP builds, the computation is done right.

~~~
mistercow
32-bit or 64-bit won't matter for Node.js. The Number type in JS is
specifically defined as using the 64-bit floating point format as defined by
IEEE 754, except that all NaNs are coerced to a single value. In terms of the
abstraction, there is never a cast when the value overflows; it should just
always be considered a double. Under the hood, there may be differences in how
the number is actually being treated.

~~~
tootie
I've heard this before and never understood why. Why?

~~~
mistercow
Presumably because it allows you to do anything involving double precision or
32-bit integer arithmetic, and performance was not originally a major
consideration. It's pretty rare to need more than 53 bits of precision (and
was even rarer for JS's original intent), so it makes sense that the numeric
type is kept simple. Edit: and to clarify, the advantage is that this makes
basic implementation extremely simple. Only if you want to optimize your
engine's performance do you have to worry about shuffling types around.

These days the solution for having more precision is to use an external
library. I think that's generally fine, although I think performance is a
concern. Financial applications aside, working with arbitrary precision is a
good hint that you might be doing something processor intensive. It's
certainly a case where I'd like the library to be compiled to target asm.js,
and maybe optionally NaCL, once those have widespread adoption. Ideally,
ECMAScript would also have a native implementation, but that won't eliminate
the need for a library for shimming for years to come.

~~~
gsnedders
Performance was always enough of a consideration that even BE's original
implementation had both int32 and double types internally, though black-box
unobservable (as in the black box, everything appears as a double).

~~~
mistercow
That is interesting.

As a side note, I'll bet you could have actually observed the difference via
timing at the time, assuming you knew what hardware you were working on. On an
early Pentium, a floating point add would have taken up to 3 times as long as
an integer add (depending on implementation), so by comparing in a loop, you
might be able to tell if a given value was being treated as an integer or a
double.

~~~
gsnedders
You still can — now the dispatch overhead is ever closer to zero, the cost of
the operation is even more apparent.

------
joe24pack
I think I might be doing it wrong, because I didn't do any looping. I'm a bit
too lazy and impatient for that, who wants to spend their afternoon adding all
those numbers up even with a computer.

    
    
      [joe24pack@staropramen ~]$ python
      Python 2.6.6 (r266:84292, May  1 2012, 13:52:17) 
      [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      >>> def gauss(x):
      ...   return (x+1)*(x/2)
      ... 
      >>> gauss(10)
      55
      >>> gauss(1000000000)
      500000000500000000
      >>>

~~~
Tibbes
Hmmmm, make that:

    
    
        def gauss(x):
            return (x+1)*x/2
    

(consider gauss(11), for example)

You gotta admit, on an article about the difference between integer and
floating-point arithmetic, that's pretty ironic!

~~~
joe24pack
It's odd that the odd numbers slipped my mind. Thank you for your gracious
correction.

------
jypepin
According to the time it took to my macbook air to calculate it in ruby, it
would be pretty interesting to have someone generate benchmarks for different
languages :)

~~~
surfearth
sum(range(1, 100000001)) took ~3.25 seconds on my air under python3.

~~~
gshubert17
Your number appears to be a factor of 10 smaller than the "one billion plus
one" of the original.

I get similar times on my Mac; about 3.2 seconds for 10 __8 and 31.4 seconds
for 10 __9\. Both are about 5 times faster than Python 2.7 for me.

------
robomartin
APL:

    
    
      +/⍳1E9
    

Try it yourself, download NARS2000 (free) here:

[http://www.nars2000.org/](http://www.nars2000.org/)

The "⍳" character is entered by typing ALT+i

Explanation:

    
    
      1E9 is 1,000,000,000
      ⍳1E9 generates a vector containing integers from 1 to 1,000,000,000 (inclusive)
      +/ is the sum of the vector
    

Any good APL interpreter will not actually generate a vector with a billion
numbers but rather recognize the above expression and optimize the resulting
operation for speed and minimal resource utilization.

EDIT:

Technically the "/" is the "reduction" operator acting along the last axis. In
the case of a single dimensional array it acts along the only available axis.
If, instead, it was acting on a matrix it would reduce along the columns.
Here's a longer annotated example with output from the interpreter:

    
    
      Generate a vector from 1 to 10:
          ⍳10
      1 2 3 4 5 6 7 8 9 10
    
      Sum:
          +/⍳10
      55
    
      Generate a vector of ten one's and zero's, repeating until the end:
          10⍴1 0
      1 0 1 0 1 0 1 0 1 0
    
      Now use that vector to reduce the original 1 to 10 vector, grabbing every other
      element starting with the first.  The effective result is that you end up with 
      all the odd numbers between 1 and 10:
          (10⍴1 0)/⍳10
      1 3 5 7 9
    
      Same things, now grabbing the even numbers by flipping the 1 0 sequence to 0 1:
          (10⍴0 1)/⍳10
      2 4 6 8 10
    
      Sum of all odd integers between 1 and 10:
          +/(10⍴1 0)/⍳10
      25
    
        Sum of all even integers between 1 and 10:
          +/(10⍴0 1)/⍳10
      30
    
      One could create a single vector with the odd integers between 1 and 10 followed by
      the even integers between 1 and ten by simply concatenating the generating
      expressions (APL executes from right to left):
          ((10⍴1 0)/⍳10),(10⍴0 1)/⍳10
      1 3 5 7 9 2 4 6 8 10
    
      And then you can reshape ("⍴") the result into a matrix:
          2 5⍴((10⍴1 0)/⍳10),(10⍴0 1)/⍳10
      1 3 5 7  9
      2 4 6 8 10
    
      Finally, use the scan operator again, now applied to a matrix, to sum along the 
      columns and produce a result for each row.  The effect is to output a two element
      vector with the sum of the odd integers between 1 and 10 as the first element
      and the sum of all the even integers between 1 and 10 as the second:
          +/2 5⍴((10⍴1 0)/⍳10),(10⍴0 1)/⍳10
      25 30
    

If you want to try this type the lines immediately following my comments above
right into the interpreter. The rho "⍴" or reshape operator is entered by
typing ALT+r.

Hope this helps make sense of it. Of course, there are other ways to
accomplish the same thing.

[https://en.wikipedia.org/wiki/APL_syntax_and_symbols](https://en.wikipedia.org/wiki/APL_syntax_and_symbols)

~~~
swirepe
Or J, if you're into that sort of thing

    
    
        +/i.1E9+1
    

You can get that here:
[http://www.jsoftware.com/stable.htm](http://www.jsoftware.com/stable.htm)

------
neurostimulant
Python 2.7 (Mid 2012 Macbook Pro, 2.5 GHz i5, 8GB, not using SSD)

sum + xrange (consumes ~20MB virtual memory):

    
    
        $ time python2.7 -c "print sum(xrange(1,1000000001))"
        500000000500000000
        python2.7 -c "print sum(xrange(1,1000000001))"  11.06s user 0.02s system 99% cpu 11.089 total
    

reduce + xrange (consumes ~20MB virtual memory):

    
    
        $ time python2.7 -c "print reduce(lambda a, b: a + b, xrange(1,1000000001))"
        500000000500000000
        python2.7 -c "print reduce(lambda a, b: a + b, xrange(1,1000000001))"  128.74s user 0.13s system 94% cpu 2:16.51 total
    

My machine swapping like crazy for more than an hour when I try using range().
I suspect it hasn't even finished allocating the list when I kill the process
after it consumes >30GB virtual memory.

    
    
        $ time python2.7 -c "print sum(range(1,1000000001))"

------
GregWright
3 milliseconds, man, the C optimizers blow my mind, they basically just cheat
and stick the answer in there. :-)

[C]cat sum.c #include <stdio.h> int main(void) { unsigned long long sum = 0,
i; for (i = 0; i <= 1000000000; i++) //one billion sum += i; printf("%lld\n",
sum); //500000000500000000 return 0; }

[C]time ./a.out 500000000500000000

real 0m0.003s user 0m0.001s sys 0m0.001s [C]gcc -O3 -S sum.c [C]cat sum.s
.section __TEXT,__text,regular,pure_instructions .globl _main .align 4, 0x90
_main: Leh_func_begin1: pushq %rbp Ltmp0: movq %rsp, %rbp Ltmp1: leaq
L_.str(%rip), %rdi movabsq $500000000500000000, %rsi xorb %al, %al callq
_printf xorl %eax, %eax popq %rbp ret

------
recursive
C#: Enumerable.Range(1, 1000000000).Select(Convert.ToInt64).Sum()

------
christopheraden
I asked a similar question for R about a year ago, and saw an interesting way
to do it, taking advantage of the math.
[http://stackoverflow.com/questions/11623865/faster-modulo-
or...](http://stackoverflow.com/questions/11623865/faster-modulo-or-equality-
checking-in-r-or-good-ways-to-vectorize)

~~~
minimaxir
Fun fact: in R, _sum(1:1E07)_ will throw a warning; you have to use
_sum(as.numeric(1:1E07))_ instead, which will indeed give the correct answer.

------
ChuckMcM
So nobody does it?

    
    
      $sum = int($max/2) * ($max+1));
      $sum += int(($max+1)/2) if ($max & 1);
    

This is perl of course but it is exploiting the fact that the sum of integers
is a sum of constants ($max + 1) with an additional term (the 'middle'
integer) if the top number is odd.

------
gshubert17
SBCL: Commenter postfuturist said that this code:

(time (let ((sum 0)) (loop :for x :from 1 :to 1000000000 :do (incf sum x))
sum))

took about 3 seconds from his REPL with SBCL, with about 8.5 billion CPU
cycles and 0 bytes consed.

Does anyone know why the same code on my version of SBCL (1.0.55.0-abb03f9) on
a Mac took 156 billion cycles and consed 24 billion bytes?

~~~
waterhouse
Consing sounds like it's allocating bignums. My guess is that you're using a
32-bit build of SBCL. In that case, fixnums only go up to something like 2^30,
and arithmetic with larger numbers will allocate memory. Can you check?

    
    
      * (log most-positive-fixnum 2) ;on 64-bit
      
      62.0

~~~
gshubert17
You're right. I have a 32-bit build, since I get:

* (log most-positive-fixnum 2) 29.0

Thanks.

------
waynecochran

      /*author: Gauss */
      var n = 1000000000;
      var sum = n*(n+1)/2;

~~~
cgh
Not sure why you're mentioning this as it's in the SO question:

"The correct answer can be calculated using

1 + 2 + ... + n = n(n+1)/2"

------
ck2
Knowing how to use a language is critical to get expected results.

This gives the proper result in PHP by forcing the integer cast.

    
    
        $sum = (int) $sum + $i;

~~~
pc86
It's been a long time since I've worked with PHP, I assume

    
    
        $sum += (int) $i;
    

will still convert to float once the size of $sum gets to the requisite size?

~~~
radiospiel
One of the reasons to stay away from PHP. Requesting and int but getting a
float regardless? That doesn't sit well w/me.

------
deerpig
It would have been interesting to see this problem solved in many different
languages. But I guess that would kill the question on Stackoverflow.

~~~
VMG
I don't think it would be that interesting - and I don't think we need to
rediscover the fact that some languages use IEEE754 as the default number type
over and over again

~~~
echohack
I think demonstrating a set of features like this between a large number of
langauges: say 20 to 50 would be a grand demonstration.

~~~
VMG
That's what [http://www.rosettacode.org](http://www.rosettacode.org) is for.

I agree that stackoverflow.com sometimes is a little too trigger-happy when it
comes to closing questions, but we really don't need a community-wiki top
answer with 1845 upvotes that is basically a rosetta code page, spawing dozens
of copy-cat questions with slight variations on the theme.

------
j-b
Visual LANSA:

    
    
      begin_loop using(#int8) to(1000000000)
        #r_dvp += #int8
      end_loop
    

From which VL will then generate 2,050 lines of C++.

------
bitwize
(display (/ (* 1000000000 1000000001) 2))

------
diego
Scala:

(1L to 1000000000L).sum

------
epochwolf
Ruby

    
    
         (1..1000000000).inject(:+)

~~~
RussianCow
Python:

    
    
        sum(range(1000000000))
    

:)

~~~
dragonwriter
Did you intend to demonstrate a very common off-by-one error in Python?

~~~
knome
Probably should've used xrange as well. In the 2.x series, Python's range
function returns an actual list.

I believe in 3.x range has been replaced with xrange.

~~~
masklinn
> I believe in 3.x range has been replaced with xrange.

Correct.

------
fjcaetano
Python is gold

~~~
NelsonMinar
And Python's integer type is gold-plated.

~~~
gshubert17
Does that make pypy platinum?

time pypy -c "print sum(xrange(1000000001))"

is 2.0 seconds on my Mac; a C program with 8-byte ints (long or long long)
takes 2.6 seconds.

------
terranstyler
Posted on stackoverflow, what an irony...

~~~
recursive
There's no stack overflow or even numeric overflow here. This is a floating
point precision problem.

~~~
terranstyler
Ted Hopp thinks it is transformed to float in order to avoid an SO. Unless you
know better, the irony is valid!

------
michaelochurch
Clojure:

    
    
        (reduce + (range 1000000001))
    

Written that way, though, it takes a long time (108 sec, compared to 2.6 sec
in C). So that's not fast, as elegant as it may be.

This is what I have for C. It's probably not great C code.

    
    
        int main() {
          long res = 0;
    
          int i;
          for (i = 0; i <= 1000000000; i++) {
            res += i;
          }
    
          printf("%ld\n", res);
          return 0;
        }
    

Faster than (naive?) C is this Clojure loop (no range object).

    
    
        user=> (time (loop [i 0 res 0] 
                  (if (> i 1000000000) 
                  res
                  (recur (inc i) (+ res i)))))
        "Elapsed time: 1612.094 msecs"
        500000000500000000

~~~
akurilin
Do you know of any good guides for how to fine-tune Clojure in situations
where you need to trade elegance for performance?

What you did that is great, so I'm wondering where I'd be able to find out
more about such techniques.

~~~
michaelochurch
I don't. I'm far from an expert on high-performance Clojure. (I'm really glad
that there is such a thing, and that people focus on it, however.) Joy of
Clojure and Programming Clojure get into optimizations a little bit, but I
think that field is still fairly new.

Sometimes with seqs one can end up with a "holding head" problem; if you're
doing stream processing but holding on to a seq, you can end up having the
whole thing in memory, which would kill you. That's not what's happening
there, though; a default-configured JVM can't hold anything close to a billion
longs in memory.

One of the neat things is that, because the REPL actually compiles code
(there's no interpreter) you get the same performance with the time macro as
you would get in compiled code. What that means is that testing for
performance can be done at the REPL and quickly.

To explain what I did and why, I figured that the tight loop would be
optimized to Java-like performance. With the more elegant formulation, I
didn't know what was going on in terms of types (how is +, a vari-aritied
function with many type signatures, being handled)? If the loop performed
poorly, I'd probably have put type hints on the arguments and replaced + with
unchecked-add; but it performed well so I left it as it was.

------
hannibal5
Common Lisp:

    
    
      * (loop for i from 1 to 1000000000 sum i)
    
      500000000500000000
      *

~~~
hannibal5
;; let's time it * (time (loop for i from 1 to 1000000000 sum i))

    
    
      Evaluation took:
        2.374 seconds of real time
        2.372148 seconds of total run time (2.372148 user, 0.000000 system)
        99.92% CPU
        8,071,475,337 processor cycles
        0 bytes consed
      
      500000000500000000
    
    
      * (disassemble (lambda () (loop for i from 1 to 1000000000 sum i)))
    
      ; disassembly for (LAMBDA ())
      ; 02C21574:       BB02000000       MOV EBX,   2                 ; no-arg-parsing entry point
      ;       79:       31C9             XOR ECX, ECX
      ;       7B:       EB27             JMP L1
      ;       7D:       90               NOP
      ;       7E:       90               NOP
      ;       7F:       90               NOP
      ;       80: L0:   48895DF8         MOV [RBP-8], RBX
      ;       84:       488BD1           MOV RDX, RCX
      ;       87:       488BFB           MOV RDI, RBX
      ;       8A:       4C8D1C25E0010020 LEA R11,  [#x200001E0]      ; GENERIC-+
      ;       92:       41FFD3           CALL R11
      ;       95:       480F42E3         CMOVB RSP, RBX
      ;       99:       488BCA           MOV RCX, RDX
      ;       9C:       488B5DF8         MOV RBX, [RBP-8]
      ;       A0:       4883C302         ADD RBX, 2
      ;       A4: L1:   483B1D25000000   CMP RBX, [RIP+37]
      ;       AB:       7ED3             JLE L0
      ;       AD:       488BD1           MOV RDX, RCX
      ;       B0:       488BE5           MOV RSP, RBP
      ;       B3:       F8               CLC
      ;       B4:       5D               POP RBP
      ;       B5:       C3               RET
      ;       B6:       CC0A             BREAK 10  ; error trap
      ;       B8:       02               BYTE #X02
      ;       B9:       18               BYTE #X18                  ; INVALID-ARG-COUNT-ERROR
      ;       BA:       54               BYTE  #X54                  ; RCX
      ;       BB:       90               NOP
      ;       BC:       90               NOP
      ;       BD:       90               NOP
      ;       BE:       90               NOP
      ;       BF:       90               NOP
      ;       C0:       90               NOP
      ;       C1:       90               NOP
      ;       C2:       90               NOP
      ;       C3:       90               NOP
      ;       C4:       90               NOP
      ;       C5:       90               NOP
      ;       C6:       90               NOP
      ;       C7:       90               NOP
      ;       C8:       90               NOP
      ;       C9:       90               NOP
      ;       CA:       90               NOP
      ;       CB:       0000             ADD [RAX], AL
      ;       CD:       0000             ADD [RAX], AL
      ;       CF:       0000             ADD [RAX], AL
      ;       D1:       94               XCHG EAX, ESP
      ;       D2:       3577000000       XOR EAX, 119
      ;       D7:       0000             ADD [RAX], AL
      NIL

