Hacker News new | past | comments | ask | show | jobs | submit login
Julia, I love you (2012) (johnmyleswhite.com)
136 points by Adrock on Feb 12, 2013 | hide | past | favorite | 65 comments




Why do people think that computing Fibonacci numbers recursively is a good benchmark of anything?

Doesn't it require O(n^2) function calls for something that can be computed in O(1)?


Actually being fast is not the point; naive fib is used because it's a very well known algorithm, easy to implement consistently across multiple languages, that heavily exercises both function calls and numeric addition.

In other words, as much as a 10-line program can be, it's a reasonable proxy for a language's performance on algorithmic code.


Also, it's a reasonable low bar to begin with for a college-educated programmer (not that all programmers need to be college-educated), because the Fibonacci sequence in mathematics (and in everyday language) is commonly defined in its recursive form, so it's likely that people will be familiar with it.


Note to self: If I ever implement a compiler, I'll include a special Fibonacci detection pass in the optimizer.


That kind of thing does happen when the stakes get high enough. There are a lot of rumors (and some evidence) of special-cased "cheating" on commonly used benchmarks, including the 3dMark GPU benchmarks, SPEC CPU benchmarks, even SunSpider JS benchmarks.


Both Nvidia and ATI have been caught cheating at graphics benchmarks. Their drivers would detect when the benchmark programs were being run, and go through lower quality fast paths. You can google "quack.exe" for details on one particularly famous case.


Actually, it takes real effort to make GCC not completely optimize-away this kind of stuff for C:

https://github.com/JuliaLang/julia/issues/1325


There's a simple solution to that: use Ackermann instead.

It measures the same thing and is a function of its own time complexity to boot.


It tests how well the compiler deals with function calls, and that's it. Useless for measuring a language's overall speed (as are all micro-benchmarks).


If the compiler deals with arithmetic in a very inefficient way, that will also show up.


Fibonacci is O(log n) or so in the best case, not quite O(1). Of course you can do a simple table lookup if you're limiting yourself to 32 or 64-bit numbers, but you can go far beyond that.


There's a formula from analysis that gives you the Nth Fibonacci number in closed form. It involves a couple of exponentials, which you can compute to a given accuracy in bounded time.


You obviously haven't tried doing this or you wouldn't suggest this as a way to actually compute Fibonacci numbers. Here's a definition in Julia (hey, why not) of a closed form Fibonnaci function:

  cfib(n) = ((1+sqrt(5))^n - (1-sqrt(5))^n)/(2^n*sqrt(5))
Let's see how it works:

  julia> cfib(1)
  1.0

  julia> cfib(2)
  1.0

  julia> cfib(3)
  2.0
Great! But let's look at a few more:

  julia> cfib(4)
  3.0000000000000004

  julia> cfib(5)
  5.000000000000001
Oops. Those aren't integers. I guess we could round, but at some point we might be rounding to the wrong integer. In fact, once you get up to 63, this happens and worse still, at 64 you get an infinite answer.

Here's a simple, reasonably efficient iterative Fibonacci function in Julia:

  function ifib(n)
    j, k = 1, 1
    for l = 3:n
      j, k = k, j+k
    end
    return k
  end
We know this one works up to the limits of Int64 range:

  julia> for n = 1:10
           println(fib(n))
         end
  1
  1
  2
  3
  5
  8
  13
  21
  34
  55
Using a doubly recursive algorithm for Fibonacci is indeed silly, but it's a good toy test of recursion performance.


You're right, I hadn't, but it sounds to me like it worked pretty well in your tests. It works better in Python than in Julia, apparently:

    >>> [int(round(((1+sqrt(5))**n - (1-sqrt(5))**n)/(2**n*sqrt(5)))) for n in range(128)]
    [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073L, 4807526976L, 7778742049L, 12586269025L, 20365011074L, 32951280099L, 53316291173L, 86267571272L, 139583862445L, 225851433717L, 365435296162L, 591286729879L, 956722026041L, 1548008755920L, 2504730781961L, 4052739537881L, 6557470319842L, 10610209857723L, 17167680177565L, 27777890035288L, 44945570212853L, 72723460248141L, 117669030460994L, 190392490709135L, 308061521170130L, 498454011879265L, 806515533049395L, 1304969544928660L, 2111485077978055L, 3416454622906716L, 5527939700884771L, 8944394323791488L, 14472334024676260L, 23416728348467744L, 37889062373144008L, 61305790721611752L, 99194853094755776L, 160500643816367552L, 259695496911123328L, 420196140727490880L, 679891637638614272L, 1100087778366105088L, 1779979416004719360L, 2880067194370824704L, 4660046610375544832L, 7540113804746369024L, 12200160415121913856L, 19740274219868282880L, 31940434634990198784L, 51680708854858489856L, 83621143489848688640L, 135301852344707186688L, 218922995834555891712L, 354224848179263111168L, 573147844013818970112L, 927372692193082081280L, 1500520536206901248000L, 2427893228399983329280L, 3928413764606884839424L, 6356306993006868692992L, 10284720757613753532416L, 16641027750620622225408L, 26925748508234379952128L, 43566776258855008468992L, 70492524767089384226816L, 114059301025944392695808L, 184551825793033785311232L, 298611126818978194784256L, 483162952612011980095488L, 781774079430990309097472L, 1264937032043002356301824L, 2046711111473992665399296L, 3311648143516995021701120L, 5358359254990987687100416L, 8670007398507983245672448L, 14028366653498970932772864L, 22698374052006954178445312L, 36726740705505935848636416L, 59425114757512894322049024L, 96151855463018834465652736L, 155576970220531703017897984L]
But, although it gives close answers up to cfib(604) (after which it throws an exception) it still gives slightly inexact answers starting at 308061521170129 (cfib(71)), which is between 2⁴⁸ and 2⁴⁹. I'd expect it to make it a few items further before going astray, since it's presumably doing floating-point calculations with 53 bits of mantissa that are supposed to be rounded to less than half an ulp.

I'm comparing with

    >>> fibs = range(2)
    >>> while len(fibs) < 128: fibs.append(fibs[-2] + fibs[-1])
    ...
    >>> fibs
    [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073L, 4807526976L, 7778742049L, 12586269025L, 20365011074L, 32951280099L, 53316291173L, 86267571272L, 139583862445L, 225851433717L, 365435296162L, 591286729879L, 956722026041L, 1548008755920L, 2504730781961L, 4052739537881L, 6557470319842L, 10610209857723L, 17167680177565L, 27777890035288L, 44945570212853L, 72723460248141L, 117669030460994L, 190392490709135L, 308061521170129L, 498454011879264L, 806515533049393L, 1304969544928657L, 2111485077978050L, 3416454622906707L, 5527939700884757L, 8944394323791464L, 14472334024676221L, 23416728348467685L, 37889062373143906L, 61305790721611591L, 99194853094755497L, 160500643816367088L, 259695496911122585L, 420196140727489673L, 679891637638612258L, 1100087778366101931L, 1779979416004714189L, 2880067194370816120L, 4660046610375530309L, 7540113804746346429L, 12200160415121876738L, 19740274219868223167L, 31940434634990099905L, 51680708854858323072L, 83621143489848422977L, 135301852344706746049L, 218922995834555169026L, 354224848179261915075L, 573147844013817084101L, 927372692193078999176L, 1500520536206896083277L, 2427893228399975082453L, 3928413764606871165730L, 6356306993006846248183L, 10284720757613717413913L, 16641027750620563662096L, 26925748508234281076009L, 43566776258854844738105L, 70492524767089125814114L, 114059301025943970552219L, 184551825793033096366333L, 298611126818977066918552L, 483162952612010163284885L, 781774079430987230203437L, 1264937032042997393488322L, 2046711111473984623691759L, 3311648143516982017180081L, 5358359254990966640871840L, 8670007398507948658051921L, 14028366653498915298923761L, 22698374052006863956975682L, 36726740705505779255899443L, 59425114757512643212875125L, 96151855463018422468774568L, 155576970220531065681649693L]

For exact answers, you clearly can't do better than O(N) (where N is the magnitude of the input parameter, not its length in bits) because the output is of size O(N). But for approximate answers, you can.

I agree about the uses for doubly recursive Fibonacci!


Yes. You can get the same approximate closed-form Fibonacci behavior in Julia with this definition:

  cfib(n) = ((1+sqrt(5))^n - (1-sqrt(5))^n)/(2.0^n*sqrt(5))

  julia> cfib(128)
  2.5172882568355055e26
Unfortunately, this approximate value is off by 1066512995643 (~1 trillion), so it's hard to see how it would be useful. Actually, it's hard to think of good uses for computing Fibonacci numbers in general :-)


Yes, but if you want to compute exact results, then you're stuck with log n time.


Can't you do it in O(1) if you limit yourself to floats? Arbitrary n but not arbitrary precision?


I imagine the table would still be of reasonable size then. Of course, this is still technically restricting the output to 32 or 64-bit numbers.


No need for a table lookup. There is a closed form formula. Of course this is only as accurate as your float precision. You can get out pretty far, though, with some clever rounding.

http://en.wikipedia.org/wiki/Fibonacci_number#Closed-form_ex...


It actually requires O(fib(n)) function calls, or equivalently about O(1.62^n) function calls.


Fibonacci numbers cannot be computed in O(1), only estimated. All closed form algorithms start giving imprecise answers after some value of n.


Finally there are downloadable binaries for Windows and OSX! http://code.google.com/p/julialang/downloads/list

Also, previous discussion: http://news.ycombinator.com/item?id=3784349


There is also an IDE for Mac, Linux, and Windows available here: http://forio.com/julia/downloads


I can understand the need for new programming languages to be continually developed, especially with differing power continuums. So I have nothing against this post, or Julia itself.

But when someone advertises that they want x feature and y feature in one language, such that the respective languages of x and y are essentially obsolete, I get a little skeptical. I'm certainly not the type of person who tries to vaguely advocate all languages are more or less the same and it's just taste; not at all.

But I think there are reasons why it's intrinsically difficult to design a language to have both the speed of C and the abstraction power of Lisp, for example. Things come at a price; this doesn't mean there isn't generally a clear victor for what you're trying to do, just that getting a paragon of linguistic design is very difficult.

But his test of Julia is promising, though honestly I'd really like to see a speed comparison on at least 100 lines of code, preferably a full application really.


It is hard ... OpenDylan (http://opendylan.org/) manages to mostly pull it off when the right type information is available. It could be smarter at times, but it does pretty well.

In a recent test for someone, they had code running in C++ in 0.5 seconds, Dylan in 0.6.

Our major performance loss in OpenDylan comes when there's still too much generic dispatch going on, but that can usually be reduced through various mechanisms. (And as I said above, there's still avenues where the compiler could do better.)

Another interesting implementation is Gambit Scheme which manages to pull off some really impressive feats of compilation for performance once you declare '(not safe)' although it is still fast with safety enabled.


Bruce, what is your association with OpenDylan? (It feels odd referring to a language that shares my name, ha).

What are Dylan's strengths and weaknesses?

And what you say about Scheme is believable, though I wouldn't use Scheme personally as I prefer Lisp if I'm going to be in that family.


Quick run down:

* Dylan was designed at Apple and then later a partnership of Apple, CMU and Harlequin in the 1990s

* Harlequin's implementation of Dylan was spun off to another company when they failed. It was then called Functional Developer.

* Functional Developer was open sourced in 2004 as Open Dylan.

Since then, the development was moving rather slowly, the documentation was still in the 1996-era Framemaker documents, etc.

I decided to help really revive it and we've since done the 2011.1 and 2012.1 releases and are working on the 2013.1 release now. The documentation has been brought forward into the modern era, the website updated, new libraries are being written, etc. I've spent a large amount of time over the past 15 months or so to make a lot of that happen.

As for strengths:

* A pretty great compiler that can do a lot of optimizations.

* Multiple dispatch. As StefanKarpinski says in another comment, it is a great thing.

* A condition system like Common Lisp.

* Everything is an object, so multiple dispatch integrates very cleanly.

* I like the macro system. It is more like syntax-case than full macros, but I find it pretty useful.

* Everything is pretty clean and consistent. There aren't a lot of strange edge cases and the code is really pretty readable.

* Multiple return values, keyword arguments, 'rest' arguments.

* Compilation to an efficient executable.

* The compilation model lets a lot of existing tools work with the compiled code. I'm able to use Instruments on OS X, perf on Linux, etc to analyze performance and so on.

* We have a lot of documentation as a result of having formerly been an expensive commercial product.

Weaknesses:

* Small community.

* Some things are still written with 1990s expectations and need some love, like atomic operations and some aspects of the numerical model.

* Some other nits that would get ironed out pretty quickly with a larger community.

* Marketing hangover from the 1990s in terms of people thinking Dylan is old, dead, etc. We've contemplated just changing the name on a future release to make it more clear that we're moving forward.

As for Lisps ... SBCL and Clozure CL are great implementations. :)


It was a sad day when Apple dropped Dylan.

Who knows, today we could have been using it for iOS, if they had kept the language around.


With under a week of work, we could probably be up and running on iOS with Open Dylan. I even have the start of some aspects of a Dylan / Objective-C bridge, but that needs a fair bit more work. Talk to us if you're interested.


Thanks, but I am not doing iOS development.

Good luck with the project.


Hi Bruce, How can I get in touch with you? It looks interesting.


Is there anything in particular you could use help with? I'm always up for Open development.


I've dropped you an email.


I should also point out that I completely agree that is intrinsically difficult to design a language to have both the speed of C and the abstraction power of Lisp. That is part of why the job that Jeff Bezanson, who is largely responsible for Julia's amazing performance, has done is so incredible.

It's hard to explain until you start using it, but ubiquitous, fast, expressive multiple dispatch is a truly amazing programming paradigm. I'm not entirely sure how it hasn't been adopted as the core paradigm of any mainstream language in the past.


Stefan, do you mind telling me what your programming background is? What languages did you come from before working on Julia?

I'm going to look more into the documentation for it now.


Not at all. I've done a lot of C, Matlab, Perl, Ruby and R, although I've also dabbled in Java, JavaScript, C++, Scala and Python. Definitely a polyglot of "pragmatic" languages. Julia was influenced most strongly by Lisp (Scheme in particular), Matlab, C, Python, Ruby and Perl. If you (or anyone else reading this) have questions looking through the manual, drop an email to the julia-users@googlegroups.com list, or me personally.


I have just started going through the docs a bit. I am interested in 2 things. One, programs that interface with other commands -- essentially rewriting large shell programs in a "better" language. I saw the section on "external commands" and that looks very promising. Are there any other links or sample programs I can check out. Would Julia be just as good as say Ruby in this respect ?

Is there a readline interface ?

Second, I haven't yet come across an ncurses interface yet.

Thanks.

edit: Just came across your article "Shelling out sucks". | In my followup post, I will describe how Julia makes constructing and executing pipelines of external commands as safe as Python’s subprocess and as convenient as shelling out. I could not find a link to the followup. Do you have something ?


Miles Lubin and Iain Dunning gave an excellent presentation at MIT's IAP tutorial on Julia in January where they benchmarked rather non-trival code for doing linear programming in a variety of languages:

  https://github.com/JuliaLang/julia-tutorial/blob/master/NumericalOptimization/presentation.pdf
  http://bit.ly/VQ9rDd (shortened version since that's cut off)
See page 7, "Simplex Benchmarks". Essentially, they implemented the basics you need to do a fast simplex linear programming algorithm in each of C++, Julia, Matlab and Python. Their results were in line with the Julia home page benchmarks:

  Julia is between 1-2x slower than good C/C++ code.
  PyPy and Matlab are 10-20x slower.
  Python is 80-400x slower.
The code can be found here:

  https://github.com/mlubin/SimplexBenchmarks
These results are fairly typical.


I think there should be a Python+NumPy variant for comparison as well.


I've added a bit of an explanation as to why there's no NumPy version.


SciPy does have some sparse support.

http://www.scipy.org/SciPyPackages/Sparse


Remove the four spaces before the long URL, it will be properly displayed and linkified. Meanwhile:

https://github.com/JuliaLang/julia-tutorial/blob/master/Nume...

https://github.com/mlubin/SimplexBenchmarks


1-2x slower is still remarkable given what Julia (apparently) has the power to do. Thanks for the citations!


Yes, on the other hand it is how programming languages advance.


Am I the only one with hopes of Julia actually evolving into a general purpose programming language?

With almost C++ level performance and homoiconicity it seems really sweet and I'd love to see it used instead of Go. Though 1-indexed arrays are definitely annoying as fuck...


Well, if you're feeling really sadistic

    baremodule MyWay
    ref(x::AbstractArray, i::Int) = Base.ref(x,i+1)
    ref(args...) = Base.ref(args...)
    end
is not that far away (I'm oversimplifying a little, but not much). However, please don't do this as it breaks coherence within the community. I found 1-based indexes annoying at first, but have grown quite fond of them since.


yeah, having mixed 1-indexes and 0-indexes code would be 10x worse than just getting used to 1-based, so hope people don't do that :)


1-based arrays are one of the more lambasted traits of Lua (an otherwise supremely elegant language), and as a long-time C/C++ programmer, I personally find 0-based arrays very natural.

However what I've found from doing lots of Lua programming, is that I just don't notice very much in practice.

To some degree, at least, I guess this reflects that one writes different sorts of code in different programming languages. The sort of offset-based address calculations one might do a lot in C just don't seem to happen very much in Lua.

So while my natural inclination is to dislike 1-based arrays, I no longer consider them an automatic demerit....


Erlang has 1-based lists as well.


It's worse in Erlang:

"Mainly, Regex and Binary indexes are zero-based, List and Tuple are one-based."

I don't remember now but one of the collections in stdlib is zero-based too. It's much worse to be inconsistent than to choose 0 or 1-based indexes for everything.


> one of the collections in stdlib is zero-based too.

Yep, array: http://www.erlang.org/doc/man/array.html

It's definitely a mess.


I'm not sure who downvoted this, but:

    > lists:nth(1, [a, b, c]).
    a
    > lists:nth(0, [a, b, c]).
    ** exception error: no function clause matching
       lists:nth(0,[a,b,c]) (lists.erl, line 117)


Julia's math library is starting to shape up:

http://docs.julialang.org/en/latest/stdlib/base/#mathematica...

It even has a digamma, Riemann zeta, and Bessel functions with fractional orders, which you won't find in Excel.

The statistics built-ins are a bit weak, however.


Have you looked at the various statistics-related packages? We moved most of the statistics stuff (and may other areas as well) into packages.


What's missing from the statistics built-ins?


Can anyone venture a guess on how successful this will be ? Given that R is pretty widespread right now, and Scipy/Numpy/Blaze is looking to win in the future (with the Darpa funding for Continuum[1]).

Wonder if Julia was considered for the Darpa grant.

[1] http://www.itworld.com/big-data/340570/python-gets-big-data-...


Wow, the list of modules has gotten very respectable since the first time I heard about Julia.

I have one concern though: how well-tested are they? There's one really prominent contributor (John Myles White), who has written a rather large number of packages. But there are only a limited number of bug-free LOCs that a human can write in a given time-span.

So Julia might well be a better language than R, but is it standard library as reliable? And how much time will it take until it becomes as trusted as that of R?


Julia's benchmarks just turned me off. They are ignoring the "reasonable idioms" of each language, and specifically focusing only on RECURSIVE and ITERATIVE algos. And also ignoring other competing peers like lua based GSL Shell. Please read https://groups.google.com/forum/#!topic/julia-dev/2MJQ_uADH4...


I'm the OP of the thread you linked to. This patch will hopefully appease some of your concerns:

https://github.com/JuliaLang/julia/pull/2278

Do you know if the following is single-threaded? I'm not familiar with python.

    numpy.sum(1./(numpy.arange(1,10000)**2))


> Julia's benchmarks just turned me off. They are ignoring the "reasonable idioms" of each language, specifically focusing only on RECURSIVE and ITERATIVE algos.

I read the benchmarks as an invitation to experiment (which I am) - not as a call to language war; if it doesn't float your boat, then that's cool (but you might want to check it out again in a year or two). Your concerns are specifically addressed on the front page in the benchmark discussion (http://julialang.org):

"These benchmarks, while not comprehensive, do test compiler performance on a range of common code patterns, such as function calls, string parsing, sorting, numerical loops, random number generation, and array operations. It is important to note that these benchmark implementations are not written for absolute maximal performance (the fastest code to compute fib(20) is the constant literal 6765). Rather, all of the benchmarks are written to test the performance of specific algorithms, expressed in a reasonable idiom in each language. ... The point of these benchmarks is to compare the performance of specific algorithms across language implementations, not to compare the fastest means of computing a result, which in most high-level languages relies on calling C code."

We all know C compilers make fast machine code, so it seems kind of silly to tediously write apple-oranges benchmarks in ways which will usually just end up benchmarking language X FFI anyway. Excepting BLAS stuff (as noted) pure Julia is within 2x of C, and that is really the take-home message of the benchmarks.

Here's why these small, hot-point benchmarks matter: the reasonable idiom in Julia is to rewrite slow algorithms in the same language. See this recent contribution for a great example of what is possible:

https://github.com/lindahua/Devectorize.jl

This combination of meta-programming expressiveness with speed in one language is really quite profound. Among other things, I think it will go a long way towards reducing the impedance-mismatch between "C core developers" and "end-user" communities in other technical computing environments (see NumPy,R).

Julia has a beautiful, small, and stunningly accessible language implementation in C (isolated C++ for LLVM stuff); and it is certainly pragmatic by using BLAS,FFTW, etc. where appropriate --- but the numerical core is defined in Julia itself, and - IMHO - that is a very fundamental shift for building an open-source computing environment and community.

(LuaJIT is a masterpiece and it would be cool to see it in the benchmarks, but there are many reasons why I personally have no interest in numerical work using Lua)


I've just started learning Julia and I'm also loving it so far.

It's really well designed with loads of great features (and decent libraries) for scientific computing.


I don't think many projects with such a horrible() distribution system succeeded in the past.

()horrible for package maintainers. Try to create a package from this... There are no releases. I don't see any use of DESTDIR. And bundled libraries are also a major setback. It seems like the developers want to make it as hard as possible to distribute their software.

And I see more and more software going this direction. :(


Up to this point, the language and standard library was in flux, hence the perpetual v0.0.0.

v0.1 is around the corner, though, so, hopefully, things will start to stabilize.


Lord knows there aren't enough Bobby Sherman references.

http://www.youtube.com/watch?v=f7PLcHnMNKE




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: