Personally I'm a little wary of being ghettoised into something overly domain-specific for scientific/numerical computing. Really good interop may mitigate that -- something which can navigate the unholy mix of C, C++, fortran, matlab, octave, R and python routines one comes across trying to reproduce others research work, would indeed be awesome.
I do wonder if some of the noble demands of this project might be better delegated to library developers though, after adding a bare minimum of syntax and feature support to a powerful general-purpose language. For now Python+numpy+scipy seems a great 90% solution here.
I huge part of the goal here is to reduce the need for the "unholy mix of C, C++, fortran, matlab, octave, R and python routines" in both academic research work and machine learning / data science code in industrial settings. The whole project kicked off with me ranting about how I was sick of cobbling things together in six or seven different languages.
So interop is a very, very high priority. We have pretty good C ABI interop now. You can just call a C function like this:
We still want better C/C++ interop though. I've already looked into using libclang so that if you have the headers you don't even have to declare what the interface to a function is. That was very preliminary work, but I got some stuff working. Making Julia versions of C struct types transparently would be another goal here.
Another major interop issue is support for arrays of inline structs (as compared to arrays pointers to heap-allocated structs). C can of course do this, but in any language where objects are boxed, it becomes very tricky. We're working on it, however, anyone who wants to discuss, hop on julia-dev@googlegroups.com :-)
in any language where objects are boxed, it becomes very tricky. We're working on it, however
Naive question (and I'll hold my tongue on my naive guesses): I understand why it is necessary to box user-defined types on the JVM, but why build this restriction into a new language that doesn't run on a restricted platform? Especially when performance and C interop are high priorities?
Or perhaps I misunderstood, and your statement should be read as sparsevector suggested.
Boxed values are pretty much necessary for dynamic languages — that's where the information about what kind of value something is gets stored. It is a pain for this kind of thing, however. In a fully statically compiled language like C, however, you can eliminate the need for a box entirely. If you want dynamic typing, that's the price we've gotta pay.
Not necessarily. You just store a pointer to the type info inline with the data (like C++'s vtables). You can have unboxed "value-types" (const structs, essentially) in dynamic languages. In fact, you could even differentiate between a "boxed" ref type and a value type at runtime, because refs don't need all 64 bits of the pointer. So a ref is a 64 bit pointer with the first bit set to, say, 0, and a value (struct) type always begins with a 64 bit pointer to it's type information, only it's tagged with MSB of 1. Since you can't extend concrete types, you can easily store value types inline in an array, and just have the type-info pointer (which must be the same for all elements, b/c there is no inheritance) at the beginning of the array. And if your structs are aligned OK, you could easily pass them to C by skipping the type-info pointer both in the single value case and in the array case.
Ok, having read this comment again, here's a more measured response. You're assuming in this comment the value-type vs. object-type dichotomy that's used in, e.g., C#. That's one way to go, but I'm not sold that it's the best way. Deciding whether you want something to be storable inline in arrays or not when you define a type is kind of a strange thing. Maybe sometimes you do and sometimes you don't. So the bigger question is really if that's the best way to go about the matter.
It seems that in dynamically typed languages, you either need to have two kinds of objects (value types vs. object types), or two kinds of storage slots (e.g. arrays that hold values inline vs. arrays that hold references to heap-allocated values). The boxing aspect is really only part of that since you can't get the shared-reference behavior unless the storage is heap-allocated, regardless of whether there's a box header or not.
This is an interesting scheme and seems like it might work, but I'm not sure. Would you be willing to pop onto julia-dev@googlegroups.com and post this suggestion there so we can have a full-blown discussion of it? Hard to do here — and some of the other developers would need to chime in on this too.
As a compromise, I think it would be helpful to be able to define structs that have a known layout in memory but no dynamic identity. They would be treated like primitive types in Java, but with the crucial difference that users could define their own. That way users could write Julia code that stores and accesses data in the same format they need for interoperating with whatever native libraries they use, instead of serializing and deserializing between Julia objects and C struct arrays (or using int or byte arrays in their Julia code and giving up most of the advantages of a modern programming language.)
We've discussed our way down that path but the design ends up being unsatisfying because "objects" and "structs" end up being so similar yet different. It may be what we have to do, but I haven't given up yet on having structs and Julia composite types be compatible somehow. pron's scheme is interesting.
Ah, there are finalisers. The function finalizer lets you define a function to be called when there are no more references to an object. I guess maybe the idea is to use this from within the constructor.
No, currently it's an inline array of immutable 128-bit numeric values and we use bit-twiddling to pull the real and imaginary parts out. However, that's a temporary hack. (It's also why the mandel benchmark is relatively slow — all the bit-twiddling is not very efficient.)
The longer-term approach is still up in the air and that's what I was talking about above. My favorite approach at this point is to allow fields to be declared as const — which in Julia means write-once. Then if all fields are const the object is immutable and can automatically be stored in arrays inline.
Can you please explain or give a link what the problem really is (that is, why do you have "bit twiddling" at all) and how you imagine that const arrays of complex can be write once and still efficient?
What's the problem to have arrays of doubles and complexes as "basic" types even in the dynamic language? I believe this could give you a C footprint and C performance with array operations?
It sounds like he's saying it's hard for Julia to interop with languages that don't support arrays of inline structs (e.g. Java). I could be misreading it though.
I've been waiting for Fortress for a LONG time, but either progress has been real slow or the Fortress team has trouble communicating their progress to the community. Probably both.
From a quick look I can tell that while Fortress is more similar to Scala, with classes, mixins and static typing, Julia is closer to Clojure, with no encapsulation, dynamic typing, homoiconicity, and separation of behavior (methods) from concrete types (akin to Clojure's protocols).
I really like the choices Julia's designers have made, particularly multiple-dispatch methods, final concrete types and lispy macros. The language seems very elegant. Not too crazy about "begin" and "end" syntax, though :)
Yeah, we've been waiting for a while for Fortress too. It seems like a lot of time and effort has gone into a WYSIWYG IDE and not as much into the actual language implementation :-(
Chapel has made a lot of progress in the past 2.5 years (while we've been working on Julia) and is certainly a contender.
Julia certainly aims to be more dynamic like Clojure, but specially designed to be good for numerical and technical stuff, and yes, the begin/end is due to Matlab — scientists who have a lot of code in Matlab already are a major target demographic. As it is, a lot of Matlab code ports over with relatively trivial changes (see: http://julialang.org/manual/getting-started/#Major+Differenc...), which was the point of syntactic similarity to Matlab.
Octave has solved this by allowing both matlab-style "end", as well as block-specific endings, like "endif", "endfunction", "endfor", etc.
Having several end end end endings without any hint of what they are closing is one of the ugliest parts of matlab. Even though you are reimplementing this for compatibility reasons, it is rather sad that you chose to make this the default behaviour for Julia.
I agree. If all they say is "end end end" you might as well use braces. I'm depressed to hear that Matlab is an inspiration to the language design at all; the Matlab language is by far the worst aspect of Matlab, and it has nothing to recommend it. Think of the damage that Sun did to Java to make it look familiar to C programmers; no need to repeat that mistake (though if you're cynical, you might think it was the smartest thing they ever did.)
Braces would be better than ending keywords, in my opinion, but I'm disappointed not to see whitespace-defined blocks of a la Python. Perhaps that doesn't work in a statically-typed, type-inferred language, but if it does, I think it would be much better. Scientists have no problem with it (certainly less of a problem than programmers do) and Python is pretty well accepted in the scientific computing community.
Using curly braces for blocking is a non-starter because they're used for a lot of other things, and honestly, bracket pairs like (), [], {} are way too precious, imo, to squander on something like blocks. Parens () are exclusively for function application; square brackets [] are exclusively for indexing operations; curly braces {} are for type parameterization. The other option that C++ popularized the use of is <> — but that's syntactically sketchy since both < and > are valid by themselves. It makes parsing a complete nightmare — both for machines and, to a lesser extent, people. I wouldn't be completely averse to indentation-based blocking, but I'm not really a huge fan of it either. I'm cool with the way both Matlab and Ruby do it, which is using `end`. Could conceivably change, but relatively trivial syntactic alterations like this are a really low priority. What we have now works well and is familiar to both Ruby and Matlab programmers.
Neither do many foreign language tokens like 我是加拿大人 but that doesn't stop me typing them in quickly using the IME. Admittedly those brackets aren't in any IME I know, but maybe they should be.
Matlab already allows you to omit the 'end' from function definitions, which many people find easier to read, and my experience seeing people transition from Matlab/Octave to Python+Numpy+Scipy is that people get on board with indentation-delimited blocks, because that's the way they write code anyway. But I agree - at the moment it's a pretty trivial thing. As a heavy user of Matlab, R and Py+N+S I'm looking forward to trying out Julia.
I personally found it worse to read, because it makes function blocks different from other blocks. Also, if you define nested functions in Matlab, all functions in the file have to be closed with end anyway.
Nothing is as bad, though, as the default Matlab behavior of not indenting first-level function code. This makes it really hard to scan a file with multiple functions and see where they separate.
I was referring specifically to supporting the Octave conventions, a superset of Matlab's, which make it easier to match up begins and ends.
When I said "easy fix" I meant "a trivial extension to the parser" without realizing you would translate that to "really low priority". Guess I'll save my non-trivial suggestions. :)
While you are at it could you support alternate syntaxes for the same Expr ? Perhaps encoding the syntax version at the top of the file or setting the reader at the top of the file? For those porting matlab code over either provide a tool to translate the source or put the reader in matlab compatible mode.
After that proposal is implemented (perhaps in version 3 of Scala, almost ten years after its first release, though granted it wasn't their top priority) they'll still have to write at least a little bit of boilerplate for each type they want to store in unboxed arrays, and they'll be limited to types that can be stored as Java primitive types under the covers. That seems like a pretty nasty limitation for a general-purpose scientific computing language, not only because it decreases native performance but because it makes C interop much more difficult.
For a general-purpose scientific language, I think it is imperative to adopt the principle that any user-designed type should be able to achieve the same power, performance, and elegance as if it were a built-in type. To the best of my knowledge, that rules out JVM languages.
(Scala was never meant to be a "JVM language," but in terms of its strengths, its weaknesses, its user base, and its future, it is very much a JVM language, and the obstacles to implementing efficient user-defined types is one of the trade-offs they accepted when they went with the JVM.)
I agree, and I think Julia's designers have chosen correctly not to use the JVM (although, in the comments below you'll find that Julia uses boxed types for "structs" as well, though it would be far easier to build true array-embeddable complex types on LLVM than on the JVM). However, value types for the JVM are a work in progress (as well as tail-calls), and I think they might have already been implemented in the DaVinchi project (future JVM improvements), and so will find their way to the JVM in due time (see https://blogs.oracle.com/jrose/entry/tuples_in_the_vm). Until then, scientific computing languages will most likely choose a different platform.
And what should the library writers use? The great advantage of this is that much of it can be done in one language.
From the article: "The library, mostly written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing."
I wouldn't say that it sucks - that's way too strong. I am one of the authors of Circuitscape (http://www.circuitscape.org), which uses python+numpy+scipy. The community loves it, the fact that its python, open source, embeddable in other tools, etc.
However, much of the code is written in a vectorized style for performance reasons, as is the case with many high level scientific computing languages. This leads to unnatural code in some cases, and also uses too much memory. First I thought IronPython was the way, but have been looking forward to pypy+numpy+scipy eagerly.
If I were to use julia, the code would be a lot more natural, because type inference and all the compiler goodies make it possible to simply write loops over arrays when I need them. This was one of the reasons we started working on julia, because everything else just seemed to fall just a little bit short.
OK, perhaps I should have qualified that. It's a good 90% solution for a lot of use cases, although obviously more useful for implementing algorithms in terms of numerical building blocks, than ultra-performance-critical code where you want to write a lot of your own tight inner loops. Is that what you meant?
The parallelisation story isn't great at the moment either, although seems like it has the potential to improve. Still 'sucks' is pretty harsh. For me, looking at how far it's come since I first played with it, I'm impressed. For machine learning, the majority of the building blocks one needs are there, and you get to sit back and put them together using a nice, clean, widely-adopted general purpose programming language. And unlike MATLAB still maintain a decent amount of control over things like memory usage and which BLAS routines it's calling.
Adding bindings for new libraries is more of a pain than it should be though on the occasions where you do really need some fortran or C++ library that doesn't have bindings yet. A language which bridges the gap between high and low levels (not C++!) and has great interop would be very interesting. I guess I'm just hopeful that this kind of thing can be achieved in a general-purpose language (new or existing) which like Python is adopted across the wider software engineering community. Perhaps that's my unreasonable demand to add to their list :)
Seriously? I mean, I know you are very excited about your work on PyPy, but why do you have to go around saying baseless stuff like this? There are many, many thousands of people who write lots of Python code every day that rely on Numpy, Scipy, and Matplotlib to get their work done, and who seem to be quite pleased with it. There is a lot of work left to do on all parts of that stack, but that's a far cry from "it sucks".
I guess I should clarify that - performance sucks. There are obviously various ways around it, but you just can't write a lot of performance critical python that way and I guess this is one of the reasons why julia exists in the first place.
But if you look at the actual cutting edge research work in the HPC and scientific space, they are working on languages that allow domain experts to express computation using higher level primitives, not on fancy compiler techniques to make general purpose imperative languages like C or Python "automagically" run faster on single cores.
The general consensus, if you look at languages like Chapel and Fortress and X10, seems to be that most scientific codes shouldn't be written using for-loops. That is the low-level control flow construct dating from the age of assembler. Instead, what scientists generally want to say is, "Apply this kernel across this domain, with these windowing conditions", or "Reduce values from this computation along these keys in my dataset". As software developers, our job is to provide the language runtime to allow them to do that; such a runtime will be the most robust, correct, maintainable, and performant.
Right. I think we actually violently agree with each other on that :) Note that in general it's ok to not write much python and live very happily just using it for non-performance critical parts. No doubt we both know a lot of people who are quite happy with that.
The problem with numpy's performance is twofold:
* Numpy expressions might not be fast enough. I believe you guys at continuum are trying to address that one way or another. In general the kernel expressed using high-level constructs in python should not be slower than an equivalent loop in C.
* Sometimes you actually want to write a for loop, because you don't care, because it's faster, because it's a single run, because the data is manageable etc. You should not be punished for doing that with 100x performance drop. You can still be punished for that with 2x performance drop.
Much praise!! These guys have incredibly good taste. Almost every single thing I can think of that I want in a programming language, they have it. All in the one language!
The fact that it has parametric types, parametric polymorphism, macros, performance almost as good as C, good C/Fortran interop, 64 bit integers and an interactive REPL all in the one language just blows my mind.
I wasn't able to tell if it is possible to overload operators, which is another thing essential to mathematical code.
I was also unsure why the keyword end was needed at the end of code blocks. It seems that indentation could take care of that.
I also didn't see bignums as a default type (though you can use an external library to get them).
However, all in all, I think this is the first 21st Century language and find it very exciting!
Thanks for the incredibly high praise. It definitely is possible to overload operators — operators are just functions with special syntax (see http://julialang.org/manual/functions/#Operators+Are+Functio...). We don't have bignum support, but adding it via GMP would be fairly easy. Just too many things to do!
If you happened to feel like trying to port it to Julia, it would be fairly doable. All the low-level functionality is there to allow it — bit twiddling operations, growable dense arrays of integers, etc.
Unfortunately the performance would be poor. This is not a reflection on Julia. Even compiled C is between 4 and 12 times slower than assembly for some bignum operations.
Also, LLVM handles carries and certain loop optimisations poorly, so even using LLVM bytecode you can't do much better than compiled C. It would be a massive project to improve this in LLVM (I thought about giving it a go sone time ago but decided it was overwhelming). And that use case is probably too specialised for the improvements to help with much else. Obviously the LLVM backend is fantastic for 99% of use cases and improving all the time.
N.B. I am not implying that a good assembly programmer is generically faster than a C compiler. Bignums are a very special case.
That makes a whole lot of sense. For myself, I wouldn't even attempt this because it's so much harder and more time-consuming to try to reimplement something like bignums efficiently than it is to just use a stable, mature and fast external library like GMP or CLN — or something like your project if BSD/MIT licensing is a must.
It's been done. I think it was Gambit Scheme that had its own bignum library. And for a while, very large integer arithmetic was reportedly faster than GMP, which if you know anything about GMP is quite an achievement. However, the GMP guys subsequently fixed this problem.
The language syntax seemed uninteresting, which is not really a bad thing but what's the case for the need for a whole 'nother language in that category?
I don't think lua has integers ie. like Javascript everything is a double, that can be changed by redefining a macro and recompiling AFAIK but it's still one global numeric type (and you can't change it for luajit).
No; there are definitely some major differences between the two languages, but they seem to have a lot of similarities. Julia's obviously made for scientific work, and Lua's a general-purpose scripting language designed to be embedded in a host application, but they seem to have a lot of design constructs in common.
It would certainly be nice if there was an option to use 0 based indices in blocks of code. It's understandable in that they are pitching at the technical community, and many mathematical papers and books are written with 1 based indices. But I am a mathematician who prefers 0 based indices.
Actually, it is quite easy to implement 0 based indices or any other indexing scheme in julia, since all the array indexing code is implemented in julia itself.
I personally would find multiple indexing schemes confusing both for usage as well as to develop and maintain. Given that 1 based indexing seems to be a popular choice among many similar languages, we just went ahead with that.
It generalizes the choice of 0 or 1 to an arbitrary starting index. So when you create an array you specify
not just where it ends but also where it begins. This lets you do neat things (consider a filter kernel with range [-s,+s]^n instead of [1,2s]^n) and the extra complexity it adds can be hidden when not needed using for-statements or higher order functions.
Nobody uses it because the implementation is not very efficient and Haskellers have a chip on their shoulder about performance. It subtracts the origin and computes strides on every index, but you could easily avoid this by storing the subtracted base pointer and strides with the array. Of course when you go to implement it you'll see light on 0-based indexing :)
A number of programming languages allow an arbitrary range of indices for an array, including Ada, Fortran 77, and Pascal. See the "Specifiable Base Index" column in this table at Wikipedia: http://en.wikipedia.org/wiki/Comparison_of_programming_langu...
Actually, the main reason I want to do away with Matlab and Octave is that I can't stand the 1-indexing! When voicing that opinion among collegues, I have heard no-one disagree with me. If you are actually stuck with this in Julia as well, I don't think I will have anything more to do with it.
Actually, broadly speaking, I think math (think summation etc.) in general is usually 1-index based while programming is 0-index (due to memory locations so that the array index also points to the first element?).
At first we were going to use 0-based indexing, but it made porting any Matlab code over very hard, which defeats a large part of the purpose of having Matlab-like syntax in the first place — to leverage the large amount of Matlab code and expertise that exists out there.
However, as I've used it more and more, 1-based indexing has really grown on me. I feel like I make far fewer off-by-one errors and actually hardly ever have to think about them. This has led me to conclude that 1-based indexing is probably easier for humans while 0-based indexing is clearly easier for computers.
Many divide-and-conquer algorithms seem to be easier to express with 0 based indexing, whereas quite a few array operations seem to be better with 1 based indexing. I can certainly understand and appreciate the different points of view, I just personally always think about algorithms with 0-based arrays.
As does anyone trained in the C tradition, but it's annoying, too, to have to translate 1-based math formulas to the C convention. Having recently used Octave for the Stanford online ML class after a couple decades of C, C++ and Java, I doubt programmers will have trouble with the mental transition.
Well, for what it's worth, I certainly also "grew up" with 0-based indexing (actually, literally grew up since I was a kid when I learned Pascal). I'm just saying that 1-based has really grown on me and that I find myself thinking about avoiding off-by-one errors far less often when using 1-based indexing. There are other times when I really wish I was using 0-based indexing. However, I find that that latter are more often times when I'm doing libraryish internals code, whereas the former are more common when I'm doing high level userish code.
Just spent the last hour reading through the docs and playing around with it. This is some damn sexy stuff. It's been a long time since I've been this excited by a language.
The language looks interesting, but I am a bit concerned about the license situation, as in my understanding they have misinterpreted the GPL with regard to shared libraries. Quoting:
Various libraries used by the Julia environment include their own licenses such as the GPL, LGPL, and BSD (therefore the environment, which consists of the language, user interfaces, and libraries, is under the GPL). Core functionality is included in a shared library, so users can easily and legally combine Julia with their own C/Fortran code or proprietary third-party libraries.
If the program dynamically links plug-ins, and they make function calls to each other and share data structures, we believe they form a single program, which must be treated as an extension of both the main program and the plug-ins. In order to use the GPL-covered plug-ins, the main program must be released under the GPL or a GPL-compatible free software license, and that the terms of the GPL must be followed when the main program is distributed for use with these plug-ins.
I believe what they're saying is that there are two things:
1. The Julia core, which consists of the language runtime and core functionality, is MIT licensed, and builds into an MIT-licensed shared library.
2. The Julia "environment", which includes a user interface, third-party libraries, etc., some of which are GPL, and which is therefore GPL as a whole.
I believe they're saying that you can link #1 with proprietary code, not meaning to imply that you can link #2 with proprietary code (because as you point out that wouldn't work). How useful that is probably depends on how many of the libraries the average application needs are in bucket #1.
That is exactly right. At the moment I think the only GPL library Julia uses is readline — but obviously that's pretty important for the repl, so we chose to do it this way.
The last REPL I wrote used linenoise ( https://github.com/antirez/linenoise ) instead of readline, and it worked great for us. You may need more of readline's features than we did, but it's worth a look.
The FSF interpretation is undermined by the existence of non-GPL Linux programs (that link, call and use data structures from the (GPL-licensed) Linux API).
Looks very much like a python without the legacy and a touch of ruby, ipython as the default shell, better multiprocessing, better os/system access, decorators and mathematics. Nice clear documentation too.
Given how similar to python they are their main competitor is pypy, which isn't too far behind them in performance I'd suspect (absent from the benchmarks though). I was fully expecting someone to make a non-compatible fork of pypy to create a faster python-like lang/something like julia. They should quickly get some elegant small libraries like sinatra/bottle and requests (access to dbs with http interfaces as well). A robust requests+lxml library built-in would be simply amazing.
PyPy is missing complex support in numpy (on trunk anyway) and numpy.random to run those benchmarks. Those that run:
fib - roughly the same time as CPython
parse_int - 10x over CPython which is extrapolating to their python vs julia about 20% slower
quicksort - 4x faster than cpython much slower than julia.
pi_sum - 20x faster than cpython, a bit faster than julia?
Those are very unscientific measurments. Both fib and qsort are recursive benchmarks (why would you write qsort recursively??), so I guess the JIT does not have time to kick in (and PyPys support kinda sucks for recursion at least in terms of warmup times).
On a slightly unrelated note - a modified lxml library runs on pypy albeit a lot slower right now.
Well you are being a bit nitpicky as pypy is both the python implementation and the JIT/Rpython/framework bit. What I mean was someone holding very close to the python implementation in almost everything (syntax etc.) but streamlining it/breaking compatibility/giving it a new name ("python3-as-it-should-have-been" perhaps?).
Further, if we are going to be nitpicky, when you use something like that for its intended purpose it's still a fork - I forked Twitter bootstrap for my new project/I used Twitter bootstrap for my new project - so "you wouldn't fork PyPy" is a false statement.
> if we are going to be nitpicky, when you use something
> like that for its intended purpose it's still a fork
No. Fork is definitely not semantically congruent with use. You're only forking Bootstrap if you create a path that diverges from its mainline; don't get confused by GitHub parlance.
Pypy most commonly refers to the python implementation anyway, as can be easily seen by going to the pypy website[1]. Daslch abused the fact that "pypy" points to two things to make a meaningless nitpick. Now you are claiming that I have claimed that fork and use are semantically congruent, where do I claim that? I think you are nitpicking and parsing "it" wrong. In either case a question asking for clarification would be more polite than going on attack.
There seems to be a disease, let's call it Eric Raymond Disease (ERD), on HN in particular[2] of people busting in rudely to declare with supreme confidence on the usage of quite generic words and/or slang. Let's start by tabling the fact that a particular "software definition" of "fork" has not yet reached the OED, or online dictionaries like Merriam-Webster for that matter. Going from that I can't see how you can declare it to only have one definition when you already admit there is a competing one ("GitHub parlance"). The base concept of the word fork is a divergence, a branch, which seems to me to cover any copy+modify move for software so someone copy+modifying bootstrap is forking it, someone copy+modifying the implementation of python called pypy is forking it.
Are you and Daslch really contributing to HN or are you taking a rather banal comment and turned it into a 100% useless thread by bickering over semantics?
It's great the metaprogramming support seems to be a requirement for new languages these days.
Unfortunately, this one combines the inconvenience of Template Haskell (explicit invocation of macros with @) and the bugs of Common Lisp (the section on hygiene says, basically, "we have gensym"). Fortunately, the language is young, and I hope they can improve this story.
I find that the @ syntax is amazing, since the reader (programmer) knows exactly which form is a macro (and can look it up), as opposed to introducing arbitrary, often very confusing syntax.
Thanks. We've considered doing other things — like function call syntax for macros (which is essentially what Lisp/Scheme have), but decided against it. It just causes confusion. Can you pass a macro around like you can a function? Can functions shadow macros and vice versa? What happens if a macro introduces a local variable that shadows the macro itself? Basically it comes down to fact that macros are syntactic and as such behave very differently from functions, which are not syntactic. With the @ syntax, there's no confusion.
Someone already mentioned PLOT. This would be a good thing to lookup when designing your hygienic macro system, http://users.rcn.com/david-moon/PLOT/index.html. The author was also involved in Dylan, which is another infix language with hygienic macros.
Seems like there have been a variety of attempts to introduce lisp-style macros into non-lisp languages that use traditional syntax rather than s-expressions. I can see how this can succeed for simple macros without sacrificing much. However, complex macros done this way tend to be much more complicated than the equivalent lisp code. Do you believe that Julia solves this problem, or is it also subject to some of the same tradeoffs?
Constructing and manipulating code in Julia is a bit more complicated than Lisp — because Lisp lists are so damned simple. However, not by much. I think most of the complexity of code that generates code is inherent. Sometimes being that meta just makes your brain hurt. Our printf implementation (https://github.com/JuliaLang/julia/blob/master/j/printf.j) is about as bad as it gets and it's still pretty understandable — aside from the inherent complexity of implementing printf and the separation of what's computed at compile time and what's computed at run time.
Just a comment/suggestion: could someone with GitHub write access go over your wiki and fix code samples where multiple lines have been concatenated into a single line? E.g. https://github.com/JuliaLang/julia/wiki/Types-and-their-repr..., heading `Built-in types`.
Oh, also I believe the link was probably too stale docs. They're gone now. What's on the website is mostly up-to-date, but a few things have changed and will need fixing.
The problem with this is that it limits the options of library writers. For example, I can't replace some function with a macro that does something smart, such as check format strings for printf at compile time, because that breaks all callers. It also fundamentally limits what you can build with macros, because you can't build language features that are basic -- they always appear tacked-on.
That is very true — and it's precisely why we have considered making macros callable with function syntax. But I feel like having something that looks like a function and is actually a macro is a bit of a dangerous lie, no matter how handy it sometimes is. One of our design goals is not to be too tricky — if something looks like a function call, it should be a function call. The @foo syntax for macro calls means that you know exactly what's going on. It also means you can't do stuff like try to pass a macro as an argument to a higher-order function — what does that even do? I.e. what does map(m,vec) mean where m is a macro?
If macros are marked as different to functions at the callsite, then do they need to be marked different at the definition site? Most functions return a non-Expr value, but could return a Expr if the program's job is manipulating them. Most macros return a Expr, but could return a literal for insertion into the code.
So I'm wondering does a programming language which marks macro expansions different to function calls (as Julia does with the @ prefix) really need to distinguish functions from macros in the definition (as Julia does with the 'function' and 'macro' keywords) ?
It's not yet another language that we need. It's more high level functions.
C or fortran can be as fast as they want, but in mathematica I can do MorphologicalComponents[] and get the components. having these functions available to me speeds up my time to discovery by 1000x or more.
Pardon the possible naïveté, but I'm so old I remember Ada. Seems that language was designed to address a very similar problem space. What does Julia offer that Ada doesn't? Or perhaps: why is ada so deficient that spending 2+ years inventing a new language was a better proposition than taking the time to improve Ada?
If the problem IS that it's huge and inelegant, then the solution isn't to try to improve it, but to start from a clean slate. Julia looks like a reasonable attempt.
I have made the same rants about numeric computing language issues as the Julia creators. I would avidly adopt Julia based on the documentation on the website. It seems to me that this is a beautifully conceived language given the concerns it attempts to address. Amazing even if the performance never improves.
The R thread on the dev list gives an accurate representation of the obstacles to Julia being widely adopted, however. One might become very unhappy while using R, but sometimes you have to use it, because something you want is only in R, due to the ubiquity of that tool for stats.
If Python, which does not have that many disadvantages besides not having been explicitly designed to appeal to users like me, cannot unseat Matlab and R, Julia will have a difficult time.
But I will give it a try, and will implement some simple but core algorithms that I use a lot.
Another idea to test Julia's usefulness would be to port a tool like Waffles to Julia. In my opinion implementing such a sensible tool like Waffles in C++ is a heartbreak.
I'm a little worried about some of those benchmarks. They appear to have benchmarked R from a few years back against 2011 Matlab, for instance. In addition, they provide no code used for the benchmarks. That being said, this looks really interesting, and I'm gonna bookmark it for when I have time to examine it properly (after the damn phd is finally submitted).
They do link to the code: https://github.com/JuliaLang/julia/tree/master/test/perf . It doesn't look like a very reliable microbenchmark - run test x 5 times - but it should provide a useful starting point if you want to run your own.
Thanks guys, I'm not sure how I missed that. The benchmarks look interesting, but it is strange how old an R version they used. That being said, its a really interesting language, and I look forward to playing with it.
This looks like it would be a great language to tackle the Project Euler problems with - which is usually a good sign, in my book. A big-int type would be nice though.
So, the usual caveats about writing code to match the languages strengths apply here. For example the fib function plays to a python weakness - function overhead. Rewriting the code to be a simple loop removes the inefficiency:
import time
def fib(n):
if n < 1:
return None
if n == 1:
return 0
if n == 2:
return 1
x0 = 0
x1 = 1
for m in range(2,n):
x = x0 + x1
x0 = x1
x1 = x
return x
if __name__=="__main__":
assert fib(21) == 6765
tmin = float('inf')
for i in xrange(5):
t = time.time()
f = fib(20)
t = time.time()-t
if t < tmin: tmin = t
print str(tmin*1000)
That's quite true, but this micro-benchmark wasn't chosen to make Python look bad — it was chosen to test how good each language was at function calls. So using a loop defeats the point of the benchmark. Using double recursion to compute Fibonacci numbers is also stupid and could obviously be avoided in all languages.
The story the micro-benchmark tells is that Julia's pretty good at function calls, but JavaScript is even better (and, of course, C/C++ is the gold standard). In general the V8 engine is really amazing. We had the advantage of being able to design the language to make the execution fast (with the constraints of all that we wanted to be able to do with it), but V8 makes a language that was in no way designed for performance and makes it blazingly fast.
After I wrote the benchmark code for JavaScript and saw just how fast it was I had a moment of "should we be doing scientific computing in JavaScript?" Now wouldn't that be nuts?
Yes, I pretty much had the same thought, but then I started thinking about multiple dispatch, calling C/Fortran libraries, polyhedral optimizations, and I realized that V8 developers may not have the same design targets in mind.
I feel like V8 faces a harder problem and has solved it ingeniously. Not to knock HotSpot, which is excellent and keeps getting better — it just seems like making the JVM go fast is not as hard as making something like JavaScript go fast. It's a tough comparison; kind of apples to oranges.
this is surprisingly complete for a relatively new(?) project.
one notable restriction is that inheritance is only for interface, not implementation.
also, can anyone find a sequence abstraction (like lists)? arrays seem to be fixed size and i don't see anything else apart from coroutines. am i missing something?!
[perhaps not, if it's intended for numerical work. on reflection i am moving more and more towards generators (in python) and iterables (in java, using the guava iterables library to construct maps, filters etc) rather than variable length collections, so maybe this is not such a big deal. it's effectively how clojure operates, too...]
pron is right: you can inherit behavior from abstract types and abstract types, unlike interfaces, can have code written to them. (In single-dispatch OO languages, you are in the strange situation that you can write code to interfaces if they are arguments but not if they are the receiver of a method; thus, you're in a situation where you can either dispatch on an interface or you can write code for it, but never both at the same time.)
Arrays are not fixed size: there are push, pop, shift and unshift operations on them just like Perl, Ruby, etc. This uses the usual allocation-doubling approach so that the entire array doesn't need to be copied every time, but it's still usually much better to pre-allocate the correct size. Of course for small arrays that are typical in Perl, Ruby, etc., it hardly matters. If you're building an vector of 1 billion floats, however, you don't want to grow it incrementally.
The sequence/iterable abstraction is duck-typed: an object has to implement methods for three generic functions:
i = start(x)
done(x,i)
next(x,i)
The state i can be anything.
Lack of implementation inheritance is one of those things that intro to OO books make a big deal of, but when you don't have it, you don't miss it at all — or at least I don't. I've never found tacking a few fields onto the end of an object to be very useful. I don't want to inherit memory layout — I want to inherit behavior. Julia's type system lets you write behavior to abstract types and inherit that for various potentially completely different underlying implementations.
I don't know if I'd call it duck-typing when methods are not grouped into interfaces. I think you could say that in Julia, each method is an interface. I find it much cleaner than actual "duck-typing" (the way it's done in , say, Scala with structural types) because then you have both an interface, which specifies a contract, as well as the method name, which also specifies a contract when duck-typing is used, even when it's found in different interfaces.
Must compound types in Julia be concrete? I don't think I saw it in the documentation.
It's been discussed, and would be satisfying to have an `Iterable` abstract type and ensure that every object implements `Iterable` satisfies contract of having appropriate `start`, `done` and `next` methods, but that requires two features we don't have yet: multiple inheritance of abstract types, and some way of specifying an interface. There's been some discussion of this, and I believe we have the way multiple inheritance could work mostly worked out; enforcement of interface implementation, not so much. However, it's a pretty massive undertaking to add it to the language. Quick teaser: generic functions and final concrete types actually make multiple dispatch work a lot better than it does with single dispatch and the "bag of named methods inside an object" model of traditional OO.
So far we haven't actually felt a "pressing need" for multiple inheritance or interfaces, and we tend to take a pressing-need approach to language features. If you can live without a feature for a while, then maybe you really didn't need it in the first place. But we'll have to see what happens when other people are starting to try to use if for things.
Aren't compound types inherently concrete? The compoundness describes the implementation of the type, implying that it must have an implementation, hence must be concrete.
Nice! I do a fair amount of scientific computing in MATLAB. This looks to have a lot of the powerful array syntax / functions of MATLAB, but with the neat feel of Python, and all kinds of extras.
When you try to install julia on MacOS X 10.7.3, you may see the make fail because wget is not installed. Easy to fix with:
brew install wget
[Edit] git page says gfortran (and wget) are downloaded and compiled, but if they're not already installed make fails. So...
brew install gfortran
The need to do this separately may have to do with licensing?
[Edit] And if you're not root, install to /usr/share/julia will fail. So you'll need to do:
sudo make install
I'm sure all this is perfectly obvious to Unix-heads who are inured to this sort of abuse, but I'm a Mac-head, used to things that Just Work, and I hate this shit.
Sorry! The only thing I can say in our defense is that this is pretty trivial compared to installing a lot of scientific computing packages. But seriously, we'd like a drag-and-drop Mac installer. Anyone want to do that? (Only half kidding.)
Stepping back a bit, this is one of the reasons why having an entirely web-based experience is appealing — then you can let people use a known-good setup without needing to mess around with installing a fairly extensive amount of software just to get basic things to work. Then there's also the general appeal of doing big data work a la Google docs or Gmail. The trick is getting the user experience to be good enough on the web.
Someone just submitted a patch that uses curl on OS X. Also, no need for root, because julia will run out of the directory it is built in. Hopefully, someone who knows more about mac packaging will build a drag-and-drop installer.
Pretty tied into Python/Numpy ecosystem & 3rd party libs; however, after reading that (somewhat over the top, albeit very persuasive) posting, I'm definitely going to dig a bit deeper.
Really nice first impression, especially like the clean syntax and calling of c -- looks promising, thanks!
I am currently building it - OS X Lion.
First, I had to install wget:)
Then, I got a certificate error on the https, so I pasted the command into Firefox and got the tarball.
After copying it to the julia dir, I edited the tar command and ran it.
Finally, doing make right now.
Here is the certificate error:
Connecting to github.com (github.com)|207.97.227.239|:443... connected.
ERROR: The certificate of `github.com' is not trusted.
ERROR: The certificate of `github.com' hasn't got a known issuer.
We used to use curl until recently so that it would build just fine on OS X. We changed it to wget, because it just seems so much easier to use than curl, and we had to do some nasty stuff at one point to download a few libraries that I couldn't figure out how to do with curl.
BTW, do try out the mac binaries if the build is an issue. We are still trying to make it all build seamlessly!
For the record "ease of use" was not why we switched to wget — there was (for a while) a dependency that could only be downloaded using wget's recursive downloading. That's not the case anymore, however, so we should really switch back to curl.
Very exciting. The real power of MATLAB likes in its toolboxes, though, I wonder if there will be an easy (and possibly automatic) way to convert toolboxes.
One more thing: it would be awesome if the `manipulate` from Mathematica could be incorporated somehow in that web interface. See:
MATLAB toolboxes are simply awesome for users, even though they are expensive. We hope that if enough people find julia useful, many such libraries will be built. Having written a ton of mex and matlab stuff, and a lot of julia libraries, I find that julia is nicer to use personally - but of course I am biased. :-)
I do believe that open source + good compiler + simple C/Fortran calling interface will lead to others being able to write toolboxes in julia itself and plugin libraries when needed.
Would be interesting to see how Julia perf compares to C++ compiled or JITed with LLVM via G++ 4.6+Dragonegg or Clang 3.0/svn. Would be more apples-to-apples, since both would use the same middle- and back-end. G++ 4.2.1 is a bit obsolete at this point, but as it probably came by default on the MBP they tested on, it's understandable.
That would certainly be doable. If the performance is better, we can certainly switch to using that for our benchmarks. The idea for the benchmarks is to compare to a "gold standard" — hence the fact that the best results are taken across all optimization levels. We could even take the best results across multiple C compilers to give ourselves the absolute hardest comparison :-)
For analogy there are few original liquor (vodka,rum,Whiskey,Brandy,Wine etc..) but god damn so many cocktails, so many cocktails.
Same in Programming Language design. Few original concepts,
Lisp,C,Smalltalk, but so many cocktails even my grandma is creating one. All we want is more libraries.
I hope this really takes off. Though, for it to really take off it needs a large ecosystem. The fact that the source is hosted on Github is a good start.
If you look at the Github repo, you will notice the second most used programming language in the repo (C being the first) is Objective-J ! I wonder why?
make[3]: gfortran: Command not found
make[3]: * [lsame.o] Error 127
/bin/sh: ./testlsame: not found
/bin/sh: ./testslamch: not found
/bin/sh: ./testdlamch: not found
/bin/sh: ./testsecond: not found
/bin/sh: ./testdsecnd: not found
/bin/sh: ./testieee: not found
/bin/sh: ./testversion: not found
make[2]: * [lapack_install] Error 127
make[1]: * [lapack-3.4.0/INSTALL/dlamch.o] Error 2
make: * [julia-release] Error 2
a bit late to the party here, but i see no-one has mentioned gpus. since julia supports calling c libraries i guess this will work, but i wondered if anyone knew of any work on closer integration?
(also, julia is not a great name for googling....)
C as desert island programming language but making all indexing 1-based: "the first element of any integer-indexed object is found at index 1, not index 0, and the last element is found at index n rather than n-1, when the string has a length of n." Also begin end blocks - maybe there was a pascal ninja hidden in the syntax committee room?
That's pretty common for new language implementations - Node.js was around for a couple of years before the Windows port was ready.
To be honest, if you're targeting early adopters of a programming language Linux and Mac support is probably a lot more important than Windows. Smart Windows users can always run Linux in a VM.
It is true that a number of people do their scientific computing on Windows+Matlab or Windows+R, but that most parallel stuff is typically on linux clusters. It would be nice if julia worked on both.
It is not by design that we do not have Windows support. It's just that none of us uses windows. We do believe that code is largely portable, and with a little effort, it can be built on cygwin or mingw. Nothing like a native port though. Maybe someone who is familiar with windows will come along and contribute. This is a common question our friends and colleagues ask us.
I doubt it was intended as an insult. Julia (and other scientific languages) are often used in clustered environments, and Linux is much, much more common than Windows in HPC.
I realize this is just a comic, but it's one of the reasons that we try to be as compatible with C and Fortran libraries as possible — that's where the vast bulk of high-quality scientific computing work has been done, and we want to be as compatible with it as possible. The goal here is not to duplicate all the work that's already been done, but to allow existing high-quality libraries to be smoothly and easily used together in one environment. That's very much the same philosophy that NumPy and SciPy have, and indeed, it much the same goal. The biggest issue with NumPy, IMO, is that Python arrays aren't designed with linear algebra and interoperability with libraries like BLAS and LAPACK in mind. This leads to the somewhat maddening distinction between Python arrays and NumPy vectors and matrices — and lots of conversion back and forth between them.
I also posted that comment without reading the article -- that was just my first response to "yet another programming language".
I'm sure that it's filling a real need. I just wish that groups could cooperate to make a handful of languages/libraries better rather than having 100 competing ones.
Actually, there is much more co-operation than is visible. Almost all the groups share the same libraries, same bugfixes, same patches, etc., which typically will account for much of the user experience.
The different language approaches will typically compete on things like syntax, speed, etc., and this leads to innovation and cross-pollination of ideas. I personally prefer to have some choice, and use a number of languages for scientific computing myself - but I guess too much choice makes things confusing for the newcomer.
Personally I'm a little wary of being ghettoised into something overly domain-specific for scientific/numerical computing. Really good interop may mitigate that -- something which can navigate the unholy mix of C, C++, fortran, matlab, octave, R and python routines one comes across trying to reproduce others research work, would indeed be awesome.
I do wonder if some of the noble demands of this project might be better delegated to library developers though, after adding a bare minimum of syntax and feature support to a powerful general-purpose language. For now Python+numpy+scipy seems a great 90% solution here.