Giving up on Julia

KenoFischer · on May 13, 2016

Happy to address these points:

  - Startup performance/memory usage

Yes, we are definitely very acutely aware of these. Julia is not currently optimized for frequently run short scripts. That's the price on pays for having to bring up the entire runtime system (initializing the compiler, RNG, external libraries etc). The good news is that there will be a solution to this soon, which is to statically compile your julia program. The area where this really comes up for most people using is package load times. We're very actively working on making that faster.

  - Syntax

A little subjective, so not sure how much I can say here. I can say that I'm not a huge fan of our multi-line comment syntax. It's not entirely clear what a better syntax would be though (the original issue on this had some suggestions, but some of them were worse).

  - One-based indexing

I think there has been plenty said on this topic, though interestingly this is one of the only times I've seen the argument made in a way that I actually agree with. That said, I do think there is an easy way to deal with this though. For packages that needs arrays of indices, it would be quite easy to define an `IndexArray` type that does the translation automatically.

  - String Formatting

Yep, you're right, it's a mess. It'll have to be cleaned up.

  - Unsafe C Interface

There's two projects (Clang.jl and Cxx.jl) which can help with this. The former automatically generates ccall definitions for you, the latter just parses the header and generates the call directly.

  - Slowing down in development

I'm really not sure where that impression comes from. Perhaps it is that we're adding fewer features, but rather working on cleaning up existing features. Also, I personally at least have been doing a lot of work outside of base (particularly on the debugger). Not sure. Would love to know.

vitaut · on May 13, 2016

Thanks for the detailed response. I hope that my post wasn't too harsh, the intent was mostly to attract attention to the current issues not to undermine the great work that you and others have been doing. I'm glad that many of the issues that I mentioned are being addressed. Maybe I'll give Julia another go in some time =).

The question of syntax is subjective of course. From the set {C-like, Python, MATLAB} I'd definitely like to see more influence of the first two and less from the last one.

bmh100 · on May 13, 2016

One-based indexing is also used in Fortran, which seems to be used in a great deal of numerical computing even today. Additionally, BLAS/LAPACK is an important linear algebra library written in Fortan.

I am somewhat confused by your discussion of startup times. Since Julia is a "programming language for technical computing", what scenario are you imagining where startup times would be a significant concern?

whyenot · on May 13, 2016

Not just FORTRAN, but R and MATLAB also use one-based indexing. It's also, at least historically, the convention for matrix notation. IMO do whatever attracts more users, as that is what Julia needs most. Without a large community moving from MATLAB, R, and other languages, Julia will never take off.

runchberries · on May 13, 2016

Erlang also.

In most functional languages using list indices are an anti-pattern. Pattern matching and generalized iteration is a much more elegant way to handle most things you would use an index for.

lostcolony · on May 13, 2016

Well, sort of. Erlang has the array module, which does actually index at 0, intentionally to feel like an array from another language.

The primitive collection types index at 1, but, as you said, are almost never indexed that way. I'm not sure the motivation as to why, but the fact it feels clunky to use them that way is a benefit, as it raises resistance when you're using them wrong (as indexing them almost always is).

jandrese · on May 13, 2016

No less than Dykstra has weighed in on the numbering of arrays.

http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

Having worked with both, I'm inclined to prefer 0 based addressing in most cases. It's slightly less intuitive, but it generally leads to cleaner code.

e12e · on May 14, 2016

That argument never felt very convincing to me. Perhaps because it really is about aesthetics, not some deep philosophical ideas connected to numbers.

I see it as rather simple. If you're listing off-sets, starting at 0 makes sense, because there'll be something there. But if you're counting elements starting at 1 makes more sense. I don't have 9 fingers offset from my first one, which I call finger zero; I have ten fingers, starting with the first one.

A set with no members is the zero set, once you add one element, you have a set of one. An empty list contains no elements. It does not have a first member, because it's empty. Add an element, and it will have one member, and it will be in the first position. Not in the zeroeth offset from the beginning.

You can the numeric names of letters of a string in an array, but I don't see that as a compelling reason for naming the first letter of a word, the letter which is at the zero-eth offset from where the word is held in memory. At best, it's a low level optimization, at worst it's just a leaky abstraction. Maybe you want to store the length of the word there (like in Pascal), for a different trade-off in terms of what is optimized.

rixed · on May 14, 2016

Indexing is not for counting elements, it's to reach a particular one.

e12e · on May 14, 2016

I still find it more useful to name it my tenth finger, rather than the ninth from the first. And I'm fine with my middle finger, finger number three - being the one that sits between the pairs of fingers made up of finger number one and two, and fingers four and five. I don't really see how it's any better to count up from the zeroeth offset of the first finger to the second offset to get to the middle finger.

Avshalom · on May 13, 2016

No less than Dykstra spouted a half hearted list of some properties and then made a proclamation based entirely on aesthetics.

banku_brougham · on May 13, 2016

no less than dykstra was convicted of grand theft auto. https://en.wikipedia.org/wiki/Lenny_Dykstra

I think you mean Dijkstra, who did say that, because of an argument with mathematicians about indexing from 1.

pygy_ · on May 14, 2016

In Dutch, ij and ÿ are interchangeable.

saghm · on May 14, 2016

My Dutch professor in college actually wrote out the "ij", but the tail of the "i" connected with the "j", making it appear to be "ÿ". It kind of blew my mind the first time I saw him write it out haha

pygy_ · on May 14, 2016

In cursive, it is often indistinguishable too.

hbogert · on May 14, 2016

In print, NO. You really can't interchange the two. As already said, in handwriting they may look similar, however we would never mean the actual ÿ, since it simply is something else which is used in Greek and French (and others).

There's even a section about this on wikipedia: https://en.m.wikipedia.org/wiki/IJ_(digraph) subsection "technical details".

pygy_ · on May 14, 2016

At the very least it happens in Flanders. I've seen ijs ("ice") capitalized in cursive as Ys[0] on an ice truck, for example.

Edit: from Wikipedia[1]: "It used to be common, in particular when writing in capitals, to write Y instead of IJ."

So it's an obsolete practice...

----

0. something along those lines: http://alphabetprintables.org/alphabet_printables_cursive/up...

1. https://en.wikipedia.org/wiki/IJ_%28digraph%29

garyclarke27 · on May 14, 2016

I don't buy the Dykstra argument,1 base is more intuitive for me and for most non programmers, I would guess. Funny though how Americans use 1 based indexing for building floor numbers, but Europeans use zero based i.e. ground floor is zero in a European lift, 1 in a US lift.

vitaut · on May 14, 2016

> what scenario are you imagining where startup times would be a significant concern?

Starting a REPL or running a computation that doesn't take long time, adjusting parameters, re-running. Startup time may not be too big in absolute numbers but it's noticeable and adds up quickly especially if you start using more packages rather than toy programs that I used as an example.

sgt101 · on May 14, 2016

Ok that puts some bounds on it; we are talking 1/10ths of seconds as significant? Or 1/2 seconds? There is work on this (https://github.com/JuliaLang/IJulia.jl/issues/346) and later versions of Julia 0.4 feel pretty snappy to me compared to R (which is the other tool I generally use nowadays)

bmh100 · on May 14, 2016

Sorry if I was being vague, but what situation would you re-run a short computation by executing the whole program again instead of using subroutines? Are we talking about a data scientist performing initial, exploratory analysis on a very small subset of data?

vitaut · on May 14, 2016

> Are we talking about a data scientist performing initial, exploratory analysis on a very small subset of data?

Yes, that's one example. Also when debugging one usually uses small data sets. There are plenty of cases where runtime is short.

I think the problem is that Julia is somewhat vague on how it should be used. If it stated explicitly that it is intended to be used in MATLAB-like fashion with one long-running instance that would save people from trying to use it as Python or other dynamic language.

nerdponx · on May 14, 2016

In that case, it should seem like a fix is as easy as keeping an IJulia kernel running in the background all day.

rrock · on May 14, 2016

I agree. I'd estimate I restart the REPL about three or four times a day while developing. My analysis runs are minutes or longer, so a few seconds spinning up Julia+packages just doesn't ammount to much.

eggy · on May 14, 2016

Fortran was meant to be a high-level language at the time, so chose 1-based indexing. When C and lower-level languages came on the scene they reverted it back to zero-based, since this is more appropriate at the lower hardware level. CBLAS is zero-based.

Intel's Math Kernel Library, which is a performant math library hand-tuned for Intel processors is zero-based in MKL-CBLAS.

It all depends on what you are familiar with, and staying consistent in use. I just use zero-based indexing in J and C and my numerical low-level work.

pjmlp · on May 14, 2016

It is all a matter of what the compiler does, array accesses should generate the same Assembly instructions.

I have used quite a few languages whose base index could be 0, 1 or whatever I choosed. Even enumerations.

kazinator · on May 13, 2016

BLAS is an API, more or less.

https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...

tavert · on May 13, 2016

And the Fortran-convention BLAS API is quite a bit more widely used than the CBLAS API. idamax in the Fortran API, and pivot vectors in LAPACK factorizations, return 1-based indices.

gh02t · on May 14, 2016

Fortran has the rather unusual feature that you can change the indexing to an arbitrary offset. You can declare arrays to be indexed from 0 if you want, but it defaults to 1 and almost nobody uses this feature for obvious reasons.

Extigy · on May 14, 2016

I index from -N:N so that 0 is in the centre of my computational boxes.

gh02t · on May 14, 2016

Yeah, something like this is the intended use, but it isn't used much at least in the Fortran I've seen. Mostly because it's really confusing for someone reading your code, as once the array is declared there's no indication what the index range is.

jimktrains2 · on May 14, 2016

BASIC did as well IIRC.

gh02t · on May 14, 2016

Pascal as well I think. I dunno which was the first to support it, but it's not exactly a popular thing in modern programming.

FreeFull · on May 14, 2016

Pascal even let you index with non-numeric types, so you could have an array indexed by the characters a to z, for example.

int_19h · on May 14, 2016

It was already there in Algol-60.

kazinator · on May 13, 2016

If distances used 1 based indexing:

    metric distance conversion chart:

    cm     m       km
     1     1.00     1.00000
     2     1.01     1.00001
     3     1.02     1.00002
    ...
    101    2.00     1.00100
    ...
 100000 1000.99     1.99999

The ratios between the values aren't fixed now; we can't go from cm to m just by scaling by 100. We must subtract, scale then add.

One based indexing falls apart if you have to index a region of storage as bits, bytes and words at the same time.

lorenzhs · on May 13, 2016

Except that's counting, not indexing. It doesn't make sense to compare the two. It's more akin to the "is the first floor the ground floor or the one above it"-debate.

kazinator · on May 13, 2016

Perhaps you mean it's displacing not indexing?

Counting is indexing!

What is it that you do when you count items in a set? You put them into correspondence with the natural numbers, indicating each one as 1, 2, 3, ... The last integer is the count.

Yes, it is related to the "is the first floor ground, or the one above it"; and it's clear that some programming languages take their cue from stairs and elevators.

Nobody ever has to calculate "what is the floor twice as high as this one?" Moreover, people are unfazed by 13 missing.

(I haven't seen a language that omits 13 from indexing, fortunately.)

lorenzhs · on May 14, 2016

Since we can probably agree that you're not indexing distances or other units of measurement (you can make a piece of string of length sprt(2) centimeters or other irrational lengths!), let's skip the terminology argument. That wasn't my point.

Is skipping the 13th floor a US thing? I don't recall ever seeing it in the UK or Germany.

happimess · on May 15, 2016

Yes, in the US, you will almost never see a 13th floor.

kazinator · on May 13, 2016

It's ironic, though, that most languages report 1-based source code line numbers in their error diagnostics, regardless of how they handle indexing. So that does say something.

And their documentation has section numbers like 1.1, 1.2, ... 5.3.3.

vitaut · on May 14, 2016

The current Julia version is 0.4.5 =).

dragonwriter · on May 13, 2016

Distance doesn't use indexing at all, because measures are not indexes, so it doesn't make sense to say "if distances used 1-based indexing".

OTOH, for something physical that actually is indexing, or at least closely analogous to it, we could ask "what if principal quantum numbers used 1-based indexing". But, then, the answer would be "things would look exactly like they do now, because it already does."

lokedhs · on May 14, 2016

But that's the point, once you think of indexes as distance things make much more sense.

The index is the distance from the first element. Thus, the third element is two elements away from the first, thus it has index 2.

I think the only reason zero-based indexing is not the standard everywhere is because some people (usually non-programmers) have a problem with the idea that the fifth element has index four.

I am suggesting that this is a linguistic problem that has messed up programming.

kazinator · on May 14, 2016

Consequently, when those people do music, they have to accept that something called an octave has seven notes, and inverting an interval is based on the "rule of nine", thanks to the way they defined intervals as one based.

kazinator · on May 13, 2016

Distances can be quantized. If you're designing something in your CAD tool, you can set a grid to, say 1cm, and never place anything on a fraction of a cm. The coordinates of points in the design are then de facto addresses.

Indices can be regarded as measures. We speak about an array having a "size" or "length": that is measurement language. Something is "3 words wide": ditto.

A given record in a file can be 25 words from the beginning, or 100 bytes, or 800 bits. All of these tell us how much storage immediately precedes that record and we can easily convert among them.

If indices support calculation, they should be displacements, and displacements should originate at zero.

Indices not intended for calculation (beyond simple successor/predecessor, perhaps) can place items into correspondence with any ordered set: natural numbers, letters of the alphabet, and so on. This is where we can get away with 1 based.

Indices not intented for any calculation whatsoever can use a set: like associating character strings with objects via an "associative" array or whatever.

rch · on May 13, 2016

I liked how the Template Numerical Toolkit implemented one-based indexing:

    // Construct matrix. Elements are not initialized.
    // To initialize all elements to zero use
    // Matrix<double> a(2, 2, 0.0).
    Matrix<double> a(2, 2);
    
    // Assign elements to first row using
    // Fortran-style one-based indexing.
    a(1,1) = 0.5; a(1,2) = 1.0;
    
    // Assign elements to second row using
    // C-style zero-based indexing.
    a[1][0] = 1.5; a[1][1] = 2.0;

-- http://www.b-a-h.com/software/cpp/scppnt.html

KenoFischer · on May 13, 2016

This syntax is available in julia as well, but I'm not sure it's a great idea to encourage mixing the two indexing behaviors, even if they have different syntax. As I hinted in the original reply, I have seen very few cases where the choice of index offset actually makes a difference. For example, loops over indices generally use `eachindex` which doesn't care about your choice of index base.

ScottPJones · on May 14, 2016

Actually, for the sort of low-level bit twiddling code that I frequently do, 1-based indexing complicates the code greatly. I also need to pass structures that contain indices back and forth to C, so the 1-based indices cause extra overhead to constantly add or subtract the size of the element. I find that it's the largest cause of bugs in my code (which is why I really want Gallium working!)

rch · on May 13, 2016

That's interesting - thanks. I don't think anyone is eager to mix indexing behaviors either.

KenoFischer · on May 13, 2016

I just realized that when I said, "this syntax is available", people might have understood it to mean, "you can use this syntax to index arrays" (which is not true for the array type defined in base). What I meant was, "you can define an array type with this indexing behavior without changing the language (since you can override the behavior of both () and [] on a particular type)".

ScottPJones · on May 14, 2016

Keno - take a look at what I did in https://github.com/ScottPJones/StringUtils.jl (building on the backs of giants, such as Tom Breloff, Dahua Lin, and others, incorporating some code from Formatting.jl). It solves problems I've had both with string literals, formatting, and outputting funny Unicode characters (without having to use UTF-8 in my source code).

I really like your Cxx.jl, that does seem to answer the issue raised here (are there any good features in pybind11 though that aren't already in Cxx.jl? that might be a good source for ideas on how to improve Cxx.jl if not)

ihnorton · on May 13, 2016

> For packages that needs arrays of indices, it would be quite easy to define an `IndexArray` type that does the translation automatically.

(just to add: there is currently an open pull request from a core array-focused developer adding such support to base)

waldir · on May 14, 2016

I believe this is the PR in question: https://github.com/JuliaLang/julia/pull/16260

vitaut · on May 14, 2016

Regarding the slowdown in development, perhaps it's a seasonal variation connected to GSoC. For example, notice a drop in postings since October in https://groups.google.com/forum/#!aboutgroup/julia-dev . This seems to be more or less consistent with the observation in http://www.davideaversa.it/2015/12/the-most-promising-langua..., but that's just a guess.

ihnorton · on May 14, 2016

For better or worse, most development discussion happens on GitHub rather than julia-dev. I've been catching up on GitHub threads recently, and the progress towards 0.5 is really impressive -- both in the language itself, and in the tooling (especially the debugger). I think the nature of priorities has changed somewhat as the language matures, but I disagree that development has slowed. (disclosure: I'm a Julia committer, but have been a bit out of the loop for some months now)

There is certainly a need for better interim communication on progress between releases, to let people know what is happening without reading every GitHub thread.

tavert · on May 14, 2016

No summer of code students have had projects that involved work primarily on the core language since 2013. Not that many posts happen on julia-dev because most of the development communication happens on github.

ScottPJones · on May 14, 2016

I do think it's seasonal, however not connected to GSoC. Many of the major contributors are (or were) students (Julia seems to be detrimental to people finishing their PhDs, except in the case of Jeff Bezanson, where in some sense he crowdsourced a lot of the research, as Julia was the topic of his thesis ;-) ) Many who aren't students are professors. Definite drop off when people are taking classes, exams, teaching, etc.

nickpsecurity · on May 13, 2016

Might be as simple as creating a static and dynamic profile for Julia. I suggested this to developers of one language project before. Maybe Julia. A single keyword or declaration near top of the file tells it to compile for fast loading with no need for functions that change things at runtime. Or compiler option.

KenoFischer · on May 13, 2016

We do have various compiler flags, but the problem on startup is the initialization of external libraries and deserialization of the system image, primarily.

bluecalm · on May 13, 2016

The thing about 1 based indexing is that it's a kind of in your face "this is different" decision from the point of view of a programmers of most popular languages. To be honest I wouldn't want to start investing my time into a language where people who proposed 1 based indexing are making design decisions. It's not that I think they are incompetent but it's clear they care way more about some different world than about my programming world and are ready to make my life miserable stating the point.

Now, I don't know if people from that different world (Fortran, Matlab, some other languages used in academia maybe) would feel the same way about 0 based indexing but it certainly sends the message to programmers outside those domains.

chappi42 · on May 13, 2016

R and Mathematica are also 1 based. And R really is popular (http://www.tiobe.com/tiobe_index?page=index - ok, popularity is droping right now, likely b/c of Julia ;-))

c3534l · on May 13, 2016

Julia seems like it's meant to be friendly to people who know MATLAB, Python, and R (even Fortran). R and Python are really comfortable with working with data science and that's what Julia is geared toward. I have no experience with MATLAB, though. Julia's programming "world" may indeed be a different programming world than C programmers.

I'm not defending Julia per se. I've been patiently observing from the sidelines to see how the language shapes up.

banku_brougham · on May 13, 2016

R is ranked lower year-on-year, but if you click into the actual rating time series you can see that R is on a tear into new highs of popularity. For that matter so is MATLAB. Julia is still buried in the next 50 list.

BTW i'm a julia proponent, I'm just starting into it but I like a lot about it.

nonbel · on May 14, 2016

>"popularity is droping right now, likely b/c of Julia"

I don't see how you concluded that? http://www.tiobe.com/tiobe_index?page=R

eternalban · on May 13, 2016

So is Lua.

bdowling · on May 14, 2016

Lua tables are more interesting than that. You can have an table that is indexed by any supported type. That means tables indexed by positive or negative integers, floating point values, references to other objects or other tables, true, false, strings, coroutines, and external memory references can all be indexes. You can also have a single table indexed by any combination of these.

eternalban · on May 14, 2016

Isn't that technically a hashtable and not an array?

bdowling · on May 14, 2016

In Lua the same data structure is used for both. The implementation optimizes the data structure like an array when you use a range of non-negative integers as indexes.

mindcrime · on May 14, 2016

To be honest I wouldn't want to start investing my time into a language where people who proposed 1 based indexing are making design decisions.

Possibly because you're used to working with languages like C, C++, Java, Pascal, Javascript, Python, etc. But in the world of languages tied closely to scientific / mathematical programming (Matlab/Octave, R, etc.) 1-based indexing is the norm. If you'd "grown up" so to speak, in that world, you'd probably find 0-based indexing distasteful.

xxs · on May 14, 2016

Around 3 decades ago when started with BASIC, Pascal/Delphi 1-based index was pretty much the norm and off-by-1 mistakes were significantly more dominant.

Switching to zero based index Java/Javascript/C felt a lot more natural (no need for upper bound, always include->exclusive when it comes to bounds). Many algorithms benefit greatly from as well (hashtables + bitwise AND, for instance)

Virtually I do not recall making off-by-one errors while working with Java. Whilst it could be attributed to personal experience, the loops construct "for (int i=0; i<length; i++)" or "for(int i=length; i-->0;)" appear quite intuitive. Strict bounds (i.e. < or >) feel more elegant as well.

Seeing, using both type of indexing, 0-based one would a winner by large margin in my book.

Avshalom · on May 14, 2016

Pascal (and Ada) actually expects you to specify the both the first and last index foo[1..10] or foo[0..9] are both valid arrays (though I believe 1 is the traditional first index) which is handy when you do run into situations where one or the other is a poor fit for an algorithm.

bluecalm · on May 14, 2016

Yes, my argument is purely about habits. 1-based indexing is a potent trap for programmers who are used to implementing stuff in 0-based indexing world.

dnautics · on May 13, 2016

You shouldn't be so biased. Having one based indexing makes translating numerical recipes from pure math textbooks (where vectors and matrices are generally one-based) simpler and less error prone. I say this as someone who chafes at Julia's one based indexing as a matter of professional practice.

bluecalm · on May 14, 2016

It's very error prone if you implemented algorithms using 0 based indexing your whole programming life. You are just bound to make mistakes if suddenly it's 1 based indexing.

I am not saying 0 based indexing is technically superior (I really have no opinion on that). I could compare it to switching positions of brake and accelerator in a car - you can talk about brake on the right design being superior all you want but the fatal crashes will happen if you implement it.

rrock · on May 14, 2016

This is exactly why all of my off by one errors happen in zero-based indexing languages. I need 1-based languages for data analysis, and 0-based languages for working on instrumentation.

dragonwriter · on May 14, 2016

Of course, zero-based indexing is prone to error if you've grown up using natural language ordinals your whole life, and get business requirements based in them both of which are stunningly common of real developers and real development tasks.

dnautics · on May 14, 2016

0-based indexing is technically superior, because it makes offsets for memory addressing simpler.

Avshalom · on May 14, 2016

Well unless your array is more than just x-many addresses in a row, like if the array carries metadata related to length (like every language but C) or if it's a sparse array (hint sparse arrays are super common in numerical work).

tcpekin · on May 14, 2016

I don't really get the difference. I'm in academia i guess (grad school) and use MATLAB a bunch, as well as Python (not a CS major). I don't think it's that much work to switch between the two, and I don't understand why there's such a division. Why is this such a huge deal?

nl · on May 14, 2016

It's not. It's nothing more than bike-shedding language design.

It's an easy thing to discuss, and everyone is an expert. Meanwhile, the real problems get ignored.

jules · on May 13, 2016

Does 1 based indexing actually matter? It does if you are doing index arithmetic for rolling your own multidimensional arrays for example, but one shouldn't do those kind of things anyway.

Fomite · on May 13, 2016

One thing to keep in mind is that scientific computing languages often have people using them who are not programmers, and who will not be "rolling their own multidimensional arrays."

If they're used to R, MATLAB, or basically any statistics programming/scripting language ever, not only will they be expecting 1-based indexing, they might not even know that that's a thing they have to think about.

developer2 · on May 14, 2016

>> they be expecting 1-based indexing, they might not even know that that's a thing they have to think about

The truth of the matter is that this is not a point that anyone should ever mention as either a pro or con of any language. It's simply a fact that you're going to see one or the other, and it's useless to complain about which one is present in any given language. It's a one-time discovery as to which you're working with. People try to make it out to be a debilitating factor, when in reality it's a non-issue.

bluecalm · on May 14, 2016

There are tons of things which you need an index for when iterating over an array. Do something different for every 4th element but then something else if it's 16th. Keep doing something for triplets of elements unless you run into specific index which is divisible by 7, then start taking them one by one. Iterate but go 5 elements back if you encounter something in some state. There are a lot of algorithms which take advantage of array being an indexed data structure with an order and not just plain container without structure.

The problem doesn't just go away with for each style loops. Python added enumerate for this reason but it really is quite clumsy for anything more complicated (and you have to choose if it's indexing from 0 or 1 anyway).

SolarNet · on May 13, 2016

> - Slowing down in development

I think part of this problem is the community, multiple people have reported having bad interactions with core language devs. Also the policies for inclusion of features, how to propose features, how decisions are made with respect to the code base, etc. all seem poorly documented.

KenoFischer · on May 13, 2016

> I think part of this problem is the community, multiple people have reported having bad interactions with core language devs.

I can only think of a couple of instance where I've heard anything like this.

The first and most prominent is Dan Luu. I'm very sad about losing him from the community. I've met him a couple of times in person, I still read his blog and I very much respect him. I don't think any of us have the full story of what happened there, but I sincerely hope that time may be able to smooth things over.

The second was in an HN comment that I just went back to find, and discovered that is was you. I'm sorry if you had a bad experience with the community. I know my perspective is biased, but I've rarely interacted with a community that's as passionate, helpful and friendly as the julia community (LLVM is up there as well).

As for documenting the process, I agree there could be some improvement there. We do have a contributors guide (CONTRIBUTING.md in the main repository). As for policies for feature inclusion, we've considered having more formal code owners for parts of the code base (right now there's mostly de facto code owners for various parts), but haven't found it necessary so far. I definitely expect some of this to be discussed at JuliaCon in June.

SolarNet · on May 14, 2016

I remember plenty of cases I got simply from observing the Julia repo for a few months, I could list them if you like... But I've been told by Julia contributors before that discussing this on HN is not an appropriate place. So we can discuss it, where I start linking to public examples on HN, or you can ignore my opinion on a problematic community and we can not discuss it.

As to the code standards, my problem isn't with the CONTRIBUTING.md; that seems fine. It's how people make long term suggestions about fixing lasting problems. As an example people who suggest refactoring large sections of Julia's expansive base library (1600 symbols? a problem that causes parts of the slow startup time and memory usage) into other libraries (which would improve batteries included, etc. features) are summarily ignored / shouted down / (predicated on my previous paragraph) banned. Important improvements to the language go in circles (often on JULEP tagged issues) especially any large improvement that would require multiple people and lots of dev work and seem to gain little traction unless a core dev just takes the time to just do it; rather than a timeline to reach consensus and then design a plan for implementing it.

As smaller examples. There was a push to increase test coverage and then months later commits plummeted the code coverage numbers and no one seemed to care. Sections of important code are uncommented, undocumented, and (mostly) untested on purpose. There are modularity problems and software architecture limitations inherent in the language design that go unaddressed. Unpredictable un-typed exceptions, namespaces/modules being hard to use for non-trivial designs, "interfaces" that aren't really. And these are just the ones without serious work behind them I can remember off the top of my head (the debugger and threading I acknowledge being worked on seriously). Though things being worked on seriously suffer problems as I mentioned earlier.

Issues like these cause many of the original poster's problems. Julia is a great language, but the community seems unable to take the steps it needs to to actually make a "Tier 1" general purpose language (e.g. like python, go, clojure etc.). Setup a serious set of policies regarding language proposals; setup a transparent community policy and actually follow it; acknowledge that certain development idiosyncrasies aren't going to fly anymore; take software architecture concerns seriously; remove the toxic core language dev (that to my count has cost you at least 4 serious contributors if not more).

Julia is likely "slowing down" relative to the expected development (which many people have observed) likely (by my thinking) because it is failing to attract/retain the developers it should be as a promising language (in my opinion) because of it's poor development and community standards.

But hey, these are just my suggestions, I don't have any skin in this game. I just wish Julia was as good as it promised to be, and I'm a bit bitter it isn't for human, rather than technological or financial, reasons.

Edit: Sorry I expanded, it felt unfair to not be more complete about it.

KenoFischer · on May 14, 2016

Also, I do thank you for taking the time to give feedback.

KenoFischer · on May 14, 2016

> I remember plenty of cases I got simply from observing the Julia repo for a few months, I could list them if you like... But I've been told by Julia contributors before that discussing this on HN is not an appropriate place. So we can discuss it, where I start linking to public examples on HN, or you can ignore my opinion and we can not discuss it.

I do think HN is not the right forum to link to individual comments and call out people for their behavior. Perfectly happy to discuss the technical issues though. If you would would still like to discuss, or at least bring such instances to my attention, please do feel free to send me an email.

> People who suggest refactoring large sections of Julia's expansive base library (1600 symbols? a problem that causes parts of the slow startup time and memory usage) into other libraries (which would improve batteries included, etc. features) are summarily ignored / shouted down / (predicated on my previous paragraph) banned.

There are several recent examples where people have suggested such things, e.g. They were neither ignored nor shouted down nor banned. https://github.com/JuliaLang/julia/issues/16357 https://github.com/JuliaLang/julia/pull/16070

> Important improvements to the language go in circles on JULEP tagged issues namely any large improvement that would require multiple people and lots of dev work seems to gain little traction unless a core dev just takes the time to do it; rather than a timeline to reach consensus and then design a plan for implementing it.

I don't understand what exactly the criticism is. At first I thought it was about, too much dicussion, but then it seemed to be about too little dev time, please do clarify.

> There was a push to increase test coverage and then a few months later commits plummeted the code coverage numbers and no one seemed to care.

I'm not really sure which instance you're referring to, but last time there was a major drop in coverage, it was investigated, found to be a bug in the instrumentation (which was fixed). I don't really think it's fair to say that people don't care. There's somewhat of a tooling problem here, since we can't run the coverage tests on travis, so we don't get them integrated in the GitHub UI, but people do look at them and add tests as appropriate.

> Sections of important code are uncommented, undocumented, and untested on purpose.

Yes, there are undocumented and uncommented sections in the code base, but I wouldn't say that they are so on purpose. I do admit to having added such hacks in the past (and not documented them because they were gonna go away soon after), because I needed them in outside packages, but people don't let me do that anymore ;).

> Unpredictable un-typed exceptions, namespaces/modules being hard to use for non-trivial designs, interfaces without any sort of type enhancement.

These are fair points, which I'm sure you've seen the issues about. The problem is the availability of developer time, not some sort of unwillingness to fix problems. Putting together the road map of what goes into each release and prioritizing are very hard, because there's just so much that could be worked on.

> setup a transparent community policy and actually follow it;

This is being discussed (as part of larger discussions around community governance) and as I mentioned will likely be a topic at JuliaCon.

ScottPJones · on May 14, 2016

I haven't disappeared (even though somebody just might have preferred that I had done so), I still love the Julia language (even though I think there are some warts, some flab that can be lost, etc.), am still programming in Julia pretty uch 24/7, have been very active in trying to help people new to Julia (on Gitter, StackOverflow, Quora, julia-users), even teaching it to my kids, promoting it on Twitter, trying to help improve the language as much as I can without being able to submit issues or PRs on GitHub, and contributing what I can of my own work under MIT license on GitHub. I'm also attending JuliaCon 2016 this summer, will just have to see what my reception there will be.

SolarNet · on May 14, 2016

> I do think HN is not the right forum to link to individual comments and call out people for their behavior.

Fair enough, my problem is that the project continually looses what appear to be serious contributors to a toxic developer (admittedly all second hand to me; but there are public examples and some private reports I have; including the person who first showed me Julia years ago). For all that it is an interesting language with plenty of problems I would enjoy fixing and taking the initiative on I simply don't have a desire to want to contribute. Again mostly due to politics. Which just sort of makes me bitter you fail to acknowledge the problems.

Like why should I take lots of time (I'm fine doing hard things like reading undocumented code bases if it's worth it (large sections of the core compiler code due to it's lack of comments or documentation are hard to read)) to help you fix technical problems, when you have social problems that are noticeable to anyone paying close enough attention to the community.

Also, thanks for taking the time to listen to my rants. I'm just disappointed.

KenoFischer · on May 14, 2016

> Which just sort of makes me bitter you fail to acknowledge the problems.

I'm perfectly happy to acknowledge that there have been conflicts in the community. However, I maintain that the overwhelming majority of interactions in the community are immensely positive. Please do also consider that there may be parts of the story that you are not seeing in the public issues. I do hope you will consider giving the community a second chance. You have quite clearly identified some of the major technical challenges we face and we'd love to hear any ideas you have to address them.

SolarNet · on May 14, 2016

> There are several recent examples where people have suggested such things

That's great but amount to a couple of functions, the larger issues addressing more serious refactorings never get off the ground (Base contains multiple thousands of symbols (not even counting how in a multiple dispatch language one symbol can contain hundreds of definitions) common lisp, by all accounts a sprawling language with multiple dispatch, has 978).

> I don't understand what exactly the criticism is. At first I thought it was about, too much dicussion, but then it seemed to be about too little dev time, please do clarify.

Both. Neither. It's about how there is unending discussion until a core dev just does it. This is not an efficient use of anyone's time. Set the scope of an issue, discuss, decide on a course of action, layout issues. This allows a core dev to get their input in place, but then not have to actually wait for them to have the time to do the whole thing themselves.

> There's somewhat of a tooling problem here, since we can't run the coverage tests on travis, so we don't get them integrated in the GitHub UI, but people do look at them and add tests as appropriate.

The number seems to fluctuate randomly from 83% in January, to 11%, 45%, 65%, 75%, 11%, 81% now. Which is a net loss and wholly sporadic. How does anyone know what to write tests for when coverage bounces all over the place? Also what good are the tests if no one takes the time to use them anyway? You have tests but they seem pretty useless with how you are using them. I understand it's a tooling issue, but a metric isn't very good if no one is bothering to use it, why waste the time to not use it.

> but I wouldn't say that they are so on purpose

"Look at the code and you'll understand." is the jist of the documentation of a 10000+ SLOC base used at the heart of Julia. Sure I could read all that, but I'd rather have a bit of documentation. Also the tests for this amount to it's examples.

> not some sort of unwillingness to fix problems

I disagree on at least 1 of them. Typed exception handling was flat out rejected because it's too much like Java by key devs; forget the fact everyone does it anyway using reflection. Interfaces and modules don't seem to make any progress because of some people complaining about similar issues.

> Putting together the road map of what goes into each release and prioritizing are very hard, because there's just so much that could be worked on.

Then stop wasting dev time and work on the community policies that make it easier to retain devs.

sgt101 · on May 14, 2016

I can only speak for myself, I attended an evening "getting started with Julia" event at MIT a few years ago, the devs there were very tolerant of me not being able to do some of the exercises and sat with me and helped me understand. Then they gave me Pizza. I've moaned sometimes about some of the decisions that they've made and they have always responded to me and showed me that they have a good reason for doing what they are doing - even if I still think that might be wrong.

chrispeel · on May 14, 2016

I have never had a bad interaction with the core language devs. OTOH, they are passionate about the language; I've found all the core developers I've met very helpful.

SolarNet · on May 14, 2016

I'm referring to one specific core dev. However they all know each other from MIT so it makes it difficult for them to deal with that fact. I can point to 3 public examples, in addition to a couple of private reports about problems which allow me to be certain. But admittedly it's all second hand to me (from people who work close to the language).

davidacoder · on May 14, 2016

I've been using julia for a little more than two years now. I've been subscribed to all the mailing lists and regularly read the github issues. Not all of them, but a fair share. I've never, ever seen any behavior by any of the core devs that one could even remotely describe as toxic, rude or anything like that. The community is actually really helpful and supportive.

I have seen this point about a toxic dev been made before in some blog post. Back then I followed the story up because it seemed so at odds with my experience with the core devs. It took a bit of googling and following links, but in the end, in my mind, there was simply nothing to the whole story. The supposedly rude behavior was not at all rude, imho.

I don't really know where these allegations come from, but I find this kind of "I've heard second hand that there is a toxic dev" inappropriate. If someone has a problem with someone, make it explicit, post the email that you dislike, so that others can judge themselves. But these vague accusations are not helpful, and at least from my point of view entirely at odds with how I have perceived the behavior of the core devs over the last two years.

I should say that I'm not part of the MIT crowd. I've never met or talked with any of the core devs and don't know them beyond reading their emails on mailing lists and sporadic interactions on github.

ScottPJones · on May 14, 2016

The rude behavior definitely exists, I can point out a number of examples if you send me your e-mail address (you can look me up, @ScottPJones on GitHub, easily enough). I've seen that that sort of disrespectful behavior tends to spread in the community, unfortunately, when nobody dares call out one person for their comments and actions, because of their position.

chappi42 · on May 14, 2016

But then - as a bystander - let me mention that I once wondered about your insistent replies in some gh issue. God, can't he stop, I thought, and, wow, they have patience. Fortunately such a situation is rare and I'm very often impressed by the intelligent friendly conversations I see. A lot to learn.

ScottPJones · on May 14, 2016

Yes, a year ago I'd never worked on any open source project, didn't use GitHub, Google Groups, Twitter, StackOverflow, or any of the other programmer social media sites, and was a total newbie about Julia, using git, and dealing with an open source community. Since I had convinced the other people at the startup where I am working that using Julia would be better for us than a mashup of Python, C, C++14, R or Octave, when I started encountering serious bugs and performance problems in areas that were critical to our development, I did get very insistent, esp. where I had lots of previous experience in the area. Instead of simply complaining about the problems, I submitted issues, wrote documentation, made PRs to fix the issues (in the Scheme parser code, in the core C code, in the C Unicode handling module, esp. to the string handling code in Base, and helped improve the coverage by adding a lot of unit tests). It was a major learning process for me, learning best practices for how to break apart a PR into smaller chunks (even though accepting huge PRs from other developers happens all the time), trying to focus on single issues in a PR, and trying to improve my own communications skills on social media. Between the time I started until early September when v0.4 was released, I'd even reached the point of being the #6 contributor to JuliaLang/julia (#26 all time out of some >400 contributors), so I don't think anybody could possibly accuse me of not having putting my time and effort where my mouth was, as far as fixing problems I encountered. Having gotten the critical issues fixed and into v0.4, so we had a reasonably stable base for our product, I was able to concentrate wholly on our product, and so stopped being so pushy about getting things fixed. ;) I still think there are some real systemic problems that need to be addressed, in how things are done, and hopefully some time will be spent to deal with that at this summer's JuliaCon 2016.

chappi42 · on May 15, 2016

Thank you for this reply, I also wish you luck with your project, joy at the JuliaCon and I'm glad that 0.4 worked :-)

Maybe a source of discontent is that startups are using Julia 'for real' and are under pressure to deliver. On the other side Julia is still developing. Core devs work feverishly but some issues just need 'time to brew'. Quiet thinking/coding alone or in small circles.

IIRC on a video of Alan Edelman he ~ said that he was involved in HPC for 30 years and to this day they weren't that successful and don't know how to do it. Julia is trying to do it in a completely different way. - Considering 30 years, does it matter if Julia takes 'her' time to be crowned?

ced · on May 14, 2016

You know, I read some of the harsh comments you got from people, and they may have been right to say these things, I don't know. But kudos for taking it in stride and maintaining a positive attitude. Good luck in your projects.

ScottPJones · on May 15, 2016

Thanks. I've never had any problems with constructive criticism, no matter how harsh it might seem (often that is where you learn the most). The problem is when people make disrespectful, disparaging, insulting remarks (which is against both the NumFocus & Julia community standards), and nothing is done about it, which has ended up driving away several people already. Hopefully that can be addressed.

davidacoder · on May 15, 2016

I followed your interactions with the core devs, at least the public ones. I think the core devs handled the situation with an enormous amount of patience and tact.

sgt101 · on May 14, 2016

Perhaps you might consider adding a note to your first post making it clear that you are identifying a problem with one other person rather than the whole community and that this is all second hand rather than your own experience?

SolarNet · on May 14, 2016

You realize edit timers are a thing right?

xixi77 · on May 13, 2016

Libraries? Sure. Ease of development, I cannot comment on.

But measuring performance with timing a "hello world" program? Seriously? What scenario does the author have in mind that makes this particular benchmark even remotely relevant?

The rest of the rant pretty much comes down to "it doesn't look like Python" (which is IMO a good thing, and I would certainly not call Python a "de facto standard of numerical computing" -- sure, it's there, but I still see a lot more of R and Matlab -- and note how both have 1-based indexes.)

To be fair, last time I checked, Julia definitely had some catching up to do in a few areas to become a real competitor to those two, but "hello world" benchmarks would not be among these.

It's been a little while though, and I am tempted to check again -- leaving libraries alone for the moment, does vectorization still result in a lot of performance loss compared to loops?

whyever · on May 13, 2016

> But measuring performance with timing a "hello world" program? Seriously? What scenario does the author have in mind that makes this particular benchmark even remotely relevant?

If Julia is to replace Python in scientific computing, people will want to use it for short plotting scripts. Startup time matters there. That hello world is so slow is already telling. A plotting script needs tens of seconds just to load the Julia libraries.

patrickthebold · on May 14, 2016

They use the REPL like R.

argonaut · on May 14, 2016

Not all of them. Many of them? Sure. Many people also use scripts and so on.

ced · on May 13, 2016

does vectorization still result in a lot of performance loss compared to loops?

Yes, but it should be fixed in the nearish future. https://github.com/JuliaLang/julia/issues/16285

scriptproof · on May 14, 2016

I had also the feeling the "benchmark" is a joke. Even if it takes some seconds to print hello, speed is only important when you run big programs with million of operations.

pizlonator · on May 14, 2016

If it takes a system 300ms to compile a function that prints "hello" then imagine how long it would take to compile a function that does something more sophisticated.

The "hello" benchmark is a fantastic benchmark for production-strength JIT-based runtimes because it tells you how long it takes for your system to warm up enough to be able to print something to the screen. You don't want this to be long. I prefer for "hello" to take about 10ms or less. In WebKit we have a few benchmarks that are the moral equivalent of "hello" and these are some of our most important benchmarks.

The reason why warm-up time is so insanely important is that any programming language, no matter what the domain is, will be used for the following things eventually:

- Tests.

- Data definitions in your language (moral equivalent of JSONP - every language/runtime has to deal with something like this).

- Initialization.

All of these things comprise run-once code that tends to be large. All of these things are important, even in math languages. The glorious "hello" benchmark is an excellent proxy for the performance of these kinds of code. So, I think that if Julia requires 300ms to run "hello" then they should make the "hello" benchmark into their primary optimization target.

Fortunately, it's easy to make "hello" run fast: just tier your JIT. This can be done with LLVM just fine, see https://webkit.org/blog/3362/introducing-the-webkit-ftl-jit/

Although we have since moved to using B3, LLVM was absolutely amazing for us, for the same reasons why it's amazing for Julia: you get tons of optimizations for a lot of targets with little effort. But you have to harness this power the right way. You can't fire up the full LLVM pipeline for code that runs once! It's a waste to unleash so many optimizations only to immediately throw away the result.

papaf · on May 13, 2016

I know R and have used Octave. I started learning Julia this morning after a physicist recommended it to me after he switched from python. I used Jupyter/Julia to simulate a neuron as a practice exercise. This is my experience as a beginner:

1. The static typing makes a big and positive difference. Its nice having a statically typed repl.

2. The documentation is good.

3. Using unicode symbols and \mu style tab completion is nice, especially in Jupyter where you can use the same symbols in latex style equations.

4. The base install is a bit bare. It would be nice if batteries were included - distributions and dataframes in particular.

5. R uses 1 based indexing and it was no shock to see this in Julia.

6. I had no problems with the mix of lisp and C++ in the source code. The lisp implementation is beautiful and worth a read.

Generally, I was shocked to see a blog post like this given that my first day with Julia was so positive.

SolarNet · on May 13, 2016

> The base install is a bit bare.

I think the problem here is the library approach. They should break stuff out of Julia's core library and move them into default included libraries (like python does).

pkm · on May 26, 2016

Yes they should, and that is exactly what is happening in Base, and has been happening for a while. A project like Julia doesn't have too much dev time input, so you cannot expect this to happen over night.

argonaut · on May 14, 2016

What is the difference?

ta2507823 · on May 14, 2016

* initial memory footprint is lower

* faster start up time

* cleaner global scope

acidflask · on May 13, 2016

I'm genuinely surprised that one would say that Julia is "slowing down in development". Perhaps it's because less press is being generated about Julia? Or that the commit rate has gone down slightly, now that the easier issues have been picked off and the remaining work will take longer for the next round of incremental developments? I'm not sure what the OP meant, but from the inside, we are busier than ever.

- Both Julia Computing and the Julia Lab have grown sizably over the past two years. The Lab now houses ten full-time researchers (up from four last year), with five new students coming online over the summer and fall. We also maintain more active research collaborations with more research groups at MIT and off-campus.

- Julia is a grateful recipient of 12 Google Summer of Code slots this year, compared to 8 for 2015's Julia Summer of Code program (sponsored by the Moore Foundation) and 4 for GSoC 2014.

- JuliaCon grew from 72 attendees in 2014 to 225 in 2015 and we are on track to meet or exceed last year's ticket sales for 2016.

- New packages continue to be registered on the central METADATA repository at roughly the same rate since June 2014. http://pkg.julialang.org/pulse.html

By some measures we are still a relatively small project, but I don't see any serious evidence for the imminent heat death of the Julia universe.

zintinio5 · on May 13, 2016

For many users of Julia, long-running performance matters more than microbenchmarks. Having converted a naively written Python program to Julia (there was a huge amount of computation being done over a large search space), I experienced a massive speedup even against PyPy. My Python scripts ran for about 10 hours before I called it quits (.6% of the work had been completed). Converting to Julia allowed me to finish within 3-4 hours, AND it was easy to parallelize.

arcticfox · on May 13, 2016

Just how naively was the Python written? ~1600 hours vs. 4 hours of execution time sounds like some extremely naive starting code. Is it fair to even compare them?

zintinio5 · on May 14, 2016

The Julia version was a literal translation of the Python version, the only difference being that I changed the entry point in the Julia version to allow restarting searches in different ranges. That made parallelization very easy, which is where I got most of time saved. I distributed the work across 3 machines. However, even without parallelization, the Julia version progressed much further than either CPython or PyPy. I also attempted a C version before Julia, but I ran into some issues, which Julia handled without me having to write extra code (int128 support out of the box, and various functions for prime numbers). The int128 support was the main reason I didn't write it in C, as I came across a lot of contradictory information online for __int128 in Clang or GCC.

GFK_of_xmaspast · on May 13, 2016

I've gotten 400x speedups going from python to c++.

cowsandmilk · on May 14, 2016

I've gotten 400(+)x speedups by going from python to python.

Original program was 50min in python, 1 min pypy

After figuring out some inefficiencies that were O(Nsquared), it was 3 seconds in python

This was timing for a case where N was much larger than my normal cases, where run time had been a couple seconds with the original version and optimization didn't really matter.

dagw · on May 13, 2016

I've gotten 200x speedup going (badly written) Javascript to (better written) pure python, despite python being a nominally slower language according micro-benchmarks. Comparing run time without knowing anything about the code doesn't say much.

MikeHolman · on May 13, 2016

Were you using a js runtime without a JIT? Were your python algorithms better? Otherwise a 200x speedup sounds completely unbelievable. That would basically indicate a bug in the js runtime causing degenerate performance under your scenario.

dagw · on May 14, 2016

I replaced some inefficient code with some better O(n log n) code. Make n large and you'll get to 200x. Which was kind of my point, without making sure you've held every else the same, saying I rewrote X in Y and got Z improvement doesn't always say that much.

lqdc13 · on May 14, 2016

That's very believable.

Example: popping from the beginning of a list in Python is O(N). Popping from the end is O(1). Initial poorly written code can have a lot of such obvious optimizations.

sgt101 · on May 14, 2016

It's a common programming thing - you write your code, it compiles, it runs, it runs properly on your test case, you try it on your full data, it's too slow. You run your profiler and you spot some problems, you solve them, it's still too slow, repeat until fast enough, declare victory.

IndianAstronaut · on May 14, 2016

>AND it was easy to parallelize

This is what is killing me when I use Python. So hard to run things in parallel.

I have had luck with Go trying to do this and set up a concurrent application.

xyience · on May 13, 2016

PyPy doesn't get you that much -- were you not using Numpy to begin with, and if not, why not?

zintinio5 · on May 14, 2016

Correct, I was not using Numpy. I did not think of a good way to fit my program into a Numpy shaped solution. The main point is that I didn't NEED to try, as an almost literal translation to Julia worked much better. I also did not try using Numba, which I suspect would have given me similar results to Julia. I am a fan of Python overall, but sometimes you just need to choose a tool that gets the job done with the least mental effort expended on the tool. This was a one-time script that needed to be written, no maintenance was needed: I just needed an answer, and within a certain amount of time.

bmh100 · on May 13, 2016

What kind of computation were you running?

tmalsburg2 · on May 13, 2016

Not sure why this is on the front page because the criticism in this article is fairly superficial. Julia was designed for scientific computing and interpreter start-up time doesn't matter at all in this context, especifically the start-up time of hello-world. Baseline memory consumption isn't an issue either and one-based indexing is simply something to get used to. If this stops you from being productive, it's not the language's fault. The comments about the syntax remind me of all those people saying Lisp is a bad language because it has too many parentheses.

armamut · on May 13, 2016

I love python and I don't like Julia at all.

But, I think judging a language (which claims math and scientific computing is it's strongest point) by print screen performance is not fair.

And, the authors last example is a little bit misleading I think. The C code sets up registers and jumps to the main sprintf routine. I don't know why didn't he tell that routine's instructions count...

Has any one counted?

armamut · on May 13, 2016

Oh. I think metrognome had shown the point. Didn't read it. sorry.

vitaut · on May 14, 2016

That routine is part of the C runtime which is used in Julia too, so it doesn't cause any difference.

sgt101 · on May 13, 2016

I'm not bothered by "hello world" performance myself, and I my recent issues with Julia have been caused by rapid development meaning that when I had to put it down for a few months lots of things I had done (because I'm mortal) stopped working. I wrote this off to "it's 0.x, getoverit". I've never tried complex text formatting in Julia either! My concerns are more focused on the type system (this I love) and performance for massive computation (I've still not managed to persuade my Hadoop admin to put julia images across our cluster, but I suppose I might win the argument one day!)

dnautics · on May 13, 2016

Yeah the poor performance in this blog post is a total misunderstanding of why and how julia is performant. Is printing "hello world" fast really that important? OK. Then don't use Julia.

You pay for it by having the compiler JIT the code in a highly optimized fashion. If you have actual numerical calculations that are compile-once, run-many-many-many-times, then you will see a huge performance benefit, amortizing the cost of expensive compilation and optimization that happens once at the beginning of the program cycle.

The very title of what he links to "How To Make Python run as fast as Julia" betrays the problem. The goal of Julia is to not have to do that sort of boilerplate/arcane tweaking to get really good performance - the system will do it out of the box.

I'll have to disagree with the notion that Julia is hard to read. I'm currently deploying Julia to run automated hardware verification on a computer chip. Effectively, I've written a DSL using Julia macros that generates assembly code files, compiles, and executes it, and my coworkers (who do not use julia) have found it easy to read my code and understand what's going on. Far easier, in any case, than the equivalent C code using asm blocks.

I do agree about the one-based indexing. I get it, it's what matlab does. But it would be nice to say, be able to throw an option at the top of a program that forces the appropriate indexing.

Xcelerate · on May 13, 2016

> If you have actual numerical calculations that are compile-once, run-many-many-many-times, then you will see a huge performance benefit

I agree, and I haven't found anything like this in other (non-exotic) languages. I recently wrote a function using the @generated macro to produce Wigner-D matrices via the recursion relations. The function dispatches on the size of the matrix (using Type{Val{N}}) and after it compiles once for a particular value of N, all future calls are blazing fast (since the machine code is essentially just a long list of multiply and add instructions).

acomjean · on May 13, 2016

>The goal of Julia is to not have to do that sort of boilerplate/arcane tweaking to get really good performance

As someone who is new to python (for bioinformatics), and find python is a fine language... but..

The do it "this way not that way" method of implementation of the same algorithms to get it to run fast makes writing performant python a tedious exercise in research and profiling. The article cited suggests Cpython, numby and numpy [1] as ways to make it faster.

Why not C using GPU acceleration as the time spent coding would probably be the same? Thats what I love about plain python, its fast to write and has some good data structures.

I haven't tried Julia, but someday its on my list of languages to learn more about.

[1]https://www.ibm.com/developerworks/community/blogs/jfp/entry...

dagw · on May 13, 2016

Why not C using GPU acceleration as the time spent coding would probably be the same?

As someone who does that sort of thing I can assure you it isn't. And when I do use C and GPU acceleration, doing so via cython and pyCUDA (and the myriad of libraries that build on cython and pyCUDA) saves massive amounts of time and effort.

That being said I do agree that writing fast python is quite different from writing python, probably more so than in most other languages.

pizlonator · on May 14, 2016

But this is a false choice. Mature JIT-based systems give you the best of both worlds by starting in an interpreter and then switching to a JIT on a per-function basis as code gets hot.

vitaut · on May 13, 2016

I wouldn't be bothered about it either if the language, like Java, targeted more the development of long-running services rather than interactive applications (with some exceptions like mobile). But as far as I can see it is advertised for use in interactive applications, possibly as a Python alternative, where responsiveness is important.

sgt101 · on May 13, 2016

I haven't got the same perception - I think it's aimed at things like simulator and solvers. Those things tend to have front ends (that is true) but not front ends that support 100's of users.

metrognome · on May 13, 2016

When the author compares the number of CPU instructions that sprintf compiles to in both C and Julia, he fails to take into account dynamic linking in C:

  jmp	__sprintf_chk

I would guess that another few hundred instructions run as a result of this jmp. Thus, the difference in the number of instructions that C's sprintf and Julia's @sprintf compile to are not as drastic as the author makes it seem.

KenoFischer · on May 13, 2016

I believe that author's point was the size of the resulting generated code, which is fair. The original intent of making it a macro was to get a faster version of printf. However, as it turns out the major time sink in printf is not the formatting, but converting binary into decimal, so specializing on the format string doesn't actually help.

mbauman · on May 13, 2016

Also note that the author asks for the native code for arguments of type `(AbstractString, Float64)`. The first is an abstract type — which means that this code will never get called in the first place. Julia will resolve the type of the string, and then dispatch to the concrete implementation. Which, for an ASCIIString, is 3x shorter.

vitaut · on May 13, 2016

This is actually a good point, but what about Unicode? Will `(UTF8String, Float64)` emit another function?

mbauman · on May 13, 2016

It's the same on 0.4, and Strings are getting a big overhaul on 0.5. Here are the results for 0.4:

    $ julia -e 'f(a, b) = @sprintf("this is a %s %15.1f", a, b); code_native(f, (AbstractString, Float64))' | wc -l
    WARNING: Returned code may not match what actually runs.
         628

    $ julia -e 'f(a, b) = @sprintf("this is a %s %15.1f", a, b); code_native(f, (ASCIIString, Float64))' | wc -l
         194

    $ julia -e 'f(a, b) = @sprintf("this is a %s %15.1f", a, b); code_native(f, (UTF8String, Float64))' | wc -l
         194

vitaut · on May 14, 2016

Thanks, this looks better. I'll need to update the post.

vitaut · on May 13, 2016

__sprintf_chk is shared between all calls to sprintf which is not true for Julia.

gravypod · on May 13, 2016

To me the biggest thing that Julia brings to the table is the amazing concurrency support.

You can not only easily create parallel tasks on your machine, but on any machine that you have ssh access to that also has Julia installed.

That is simply amazing. Until something else can do that, Julia is going no where but up in my mind.

Tarrosion · on May 14, 2016

At this point there are many well phrased comments saying most of what I wanted to say - unreliable airport wifi ate a long response a few hours ago - so let me just note that

A) the author is entitled to all their opinions and experiences and

B) raises some good points (see Keno's comment) but

C) raises some subjective (syntax, etc) and odd (measuring performance by timing a one line hello world) points.

Julia is not the right hammer for all nails, but I use it every day and enjoy doing so. If you're on the fence, I encourage you to give it a try. And for what it's worth, end as a keyword aside, I really like the syntax, particularly :: type decorators. Your mileage may vary.

DannyBee · on May 13, 2016

Complaining about microbenchmark performance, then perf benchmarking in a completely unscientific way, is pretty bad (These are non-quiesced machines, etc)

The error bars show it's probably slower, but i'm pretty sure he's not going to get valid measurements to 0.002s by running it once with time without doing things like disabling CPU throttling, etc :)

Of course, looking at julia's microbenchmark for C, they had to do a bunch of things to get the compilers to stop optimizing away their benchmarks, so that should tell you something right there :)

The example they give of sprintf calls is completely misleading, since sprintf_chk is going to be several hundred instructions itself.

Follow https://github.com/lattera/glibc/blob/master/debug/sprintf_c... all the way down the rabbit hole :)

Fede_V · on May 14, 2016

Julia aside, Keno, kudos for being an absolute class act and replying to everything so politely.

For what it's worth, my impression of Julia has been overwhelmingly positive, and all the developers I've interacted with have been polite and friendly. I haven't made the switch from Python because:

- I prefer the Python syntax

- I like Python libraries (I know about PyCall, and it rocks)

- The increased speed of Julia doesn't really add much given numba/theano and so forth

However, I really like:

- Optional static typing for sanity checking

- Can write fast functions directly in julia - which is handy when passing callbacks or doing numerical routines like integration (although this requires timholy's fast lambda package)

tavert · on May 14, 2016

> this requires timholy's fast lambda package

Not on nightly. That's fixed now. The technical concerns here about startup time etc are fixable, we'll get to them.

Frompo · on May 13, 2016

If you want to replace Matlab you should use Octave, julia is for making your next climate model, not for "hello world" ricing

sndean · on May 13, 2016

I've found the Armadillo C++ library (http://arma.sourceforge.net/) to be a better replacement, when concerned about speed.

sgt101 · on May 14, 2016

But speed has multiple dimensions. Yes, you can get things to go faster, quicker with three tactics : get closer to the iron (C, C++ - sometimes), use bigger iron (clusters) or (best) better programming. But the point of Julia is that at the end of the day the second and third tactic are easier for you, and not just you, everybody in the team and beyond.

On the other hand everyone knows that Julia is 0.x which means that while this is the IDEA of Julia it's still not claimed as the reality - although my experience (and some other people) is that it's pretty true most of the time.

cozzyd · on May 14, 2016

IME eigen3 is even better (although the template-induced hellish compilation times are a huge pain).

sndean · on May 14, 2016

I was going to say "but there's RcppArmadillo!" But now I see there's RcppEigen, too. I'll have to check it out.

gaius · on May 13, 2016

Octave is much slower than MATLAB.

jordigh · on May 13, 2016

The interpreter is. But if you write vectorised code, we frequently are able to outperform Matlab.

Matlab used to have the same problem, and they even used to have docs on how to vectorise. When I first learned Matlab in 2004, they told me, "don't use loops". Twelve years later, you're supposed to write all the deeply-nested loops you want.

So yes, most Matlab code out there is loopy, so it's slow in Octave. But if you write it for Octave and don't make it loopy, it's quite good.

aurelian15 · on May 13, 2016

True, but if performance is your key concern, you shouldn't use either of them. Octave and MATLAB are great for prototyping matrix based numerical algorithms. And for this purpose I find Octave much more pleasant, since it is less nitpicky regarding its syntax. And once I have something that works and want to apply it to huge datasets, I usually rewrite my code in C++.

IshKebab · on May 13, 2016

That's exactly why Julia was created - it's supposed to be like Matlab, but fast.

bluenose69 · on May 13, 2016

Well, actually, Fortran is for making your next climate model (see e.g. the MITgcm at http://mitgcm.org/public/source_code.html).

yoodenvranx · on May 13, 2016

As someone who does quite a lot of image and signal processing the fact that their arrays are 1-based is a complete deal breaker. The first few years of my career I worked with Matlab and I hated 1-based arrays with a passion. I think I never encountered a situation where the Matlab way makes stuff easier, almost always the 0-based index is the more natural choice.

After I switched to Python/C I can say that I never want to work with 1-index languages again.

pkm · on May 26, 2016

I've always had a very hard time understanding how someone can do something requiring a high level of abstraction (programming), but cannot subtract one. It's simply amazing.

I've used several languages in both camps, and you know what? When I'm in Matlab I start my indeces at 1, and in C i start at 0. Who would have thought it would be that simple!

mathgenius · on May 14, 2016

I just cannot comprehend how anyone can use 1-based indexing. I have never used it myself though so there's always the possibility that I may just be missing out on something. Glad that you cleared this up.

Sean1708 · on May 14, 2016

I just can't comprehend how something as trivial as indexing can be a deal-breaker for someone. I write 0-based most of the time, but I can think of as many advantages for 1-based indexing as I can for 0-based. The only time I can understand prefering 0-based is in C or something of a similar level, where indexing is just syntactic sugar for pointer arithmetic.

idunning · on May 14, 2016

Because index sets in mathematics are typically indexed beginning at 1? The most notable, and relevant, being indexing in matrices

daxfohl · on May 13, 2016

Seems like for the last five years every language has been gaining popularity. Now they're all losing popularity. Except rust, perhaps, and elm. What gives?

hondaz54 · on May 13, 2016

Obviously static/strong typing is winning. Last man standing are JavaScript and Python, the former transforms into a compilation target (like Elm), the latter doing some kind of gradual typing (like mypy).

I think dynamic typing has its place, but more in experimental design and prototyping than in bigger application development (big IMHO).

xyience · on May 13, 2016

Uh, I think you're forgetting PHP. Not to mention Clojure is at least as popular as Go. Dynamic typing isn't going anywhere, desire for 'gradual typing' is a very niche but vocal desire in the dynamic language communities. It may lose some mindshare but there are a lot of options to lose that to these days since even the holdouts of 'annoying' static typing with terrible type systems like C++ or Java are making it more and more feasible to ignore/not specify the types in your code.

hondaz54 · on May 14, 2016

Ha, you are right, I forgot PHP. However I think PHP is really a language where people agree that it has lots of weird corner cases.

So its popularity is really historical incident, in particular I think, that it was the first hack (see worse is better) to allow to easily deploy dynamic web applications (via web-server side interpretation of code, which got popular in Apache). Furthermore PHP hosting is still very popular in low-cost bundles to integrate e.g. Wordpress.

Clojure is another nice language, but could benefit from typing. :p Maybe the main "problem" for dynamic languages ist the fact that it requires more discipline on the Programmer's side, so I imagine it is popular for smaller teams, but this is hard to achieve for bigger development efforts, where you can benefit from a stronger type system.

zmmmmm · on May 14, 2016

> Obviously static/strong typing is winning. ... > I think dynamic typing has its place

This is why I wish Groovy with it's combined static / dynamic typing abilities was a) better and b) more popular. Its ability to interweave static and dynamic typed code is really spectacular when it works. Unfortunately there are a lot of holes and it can still be quite painful when in static mode, so I mainly only use it for performance rather than as I would like to - as the default mode.

vorg · on May 14, 2016

Groovy originally never had static typing, but was only a dynamic typed complement to Java's static typing. It worked best for this purpose (e.g. testing and manipulating Java classes, scripting in Grails, a DSL for build scripts in Gradle) but fell flat when version 2 retrofitted it with static typing and promoted it as a replacement for Java, on the JVM and Android. Best to use Groovy with a language built from the ground up to be statically typed, like Java, Scala, or Kotlin.

Having said that, I've actually since found Clojure to be better than Groovy at testing Java classes. A well-placed macro can often cut out syntactic clutter when testing some repetitive scenario.

unlinker · on May 13, 2016

Why is strong/dynamic typing considered such a big deal?

IshKebab · on May 13, 2016

Two main reasons:

1. You can detect very common errors (e.g. typos) at compile-time instead of maybe detecting them at run-time. This makes the code much much more reliable (or equivalently you don't need to do nearly as much testing).

2. Dynamic typing prevents IDEs from doing extremely useful things like real code completion and symbol renaming.

If you're thinking "but I edit Javascript with code completion" or "code completion isn't such a big deal" then it's probably because you've never used accurate code completion, e.g. Microsoft's Intellisense for C++, or pretty much an Java IDE.

lispm · on May 14, 2016

There are ways to deal with that in dynamically typed languages.

1) Common Lisp implementation use a compiler to detect typos, etc.

    CL-USER 21 > (defun bar () (fo0))
    BAR

    CL-USER 22 > (compile *)

    The following function is undefined:
    FO0 which is referenced by BAR

2) In Common Lisp one can ask the running Lisp system for information about classes, symbols, functions, etc.

The use cases for renaming are also completely different. If you take for example a Java class and you want to rename an attribute and update the getter/setters you might want to use a 'tool'. In a dynamically typed language like Common Lisp, this is often not necessary because code generation is widely used and changes can be propagate that way.

eutectic · on May 13, 2016

0. Performance. 3. Interface documentation.

sgt101 · on May 14, 2016

Also static types tell the machine what is supposed to happen next, and always. Dynamic types represent possibilities that have to be maintained as open until they become commitments that have to be remembered. Static types are certainties that come as orders. Dynamic typing is like being in love, Static typing is like being in the Army.

(disclaimer, I have never been in the Army or any military force and the above post is based on my imagination and watching films)

azdle · on May 13, 2016

The way that I see it is that it's a factor that is very much in the front and center of how a developer usually uses the language and their preference tends to fall out of the mentality in how they're writing their code. A static-strongly typed language will pretty much always take more time to write, you have to be a bit more methodical in what you're writing because sometimes changing one thing means changing types in function signatures and variables in a few dozen different places in your code. On the other hand the dynamic-weakly typed languages you can definitely iterate faster, but that can mean that large projects can end up being less maintainable in the future because you don't usually have a type checker to tell you you're passing an array to a function that is expecting a map/object and you end up needing more unit tests and the like to know that your code isn't going to crash in production.

Neither approach is wrong, but most developers have (sometimes very strong) opinions on which is the right way to do it in various different cases.

yoplait_ · on May 14, 2016

> that can mean that large projects can end up being less maintainable in the future because you don't usually have a type checker

That's not true if you seriously unit-test your code base, which you should in both cases.

kstrauser · on May 14, 2016

And why do people conflate them so? Python is strongly typed by any reasonable definition, but it's also dynamically typed. To contrast, I've heard C described as weakly, statically typed.

Strong typing is awesome. I personally think that dynamic typing is wonderful, too, but opinions vary about that.

daxfohl · on May 13, 2016

It's totally not. You totally never see anyone talking about it, anywhere.

jerf · on May 13, 2016

You know, you take a potshot at Go in there, but I observe that none of the negative bullet points apply to Go. My real point here not being "Go rocks", but that languages are more than just the collection of bullet-point features they claim on their home page. I think this is another one of those things that says obvious when I say it directly, but a lot of people are not letting it inform their actions.

Also I love that first alphabet image on that page.

Etzos · on May 13, 2016

Was it really a potshot at Go? Maybe I read it incorrectly but what I thought was being said was that all the "modern" features that Julia/Python/other newer languages have are not available in Go (think stuff like list comprehensions). I don't think that's a potshot as it's not really a negative, just a difference in design. Go doesn't have those features and many times does not want those features, and there is nothing bad about that and nothing bad about stating such.

vitaut · on May 13, 2016

Go has a set of completely different issues but that's a topic for another post =).

doug1001 · on May 13, 2016

- One-based indexing

setting aside whether zero- or one-based indexing is 'better'; R and MATLAB have one-based indexing, so the convention is likely familiar to many in the Julia target audience.

still, as the OP says, (relatively) painless interoperability with C and C++ is an advertised Julia feature, both of which use zero-based index (although Fortran is one-based), and that mis-match is a definite obstacle to interoperability.

tavert · on May 13, 2016

I find I don't pass individual array indices back and forth between C and Julia as often as I would in Python or Matlab, since I don't need as much compiled interface glue. You can do iteration and the like entirely on the Julia side or entirely on the C side and mostly pass the array buffers back and forth. ccall handles that for you if you declare a Ptr{Float64} argument type that the C library expects and send it a Julia Array{Float64} input. It's when you have arrays of indices, like the linear programming library his example snippet is referring to, that you notice. Or serializing text-based representations.

ScottPJones · on May 14, 2016

It does look like support for arbitrarily based arrays (like Fortran 90) will be available in Julia v0.5 (thanks to Tim Holy's great work). Julia is very flexible, and the next thing is to allow row-major instead of just column-major arrays (something that I've heard is also being worked on, although I don't know if it will be ready for v0.5)

rmah · on May 13, 2016

Regarding performance, I can only say that julia's primary use case doesn't really seem to be for small scripts that take a few msec to run.

The last set of programs I wrote in julia had run times of hours to days. A few extra seconds of startup time is well worth the 10x or better improvement over using python for the same tasks. The code was also tighter and easier to understand. A win all around.

Gratsby · on May 13, 2016

I think it's a little naive to "give up" on a language early on, but on the same note, I've walked away from several for periods of time because they lacked the maturity I needed.

I did enjoy the article and the linked article that spelled out ways to increase performance with Python. With any environment there are dramatic performance improvements to be had with a little bit of engineering and knowledge.

I've seen a bit of an odd shift towards Julia - People seem to be adopting it in droves from my perspective. That means that the development team is doing something very right. Given some of the people I've heard talking about Julia, I don't think it's going away any time soon.

This kind of feedback is good for the team. If you are going in another direction for the time being, stating why is always helpful. Glad to see a developer here in this thread.

marmaduke · on May 13, 2016

Numba package for Python gives you the LLVM JIT for numerical work. I really don't see how Julia is relevant anymore.

ska · on May 13, 2016

Python really doesn't (and cannot) address many of the design goals of Julia. At least as I understand them (and I'm not a Julia user). Whether or not Julia has/will achieve them either is a separate issue.

Python can be a very useful mess for this sort of work (numerical analysis etc.), and is succeeding at that quite well. In fact, that's it's main challenge to something like Julia. Not design, that ship sailed a long time ago. But practicality and availability of packages and bindings. Once you get too far ahead in that, it's hard to justify using any other platform for "real work", rather than because it's fun to hack on.

marmaduke · on May 16, 2016

Python is a mess if you make it that way.

What are these design principles in Julia that Python can't possibly uphold?

cschmidt · on May 14, 2016

In my (somewhat limited) experience, Numba works great until it doesn't. Loops on numpy arrays work great. Other stuff becomes super slow, without any real way to understand what is happening. Julia sees more understandable.