Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because Python has NumPy, which implements vectorized math on arrays and matrices. Machine learning algorithms are implemented naturally and efficiently with those primitives. PyTorch, TensorFlow, and I think every other machine learning framework in Python all use NumPy.

JavaScript, Ruby, and Perl either don't have this abstraction at all, or they have much weaker versions of it, and many fewer scientific libraries.

NumPy started in the early 2000's and continues to this day. It takes decades to build up this infrastructure! This recent interview with NumPy creator Travis Oliphant is great:

https://www.youtube.com/watch?v=gFEE3w7F0ww

He talks about how there were competing abstractions like "Numeric" and another library, and his goal with NumPy was to unify them. And how there are still some open design issues / regrets.

There were multiple people in the nascent Python community who were tired of MATLAB, not just because it's proprietary, but because it's a weak and inefficient language for anything other than its scientific use cases. You won't have a good time trying to write a web app wrapper in MATLAB, for example.

The much more recent Julia language is also inspired positively and negatively by MATLAB, and is very suitable for machine learning, though it doesn't have the decades of libraries that Python has.

-----

The NumPy extension was in turn enabled by operator overloading in Python (which is actually a very C++ influenced mechanism). JavaScript doesn't have operator overloading; I'm pretty sure Perl doesn't, but not sure about Ruby. Lua and Tcl do not have it. (Lua does have a machine learning framework though -- http://torch.ch/ -- but I think PyTorch is more popular now.)

So if Guido didn't design Python with operator overloading, then NumPy would not have grown out of it.

Also relevant is Guy Steele's famous talk Growing a Language (late 90's or early 2000's I think). He advocates for operator overloading in Java so end users can evolve language with their domain expertise! Well Java never got it, and Python ended up having the capabilities to grow linear algebra.

Guido has even said he doesn't really use or even "get" NumPy! So it turns out that an extensible design does have the benefits that Steele suggested (although it's a very difficult language design problem.) There have been several enhancements to Python driven by the NumPy community, like slicing syntax and semantics and the @ matrix multiplication operator. And I think many parts of the C API like buffers.

-----

Another interesting thing from Oliphant's interview is that he really liked that Python has complex numbers. (I don't think any of JavaScript, Ruby, Perl, or Lua have them in the core, which is important.) That piqued his interest and kicked off a few decades of hacking on Python.

He was an electrical engineering Ph.D. student and professor, and complex numbers are ubiquitous in that domain. Example:

    $ python3 -c 'print(3j * 2 + 1)'
    (1+6j)

This is another simple type built on Python's extensible core, and it's short.

    $ Python-3.9.4$ wc -l Objects/complexobject.c 
    1125 Objects/complexobject.c

I recommend writing a Python extension in C if you want to see how it works. See Modules/xx*.c in the Python source code for some templates / examples. IMO the Python source code is a lot more approachable than Perl, Ruby, or any JS engine I've looked at.


I should also add a cultural / social reason why Python is used in scientific computing and machine learning much more than JS/Ruby/Perl:

Python was the only one of those languages (partially) funded by government research agencies. Guido was a research programmer in the Netherlands at CWI, and then he moved to the US when he was hired by CNRI, a research agency headed by Bob Kahn (loosely connected with DARPA as far as I remember).

If you look at the backgrounds of Brendan Eich, Matz, and Larry Wall (creators of JS, Ruby, and Perl), they are quite different. None of them really worked in a research setting, and they certainly didn't develop their language in a research setting.

https://en.wikipedia.org/wiki/History_of_Python

3 hour oral history with Guido: https://www.youtube.com/watch?v=Pzkdci2HDpU&t=12s

Lex Fridman interview with Guido: https://www.youtube.com/watch?v=ghwaIiE3Nd8

Lua was developed in a research setting, funded partially by Brazilian oil companies as far as I remember, but I don't think it ever had a "scientific computing" focus. It was picked up more in games and apps due to the C embeddability and features like coroutines. The ML framework Torch was built on LuaJIT because it has math nearly as fast as C. But I think the language Lua is less suited toward linear algebra, again due to the lack of operator overloading.

Not to mention that Lua doesn't even have separate ints and floats! This is also an issue with using JavaScript for scientific computing.


Perl was originally written at JPL, which is the epitome of government-funded research, and for the first many years of its life most of its numerous contributors were at one or another government-funded research institution, because people who weren't didn't have internet access.

Lua does support operator overloading:

    $ luajit
    LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
    JIT: ON SSE2 SSE3 SSE4.1 AMD fold cse dce fwd dse narrow loop abc sink fuse
    > x = setmetatable({}, {__add=function() return 37 end})
    > print(x+5)
    37
Not sure if that was true 20 years ago.


Hm interesting, it's hard to find references to Wall working at JPL, but here's a very non-authoritative one:

https://old.reddit.com/r/perl/comments/5lj9ms/did_larry_wall...

Wikipedia doesn't mention it:

https://en.wikipedia.org/wiki/Larry_Wall

https://en.wikipedia.org/wiki/Perl#Early_versions

That does sound right, since I vaguely recall an interview with Wall talking about JPL.

-----

I think there's still a difference because Python was literally funded as a research project by CNRI, a government research institution. It wasn't created there, and it was funded by different entities afterward, but I think that's the period when contributors with a scientific background like Travis Oliphant, Jim Huginin, and David Beazley started working on Python's libraries and infrastructure.

At best it seems like Wall worked at JPL for a short time and started Perl there. It also matters what kind of research it was. Perl is aimed much more at text processing and not linear algebra, while Python is more general purpose in this respect.

Also, if my memory is right, by early 2000's JPL had jobs in Python, and python.org said JPL was a user. I could be wrong but I don't think Perl ever caught on as much as Python did at JPL.

-----

Yes good point about Lua's metatable mechanism.


I think Larry was at JPL from before the first version of Perl in 01986; his job before that was evidently at Unisys, maybe starting around 01979: http://web.archive.org/web/19970626151153/https://www1.acm.o... https://spu.edu/depts/uc/response/spr2k/wall.html He was still at JPL in 01991; I suspect but am not sure that he kept working at JPL even after changing his email address and official affiliation to NetLabs.

Rich $alz reposted Perl in 01988, bearing a "1987" copyright date, but I think the first version really was released in 01986; at any rate by this point Larry was definitely at JPL: https://www.tuhs.org/Usenet/comp.sources.unix/1988-February/...

Unfortunately Google has decided to remove the ability to view source from Google Groups, so we can't see the return-path for https://groups.google.com/g/comp.lang.perl/c/t4RumjajsXA/m/7..., one of the earliest messages he posted from netlabs.com, so we can't see what NNTP server he was using at the time. (I guess that's what we get for letting Google take the responsibility for "making information universally accessible": we have no recourse when they decide that means making previously public information inaccessible.)

The Wikipedia article says he released the first version of Perl when he was still at Unisys, citing the first edition of Programming Perl, which I don't have. The second edition (01996) is silent on the question. It also cites https://www.oreilly.com/pub/au/148, which does say that and presumably is at least subject to Larry's veto.

So, at any rate, JPL was funding Perl development from at least 01987 to 01991, four years, almost the same amount of time that Guido van Rossum was working at CNRI, 01995 to 02000. But you're probably right that Guido had to write grants and write progress reports on Python, and Larry didn't on Perl; he had the discretion to just do it. Also, I suspect this was true when Guido was at CWI too, as you say. AFAICT the only research paper Guido published at CNRI that was about Python was the CP4E paper.

I don't think it's accurate to say, "Perl is aimed much more at text processing and not linear algebra, while Python is more general purpose in this respect." PDL/perldl is from 01996, just the year after Numeric in 01995, and Perl 5 offers pretty much exactly the same set of facilities as Python for this sort of thing (dynamically loadable language extensions, operator overloading, dynamic typing --- though I guess at least Python's indexing syntax is more comfortable, because x[i:j, ..., 3] is a valid Python expression that preserves all that indexing structure, and has been since at least 1.5.2).

If memory serves, PDL was a lot better at 3-D plotting than Numeric was when I first tried it in about 01998; it could pop up a rotatable 3-D plot in an X window, and Numeric couldn't.

I think what happens is that a lot of people started working on Numeric (including those you mention --- although keep in mind dabeaz was also working on Perl's libraries and infrastructure!) and so it started getting better faster than PDL. Part of this was that Python is just a more pleasant, less clumsy language, so people chose it when Perl didn't have a killer advantage, like native mutable strings or a relevant CPAN module. But that's not about Python being well-suited for linear algebra; its only linear-algebra-specific feature is Python 3's @.

I think it would be very hard to find anyplace in 01995 or later that had Unix systems and didn't have huge piles of Perl. You're probably the person in the world most aware of the shortcomings of the primary alternative in the early 01990s (ksh/pdksh/bash). But it's also true that lots of Perl-heavy sysadmin shops never got into writing "real programs" in Perl like Slic3r, Perlbal, Movable Type, and Frozen Bubble, and so Perl didn't show up in their job descriptions. And nowadays most of those huge piles of Perl are sort of regrettable, and regretted.


OK I went down a rabbit hole and also realized how terrible Usenet search is. I tried to Google for Usenet search engines and ended up with mostly spam :-( Seems like Hacker News is sadly a better search for these kinds of things.

Anyway I agree with most of what you say, EXCEPT I think Perl's focus on text vs. Python's more general purpose focus can be seen from the creators' very early release announcements!

One thing I've realized while working on shell is that the "bones" of a language are set in stone VERY early. Then comes 10-50 years of compatible changes that necessarily must obey many early design decisions.

Also I'm not saying the focus on text is bad -- in fact a big motivation for Oil is that Python is not convenient enough for text processing :) (and that it's awkward for process-based concurrency)

Perhaps my experience with Oil shows me all the stuff I'm NOT doing to support scientific computing. Even basic stuff like exponentiation x^0.2 is a huge bit of code, as well as scanning and printing floating point numbers, all of which shells lack. Oil should have proper floats but not in the initial versions. (Early in the project I also thought it would have efficient homogeneous vectors, before understanding why Python settles on heterogeneous lists and punts the rest to extensions)

From your link:

   Perl is a interpreted language optimized for scanning arbitrary text
   files, extracting information from those text files, and printing
   reports based on that information.  It's also a good language for many
   system management tasks.  The language is intended to be practical
   (easy to use, efficient, complete) rather than beautiful (tiny,
   elegant, minimal).

https://github.com/smontanaro/python-0.9.1 (I think this is from 1990 or so)

   This is Python, an extensible interpreted programming language that
   combines remarkable power with very clear syntax.

   This is version 0.9 (the first beta release), patchlevel 1.

   Python can be used instead of shell, Awk or Perl scripts, to write
   prototypes of real applications, or as an extension language of large
   systems, you name it.

This is very revealing! And prescient! The intent of the creators does seem largely borne out. Perl was extended to more domains but I'd argue that the "bones" prevented it from evolving as Python did.

optimized for scanning arbitrary text files, extracting information from those text files

to write prototypes of real applications, or as an extension language of large systems, you name it.


> Anyway I agree with most of what you say, EXCEPT I think Perl's focus on text vs. Python's more general purpose focus can be seen from the creators' very early release announcements!

Oh, I agree with that part, too; Perl's growth into a general-purpose language was very uncomfortable and surprising. I just think they were about equally terrible at linear algebra to begin with.

What would make a language good at linear algebra? I think you'd want, as you say, efficient homogeneous vectors, and also multidimensional arrays (or at least two-dimensional), non-copying array slicing, different precisions of floating-point numbers, comma-free numerical vector syntax (maybe even delimiter-free, like APL), zero division that produces NaNs instead of halting execution, control over rounding modes, arguably 1-based indexing, plotting, and infix operators that either natively have linear-algebra semantics or are abundant and overrideable enough to have them. Python didn't have any of those built in, and a lot of them can't be added with pure-Python code.

You'd also want flexible indexing syntax (that either does the right linear-algebra thing by default or can be overridden to do so), complex numbers, infix syntax for exponentiation, and a full math library (with things like erf, gamma, log1p, arcsinh, Chebyshev coefficients, and Bessel functions, not just log, exp, sin, cos, tan, atan2, and the like). Python 0.9.1 evidently didn't have any of those (you can do x[2:] or x[:5] but even x[2, 5] is a syntax error), but they were mostly all added pretty early, though its standard math library is still a bit anemic. Like Perl, though, the first version of Python did have arrays and floating-point support (arithmetic, reading, printing, formatting, serializing) from very early on; unlike Perl before Perl 5, its arrays were nestable. (Perl 5, in 01994, also added a slightly simplified version of Python's module and class systems to Perl. I forget if "use overload" was already in there, but it seems to be documented in the 01996 edition of the Camel Book, so I guess it was in Perl 5 from pretty early versions.)

Numeric and Numpy added most of these things to Python, and IPython, Matplotlib, and SciPy added most of the others. Adding them to Perl 5 would have been about the same amount of work and would have worked about as well, but the people who were doing the work chose to do it in Python instead. It isn't the choice I would have made at the time, but I'm glad they had better technical judgment than I did.

Nowadays, for a language to be good at linear algebra, you'd probably also want automatic differentiation, JIT compilation, efficient manycore parallelization, GPGPU support, and some kind of support for Observablehq-style reactivity. Julia fulfills most of these but they're hard to retrofit to CPython.

A shell is sort of an "orchestration language", in the sense that a shell script tells how to coordinate fairly large-grained chunks of computation to achieve some desired effect. We've seen an explosion of such things in the last ten or fifteen years: Dockerfiles, Vagrant, Puppet, Chef, Apache SPARK, Terraform, Nix, Ansible, etc. Most of these are pretty limited, so there's a lot of duplication of functionality between them. And most of them don't really incorporate failure handling explicitly, but failures are unavoidable for the kinds of large computations that most need orchestration of large-grained chunks of computation. I wonder if this situation is optimal.


> JavaScript, Ruby, and Perl either don't have this abstraction at all, or they have much weaker versions of it, and many fewer scientific libraries.

I don't know that Numo for Ruby is “much weaker” than NumPy. It looks like installation is rougher since it doesn't bundle dependencies, and its newer and thus there is less downstream ecosystem.

> JavaScript doesn't have operator overloading; I'm pretty sure Perl doesn't, but not sure about Ruby

Ruby and Perl both have operator overloading. (Perl has “use overload”, and in Ruby operators are defined via overridable methods.)


Ruby does have operator overloading.

And it is kind of sad to me that Python is so much more popular than it, even though Ruby has a much cleaner object-oriented foundation. Not to speak of underscores...


Ruby is arguably more OOP than Python, but I'd claim that doesn't help much in the scientific programming / machine learning use case. It might even hurt a little.

This kind of code is naturally expressed in "functions and data" rather than "objects" (data being vectors, matrices, etc.).

And I say this as someone who uses objects in most of my code! (which is not scientific code)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: