Rust has been eating C++ lunch. Same rapid rise of ecosystem story.
Instead of forcing Python to be a language it isn't it might be more efficient and ultimately the "right choice" to invest the time in Julia.
Julia is great for numerical computing, it needs faster time to plot and more hands in the ecosystem. The former will be solved and the latter seems inevitable to me. Pitch in!
Languages are fun to think about but you don't always need to be concerned with every vocal minority of programmers that like to talk about how their language is better than yours. Sometimes that replacement is better and sometimes those people are wrong. But even when they're right, being marginally better isn't that big of deal, or nearly enough to make for a viable rewrite or change of language.
That said I'd like it if it develops a robust and large ecosystem because I personally like coding in it. It has built-in matrix ops, parallel ops, dynamic dispatch etc that are really nice to work with in the numerical space. Like Matlab but well rounded and fast.
So I admit my comment is less argument and more cheerleading. "Hey folks let's make this the case so us numerical people can have a slightly improved experience".
In the grand scheme of things this is as noble or ignoble as any.
I love that Torch is converted to Python. I think it’s more importantly with a large ecosystem than the most efficient language.
I love many things about Julia, but there are definitely growing pains for the language and I can see why it may be a hard sale to people used to modern python or C++, both of which have improved since Julia came out. However, Julia has it's own goals that might make certain communities veer towards it. It'll be interesting to see if it gets more adoption in industry.
Personally, the first version of python I used was 1.5.2 (in 1999). I learned it initially instead of the current 'scripting language' hegemon, which was perl. Since CGI was in extensive use at that time, this was the start of the python web backend ecosystem.
In the late 2000s, I was able to sneak in Python rewrites of MATLAB or Fortran code written by scientists, even in situations where Python was not officially sanctioned for that type of work (I was mainly a C++ developer, but would also write python bindings). I feel like it really was the strength of python's strength as a universal ducktape that caused the ecosystem growth of both the scientific _and_ web stacks.
Nowadays, I've really come to appreciate Julia from a novelty standpoint, and I'm intrigued at what it can do, especially in terms of probabilistic programming and other emerging fields, but in order to justify using Julia in more general 'professional programmer' settings, it needs to focus on what's made modern Python so successful. Otherwise, I could see it's fate as "merely" replacing a more niche language like MATLAB (which Python has /already/ managed to do for the most part).
Python more like #100 in terms of speed, #100 in terms of correctness, #100 in terms of sound abstractions, #10 in terms of readability for large programs. Its real strengths are quick hacks and a decent C-API.
The community is smug, conceited, does not value correctness and in general is intoxicated by Python's undeserved success. Many posers and incompetent people.
I personally don't like Python that much because every library does things differently and sometimes it feels like learning a completely new (sub) language. E.g. NumPy DataFrames allow to do the same thing in multiple ways (e.g. adding an index column, or removing a column). Often when I need to look up how to do a particular thing I end up finding many solutions that simply don't function with the version I am working with. Sometimes looking even into old code of mine doesn't work any more and requires either me using an older library of relearning how to do things.
That being said, a friend of mine has been quite fond of Julia lately. Which put Julia on the top of my list of programming languages to do a deep dive.
Also you can very much not redefine functions on the fly due to world age (assuming the caller doesn't pay for invoke-latest).
There are legends from Common Lisp users where this is very much done.
> When one of the customer support people came to me with a report of a bug in the editor, I would load the code into the Lisp interpreter and log into the user's account. If I was able to reproduce the bug I'd get an actual break loop, telling me exactly what was going wrong. Often I could fix the code and release a fix right away. And when I say right away, I mean while the user was still on the phone.
> Such fast turnaround on bug fixes put us into an impossibly tempting position. If we could catch and fix a bug while the user was still
on the phone, it was very tempting for us to give the user the impression that they were imagining it. And so we sometimes (to their delight) had the customer support people tell the user to
just try logging in again and see if they still had the problem. And of course when the user logged back in they'd get the newly released version of the software with the bug fixed, and everything
would work fine. I realize this was a bit sneaky of us, but it was also a lot of fun.
Taken from https://sep.yimg.com/ty/cdn/paulgraham/bbnexcerpts.txt?t=163...
I know that this is not possible in Julia like that. But just because it's not possible in Julia doesn't mean what they did was somehow wrong when it worked well for them.
This kind of thing sounds fine for a dev server though, in which case Revise+RuntimeGeneratedFunctions will get it done in the cases I've seen.
However, I’m also tempted very much by a system that allows for fixing things so quickly, because it provides for a delightful customer experience. Most customers don’t report most issues, only when something is persistently wrong or is something they can’t get around. So being able to quickly fix it this way seems genuinely amazing. If I were to take this idea further… maybe some sort of customer isolated canary deploy/ feature flag might be the way to express this customer experience in todays world.
The problem is always that you might break other things with your fix. Maybe a battery of testing before canary deploys can raise confidence about the validity of the fix.
Common Lisp, Smalltalk, and Erlang all treat the process as running indefinitely. This gives those languages a certain something special.
I was hoping to find the same thing in Julia. It's not there. It seems like a significant missed opportunity to me. These dismissive comments make it sound like you guys might not know what you are missing.
Coming from a Lisp background Julia fits like a glove in the beginning but some aspects, like the inability to redefine structs, I just crash into like a brick wall. It makes it hard to imagine writing (say) a Blender/Paraview type of application in Julia.
I suppose that Julia is heavily influenced by Scheme and Dylan but not especially by Common Lisp (or Smalltalk) where the idea of being unable to redefine types at runtime is jarring.
I wonder if Julia advocates would do better to just acknowledge that this point of view exists, even if experience with Julia might change it over time, rather than dismissing it (yes, very much so) as empty "fuss." But there aren't that many Lisp/Smalltalk hackers in the world so you can probably afford to alienate them if you want to...
You might want to try and acknowledge the other point of view where I note that, hey, we did make this work but it was a smaller deal then we thought because we legally cannot employ it in many production contexts. We're still going to work out a few details for fun and ease of debugging, but given that we have extensively looked into and thought deeply about the whole issue, we have noticed that the last little bits are less useful than we had thought (at least in the contexts and applications I have been discussing, like clinical trial analysis). That doesn't mean we won't finish the last pieces, but given how much of it you can already do and how many teaching materials show how to do work around the issues in any real-world context, and how little of a real-world application the last few bits have, it shouldn't be surprising that the last pieces haven't been a huge priority. So instead of looking narrowly at one factor, I encourage you to take a more global view.
Talk related - https://youtu.be/5mUGu4RlwKE
Time to plot is much improved in 1.6 and should continue to improve in 1.7. It's definitely being addressed.
I'm not sure what advantage Julia has over Python. Yeah it has some typing and can be faster, but its too similar. Still single threaded.
# Threaded inner loop, each thread has no dependence upon others
Threads.@threads for t=1:Nthr
Having 1 kernel thread for each CPU thread means that your program can use all available CPU threads at the same time (so you get all the parallelism available within the machine), and having a language based scheduler for each thread means you can have minimal overhead (no need to do a system call) to create a new concurrent execution (meaning lightweight/green threading similar to what python allows, except being automatically distributed by the language within all kernel/cpu threads). In Elixir this means you can create millions of processes even though the OS will only see one thread per logical cpu thread, and I never felt the limitation of this abstraction over multiprocessing (of course, Julia is definitely nowhere near as mature - and maybe never will due to stuff like preemptive scheduling and parallel garbage collection that is easier to implement in a language with only immutable types, though it seems to be moving along, and in Julia 1.7 the processes being able to move between kernel threads solving the issue mentioned in that discussion you linked).
If you want julia to use multiple system threads, why are you suggesting one not use system threads for this test? All you have to do is start julia with multiple threads and it'll use those threads for your infinite loops.
Since I didn't use dirty processes in elixir it did make me forget about this obvious issue you pointed out, that for a mutable language like Julia can happen in every thread, but that's not something that limits the expressiveness of the model, but something that requires consideration to avoid while programming and language level mechanisms to protect the thread (at the very least the ability to define timeouts that can throw an exception on any spawned process) or maybe a future framework on top of it that handles this in a safer way (something like Akka). I can only hope Julia can achieve the full potential of it's multithreading model.
GIL can be released and it is released e.g., for IO, during heavy computations in C extensions--Python program can utilize multiple CPU cores.
And no, Julia is not too similar to Python. Julia has multiple dispatch, Python does not.
Yeh this is superficial, but so are 200 dollar sneakers and they do just fine.
Honestly there is a real pleasure writing code that looks like it could be on the blackboard. The numpy / numba world in Python just feels... not great.
Even after 25 years in the business, it's still a pain.
Edit for details: `Σλ∀φ` is a chore to read for me. These letters only make sense if I've absorbed the author's notation beforehand, like in a math proof. I parse the snake case version instantly, because my brain is trained for that. And I understand it much more quickly, because it says exactly what the function does in the way I would represent it in memory.
> I'd hate to have to learn which Greek letters correspond to which operations every time I dive into a new codebase.
You have to do that anyway, because you’ll have to connect the code you’re working on with the scientific paper describing it. When the code uses variable names too far removed from the mathematical symbols, you have to do make two steps: figure out how the words connect to symbols and then figure out the symbols. This will be especially difficult for the mathematician/scientist without a strong coding background: they’ll have much less friction reading code that matches the symbolic notation and Greek letters that they’re used to.
> The advantage of English words is that everyone knows them.
Not when it comes to “math” words.
> `Σλ∀φ` is a chore to read for me.
Right, but that’s because you have a different background. For me, `Σλ∀φ` is much easier to read and understand. More importantly, a symbolic notation is much denser, which allows to parse long expression that would be very hard to understand if they were written out in words.
Again, this is for the very specific context of highly mathematical/scientific software that Julia excels at and is primarily used for. In a more general context (when the software isn’t a direct representation of a scientific paper), I’m 100% on board with good, descriptive variable names
Of course it will limit who will be able to interact with your project, which can be a good thing or a bad thing. For the math-unicode in numerical software, you may not even want someone without at least a minimal math background (enough to understand Greek letters and math symbols) working on the code. Likewise, a project that’s inherently Chinese, where it doesn’t make sense for the users/developers not to know Chinese, should feel absolutely free to use Chinese symbols in their source code.
On the other hand, if you do it gratuitously, you just unnecessarily limit collaboration. I’m ok with that, too, personally: it’s your own foot you’re shooting.
There's really nothing wrong with beginners starting out in their own language. Why shouldn't a 14 year old Chinese kid write their first programs using Chinese characters as identifiers? I'd much rather have a language support full Unicode they way Julia does than to force everyone into ASCII.
I remember that one of the first things I did when learning C++ was to implement a matrix library, operator overloading and all (oh, how naive when it comes to getting it right!).
So the ability to express computation in standard mathematical notation rather than having to invent pseudo symbols to do it makes it much easier to read for people who already have that training. And for people who don’t… it does require pre-reading to understand how these symbols are used but you have the benefit for hundreds of years of math literature and community to look it up!
It's a hell of a lot better than reading `Sigma`, `lambda`, etc. and having to do both the verbose English -> character translation in your head while trying to understand the mathematical form. At the end of the day, you do need to understand the math in order to understand mathematical code, and making it have as few translation steps really helps.
And so it is with greek letters. Many of them carry intrinsic meaning, or a loose collection of related meanings - they effectively are labels. It's not that people want to write dense and impenetrable code, it's that they would rather write μ than coefficient_of_friction because that is its proper name and how it appears in formulas and textbooks.
But seriously, it's 2021. We are no longer slave to ASCII, or even English.
Most of the newer math oriented langs have heavy use of unicode (julia, agda, lean, etc).
To me, it's a matter of familiarity. clearly human brains can process large amount of symbols. Just look at some east asian languages. Historically, there has been an ASCII bias in computer languages, but that is history.
Mathematics is a /language/, and one of the most universal in humanity because it models certain human reasoning.
Julia is a much better solution to the two language problem, here's hoping it can overcome the ecosystem inertia.
While I'm really not a fan of 1-based indexing, Julia's multiple dispatch is not something easy to match in Python.
[EDIT]: one thing that's still not solved in Julia is code startup time.
Many people will sell you some sort of workflow that works around the problem, but it's the same old tired arguments people would use to defend traditional compiled languages, and I'm not buying.
I really wish they would find a way to truly solve this.
as in 1.7 > 1.6 > 1.5 > 1.4 > 1.3 > etc...
it's especially goten way better since julia 1.5, so really mostly in the last few years.
In julia 1.8, what's interesting to me is that the julia runtime will be separated from the llvm codegen; https://github.com/JuliaLang/julia/pull/41936
the immediate effect is to allow small static binaries without a huge runtime (namely the LLVM ORC), but the side effect is probably that the interpreter will also get better in cases where you don't want JIT.
> Many people will sell you some sort of workflow that works around the problem, but it's the same old tired arguments people would use to defend traditional compiled languages, and I'm not buying.
I mean, Julia has a REPL so you can basically edit code as the process runs, which definitely makes startup time less of an issue. The fact Julia can produce such fast code is also pretty nice.
Starting a new process every time you want to run a snippet of code isn't getting the best out of a dynamic language...
Really? I thought that was a ~2017 thing.
It won't stop either because the road between between JS client dev and JS server dev is so smooth. Path of least resistance type thing.
I guess in some startup scene, it has been Java and .NET over here and no signs of changing, despite the occasional junior projects that eventually get rewritten back into Java/.NET stacks when they move on.
It’s PyTorch-if they said “the next version of PyTorch will be in Julia, the ecosystem would shift accordingly.
They’re practically saying “this language has every feature we need and want, most of them already existing, but we’re going to continue re-inventing them in this objectively less suitable language because we clearly wish to make life harder for ourselves”
I have used MATLAB, R, Python and Julia extensively for doing all sorts of data related things during the last 20 years. Julia is incredibly easy to work with, very elegant and really efficient.
R and Python have always felt clumsy in some ways, and hard to write really performant code, even if you are more proficient in Python! As a seasoned Lisper and MLer, even after having a lot of Python experience in my belt, Julia felt much easier to work with from the very beginning.
Furthermore, most Julia libraries are written in pure Julia which simplifies things a lot and enhances composability. While there are great libraries around, the DL landscape is a bit lacking. Flux is great, but I would not use it to build e.g. transformers as it changes too often and has too few maintainers behind it. Hence a potential migration of Torch to Julia would be fantastic.
You can take a Python web server process, have a request call a task that uses NumPy and OpenCV and scikit-learn, get that back, and you're done, all in the same language.
Julia's community does not seem to have aspirations beyond high-performance math code, which is great for its use, but I'm not going to learn Julia just for that when I can implement the entirety of a development pipeline in Python and have all the other niceties that come with it.
And with upcoming improvements to binary size and structured concurrency (it already does go-like lightweight threads) it will get even better.
PyTorch is not only easy, but is a joy to work with.
Among researchers, TensorFlow is rapidly losing ground to PyTorch, and, I think, will keep losing ground until it becomes a niche and only used by Googlers and some others.
Soumith Chintala had a keynote talk in juliacon where he focused on these points;
Another issue is pytorch/tf in python are very dominant in research/projects. Often we clone relevant recent projects and try experimenting with them to see if they help. Swapping to Julia would hurt a ton in that area.
edit: Also while I'm fond of python I'd be very open to seeing another language win. There are language design choices I dislike in python, but I like enough of the language and ecosystem as been too strong to leave most other languages worth pondering. If Julia grows enough that my coworkers start asking for Julia support I'd be happy to explore it. My pet preferred language is crystal (ruby like readability + types + good performance) but ecosystem wise it's tiny.
Numpy -> Array + broadcasting (both in Julia Base)
pytoch/tf -> Flux.jl (package)
batch/stream processing -> you don't need it as much, but things like OnlineStats exist. Also Base has multithreaded and distributed computing. Spark in particular is one where it lets you use a cluster of 100 computers to be as fast as 1 computer running good code.
pyarrow -> Arrow.jl (there's also really good packages for JSON, CSV, HD5 and a bunch of others)
Let me know if you have any other questions. Always glad to answer!
What’s workflow orchestration choice? That’s main one you didn’t touch. My work area in on an ml training platform and a lot of my work can be described as wrapper work on kubeflow to allow dozens of other ml engineers to manage experiments/workflows. For python the main choices are kubeflow/airflow. Ray kind of but ray workflows are still quite new and missing a lot of useful features. I need some system to run hundreds of ml workflows (one workflow being like 5-10 tasks some short some long) per day and manage there tasks well.
Broader area also includes libraries like weights and biases, bento ml, etc (experimentation management libraries).
In theory you can have workflow manager in one language and workflow code in a different language. Main downside is it makes debugging locally workflows harder (breakpoints are a little sad across most language boundaries), but it is doable and we debated migrating to temporal (Java workflow system) before.
We’ve moved away from language-integrated orchestration entirely at my work: we use Argo Workflows on Kubernetes, so we’re just orchestrating containers and aren’t beholden to language-specific requirements anymore so you can use whatever language/tool you want provided it packs into a container and accepts/returns what the rest of the workflow expects.
One of the really big potential benefits of Julia is that it lets you remove language barriers which is especially nice if you are doing ML research (or playing around with new model types, etc). Since the ML stack is Julia all the way down to CUDA/BLAS/julia loops, you can really easily inspect or modify everything in your stack.
I'll leave it to HN to figure out what that means :P
PyTorch is a small part of the python ecosystem. The python ecosystem is not going to change at all if PyTorch moves to Julia.
Glad to see functorch as PyTorch is the library I have the most experience with.
But all bullet points are about things that are easily done right now with libtorch (pytorch underlying C++ core code), and the hassle is... Python.
Well rational conclusion would be, just do everything in C++, and bind to Python. Make C++ first citizen here, since in all cases it'll be needed for performance, forever.
In other words, python binary wheels are harder to maintain than source-only python packages. And pytorch uses more than a few. I can't imagine Julia makes the problem much simpler. The main pain point is probably the lack of standard, multi-environment packaging solutions for natively compiled code.
I don't know what it would take for this sort of pain point to improve significantly. Some standards around how C, C++, and Fortran projects are packaged would help. This would allow projects to build on top of existing natively compiled tech a lot better. Maybe the biggest reason those languages don't have the same "ecosystem" as python is utter lack of packaging standardization.
Are you talking about something like BinaryBuilder.jl, which provides native binaries as julia-callable wrappers?
IMHO, one of the biggest advantages of Julia _is_ arrays.
in fact, it appears to me that they intentionally made the core language with fewer amounts of intrinsics relative to other languages.
Many of the packages in JuliaArrays/ might as well as be in the core language, especially things like StaticArrays:
It’d be interesting to see how much of the Python ecosystem is actually necessary to move PyTorch to a better language.
I’m afraid we’re stuck with Python for the next 20 years. That makes me very, very sad.
Its important to remember that most of the python ecosystem, isn't written in python. The functions are often thin wrappers/objects around the real computation, which is often written in a faster language, C/C++/Fortran.
Julia excels in composability, performance, ease of development. You don't need to recode your algorithms in another language due to the performance of Julia, as is needed in Python's case.
Generally speaking, I see Julia's biggest fault, time to first plot, being addressed far sooner than python being redesigned to have Julia's capabilities. For the record, I use python daily in the day job. And I use Julia there for analytics there, often with jupyter notebooks. Easy for users to consume and interact with.
I barfed at 1-based indexing for about a week, but now it is as natural as anything.
I would compare 0-based and 1-based indexing with whether you put semicolons at the end of each line or not. Either way doesn't really change the feel (semantics) of the language.
Also, fortan is 1-based, iirc, and a lot of numerical code is in fortan.
Oh, and many many beginning programmers and scientists have a hard time with 0-based indexing. Not sure why, but such you hear, so the choice is really not that odd.
It makes sense in certain context (and in languages like C that have a low-level mental model). For scientific computing at a higher level of abstraction where the mental model of a multidimensional array is a tensor, and not a memory offset location, zero-based indices really get in the way
The sensibility of the index choice is equal to the starting value of the index.
People should stop wasting time bikeshedding this insignificant detail, but for some reason it is to programmers like a red rag to a bull.
When you have to deal with a wide range of languages, stuff like this is small potatoes, compared to, say, indentation based structure. The latter can result in the completely non-obvious change of program flow due to a single errant space or tab.
Also have a look at https://github.com/giordano/StarWarsArrays.jl
I worry about someones ability to solve real problems in any language if they can't get their head around an +1/-1 when indexing into an array.
This. Think about what this signals to employers and interviewers if someone throws a hissyfit over this.
So this "very big elephant" is, in reality, a nothingburger.
For me, the very big elephant in the room is the semantic formatting. It has and continues to trip me up time and again. A touch of a space bar can change the flow of a python program. That is a huge asteroid of a problem.
For me I think the packaging ecosystem is bad, we need one package management tool like poetry built in. We need a built in typing system like typescript. Lastly we need to remove the GIL.
I’m pretty sure all of these are currently being addressed by the community.
I switch languages a lot and things like functools, itertools, dunder methods, list comprehensions, dict comprehensions are things I sorely miss especially in typescript. In particular list and dict comprehensions when used with care are a great deal easier to work with and reason about when transforming data.
I like to think that containers only exist because deploying a Python application is so %^#(&*# complicated that the easiest way to do is to deploy an entire runtime image. It's an absolute nightmare and travesty. So bad. So very very bad. https://xkcd.com/1987/
I'm not optimistic on TypeScript for Python. That'd be great if such a thing existed! I'm not optimistic on packaging or deployment. There is recent progress on GIL removal which is exciting! There is hope, but I'm proceeding with very cautious optimism.
Comprehensions are kinda great, but also hideous and backwards. Rust iterators are a significant improvement imho. The fact that no recent languages have chosen to copy Python's syntax for comprehensions is telling!
Oh, and I think the standard library API design is pretty poor. Filesystem has caused me immense pain and suffering. Runtime errors are the worst.
MyPy exists, Python officially supports type annotations.
I do think comprehensions are a weird feature for Python, particularly from the “one way to do it” perspective. And also because the language so strongly discourages FP patterns outside of comprehensions.
Overall I would say the language is a lot better than the ecosystem, and that it suffers a lot from having a bad packaging design. I’m not a fan, suffice it to say. It’s best if you can stick to the standard library.
Do you mean os.path, pathlib.Path, or something else? I use pathlib.Path all the time and never get filesystem errors anymore, it's really practical.
Not even close to "only thing". Dynamic language is a net loss of productivity once you reach a certain level of scale. Refactoring a codebase with millions of lines of code in a dynamic language is an absolute nightmare.
Opinions vary at what level of scale this happens. My personal experience is that once you hit just a few thousand lines of code that dynamic typing is a drag on productivity.
There's a reason things like TypeScript are increasingly popular.
I don't care what language first year CS students use. I care what languages I'm forced to deal with on a day-to-day basis!
JET.jl is a interesting project to do some error analysis through abstract interpretation:
more so than the goals of JET itself, the abstract interpretation framework it exposes should allow for TypeScript-like gradual typing.
This is important, imho, if Julia is ever to go out of a niche area because most of the rest of the world has already moved on to see the benefit that the improvement in application reliability that static analysis allows.
Python is popular because of the ML revolution. If ML didn't take off neither would Python's popularity. Is ML successful because of Python or despite Python? Well, the world is probably further along with Python than if it merely didn't exist. But if a different language that sucked less existed we would, imho, be further along than we are.
I'm not annoyed Python exists. I'm annoyed that its inertia is so monumental it's inhibiting further progress. We're at a local maximum and the cost to take the next step is really really really high. That's not Python's fault mind you, just the way things work.
Python was popular before because it's very nice language. People wanted to use it for science to, so they wrote very good scientific libraries for it.
R was very popular for non-neural-network ML some years ago, yet it wasn't picked up for NN, because R kind of sucks for general programming. As the joke goes, the best part of R is that is a language written by statisticians. The worse part of R is that is a language written by statisticians.
Python was growing at accelerated speed year on year well before neural networks.
Python was popular, including for scientific use, before the ML revolution; in fact, the only reason it is associated with the ML revolution in the first place is it's preexisting broad adoption in scientific computing.
Python and PHP are so big because of the languages themselves and their implementation, Js is a bit different in that regard I’ll admit.
Is Python better than assembly and C/C++ for ML? Absolutely. Is Python good? I don't think so. Other people might use that term, I do not. I think Python is a bad language that would be designed very differently it was built today. And we're stuck with its decisions because the inertia to change them is monumental.
It's not really the language's fault. As an industry we've learned a lot in the past 30 years! It would be a travesty if we hadn't made language design process! Unfortunately we haven't figured out how to effectively apply lessons to encumbered ecosystems. Case in point: the transition from Python 2 to Python 3 was an absolute disaster. And it only made tiny, incremental improvements!
I was recently writing code using Reactor/RxJava in Java 11 w/ Lombok. I don't think I've ever been so productive or lead a team as productive as when we were going ham on the functional/reactive Java. Now that I'm back in Python land, I am constantly frustrate on a daily basis with both the language and the runtime at every turn. Even with the asyncio we are working on, it feels like the absolute minimum viable product when compared to the java, node, or rust I have done.
There are some fantastic python enhancements that bridge some of the gaps like PEP 484 (Type Hints) and PEP 517 (holy crap an actual build systems that are not dogcrap) but it feels like the python community does not care.
I wrote a somewhat tongue-in-cheek rant blog post. https://www.forrestthewoods.com/blog/things-i-like-about-pyt...
I deny that hiring Python is hard beyond “hiring is hard”.
> You think you're getting a good programmer, because they know all the leetcode tricks,
Unless I want someone for a role that is very much like reproducing leetcode tricks, I don't think I would think someone is good for it because they are good at those. In fact, leetcode is mercilessly mocked as being almost completely irrelevant as a positive signal for hiring, though it may be useful as a filtering tool to reduce an overwhelming volume of applicants to a manageable one where high rates of both false negatives and false positives, but some slight advantage over just randomly discarding applicants, is tolerable.
That's...a complete topic switch and irrelevant. The discussion I was responding to is about the challenges facing whoever does have control.
Outside of that I interviewed several of my friends (I know them from a non-programming context, so I don't know their competency) who were predominantly python devs, and completely noped out of them for the same reasons (and these were my friends).
All I'm saying is that signal to noise for the common tests you give for 'problem solving aptitude and conceptual fundamentals', is much lower when you are hiring for a python position. You think you're hiring for those things, but you're actually hiring for leetcode-optimizers.
I mean, I'm not trying to do hire like that, and I think I have an interview that tries to test that effectively, but I have had to deal with the downstream effects of people who are doing hiring like this, and that has been a real problem for me.
If you instead hire for "engineering" positions, without caring about what languages the candidate knows, you can interview for their ability to solve practical programming exercises  in whatever language they are most familiar/comfortable with. Maybe this only works at FAANG-level hiring, but in these contexts, top tier candidates can get things done in any language, and that's really what matters, no? But more to your point, I've generally found candidates that pick Python (or Ruby/Perl/etc) can actually accomplish more–and therefore prove their capabilities–in the space of an interview simply because they're picking a more expressive language. Bad candidates will prove they are bad candidates no matter what language they choose.
1: Eg, reading/manipulating files, merging/filtering/sorting data, generating and processing multi-dimensional data structures, etc.
Python is one of the few languages that has a balance of ease of use, ecosystem, ubiquity, and useable type system. It's a fantastic glue language and it's extremely flexible.
> A language must compile to efficient code, and we will add restrictions to the language (type stability) to make sure this is possible.
> A language must allow post facto extensibility (multiple dispatch), and we will organize the ecosystem around JIT compilation to make this possible.
> The combination of these two features gives you a system that has dynamic language level flexibility (because you have extensibility) but static language level performance (because you have efficient code)
Given those constraints, the first language that comes to mind is Java. Why is Java basically not a player in the scientific-computing game?
There’s also some unfortunate choices Java made like standardizing one specific semantics for reproducible floating point code. That’s unfortunate because adjusting for native SIMD widths sacrifices reproducibility but improves both accuracy and speed. The only choice if you want perfect reproducibility on all hardware that Java supports is the worst performance model with the worst accuracy.
There’s also the fact that Java integers are 32-bit and Java arrays are limited to 2GB, which was reasonable when Java was designed, but is pretty limiting for modern numerical computing.
I also think that the JVM object model is quite limiting for numerical computing. They still don’t support value types, but value types are precisely what you want to represent efficient numerical values like complex numbers, or quaternions or rationals, and so on. Java forces all user-defined types to be heap-allocated reference types. Julia solves this by defaulting to immutable structures, which is exactly what you want for numerical values: the semantics are still referential (if you can’t mutate you can’t distinguish value semantics from reference semantics), you just can’t change values, which is exactly how you want numbers to behave (you don’t want to be able to change the value of 2).
Lack of value types in Java also makes memory management unnecessarily challenging. You can’t make a user-defined type with an efficient C-compatible array layout in Java. Because the objects are references, so the array is an array of pointers to individual heap-allocated objects. The ability to subtype classes forces that, but even with final classes, the ability to mutate objects also forces it, since pulling an object reference out of an array and modifying it is required to modify the object in the array (reference semantics), and that’s incompatible with the inline array layout.
And finally, this same lack of value types puts a LOT of pressure on the garbage collector.
This is mostly true, but the primitives are value types and you can get some things done with them. (Not enough to make Java good for these use cases, no.) I.e. write float instead of Float and you have a contiguously allocated region of memory that can be efficiently accessed.
What do you do in a dynamic language where no such separation can be enforced? Or in a static language where you just don’t want that kind of ugly anti-generic bifurcation of your type system? The classic static PL answer is to just make all objects have reference behavior. (Which is the approach Java takes with a pragmatic ugly exception of primitive types.) This is the approach that ML and its derivatives take, but which is why they’re unsuitable for numerical computing—in ML, Haskell, etc. objects have “uniform representation” which means that all objects are represented as pointers (integers are usually represented as special invalid pointers for efficiency), including floating point values. In other words an array of floats is represented in ML et al. as an array of pointers to individual heap-allocated, boxed floats. That makes them incompatible with BLAS, LAPACK, FFTW, and just generally makes float arrays inefficient.
So what does Julia do? Instead of having mutable value and reference types, which have observably different semantics, it has only reference types but allows—and even defaults to—immutable reference types. Why does this help? Because immutable reference types have all the performance and memory benefits of value types! Yet they still have reference-compatible language semantics, because there’s no way to distinguish reference from value semantics without mutation. In other words immutable structs can be references as far as the language semantics are concerned but values as far as the compiler and interop are concerned. And you can even recover efficient mutation whenever the compiler can see that you’re just replacing a value with a slightly modified copy—and compilers are great at that kind of optimization. So all you give up is the ability to mutate your value types—which you don’t even want to allow for must numeric types anyway—and which you can simulate by replacement with modification. This seems like a really good trade off and it’s a little surprising that more languages don’t make it.
The Julia approach is indeed elegant. An in-between position taken by F# is statistically resolved type parameters, where type safety is maintained and semantics largely preserved by constraints on members. It's a pain to write but trivial to consume though errors can be obstruse. More type level flexibility on the CLR (which seems to be planned) will further improve things when it comes to generic programming.
Python was written with people like scientists in mind. Professionals write fast C libraries and then people who know just enough to get by use python to glue it all together.
Having just taught a bunch of scientists Python, I'm skeptical of this.
Stuff that is obvious to me, but really hard to explain to coding beginners:
* why `len(list)` but `str.join(...)`? I.e. why are there two ways to call functions?
* why does `` do so many different things? List comprehensions, list construction, subsetting, sometimes it secretly calls a method....
* why in the name of god is the first row called 0?
* and then why does [0:3] mean rows 1, 2 and 3, not rows 0, 1, 2, 3 or 1, 2, 3, 4?
* ... except in pandas where sometimes [foo:bar] includes bar...
* in general, why is pandas so damn complex?
I thought it'd be easier to teach than R, but I'm not quite so certain now.
- plenty of scientists write good code; I think the "scientist can't code" meme is harmful.
- most of the people that write PyTorch code aren't necessarily scientists - they're software developers (data scientists, ML engineers, research engineers, whatever title - but their main job is to write code)
> Python was written with people like scientists in mind.
Python had nothing to do with scientists when it started. It was written to be readable and easy to use and it started as a hobby project. It existed for over 15 years until the data science / ML ecosystem started growing around it.
But you can just write a simple 20 line Python script to do some data mangling, no project with 30 IDE files required.
Visual J++, Visual Cafe and JBuilder were the main ones but not everyone was eager to buy them, while the JDK was free beer.
Python doesn’t compile to machine instructions either and there’s nothing that prevents GPU access from Java. In fact I’d bet in many cases pure Java beats Python + C library though it obviously depends on how much has to be written in pure python.
And the VSCode extension isn't quite IDE level, but that's just a matter of resources.
In future it would be great to mashup something like repl.it with Jupyter notebooks, maybe with Medium-style wysiwyg. That might really win.
See this talk for examples: https://www.youtube.com/watch?v=kc9HwsxE1OY
There is the use of @ (but to signal macros), but otherwise, the syntax is much closer to a cross between Python and matlab except nicer for doing math.
I tried writing a few programs in Julia and got sucked in by how effective it is. The real surprise is that just a few weeks in instead of pulling up R to do a quick calculation my fingers decided they wanted Julia.
for example, check out parts of the stdlib:
but in the end, julia really is a lisp:
Meanwhile I have actually seen real world lost productivity due to white space in python. Curly brackets not so much. Never seen lost productivity over begin/end but I'm sure it's happened. It seems silly to me either way - it doesn't really effect anything.
But yes end closes begin, for, function and while.
But, then what?
I could not use it anywhere I worked. The ecosystem was lacking.
Julia is good, but for what exactly?
People involved with Julia are always big with words, but when will I see it in use somewhere?
What has not been accounted for is that the huge community / network effect of the python ecosystem is very far from exhausting itself. If anything, it is just starting as the exponential growth has mostly been the last few years (tautology, he he)
A major investment to eliminate python technical debt would make more sense if things were stagnant and the re-engineering would open up entirely new domains.
Some portions of the ecosystem are rock solid, especially the parts where JuliaComputing makes money from consulting(not all but some). Other parts are beds of sand/permanent research projects. The median experience is usually someone points you to a package and it doesn't really do what you hoped it would so you end up adapting it and rolling your own solution to a problem. Maybe you try to make a PR and it gets rejected because of "not invented here"/academia mindsets, either way you made a fix and your code works for you.
What makes this barrier hard to overcome for adoption is: trust, and blind spots. People who aren't experts in a casual work area (maybe computer vision) realize they can't use a tool to do something `basic` and run away to easier ecosystems(R/Python). People who are experts in other areas, check credentials of packages see that an ivy league lead researcher made it and assumes it's great and usable for a general audience. So you'll get a lot of "there's a package for that" but when you go to use it you might find the package is barren for common and anticipatable use cases in industry (or even hobbies).
This makes Julia best positioned as a research tool, or as a teaching tool. Unfortunately - where Julia actually shines is as a practical tool for accomplishing tasks very quickly and cleanly. So there's this uncomfortable mismatch between what Julia could be and what it's being used for today. (yes Julia can do both not arguing against it). The focus on getting headlines far outsurpasses stable useful stuff. Infact, very often after a paper gets published using Julia, a packages syntax will completely change - so no one really benefits except for the person who made the package.
Interestingly, 1 person(with some help of course) fleshed out the majority of the ecosystems need for interchange format support(JSON), database connections, etc. It's not like that person is jobless spending all their days doing it - it was a manageable task for a single smart person to kick off and work hard to accomplish. Why? Because Julia is amazing for quickly developing world class software. That is also kind of its detriment right now.
Because its so easy to create these amazing packages you'll find that a lot of packages have become deprecated or are undocumented. Some researcher just needed a 1 off really quickly to graduate, maybe the base language(or other parts of the ecosystem) changed many times since its release. Furthermore, if you try to revitalize one of these packages you'll sometimes find a rats nest of brilliance. The code is written very intelligently, but unpacking the design decisions to maintain world class performance can be prickly at best.
One of Julia's strengths is it's easy/clean to write fast enough code. One of its downsides is, this attracts people who focus on shaving nanoseconds from a runtime (sometimes needlessly) at the expense of (sometimes) intense code complexity. Performance is important, but, stable and correct features/capabilities mean more to the average person. After-all, this is why people use, pay for, hire for: Matlab, Python and R in the first place - right?
Most people don't want to have to figure out which ANOVA package they should use. Or find out in a bad way some weird bug in one of them and be forced to switch. Meanwhile in R: aov(...).
Do I blame Torch for not using Julia? No. Should they consider using it? Yes, absolutely. Does Julia's cultural issue need attention before risking Python(or anything else) reinventing a flavor of Julia that's more widely used for stability reasons alone - in my opinion, yes (see numba, pyjion, etc). Still love the language, because technologically it's sound, but there are blemishes. I'd chalk it up to growing pains.
(To be fair, Postgres has an extremely similar issue with JSON data types and it's doing fine.)
The state of tabular data formats is similar but instead of 2 libraries there are 20, and some of them are effectively deprecated, but they're not marked as deprecated so the only way to find out that you shouldn't be using them is, again, to ask a question about them in Discourse or Slack. You can check the commit history, but sometimes they'll have had minor commits recently, plus (to Julia's immense credit) there are some libraries that are actively maintained and work fine but haven't had any commits for 3 years because they don't need them. I assume this will get worse before gets better as the community tries to decide between wrapping Polars and sticking to DataFrames.jl, hopefully without chopping the baby in half.
I feel like the "not invented here" mindset contributes a lot to that fragmentation. It's easy to write your own methods for types from other Julia libraries because of multiple dispatch, which seems to have resulted in a community expectation that if you want some functionality that a core package doesn't have, you should implement it yourself and release your own package if you want to. So we have packages like DataFramesMeta.jl and SplitApplyCombine.jl, not to mention at least 3 different, independent packages that try (unsuccessfully IMO) to make piping data frames through functions as ergonomic as it is in R's dplyr.
Despite all of this, I still like the language a lot and enjoy using it, and I'm bullish on its future. Maybe the biggest takeaway is how impactful Guido was in steering Python away from many of these issues. (The people at the helm of Julia development are probably every bit as capable, but by design they're far less, um, dictatorial.)
Again, completely agree with the sometimes confusing state of the ecosystem. Sometimes I wish a bit of democracy existed, but people are people. I proposed some solutions to that problem a while ago but that's a story for another year.
Academia does create a very different kind of reward system that is often counter to community progress. IE: get there first, publish, obfuscate to thwart competition, abandon for new funding. Tends to reward people the highest for not giving credit, or sharing progress.
Meanwhile, people relying on alternatives to julia are more like: load in trusty xyz, use it in trusty way, I'll upgrade when it makes sense, and check the docs not the code when I am unsure of something.
Not to say industry is much better(I keep saying `academia`), but industry projects do tend to appreciate/honor free labor a little more kindly. That or they close the OSS gate and you get what you get.
Novelty is a driving force, but too much entropy and not playing well with each other can destroy a meaningful future quickly. It'll work itself out, one way or another but only because the technology is good :D.
With how quickly these frameworks change it's overwhelming to keep pace! Anyone have advice for solid frameworks that can reasonably leverage GPU's without too much heavy lifting?
Yall really just write procedural code for everything?
Another non-functional application of tail call elimination is finite state machines. Writing them as functions calling the next state in tail call position is very elegant, legible and efficient.
Were it not for Firefox and Edge teams who torpedoed that feature, it would be a part of the major language of today.
Maybe it still will be.
I use Python plenty, just not in large enough doses that I have to actually make peace with it.
Julia code might also uses a lot of in place operations which would be hard for a compiler to infer as safe.
Well, for example at the very least in Common Lisp you'll have much more joy with higher-order functions than with loops. The simple reason for that is the existence of compiler macros (http://clhs.lisp.se/Body/03_bba.htm) which can replace function compositions with arbitrary code. And it's much easier to figure out what the function composition does than to write a loop vectorizer.
Julia also has similar macro capabilities to common lisp.
> And it's much easier to figure out what the function composition does than to write a loop vectorizer
I probably agree but complicated loop vectorisation would probably expressed in Einstein tensor contraction notation such as Tullio.jl provides. I wonder a Fourier transform would look using function decomposition.
@tullio F[k] := S[x] * exp(-impi/8 (k-1) * x) (k ∈ axes(S,1))
Maybe a new iteration construct specifically supporting parallelization would help here. Some of these things might already work quite well, for example it should not be difficult to discern from (let me use ITERATE's example)
(iterate (for el in list)
(finding el minimizing (length el)))
No, I AM going to write procedural code, and it WILL be faster than your "high IQ" 1 line recursive solution. Also funny to see how little recursion gets used in CUDA/Pytorch/GPU programming - which is what we are seeing to be more and more important over time.
If your language of choice features formalized modules in the Modula-2 sense, its loops most likely can't span multiple functions even within a single Modula-2-sense module, even if you wanted that (for example, in a state machine with named states as functions which tail-call each other to switch state by transferring control, or something like that).
That's true in many situations, but in this case the equivalence between the two is so straightforward that I can't see recursion gaining you any clarity. Perhaps you have a concrete example?
Sometimes it makes more sense and is more flexible to package functionality as a recursive function, then mutation and variables feel like clutter. Other sometimes using recursion instead of a loop gains nothing expressively. Or sometimes structural recursion + pattern matching is the much more elegant approach in terms of communicating and capturing the essence of the algorithm.
It works by inspecting the code and rewriting a function to turn tail calls into loops.
The interesting bit is that it was very easy to write because of the strong macros in Julia.