Hacker News new | past | comments | ask | show | jobs | submit login
Mojo might be the biggest thing to happen in programming for decades (fast.ai)
367 points by ZephyrBlu on May 4, 2023 | hide | past | favorite | 195 comments



After my experience with swift and the bone headed decisions made in lower level design items of the language to make it more 'ergonomic' at the expense of compile speed and debug-ability, I worry about chris latner making the same mistakes again.

Swift literally took 10x more time to compile than equivalent objective-c code for not much benefit when it first came out, along with a debugger that to this day is significantly slower, unreliable and is still buggier than the objective-c one. It also does not scale linearly in compile time speed when you throw more cores to it, unlike c, c++ & objective-c, and linking is pretty much solved now with the mold linker.

There are / were huge chokepoints in the compiling chain that caused many of these regressions, that when I learned the minor benefits they brought made me do a facepalm. So many literal expensive megayears of life wasted for so little benefit.

%80 of the benefits of swift that were cheap to implement could've been ported over something equivalent to objective-c for little to no compile time penalties with the new ergonomic language syntax, such as strong nullables and enum ADT types. To this day, it's you still have to codegen mocks. It's frustrating.

I would wait and see what benefits that mojo will actually bring, and I hope that chris and the team there has learned from their mistakes with swift, and chose compile speed over little features that many could live without.

I also hope they use this as an opportunity to solve pythons horrible package management issue and copy ideas from rust's cargo liberally.


I have literally the same experience about slow and overly complex type systems and too much sugar as you're pointing out. I've learned a lot from it, and the conclusion is "don't do it again". You can see a specific comment about this at the end of this section: https://docs.modular.com/mojo/notebooks/HelloMojo.html#overl...

``` Mojo doesn’t support overloading solely on result type, and doesn’t use result type or contextual type information for type inference, keeping things simple, fast, and predictable. Mojo will never produce an “expression too complex” error, because its type-checker is simple and fast by definition. ```

It's also interesting that Rust et al made similar (but also different) mistakes and have compile time issues scaling. Mojo has a ton of core compiler improvements as well addressing the "LLVM is slow" sorts of issues that "zero abstraction" languages have when expecting LLVM to do all the work for them.

-Chris Lattner @ Modular


Threading the needle between "making the same damn mistake over and over" and "second system syndrome" is hard. I really really hope Mojo can do it! And for my favorite language Python too, that's a nice little bonus.


> Mojo has a ton of core compiler improvements as well addressing the "LLVM is slow" sorts of issues that "zero abstraction" languages have when expecting LLVM to do all the work for them.

This sounds super interesting. Is there a good write up about this, or about the specifics of what specific type system features tend to make languages like Rust/Swift slow? Is it constraint-solving stuff? Exhaustive pattern matching?


We are a bit overwhelmed at the moment, but sure, this is something we'd eventually give an llvm devmtg talk about or something. Broadly speaking, there are three kinds of things that matter:

1) Type system. If you go with a constraint based hindly milner (https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_sy...) type system and then add things like overloading etc, you quickly get into exponential behavior. Swift has been fighting this for years now to tamp down the worst cases. This isn't good; don't do that.

2) Mid-level compilation model. "Zero cost abstraction" languages including C++ (but also more recent ones like Swift and Rust and now Mojo) rely in massive inlining and abstraction elimination to get "zero cost" behavior. LLVM can do inlining and simplification and all the other stuff, but it is a very low level of abstraction - generating a TON of IR and then asking LLVM to delete it for you is very slow, and shows up in the profile as "llvm is slow" because you're creating all the llvm ir nodes ("LLVM") and then asking llvm optimization passes to delete them for you. It is far better to not generate it in the first place.

3) Code generation: Reasonable codegen is O(N^2) and worse in core algorithms like register allocation and scheduling. This is unavoidable if you want high quality of results (though of course, llvm is also far from perfect). LLVM is also very old and single threaded. If you don't do anything to address these issues, you'll be bottlenecked in this phase.

These all have good solutions, but you have to architect your compiler in anticipation of them. Mojo isn't just a syntax or PL step forward, it is a massive step forward in compiler architecture.

-Chris


Chris, you are so generous for explaining this stuff, thank you.

Quick question about #2 that I may not be picking up on, but how is it possible to avoid generating the "TON of IR" in the first place? (I'd live with a link to further reading if you're unable to go into details). Thanks!


[Not him of course] — My understanding is that instead of generating a ton of (LLVM) IR, you generate a little (MLIR) IR, because MLIR lets you define operations at higher levels of abstraction, suited to particular tasks. For instance, if you're doing loop optimizations, instead of crawling through a sea of compare and branch and arithmetic operations, you'd just use a higher-level ‘for’ operation¹. Only after you've done everything you can at the high level do you move down to a more concrete representation, so you hope to end up with both less LLVM IR and less work to do on it.

¹ e.g. https://mlir.llvm.org/docs/Dialects/Affine/#affinefor-mliraf...


Ah makes sense, they can do more with an intermediate IR (MLIR) and not make llvm churn through a ton of unnecessary nodes. Thanks!


And developing a ton of IR leads to large binary sizes, but that is something you also see in C++ too with template specialization.


Do you see overloading as must-have?


At this point (and after seeing the same thing working with Deno), I would love if the Swift team created some opt-in swift-2.0 mode that would get rid of that bits of syntax sugar that make Swift slow to compile.

I'd rewrite some parts of my apps in a few days and never look back.


Hope it works out!


> compile speed and debug-ability

I wish people would prioritize this more.

With every new langauge that comes out, sure it brings a few interesting features, but it always takes 5+ years for IDE support, a good debugger, and compile speed (if ever) to arrive.

Good debugging tooling and compiler speed probably outweighs all the other benefits in the long run. Rust is a good example. JavaScript transpiling too. And once it arrives, people will probably be onto the next new language that barely has IDE syntax highlighting ready.

> pythons horrible package management

It really is terrible. The import system is so convoluted, but I guess it's quite old.

Python really needs a good monorepo-friendly package manager like Node's pnpm.

Even though the ESM transition is a clusterfuck, the new module system is actually quite nice and straightforward if you don't need to touch older code.


Go does very well here (maybe less so on the debugger, but the simplicity of the language makes me lean on the debugger a lot less than in more complex languages).


Perhaps other system programming languages can learn from D language where the DMD, D's reference compiler, compile itself and the entire standard library in less than 5 seconds, with parallelism turned off [1].

[1] Ask HN: Why do you use Rust, when D is available?

https://news.ycombinator.com/item?id=23494490


>"at the expense of compile speed and debug-ability"

To me reduced "debug-ability" is one of the primary reasons to ignore any language however "beautiful" it is


> Swift literally took 10x more time to compile than equivalent objective-c

Same with C++ vs. C


That depends on the sourcebase and compiler. Anyone who used Metrowerks C++ back in the 90’s can attest that C++ compilation doesn’t have to be slower than C. Swift has baked compilation slowness into the type system, and it will never be as fast to compile as C/C++ on the same hardware. The compiled code might be faster, though.


Don’t C/C++ scale linearly with more cores because you just compile different compilands on the cores? Wouldn’t that be the same for Swift? Note, I’ve never used Swift.


Nope it doesn't work that way for swift, stuff has to repeat on multiple cores per file within a single module, leading to being less compute efficient the more cores you add, unless you force everything into a single thread with WMO mode, but then you have manually manage your build graph very carefully to stay compute efficient. It could've changed since I've last looked, but there are still things like that are lurking around.

https://github.com/apple/swift/blob/main/docs/CompilerPerfor...


But the same also applies to C++. A boatload of my compile time is spent on expanding, parsing the same header files again and again in different compilation units. This can be solved by precompiled headers, yeah. But then again a boatload of the compile time is spent on instantiating the same templates in different compilation units again and again and again.

It's just that the fact that the language is cursed from the beginning gives an illusion of "linear scaling".


Not if extern templates are used, at least for the common types.


Swift has this weird thing where a two files next to each other can use the symbols from their neighbors without any import. I find this... disturbing.


I think that is fine if language also does directory = namespace


It does not in swift's case. You have to set the namespace/module/library manually with the set of files and they never really embraced public submodules that might've reduced this.


Jeremy here. Thank you for sharing my article on HN! I've used more programming languages than I can count since I started coding 40 years ago, and it's not often I've gotten really excited about a new language. So the fact I've written this should tell you how special I think Mojo is.

It's early days of course - Mojo is still a prerelease, which means it's only available in an online playground and it's not open source yet (although that's planned). But for such a small team to create something like this so quickly gives me a lot of confidence for what's coming over the next few months.

Let me know if you have any questions or comments.


It seems like Julia already had a pretty good solution to the two language problem as well as being designed from the ground up for numerical computation (Python has numpy, but it seems bolted on and clunky by comparison). Yes, there are issues with large executables (large runtime), non-optimal gpu kernels, etc, but it seems like many of Julia's nagging issues could have been fairly easily solved if it had received as much investment as some of the alternatives have received - Modular appears to have a lot of funding to develop Mojo, and frameworks like PyTorch and TensorFlow have the backing of mega tech corporations. At one point Swift for TensorFlow was going to be the next big ML language and was being funded and developed by Google.

I'm not sure there's a question here... more of a frustration that Julia seems to be a very good solution already (and is more mature than Mojo) and yet it seems to get passed over because companies decide to fund some new(er) shiny thing instead.

BTW: I think your matrix multiplication demo could achieve very similar performance (including the SIMD, parallelization and LoopVectorization) in Julia.


"get passed over because companies decide to fund some new(er) shiny thing instead" is not a fair or reasonable assessment. Why would a startup that's racing against time and money decide to create a new language when something already exists that can meet their needs with just a little tweaking?

My post embeds my recent keynote at JuliaCon in which I lay out some of the shortcomings of the language, as I see it.

MLIR didn't even exist when Julia was created. A language that can really take advantage of MLIR, and is designed from the ground up for modern accelerators, is not something that can be readily created by tweaking Julia AFAICT.


> Why would a startup that's racing against time and money decide to create a new language when something already exists that can meet their needs with just a little tweaking?

Have you MET programmers? The idea that we/they would do this is not anywhere close to surprising.


Exactly. Lattner himself has been involved in some of this - Swift for Tensorflow comes to mind. I'm old enough to remember when Swift for Tensorflow was going to be the new ML language hotness: https://www.youtube.com/watch?v=drSpCwDFwnM and https://www.youtube.com/watch?v=s65BigoMV_I


> MLIR didn't even exist when Julia was created. A language that can really take advantage of MLIR, and is designed from the ground up for modern accelerators, is not something that can be readily created by tweaking Julia AFAICT.

MLIR is another level of intermediate representation that sits at a higher level than LLVM-IR. Should the Julia folks decide that MLIR would offer advantages for optimization it wouldn't be difficult for them to add in an MLIR intermediate stage - that's kind of the whole point of an IR. But again, that would take some development effort - if the kind of investment that Mojo was getting were available for Julia it would get done. Nothing about the language itself would need to be impacted, it's the mid and backend where MLIR can offer advantages.


Disagree that Julia solves the two-language problem, although it was a major step in that direction. It may currently solve it for specific roles (e.g. data scientists), but it does not solve it for general-purpose programming.

One specific example? Robotics and autonomous systems frequently use ahead-of-time (AOT) compiled binaries and libraries. You don't want your quadrotor / RC car / self-driving Tesla trying to perform JIT compilation while operating in time-sensitive scenarios. AOT compilation is a major Achilles heel for Julia--it's possible (with major limitations), but feels more like an afterthought than a core feature.

I can't see Julia scaling to general-purpose programming until its AOT compiling capabilities improve. JAX has some of the same drawbacks. I'm very interested to see if Mojo can fill the gap of AOT compilation with a reasonably simple syntax. (And yes, I am aware of Nim--I do wish its scientific ecosystem was more developed though).


This is a very good point. You must support AOT compilation if you are going to have a general purpose system. This was clear when Julia came out and I interacted with core developers communicating our experience with creating SciPy and the critical reliance on the AOT features of Fortran/C/C++ as well as the bindings to Python. I believe simpler spellings can be achieved (i.e. more unification between scripting, dynamic JIT, and foundational AOT use-case), but the ecosystem is not close to a one-language to rule them all scenario.


I've always felt that the choice to 1-index and favor mathematics over engineering has kept the engineers away.

You can build a fantastic language for mathematicians, but people outside that domain won't feel welcome or compelled to use the language to scratch their itches.

It'll just sit there. For math folks.


I don’t think “mathematicians” is the right category. More like scientists/engineers who would otherwise use Fortran and Matlab (both have 1 based indexing) to do number crunching. That’s actually a lot of people though writing quite a bit of code. It’s just they just don’t work in the software industry, and usually for them the code is not the work product.

I think the aspiration was to provide something that a Matlab user can immediately pick up, that is almost as fast natively as Fortran, that also provides features to do good modern software engineering if you want to. And it succeeds amazingly at this.

In my own view, adoption has not been wider because 1) most people don’t need speed so the trade offs with the compiler + using a weird language aren’t worth it, and more importantly 2) outside of some incredible software gems, the package ecosystem tends to be flaky and maintained by overworked academics in their spare time.


> outside of some incredible software gems, the package ecosystem tends to be flaky and maintained by overworked academics in their spare time.

That was the point I was trying to make.

This language is too esoteric and leafy for most hackers / engineers / enthusiasts to spend their spare time building support packages. They'll pick up something like Nim or Rust as a new language before they'll look at Julia.

By catering to mathematicians, scientists, and engineers, the broader population of software folks were excluded. Or, if "excluded" is too harsh, they at least weren't incentivized.

Because of this, the best parsers, serializers, protocol implementations, scrapers, web servers, and other assortment of important technologies get written in other languages instead.


I think you are right about this -- I just wanted to draw the distinction between mathematicians and the much larger population of scientists/engineers, who really do write a lot of code which is in its way "serious computing" even if it doesn't deal with any of the concerns and practices of "good software engineering".

(I also think Julia would be okay with the current level of support for the "computer sciencey" technologies you describe like parser and web servers and so on -- it's pretty good at calling out to C or C++ if needed. But even many of the scientific/numerical packages don't have the maintenance resources they really need, so are often "good but flaky". This situation does seem to be slowly improving though.)


I've never understood why some people seem to care so much about 1- or 0-based indexing. Literally who cares? I cannot comprehend actually refusing to use an otherwise-perfectly-good language just because the indexes happen to start at 1.


People care about difference things. There are probably people who don't give a rat's behind about stuff you care passionately about or irks you in some significant way. Does that invalidate your caring?


Because it’s wrong. Combinatorics taught me that, and I’ll be damned before I ignore what the universe keeps screaming at me


"How do you xor two numbers in Julia?"

"a ⊻ b"

"... am I supposed to type (or copy paste) that symbol every time I want to xor two numbers? There must be a better way right?"

"Sorry, no."

I immediately gave up the language after this. Too bad, I really loved it and was looking forward to progress in DL stack in Julia at that time. Too math-y.


You can always just use `xor(a,b)`. Julia makes sure that there aren't any unicode operators that don't have an ascii equivalent (in Base at least). Also, most editors will allow you to type ⊻ as `\xor` and tab complete to ⊻.


So you gave up the language after someone gave you a flat out wrong answer to a single question? That's too bad.


> "... am I supposed to type (or copy paste) that symbol every time I want to xor two numbers? There must be a better way right?" > "Sorry, no."

This just false. From https://stackoverflow.com/a/60173810/2990344

help?> ⊻ "⊻" can be typed by \xor<tab>


La mayoría de humans ist gewohnt avoir mehr letters que clés de toute façon.


Julia has built in support for special characters like that by typing \charname<tab>. You have to remember the name but it's not all that hard to type.


If this is why you stopped using the language then you can't have spent more than an hour on it. \symbol<tab>


I assume you mean specifically software engineers?

Most traditional engineers are quite familiar with Matlab and Fortran.


Julia didn't really gain momentum because syntax is offputting . When we started our first ML project in 2013 , we tried R, Julia , Python , and even at python with rugged ML libaries , we choose it because none of us liked other language syntaxes , coming from C++/Javas background. Then we fully falled in love with python.


2013 was early days for Julia.


still have the same syntax.


Ergonomic wise what holds Julia back other than the deployment story is that they refuse to adopt nims universal call syntax approach f(a,b) = a.f(b)


You can use my fork of vscode julia extension for this https://github.com/xgdgsc/julia-vscode/releases/tag/v1.41.1_...


Do you mean D-lang universal call syntax approach?


Why is that better?


Familiarity and more importantly discoverability:

If you type a. in the REPL or IDE you should get a list of all functions applicable to the object


Familiarity changes rapidly in the world of software development. But I do agree about discoverability.

However, I think the issue is rather that we need to find new discoverability systems that work for multiple dispatch rather than limit our languages to single dispatch because we can't come up with better ways to achieve discoverability.

Maybe (a,b)f would be better syntax.. or we could start writing backwards ;) but now we're getting into seriously unfamiliar territory.

Multiple dispatch might actually be superior for discoverability when we get this right because you can filter the method list based on all arguments rather than just the first one.


Julia is not an OO language, and its functions dispatch on all function arguments, not just the first one.


I'm pretty sure Nim copied that from D language.

As they always said, imitation is the sincerest form of flattery.


julia's syntax is weird.


As if that poses any problem to R, C++, Perl,...


R -> dying, Perl -> dead, C++ -> also dying. Although C++'s existing size will make its death a long-drawn-out process.


i've been reading about the upcoming death of perl and C++ for 25+ years now.


All the programmers are slowly dying too, so maybe it's a good fit.


A matter of opinion.


python's syntax is whitespace-sensitive and doesn't have curly braces around blocks

they did finally add assignments in expressions and conditional expressions, but every time i show a list comprehension to a java programmer they glaze over

so i guess some weird syntax is okay


> python's syntax is whitespace-sensitive and doesn't have curly braces around blocks

That's a good thing.


Not if you want to try out a copy fragment by copying it from editor into REPL.


jupyter solves this, but also you can prefix it with

    if 1:
and remove all its internal blank lines and add a blank line after it


More tedious editing when I just want to quickly try out a short code fragment. Perfect solution! So pythonic and full of "zen".


i didn't say it was a bad thing

my comment makes no sense if you read it assuming i think it's a bad thing

what it is is a weird thing


i thought you were being ironic with your own unique english syntax

but then i checked all your comments and sure enough it is pythonic english

weird syntax is okay and some punctuation may be redundant but it sure makes life easier for the eye


> they did finally add assignments in expressions and conditional expressions

Consitional expressions were added in Python 2.5 in 2006.


yes, after fifteen years

assignment in expressions took even longer


How are python list comprehension in any way more weird than Java streams?


java streams don't exist in java syntax

we were talking about weird syntax, not weird semantics


List compréhensions exist in so many languages. Are we talking those "Java" programmers who only ever coded in Java?


who cares about what java programmers think lol whatever they like should be treated with extreme suspicion

> python's syntax is whitespace-sensitive and doesn't have curly braces around blocks

that's a plus. the proof is in the pudding. one language has exploded. others haven't. the end.


i didn't say it was a bad thing

my comment makes no sense if you read it assuming i think it's a bad thing

what it is is a weird thing


Hi Jeremy, thanks for sharing this, hope it works out well. Also, thanks for doing the past work on Dawnbench&etc that y'all did. Really helped push an era of speedy deep learning to the forefront and helped stoke the flames of my passion for this particular subfield of research. FastAI really put out a ton of good work that inspired and helped me a lot. I'm generally stuck on the linear version of OneCycle, but I've referred a lot to y'all and the community over the years when building out my toolbox.

In any case, you have more years under your belt than me in the software field, so I'm sure you have some of the standard skepticisms too, hopefully it works out really well. Will be keeping an eye on this software, it seems promising. Esp as I tend towards the more esoteric methods that tend to break the existing deep learning tooling out there.


I really appreciate you saying that! I'm glad it helped you.


Thanks, Jeremy. It does sound exciting. It reminds me a bit of what Blaze hoped to offer (a unified interface but to diverse data storage instead of processing systems) but never came to fruition. And folks have been talking about needing an ML/AI-centric language that could feel more natural and expressive than the abstractions that Tensorflow, PyTorch, and Jax provide. Maybe Mojo is it.

To get full performance though, you can't write just Python. As you show in your demo, you have to add verbose typedefs, structs, additional fn/defs, SIMD calls, vectorization hooks, loop-unrolling, autotune insertions, etc.

While great, this adds mental overhead and clutters an otherwise concise and elegant syntax. Do you think this syntactic molasses will become second nature to developers? Will IDE tools make writing it easier?


> To get full performance though, you can't write just Python. As you show in your demo, you have to add verbose typedefs, structs, additional fn/defs, SIMD calls, vectorization hooks, loop-unrolling, autotune insertions, etc.

As an engineer, it feels like you can never escape this. Speaking for myself, nor do I want to!

Zooming in & out of different syntaxes that are tuned to that local context seems to be the better developer experience.

That’s what I felt when I saw mixed jsx (html & javascript) or mixed swift & objc & cpp code. Though, tbh, sometimes mixing stuff in does come with baggage; that’s hard to jettison.

Regardless, as a software engineer who works with ml engineers, it is horrendously painful not having a unified systems language that helps build the model and deploy it into prod. Putting the ML scientists in charge of deploying to production by learning about SIMD calls, vectorization hooks, structs & typedefs is something I welcome.


Most of what you’re calling syntax isn’t syntax, it’s just extra code that allows you to be specific/concrete about how certain types and code should operate.

When it comes to types, once you’ve written the optimized type definition, you don’t have to think about it as much when actually using it. It doesn’t add extra clutter either.

As far as the other things, there isn’t really much extra to type other than being specific about what kind of optimization you want the code to use… vectorization, unrolling, etc.

Also I think it’s worth pointing out that AIs will be writing more and more code for us in the future. So assuming we’re writing much code at all in the future, it will probably be in as simple of a form as possible (like python) and then we can ask the AI to write whatever performance annotations it thinks will be effective.


I am curious about a few syntactic details:

Why is the & (reference) argument convention a suffix? That seems inconsistent with Python’s *args and **kwargs, and the keyword conventions.

What is the difference between the “owned” and “consuming” conventions?

And why is the ^ (consume) operator postfix? Python doesn’t have any postfix operators, and it seems awkward wrt the ^ infix xor operator.


I'd love it if you could ask technical questions on the discord channel, as that's more scalable for us. Some quick answers:

1) We're likely to change the reference syntax to inout: https://github.com/modularml/mojo/issues/7

2) they're the same thing. Consume is the word we're currently using for the operator, owned is the argument convention. We may need to iterate on terminology a bit more.

3) Because it composes properly with chained expressions: `x.foo().bar().baz^.do_thing()` vs something like `(move x.foo().bar().baz).do_thing()`


I worry a bit that postfix use of ^ is going to create some headaches with people expecting exponentiation.


can you link the discord channel, I've seen it mentioned several times but haven't been able to find it.

Edit: found it. https://discord.gg/modular


This is very exciting , I had seen you had mentioned about Numba and Cython but any comments on Pypy ? How about Interfacing Mojo LLVM+MLIR With PyPy? Would it make it more compatible with python ecosystem since 90% of python ecosystem works on PyPy. I afraid i am not making any sense , but this is what i had been waiting for past 13 years since i touched python 2.5.


[flagged]


The broader context of the target audience of this language is very important here — it’s oriented towards the familiarity and needs of a community most comfortable with python


The two biggest things that have happened in programming for decades are:

1: StackOverflow

2: ChatGPT

Nothing else comes close. Certainly not any new language.

WHY these two things are the biggest developments in programming is because they help developers solve problems and learn new things from a global accumulation of programmer knowledge.

If there was a Nobel Prize for computing then both StackOverflow and ChatGPT would be deserving of it, or the people behind them anyway.


My bad - I meant to write "Mojo may be the biggest programming language advance in decades" but forgot to write "language". I've updated the title now. (Although it doesn't update the HN title unfortunately.)


Only dang can update titles here.


This.

I am actually super interested and positive about Mojo but I can't take the title seriously. Not even close. As is, it is basically a yet another new (completely distinct language from Python) that looks and feels kind of like Python and hopes to convert a bunch of users, but is not really Python, and embeds CPython in the same way that Swift (4TF) could already do (and a lot of random software also do to use Python as their scripting engine.)


What is meant by "decades"?

GitHub?

The cloud?

GPUs? Multi core CPUs?

Subscription models?

SPAs? Ajax?

VC funding?

Mobile phones?

I don't think you appreciate what has happened over the last several decades. I'm sure I'm missing a ton of things as well. Stack, overflow and chat GPT probably wouldn't even make my top five list over the last 3 decades. If you look 3 decades out, I'm sure there will be some AI stuff in there, but it is not even close to making the list right now IMO.


ChatGPT was released 5 months ago, although it seems very important, I don't think we can fairly asses its legacy in this short time.


Statically typed functional programming is number 2 for me

And stackoverflow is number 1 :)


Seems a little flavor of the months. Why not Git/Github over ChatGPT considering its influence on ChatGPT itself?


Version control dates back to the 60s and early 70s. Not really the last decades, but literally the infancy of modern computing


Sure, and so did helpful websites are quite old as well, but nothing is quite as prolific and consolidated as stackoverflow and github. The metric seems to be about subjective relevance not originality.


if anything owes its success to being flavor of the month, it's git. mercurial was probably a better dvcs in the end, and svn wasn't all that bad before both of them. and as impactful as github has been, we had svn-based online open-source communities before git too. git/github was just there at the right time/place imo.


StackOverflow is just expertsexchange without the dark patterns.

An incremental improvement with an enormous impact.


And without the unfortunate way to misread the word boundaries in expertsexchange!


And Google is just Alta vista a bit different


3. Typescript

4. React

5. Rust


Which are yall disagreeing with?


I would put HackerNews in that list as well.


> 1. Mojo hasn’t actually achieved these things, and the snazzy demo hides disappointing real-life performance, or

> Neither of these things are true.

> The demo, in fact, was created in just a few days

Not to detract from this, I'd trust anything Chris does, but anyone who follows similar projects (Truffle Ruby comes to mind) knows that it's very easy to implement a tiny subset of a highly dynamic language and make it fast, it's insanely hard to scale it all the way up to the most dynamic of behaviors.

The lines above read to me as a bad sign that the author of the article isn't being very honest. It definitely is a snazzy demo hiding real life performance because one way to hide performance is to implement a very small subset and make it fast.


Indeed, you're totally right about that. That's the trick with Mojo, our goal is not to make dynamic python magically fast. Yes, we are quite a bit faster at dynamic code (because we have compiler instead of an interpreter) but that isn't by relying on a 'sufficiently smart' compiler to remove the dynamism, it is just because "compilers" instead of "interpreters".

The reason Mojo is way way faster than Python is because it give programmers control over static behavior and makes it super easy to adopt incrementally where it makes sense. The key payoff of this is that the compilation process is quite simple, there are no JITs required, you get predictable and controllable performance, and you still get dynamism where you ask for it.

Mojo doesn't aim to magically make Python code faster with no work (though it does accidentally do that a little bit), Mojo gives you control so you can care about performance where you want to. 80/20 rule and all that.

Mojo also has a more advanced ownership system than Rust or Swift, which is also pretty cool if you're into such things. Check out the programmer's manual here for more info: https://docs.modular.com/mojo/

-Chris Lattner @ Modular


Since it's important to get important things right in the beginning:

Please make Optional[T] as ergonomic as Swift and Kotlin with a simple '?'.

Please use a general Vector pipeline instead of manual SIMD yuckness. Autotune was neat for the demo but should be hidden inside the compiler as quickly as possible.

Please use 'import' to import python (and mojo dependencies with uppercase if necessary)


I'm a fan of optional and other algebraic datatypes, you don't need to convince me. It's in our roadmap doc! :-) OTOH, Swift has way too much special purpose sugar which I'd prefer to reduce this time around, there are ways to have the best of both worlds - ergonomic and extensible.


Does this mean Mojo will have something like Rust’s Try trait? :D


Remaining faithful to Python's syntax should certainly help avoid too much special purpose sugar. Most Python enthusiasts seem to be on low sugar diets.


Although that's true, TruffleRuby and GraalPy show that you can actually make highly dynamic languages extremely fast with a powerful enough JITC, which Graal/Truffle actually are. The hard part those projects are wrestling with is being 100% compatible with the entire ecosystem, warty native modules that abuse interpreter internals and all the rest. But they have the tech and manpower to do it.

So one of the key questions for Mojo is going to be this: how does it or will it compare to GraalPy as it matures, which should be able to run untyped Python at high speed, potentially even multi-threaded, with existing native extensions.

It can be that maybe the advantage is predictability, lack of warmup, that people in the AI space just prefer stuff made by a startup, that Mojo solves other pain points like packaging, all sorts of things. But it can also be that Python took off in ML exactly because it's untyped and people don't want to deal with the hassle of a type system if they don't need to.

The alternatives section of the blog post doesn't mention Graal/Truffle at all, suggesting that this is a bit of a blind spot for them. But these are well funded projects that have already yielded many-orders-of-magnitude speedups for existing code. It's not necessarily a good idea to write them off.


I love this part:

"The next step was to create a minimal Pythonic way to call MLIR directly. That wasn’t a big job at all, but it was all that was needed to then create all of Mojo on top of that – and work directly in Mojo for everything else. That meant that the Mojo devs were able to “dog-food” Mojo when writing Mojo, nearly from the very start. Any time they found something didn’t quite work great as they developed Mojo, they could add a needed feature to Mojo itself to make it easier for them to develop the next bit of Mojo!

This is very similar to Julia, which was developed on a minimal LISP-like core that provides the Julia language elements, which are then bound to basic LLVM operations. Nearly everything in Julia is built on top of that, using Julia itself."

I've always wanted to create a new language (probably visual) where the implementation of that language is open to be extended and changed right in the same context as where you use that language. A readily-accessible trap down to the next lower level of abstraction. I suspect this leads to more "expressivity" while being less code overall.

Sounds like Julia (as mentioned) as proven this is true and perhaps Mojo too.


> I've always wanted to create a new language (probably visual) where the implementation of that language is open to be extended and changed right in the same context as where you use that language. A readily-accessible trap down to the next lower level of abstraction. I suspect this leads to more "expressivity" while being less code overall.

It sounds like you would find Self [1] interesting (along with the visual programming environment).

[1] https://en.wikipedia.org/wiki/Self_(programming_language)


Yes, Self is very cool--I've read about it before.


I wish the article would also mention Nim, a Python-like language that compiles to native, and compare it to Mojo.

Also, this isn't quite correct:

> There is also the approach taken by Go, which isn’t able to generate small applications like C, but instead incorporates a “runtime” into each packaged application. This approach is a compromise between Python and C, still requiring tens of megabytes for a binary, but providing for easier deployment than Python.

Compiling Go with gccgo instead of the Go compiler, and with the right flags, results in a much smaller executable.

It is also possible to further compress the executable with upx.

This command should result in executables around 0.5 to 1.5 MiB in size, on Linux (depending on the size of the Go project):

    go build -mod=vendor -gccgoflags '-Os -s' -o prog && upx --best --lzma prog


> I wish the article would also mention Nim, a Python-like language that compiles to native, and compare it to Mojo.

I would not say that Nim is “a Python-like language”. The only significant similarity is in the usage of indentation instead of curly brackets. To me, it is much more similar to Pascal/Ada/Oberon, but less verbose.


I have not programmed much in Nim, and I think you are mostly right, but Nim also have keywords and concepts not found in Pascal/Ada/Oberon that clearly comes from Python, like "yield" and iterators:

    iterator iota(n: int): int =
      for i in 0..<n: yield i


I think the 1977 Icon language is where the generator concept originated. It had many other cool features I have not seen in other languages since: https://en.wikipedia.org/wiki/Icon_(programming_language)#


It's a superset of python with a faster runtime. There exist dialects for other languages that seem analogous.

The article introduces it with a video instead of just getting to the point and showing code.

Yeah idk. Looks cool I guess.


The video is a code walkthru. The code is here:

https://docs.modular.com/mojo/notebooks/


Nice article. However you mentioned in the other post that you are an advisor to Modular, so maybe it should be regarded an advertisement.


I didn't get paid to write this article, and no-one at Modular asked me to write it, and I don't have any equity in Modular so there's no financial upside in me writing it.

I agreed to become an advisor to Modular after I saw what they're working on and thought it looked great.


Still a massive conflict of interest that should be declared.


He does. It says in the sidebar "I’m an advisor to Modular"


It’s declared in a footnote.


The title is obviously too hyperbolic, as the article itself doesn't even try to back the claim. It's just a good introduction piece. (The official doc is a little bit too scattered to read through.)

Although I like the basic concept, it feels awkward that the engine is bound to the language, which by itself is also bound to a specific domain. Unless Mojo becomes fully compatible with CPython, the language will be just another DSL, which won't get adopted widely. Worse, MLOps is already a huge mess, so I can't imagine migrating to a language with limited usages.


It's interesting to read but I'm not very convinced by the added value compared to using Cython, building modules in fortran or C, or using interop with rust or nim.

Especially I think the need to manage memory manually and not be able to use a GC (or some other automatic memory managment) is a mistake. While I understand that you may need to manage the memory manually to squeeze the maximum performances in some cases, the performance degradation shown in benchmark by D, nim, Julia or C# shows that you can have a very good compromise between speed and easyness with autmatic memory management


Lets see if this doesn't turn out to be yet another "Swift for Tensorflow", remember when fast.ai used to say the same about it?


I agree that Jeremy was excited about Swift being a possible option for solving the "two languages" problem, but to be fair there weren't many options for ML programmers at the time.

Also, I think "Swift for Tensorflow" was doomed from the start. Swift doesn't really exist outside of Apple's ecosystem, and it was competing with Python which is everywhere, and was a top 5 language even then.

Making a superset of Python is a winning strategy because people can try it out and walk away if they don't like it without much effort. It worked for C-with-classes (which became C++) and Typescript, even though it took a little while for them to gain popularity and mass adoption.


I understand that there might be some benefit in developing Mojo in a small group. But I really wish there were some source code to compile, or at least binaries to execute. Currently it is locked behind a wait-list that might give access to executing code on a remote server... But if you make these big claims about hardware acceleration and performance, I really wish I could verify those claims on my own hardware.

Really cool project, looking up to tinkering with some actual releases soon. I know the names behind this project are highly credible, but I'm reluctant to invest much brainpower into something that might be locked up to cloud usage.


We're not in a rush here, we'd rather do things right than do things fast. Mojo is currently one day old :-)

We are deliberately rolling things out in a slow way to learn as we go. We're also eager for it to go far and wide, but think it makes sense to be a bit deliberate about that. When we launched Swift 1.0 it was more like a 0.5 and that started things off on the wrong foot. Mojo is still early in development, not ready for widespread production use.

-Chris Lattner @ Modular


What’s the dependency story with Mojo, especially considering Python’s less than stellar dependency situation?


How does the MLIR that Mojo generates differ from https://openxla.github.io/iree/

Or does Mojo sit on top of iree?

Will the inference engine support multi GPU setups?


Anyone know whether the plans to open-source the language are the "in a few weeks" type of plan or the "maybe never" type of plan?


The faq has similar language to the original open source plan for Swift. That was eventually open sourced. So the track record is certainly reasonable.



This title is working overtime


It appears to be just another language. I don't see what's different about it. Anyway, it's not the languages themselves that make them big, it's the ecosystem around it that makes it big.


Isn’t that kind of the whole thing? It already has the whole python ecosystem available to it.


Well, we don't really know. The keynote claims it is a superset, but elsewhere they say they will strive to be a superset. So how much of a superset are they? Are there huge swaths missing? Really hard to tell at this stage from a tech demo. According to this post they've just been working on this for a short time, so a lot of their statements are defacto aspirational. Programming language lifetimes are measured in decades. They've got a long way to go.


Ok but this is from the guy that created LLVM, Clang, and Swift… This is the real deal, he has no reason to exaggerate or lie.


> but this is from the guy that created LLVM, Clang, and Swift… This is the real deal,

Remember Ceylon? The language, which was supposed to be a better Java plus interoperable with Java? That was created by the guy who had created Hibernate and then Seam framework. Where is Ceylon now?

People have different reasons and opinions for starting yet another language. Just because they have created something popular previously doesn't mean this is the next big thing.


nope I don’t remember Ceylon or any of the other frameworks you mentioned… and I highly doubt they are of the calibre of LLVM haha


My only point is that this is a show, don’t tell kind of thing. We’ve been told, but we have yet to be shown.


So is this more like Matlab or more like Python, in the legal and organizational ways? I don't know the right words for it so I'm not going to say "free" or "open source" but maybe you know what I mean.


"Maybe it’s better to say Mojo is Python++"

I think this is a great analogy that will get more developers to make the connection. I've read all the "why" and all but this phrasing really sinks in for me.


Jeremy has a lot of cache with me with his amazing work in education and his excitement on this is a great sign in Mojo's favor. Solving the two language problem would be pretty amazing.


I really hate python's lack of curly braces.

It makes code so much more unreadable, copy+paste error prone, etc.

Would be nice if this new language made them optional so new code could could be written with them.


I prefer "One way to format everything", it just cuts so much needless whining if everyone just uses same formatter with little to no settings.

... and having curly bracers instead of the whitespace nonsense. Using invisible characters as flow control was never good idea.


Is Chris the modern Dennis Ritchie? Thank you for your work Chris, insane track record!


Mojo looks extremely promising. Lattner has some hits and misses but in general I find myself agreeing with him on principles. A brief skim of MLIR is interesting as well and I can understand it's goals and how it fits into the LLVM ecosystem. Python is is a pleasant language to use and it has a massive and growing ecosystem. This combination of a stellar tech leader, a clear defined technical goal built within and on top of an existing powerful compiler platform/framework, a popular and beautiful language foundation including a huge library of existing performant code, and impressive performance claims - this all adds up to something worth paying close attention to.

Some random thoughts:

After watching tutorials on PyTorch I've come to the conclusion that new programming languages must have syntax built in for dealing with tensors/matrices. The flexible array index syntax in Python allows for some nice ergonomics for this but I have a feeling that we are just scratching the surface.

I understand that they don't have a public release nor a license - but I would like some more clarity on this before becoming more invested. Just saying "open source sometime soon" is a bit vague.

I worry about the eventual divergence these kinds of projects make from their source. Inevitably there will be a day when Python moves in a new direction that won't be compatible with Mojo. Will they remain tethered to Python on that day or will they feel sufficiently independent to go it alone?

I was very excited by Swift as a language. I was excited when Apple open sourced it. I was excited when Google was using it for TensorFlow. But it feels stalled out somehow. I don't understand the dynamics of language growth and adoption well enough to make predictions. If I were to start a new ML company today I don't know how comfortable I would be choosing Mojo over Python. Just having the current backing of a business will not guarantee success. Will this language offer enough to attract Python, C++, Rust, Julia, etc. devs over? They talk about deployment story ... but there is competition there as well in things like WASM. There are more ways to fail than there are to succeed - and I wish something like this would succeed but reality is so fickle.


When a new language project like this drops I always feel like a kid on Christmas morning (or whatever your favorite gift-giving holiday is). I've wanted to see a general purpose language build on top of MLIR since it came out and I think this is a great way to do it, since Python is the main client for advances in ML accelerators. Treating the CPU as a target for acceleration is brilliant to. I deeply hope that this language will give general purpose programming languages a huge step forward in taking advantage of our current heterogenous-compute world. Bravo!


"Python but with better performance" is a faster horse, not a car.


I think static compilation like C++ or rust is important to overcome the two language problem. That will be the biggest advantage over Julia. Unclear if Mojo can do this or if it is planned.


It can do this.


Thanks for clarifying


They are promoting tightly packed structs as fast. For high-performance language I would expect native struct-of-arrays support because structs are bad for CPU cache and SIMD.


This statement isn't borne out by the article. The biggest advance in language design has been gradient typing, the biggest advance in tooling is LLMs.

This is a reworking of Python.


So if the big trick is using optional strong typing, couldn't CPython just do the same? And syntax for type hints is stable enough now, that there would be only little changes necessary on the front. It will likely not give the same level of speedup as Mojos new runtime delivers, but could there be at least some significant boost possible which would make in impact?


I guess JIT type specialization / "monomorphization" might be one way to do it, but I expect that it would be bottlenecked by all the surrounding CPython interpreter stuff.

/amateur_two_cents


I was thinking about this idea the other day - python++, but with a diverse standard library for everything. I should never have to look up a function that turns a textual IP address into a form I can work with. I should not have to really even do a normal for loop to transform data. If it's something everyone would have to write a function for, why not pre make it.


Seems like a brilliant approach. Every “better Python” language has the seemingly insurmountable bootstrapping problem of getting a critical number of libraries to get equivalently productive. Definitely going to be keeping an eye on this. If for no other reason to checkout the deployment story which is still painful after all of these years.


To get around Python's limitations and complexity of C++, here is a Rust based DL framework we are developing, called Burn: https://github.com/burn-rs/burn. The super cool thing about Burn is that you can swap backends easily.


This is OT, but what does your username mean? I found this: https://en.wikipedia.org/wiki/Antimora

Are you named after a species of cod?


I'm pretty passionate about extremely fast deep learning tools. I like the idea, but as it goes with developers who've been burned before, I'm pretty skeptical and will probably wait and see if this ends up surviving the initial phase of testing and trying things out. I know the xkcd standards comic isn't blindly applicable to everything but still, it's not a bad prior.

Peeps who know me know that I really have skin in the game too when it comes to neural network training speed and simplicity, it's something I'm quite passionate about. So something like this, if it actually works like it says it does on the tin and is usable, would be something that would be extremely useful. There's one or two kernels I'd really like to write to bump down speed records even further, and doing so directly in the language feels pretty promising to me. I really like the idea of that. And the promise of mmap-like operations too. We have pointers already in torch, they're just ad-hoc as index values, and still somehow unprotected because you have to reset your entire runtime when CUDA borks out on an out of bounds error (that only triggers the next time you try to access data.... :'( ).

It would be nice if this was the real deal. I think time will tell. I hope it is.


How does Mojo deal with memory management? Does it have the same memory safety issues as C and C++?


Mojo has a full ownership system that learned a lot from Rust and Swift and took the next step. Some details here if you're interested: https://docs.modular.com/mojo/programming-manual.html#argume...


At the moment, you can handle memory manually with alloc/free (and there's stuff there to help with RAII), or you can use Python ref counting. Borrow-checking is currently under development, as discussed here: https://docs.modular.com/mojo/roadmap.html#ownership-and-lif...


Another good news for a python wanna-be newbie like me, another reason to choose Python, then later/future I can choose Mojo for performance related, ease to upgrade since its Python superset.


Looking forward to running flask on mojo some day. Would be neat to see it with the structs instead of classes, and fn's instead of def's.


Honestly without a UI framework programming languages are kinda useless. Ecosystem is key to day to day usefulness compile time is secondary to that.


Why do you want to write UIs with every programming language? I'm curious since, not every language is suited for it or not the use case at all.


Is MLIR a “huge success?”

Is it involved at all in the Python ->PyTorch -> nvidia gpu path that dominates ml today?


This looks potentially really cool. Thanks Jeremy for the demo and Chris (et al?) for creating it.


Those who don't understand Objective-C are doomed to reinvent it. Badly.

Not that Objective-C is perfect, far from it. But it had the right approach, and already does the things that are claimed as the huge innovation here "out of the box".

Already did them in 1984.

And without the millions of lines of compiler code and ridiculous compile times, bordering on the absurd.


> Those who don't understand Objective-C are doomed to reinvent it. Badly.

I feel Chris Lattner can make a reasonable claim as to understanding Objective-C.


You would think so, yes, and given the circumstantial evidence, your feeling is a perfectly excusable mistake to make.

Disclaimer: I worked with Chris at Apple and have the utmost respect for him both as a person and for his abilities as both an engineer and manager.

And he wasn't the Apple engineering manager in charge of Objective-C who started a discussion we were going to have with the words: "Objective-C is a statically typesafe language and this is unarguable". Now one half of that sentence is true. The other is laughably wrong. I have a hard time thinking of a language that is less statically typesafe than Objective-C. We didn't have the discussion, because continuing the conversation after that tiny bombshell is pretty pointless.

A lot of people at Apple did not understand Objective-C, many thought of Objective-C as at best a maladjusted Java. And it is. That is, if you make the wrong assumption that it is trying to be some sort of Java in the first place, or you misuse it that way. But it's not. It's a 1984 solution to the two-language problem. The component language being C and the composition language being Smalltalk. Jam them into one language and you've solved most of the boundary problems.

Now you've obviously also created a bunch of problems, but you solved that one.

When Swift came out, everything about it reeked of "we don't understand Objective-C", including the ridiculously wrong tag-line "Objective-C without the C". It's a better C++ that basically ignores the Smalltalk part in the hope that the larger-scale composition problems that the composition language tries to solve will just go away.

And of course Smalltalk also wasn't really a good enough composition language either, because it's still essentially procedural. We need more flexibility in our composition languages [1]. And the composition language being so close to the component language probably led people to make those mistakes of treating them as a a maladjusted unitary language, using the composition part for component programming etc.

Anyway, coming back to the understanding part: I've followed the Swift creators wherever they've talked or written about the language, the design rationales etc. And over time, what was only a sneaking suspicion became a grim certainty: they did not understand Objective-C, and did not even believe that understanding it was necessary.

This did come as a bit of a shock to me, because despite that sneaking suspicion, it just didn't seem possible.

[1] https://news.ycombinator.com/item?id=35318258 (linking to HN discussion, my domain is currently in transition)


Agreed. Swift has been such a disappointment in every department.

If you took Objective-C, modernized the syntax (removing the old bracket syntax that scares new people away from it), and added some more functionality where it is missing (eg. strings), you'd end up with a much much better language than the kludge that Swift is.


  > mistakes of treating them as a a maladjusted unitary language, using the composition part for component programming etc.
i have a similar feeling but its hard for me to put my finger on a concrete example... any good ones?


It seems to me that right now is an especially awkward time to launch a new programming language competing with Python, because LLMs are great at writing Python, and presumably can't write your new language at all (since there's no data for it yet). The linked post does not seem to address this.


Mojo consumes python so it can build on that.

However that assumes that LLMs are a significant part of new written code (unlikely), or a significant learning source (maybe) and can’t be adapted to mojo (they probably will be)


You'd think this, but ChatGPT is able to write code in my programming language after I feed it just a few examples. It's not perfect, but it knows enough about adjacent language that it can actually write pretty good code in my language. The problem is you have to keep reminding it about the language fundamentals, or it will start making things up as your prompts get further and further from the first examples.


I can't but respond that this seems like the very least compelling reason not to innovate on computer language design & implementation.


Python Type notations are already part of the learning corpus and can probably be generalized to Mojo. many specifics are probably almost one short learning. LLMs can easily adopt new styles once the general model is 'understood'


Who's working on a Mojo equivalent for Javascript?


unrelated to the contents of the article, but does anyone else struggle reading this light-grey text on white background?


Oh yes you're right - I've just pushed a change to make it darker.


Meanwhile, in the Erlang universe ... ;-)


what are the benefits of Mojo over Apache TVM and OctoML?


This feels like a promotion/add to me, will never use it.


This heuristic that ads / promotions are bad serves us well the majority of the time. In this case - yes it is a promotion (i.e. it's a blog post), but I recommend you watch the video, it seems pretty cool!

Sometimes people release good products and somehow they have to communicate to us and promote it.

The issue isn't the ads it's the signal to noise ratio and filter mechanisms at our disposal.


Yet another (over hyped) llvm programming language. Remember when JavaScript framework fatigue was a thing?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: