Hacker News new | past | comments | ask | show | jobs | submit login
How fast does Java compile? (mill-build.org)
94 points by jatcwang 12 days ago | hide | past | favorite | 71 comments





We slow our build down by having a dozen 'tools'-projects with a handful of classes each in a straight dependency-line above our main project that contains 99% of the code. This easily increases our compile time from a few seconds to several minutes. With a bonus chance to cause an infinite IDE workspace build loop.

> Compiling 32,000 lines of code per second is not bad

https://prog21.dadgum.com/47.html: “By the mid-1990s, Borland was citing build times of hundreds of thousands of lines of source per minute”

I know it’s a different language, but I don’t think Java is significantly harder to parse than Pascal, bytecode doesn’t have to be heavily optimised (it gets heavily morphed at runtime) and computers are a lot faster than in the 1990s.

Also, recently (2020) https://news.ycombinator.com/item?id=24735366 said: “Delphi 2 can compile large .pas files at 1.2M lines per second.”

Or am I mistaken in the idea that Java isn’t hard to parse? If so, why is it hard to parse? Annotations? Larger programs with lots of dependencies?


Hundreds of thousands of lines per minute isn't the same as 32,000 LOC per second.

Delphi did some things that made it unusually fast to parse, like being single pass (meant you could not arrange your code as you saw fit as backreferences didn't work). Also, javac suffers from being JIT compiled so a lot of CPU is wasted each time it's invoked unless you use daemons like Gradle does.

But also, the Delphi compiler was IIRC at least partly written in assembly not Delphi itself.

You could make javac much faster just by compiling it with GraalVM but then you'd lose the ability to load plugins like annotation processors. Delphi's compiler wasn't pluggable in any way (at that time?).


> meant you could not arrange your code as you saw fit as backreferences didn't work

The actual consequence is that you had to declare things at the beginning of the block. It handled forward declarations just fine. This had minimal impacts on actually "arranging your code."


No, it also didn't handle circular dependencies between interface blocks. Big pain.

I never had that be a problem in practice because the interface was separate from implementation. It was exceptionally easy to sort that rare corner case out.

> You could make javac much faster just by compiling it with GraalVM

Would it be faster? To start up, sure, but I'd imagine the compiler to rate quite well on the scale of how much it benefits from the dynamic runtime optimizations that are only possible with jit compilation.


GraalVM AOT can utilize profiling data from training runs to give the same speedups. It also does ML driven inference of profiles.

> Hundreds of thousands of lines per minute isn't the same as 32,000 LOC per second.

Right, this is an example of the Java speed being terrible because Borland was almost as good thirty years ago. 32,000 lines per second is 100,000 lines per three seconds, 2 million lines per minute. Compare the statistic that wasn't thirty years old:

> “Delphi 2 can compile large .pas files at 1.2M lines per second.


But 100s of thousands of lines per minute on 1990s hardware is orders of magnitude faster than 32k lines per second 30 years later.

The very next sentence is

> But it is nowhere near how fast the Java compiler can run.

And then it explains why that initial build was compiling only 32k lines per second.


That's a nice writeup. Given the hot/cold difference you observe, I'd be very curious to see how builds compare when using maven daemon or gradle daemon.

I am not very familiar with those myself, simply aware of them.

[1] https://github.com/apache/maven-mvnd

[2] https://docs.gradle.org/current/userguide/gradle_daemon.html


The benchmarks in the article do use the Gradle daemon, but not the Maven one, mostly because mvnd isn't the default.

For Gradle, if you turn off the gradle daemon, it gets even slower than the numbers presented going from 4+ seconds to 10+ seconds per compile:

    lihaoyi mockito$ git diff
    diff --git a/gradle.properties b/gradle.properties
    index 377b887db..3336085e7 100644
    --- a/gradle.properties
    +++ b/gradle.properties
    @@ -1,4 +1,4 @@
    -org.gradle.daemon=true
    +org.gradle.daemon=false
     org.gradle.parallel=true
     org.gradle.caching=true
     org.gradle.jvmargs=-Xmx2048m -Dfile.encoding=UTF-8 \

    lihaoyi mockito$ ./gradlew clean; time ./gradlew :classes --no-build-cache
    10.446
    10.230
    10.268

Our relatively complex multi module Gradle takes 30s just to configure. That rarely happens though as things are usually coming from a cache.

Oh wow. That's... something. :)

Author here, hope people find this article interesting!

"From this study we can see the paradox: the Java compiler is blazing fast, while Java build tools are dreadfully slow."

Hi, nice article! I wholeheartedly agree with the conclusion after 10 years of fighting with maven for performance gains (that I always measured in minutes not seconds).

Slow feedback cycle is the root of all evil.


I'd love to see Bazel added to the comparison.

Interesting read. Just one remark to the "Keeping the JVM Hot" Benchmark: Running the same Compilation Task on exactly the same input over and over again gives the Just in Time compiler of your Java runtime engine the chance to optimize for this specific task. This somehow spoils the idea of this benchmark because in a real world situation the source code changes between each compilation process.

Yeah but only some lines will change , it will probably still be useful. That’s why there is gradle daemon etc

Use Maven profiles:

* compile only

* compile/test only

* compile/install jar only, skip source/javadoc packages

* checkstyle only

* static analysis only

* code coverage only

* Skip PGP (you DO check your artifact signatures, right?)

The beauty of this is you can create a corporate super pom that defines all of these for you and they can be inherited by every project in your org.

Finally, if you have a large multi-module project, run with -T2C to parallel-ize module builds. If your tests are written in a sane/safe manner: -DuseUnlimitedThreads=true -Dparallel=classes -DforkCount=2C will also give you a major boost.

We have a giant code base (500,000 SLOC) with 28 maven modules, and a full compile and install takes less 40 seconds with compile/install only. You often don't need to do a full build, but even with tests thats about 3 mins on a M3 Max.


I don't think anyone checks their artifact signatures. I've changed signatures several times over the past few years (every time I get a new laptop I forget to copy over my keys) and haven't heard a peep from anyone using my open source libraries who actually noticed the change

We notice and costs several hours per month to handle. A malicious party can slip through, single contributor software is classified as a high risk but XZ utils were not classified as such so we would have missed the Jia Tan incident.

Alas, I'm stuck with Gradle. And not a simple example of the species, either.

I think thats the major issue I have with Gradle in large orgs: it doesn't lend itself to reuse, but I don't blame it either: it's not really meant to do that.

I'm dating myself, but 20 years ago, we had Ant. It was very much a blunt hammer like CMake. Maven came along and was like "how about we choose a definitive way to organize projects, make that repeatable". You lose flexibility but gain speed/reuse.

I see Gradle more of a replacement for Ant than Maven in this regard. Infinite flexibility, at the cost of speed/reuse.


>it doesn't lend itself to reuse, but I don't blame it either: it's not really meant to do that.

Only if you keep writing more logic into build.gradle.kts.

Writing custom Gradle plugins has been the standard and recommended way to configure your projects for years now. Aside from a few footguns (lol extensions not being resolved at configuration time and needing providers everywhere), it allows you to mostly skip configuration steps.


> You lose flexibility but gain speed/reuse.

Its actually: You lose major flexibility AND all your speed but gain reuse.


> Again, Maven doesn’t make it easy to show the classpath used to call javac ourselves.

Wait, I was under the impression that maven dependency plugin has a command for exactly that (dependency:build-classpath or something like that)?


Seems easy enough to do a quick fix for. Just add a option to maven that goes straight into compile and does nothing else. Would be good for local dev. However as we all know maven does much more than this and server side ci/cd process is what really slows your other team members down (if its slow)

Long classpaths with unnecessary direct dependencies (i.e. not runtime) cause surprising slowdowns due to lookup performance.

I forgot to mention, I wrote about it https://fzakaria.com/2024/11/08/jvm-boot-optimization-via-ja...

I'm kind of intrigued by Mill but I've fallen into the same trap I've observed in others. I'm over indexed in mental capacity in having wasted learning Bazel and it's equivalent systems.

The lift to another system has to be enough to surpass that loss.


With Clojure it's just a simple deps.edn file and the `clj` on the command line. JVM in a jiffy.

very good write up. This was really informative!!!

How fast does Golang compile? https://www.octobench.com/

Huh, so for anyone interested but did not have the time to do the comparison of the two links of OP and parent post:

* java (on a cold jvm): 18.000-32.000 line per second on a single core

* java (on a hot jvm): 102.000-115.000 line per second on a single core

* golang: 28.000 line per second on 12 core


Pretty impressive numbers, although it’s fair to say here that while Go comes with its own build toolset, pretty much no one uses plain javac to compile stuff. It’s almost always either Gradle or Maven and as the article hints they slow down builds a lot.

> From this study we can see the paradox: the Java compiler is blazing fast, while Java build tools are dreadfully slow. Something that should compile in a fraction of a second using a warm javac takes several seconds (15-16x longer) to compile using Maven or Gradle.

edit: typos


I'd be curious to see what the slowdown is, because my gut feeling is that it simply comes down to QoL stuff tools like maven do. If we keep it to compiling only, stuff like resolving dependencies from the pom.xml and such.

Outside a synthetic benchmark:

  * Gradle compiling 100,000 lines of Java at ~5,600 lines/s
  * Maven compiling 500,000 lines of Java at ~5,100 lines/s
  * Mill compiling at ~25,000 lines/s on both the above whole-project benchmarks
Again not to put Java down but have a proper discussion about compiler speeds. I'm not interested in "your" tool is faster than "my" tool, I want to understand compilation speeds of different programming languages and what impacts them. Java and Go have similiar execution speeds + similar simple type systems, no implicits etc, so they should be similar.

Of course beside the obvious, comparing compilation speeds on two totally different CPUs and machines. Do we compare compilers or machines?


Notably, the benchmarks I provide are all single-core. You can get more if you modularize the project and spread out the load over multiple cores, and some of the other pages linked from the OP go into more detail

Notably all the benchmarks you provide are synthetic and not achievable by developers. I more care about real benchmarks for real developers, that they can see in their daily working lifes.

Yes but reality is even worst maven projects take f*"* eternity like the post says, you have tens of atomyse package that make simple task that should take seconds take minutes, while the fast in go is not only the compilation time, but that 28000 is in real word case,at least it'd feels like this for experience.

What I love most is how people project.

I was just saying "How fast does Golang compile" because I'm interested in compilation speeds and CPU usage cross compilers (Rust, despite it's "slowness" seems to have the best CPU utilization of the compilers I've checked over the years).

I've been using Java from 1996 on for two decades.

A sidenote: The article is hard to read, it's not clear how much IO there is. It seems to use LOC as "all lines" including empty lines and comments, not "of code" (most tools today mean NCLOC when they say LOC). Also not sure why they chose Netty for the test with 500k lines and then only used a sub project with 50k lines.

"Compiling 116,000 lines of Java per second is very fast. That means we should expect a million-line Java codebase to compile in about 9 seconds, on a single thread."

The get the score from compiling 50k of lines without IO it seems and then extrapolate to 1m of lines - does that also fit into memory? No GC? And no IO? At least one would need to check if a file has been changed on disk? Or would the OS do that with a cache check?

"compile your code, build tools add a frankly absurd amount of overhead ranging from ~4x for Mill to 15-16x for Maven and Gradle!"

IO? Check for changed files?


All this is actually covered in the article

> These benchmarks were run ad-hoc on my laptop, an M1 10-core Macbook Pro, with OpenJDK Corretto 17.0.6. The numbers would differ on different Java versions, hardware, operating systems, and filesystems. Nevertheless, the overall trend is strong enough that you should be able to reproduce the results despite variations in the benchmarking environment.

> Again, to pick a somewhat easily-reproducible benchmark, we want a decently-sized module that’s relatively standalone within the project

> "typical" Java code should compile at ~100,000 lines/second on a single thread

> We explore the comparison between Gradle vs Mill or Maven vs Mill in more detail on their own dedicated pages


"All this is actually covered in the article"

Then answering none of my questions.

None of your "answers" you quote answers anything to me.

> Again, to pick a somewhat easily-reproducible benchmark, we want a decently-sized module that’s relatively standalone within the project

Why not take netty src? Why is a module more easily reproduceable? We chose a decently sized module because we want a decently sized module does not explain anything.

What about IO? Why lines with comments and empty lines? Or do you even? There is no mentioning in the text as far as I can see, "lines/second" implies this, but then you say "source lines per second", does this include empty lines? I think any compiler can compile a >1.000.000.000/second of empty lines in one file that is already paged into memory.


How fast does Rust compi..... I better not actually.

Don't worry, I wrote this article about Java but my day-to-day work is Scala. We're lucky if we can crack 4,000 lines-per-second-per-core on my pretty vanilla Scala projects x_x

I feel the pain, I've started a startup on Scala (and some years later successfully sold it), but compilation speed and 20 lines type constraint errors where my biggest problems (but I never forget that it enabled us to scale the company and sell it :-) - but if a programming language spawns consulting companies that only help with speeding up compile times, that is a bad sign.

Though when working on a 2M+ LOC Java project 15 years ago, we also spent a lot of money on very fast RAID hard drives in developer machines to speed up compilation. JRebel was a huge relief back then.


Why? I'm quite curious. There aren't any good benchmarks on the internet, especially ones comparing it to other languages. I'd love to actually see some real numbers showing how far off it is from Java or Go (I don't doubt it is slower than Java or Go, as it is a much more sophisticated language).

My anecdotal evidence suggests it is not as slow as many think. It compiles a 500k LoC (that's count of all code in dependencies) project in ~10s on my M2 Pro.


Subjectively, not baked by any science, the Linking step is the worst, even with mold.

About my only gripe with the language.

But it's getting better.


One thing I discovered while trying to improve my incremental Rust compilation time (https://blog.waleedkhan.name/rust-incremental-test-times/) was that there was a lot of time that seems to have been attributable to `cargo` itself (https://lobste.rs/s/ktyp2q/improving_incremental_test_times_...), rather than the compiler. It seems like a similar situation as reported in the article.

I thought java was tokenized, not compiled. That's why you need their runtime environment isn't it?

It is compiled to Java byte code. The JVM runs the byte code.

Tokenization [1] is the process of splitting text into components called tokens. It's usually the first stage of parsing.

[1]: https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization


Compiled means it's a direct executable in machine code. Java ain't that. Heck, if it needs a runtime with a garbage collector built into the runtime it's especially junk.

No, that is not what compiled means. A compiler is simply a program that translates from one language to another. Hence a compiled language is one that is translated into a different language prior to execution. Java is compiled twice; once to byte code and again to machine code by the JIT compiler. That is unless you compile it directly to machine code using, say, Graal Native Image.

You are welcome to your opinion on the utility of garbage collection but the widespread usage of GC suggests the industry as a whole does not share your view nor is such an unnaunced view supported by research (e.g. https://dl.acm.org/doi/abs/10.1145/1094811.1094836)


As having a masters in computer science I respectfully disagree with you. Would you consider a program that translates C code to FORTRAN a compiler? Nope, it's a translator. A compiler compiles a language into a direct executable. Just like Microsoft Windows has executables, .exe file, there's no runtime required to run those files.

Nobody calls them “translators”, that’s so silly. It’s a compiler, or transpiler (still a compiler).

Avoid using appeals to authority when discussing, it doesn’t make you look good. Even more-so when your authority is relatively mundane and you’re totally incorrect.


Transpilers are a subset of compilers, even if somewhat often "compiler" is used to refer to the ones producing executables.

And .exe-s still need the OS, dynamic libraries, and, perhaps most importantly, the CPU (which is pretty much a VM interpreter, modern CPUs having what's essentially a custom IR, peephole optimizations, aliasing analysis, JITting, loop unrolling, PGO, caching, ...), to run.


You should use your CS master skills to read the first line of the "Compiler" article in Wikipedia [1]

> In computing, a compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language).

And maybe also the second line. JVM bytecode is an object code as referenced there.

[1] https://en.wikipedia.org/wiki/Compiler


I would call this a transpiler, and it's definitely a kind of compiler. I typically use the word compiler to imply that I'm moving to a lower level representation, and a transpiler as one that does a horizonal translation.

There are expections to this general rule of thumb, Cross-compiling is a term used for a horizonal translation.

I also watched this video of a PhD project that compiles JavaScript from an input of Visual data and JavaScript. Which is probably one of the more interesting uses of compiler I've seen.

https://youtu.be/MQnVmEw6ISQ?si=NYCSuzi3IxG0md_k


Why would holding a MS make you an authoritative source on… anything?

Anyway, there’s no real agreement on the terms and everyone has an opinion. This is as useful as arguing “is a taco a sandwich”


I very much consider RATFOR a compiler.

javac does produce machine code. You just don't like the machine.

We used to have Java CPUs

https://en.m.wikipedia.org/wiki/Java_processor

The JVM is just to run the code on other systems, and is the most common way of running compiled Java code.


I think it means "compile it into bytecode", which can then be run by the Runtime.

So like most JIT languages nowadays?

That's pretty much the definition of tokenized. I know Oracle would like us to think different, but facts are facts

Since you seem passionate about precise definitions for technical terms, maybe you could point us to third-party definitions which match your understanding? Start with sources for what "compilation" and "tokenization" mean as you understand them?

You are thinking of the typical design of a BASIC system in the 80's. Keywords were stored as "tokens" (numbers) but expressions were usually stored as space-stripped strings. There would be an edit-time syntax check (combined with the tokenization) to check if the expressions were syntactical and only in places where they were allowed). Some designs did use tokens for the expressions too.

There was almost no syntax involved and essentially zero compilation work.

Java compilers do a lot more than that. They actually compile to a real (abstract) machine and they do real compiler things (proper parsing, maintenance of namespaces, type checks). They usually don't do any optimization because that will happen at run-time anyway -- but that doesn't make them "not compilers".


> That's pretty much the definition of tokenized

Oh boy. It’s rare to see such confidence mixed with such nonsense.


Java is compiled down to bytecode and there is a bunch that happens in the compiler to make things fast ahead of the JIT.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: