Hacker News new | past | comments | ask | show | jobs | submit login
Go vs. Crystal Performance (ptimofeev.com)
197 points by open-source-ux 10 months ago | hide | past | favorite | 160 comments

I am not an expert at this but I want to inquire:

- Everytime there is any kind of benchmark between 2 or more languages, there is always a caveat - "Not a fair comparison" or the algorithm isn't right.

What, then, is a good way to compare languages? After all, they are not apples-oranges comparison. They are a tool to get things done. It is like I am comparing 2 different brands of hammers. Sure the grip is different, the shape of the head is different, but if we are testing a particular aspect of nailing - say nails/sec, then it is worth investigating which one is better. One hammer takes a lot of expertise to use it and yet another one is easy to use. So, then, write those pros and cons down, it doesn't make the comparison a worthless activity which is the cliche response to benchmarks - developer time matters and not the language.

What is a good way to compare 2 languages. Say Python vs. C? Are there standard implementations of algorithms (e.g. Mandelbrot) or something like that we can use and definitively compare the speed? Shouldn't we have standards around these benchmarks? Some kind of a ISO implementation of an algorithm reviewed by experts that can be used for benchmarking?

For majority of programmers, the speed doesn't quite matter. But that discussion is off the table and orthogonal. Sure for most things, I personally pick up Python and it is the fastest way to develop for me. But there are many reasons why we should compare languages and these concerns are shouldn't stop us from evaluating objectively.

> What, then, is a good way to compare languages?

Digging into details and yielding very specific conclusions.

Let’s take the example benchmarks.

The first one, Fibonacci, can only be properly assessed by inspecting the assembly. It is an analysis of compiler output, not of the language, not even of compiler optimizations.

It therefore only measures the CPU speed of function calls, and one compiler outputs a different set of assembly instructions:

• Go (go tool objdump fib) is all like “CMPQ JBE(label(morestack)) … LEAL(-1,cx) MOVL(cx) CALL(fib) …×3 ADDL(-2,cx) MOVL(cx) CALL(fib) MOVL ADDL …×3 RET label(morestack) CALL(morestack) JMP”.

• Crystal (crystal build --emit=asm fib.cr) is all like “subl(1) …×6 callq(fib) …×1 subl(2) …×7 callq(fib) movl addl …×6 addq retq” for the recursive condition.

So the conclusion is that Go outputs less instructions, but has to maintain its segmented stacks (useful for goroutines, its lightweight threading system) on every call, while Crystal uses the standard assembly calling system with little overhead beyond light dynamic typechecks.

So there is a language-motivated difference: calls in Go are a tiny bit more expensive, but built-in lightweight threads consume little memory because their stack can increase dynamically according to use[0].

The second benchmark, HTTP servers, has everything to do with the implementation of the default library, and nothing to do with either the language or the compiler. And I presume Crystal is backed by an optimized C library for TCP[1] while Go is probably a full reimplementation of TCP and HTTP.

[0]: https://blog.cloudflare.com/how-stacks-are-handled-in-go/

[1]: https://github.com/crystal-lang/crystal/blob/master/src/sock...

> Go outputs less instructions, but has to maintain its segmented stacks

I think they dropped segmented stacks in 2014: https://en.wikipedia.org/wiki/Go_(programming_language)#Vers...

The stack can still grow, but contiguously. I'm not sure if there are any instructions required to monitor the stack size and grow if it needed.

> CMPQ JBE(label(morestack)) …

^^ It's still the preamble for most functions :)

> And I presume Crystal is backed by an optimized C library for TCP[1]

On what basis are you presuming that?

What we did (and continue to do) in the TechEmpower Framework Benchmarks [1] is publish a set of requirements for our test types, provide a small set of example implementations on mainstream frameworks, and then solicit input and contributions from the open source community. Fast forward to today, and we'll be at 5,000 pull requests processed soon, and receive many from framework maintainers or advocates. Other test implementations are maintained by broader fans of languages.

Obviously there are an unbounded set of ways people can test the boundaries or contribute implementations that aren't in the spirit of the project. We've had to deal with several contributions that required push back or removal. Since we run our benchmarks project in our spare time, we rely heavily on the community to help us keep an eye on implementations, and also help determine where to set boundaries when they require clarification. For example, is pipelining in the Postgres driver acceptable given we disallowed batch querying? We ultimately decided it was acceptable because it's leaving the ergonomics of application development unchanged.

I feel that overall, the project has been effective in large part because the community has done a good job of self-policing.

[1] https://www.techempower.com/benchmarks/

I think this kind of language comparison is usually done to prove a point rather than in a dispassionate search for real truth.

I've never seen a situation where a team genuinely looks at the available languages for their project and chooses one based purely on performance. Even projects that are very performance-sensitive usually have other criteria (like not being able to use a GC language).

In most situations, there are a number of competing criteria for choosing a language. Team familiarity, deployment environment, availability of necessary libraries, compatability with other code in the organisation, compilation time, making maximum use of available resources. All of these, and many more, are usually as important as raw speed.

For a "real" comparison, there would probably need to be a lot more code being executed, and specific tests for specific environments - standalone web server? microservice? CLI? desktop app? mobile app? All of them will have different needs and preferences for performance.

I don't believe in microbenchmarks like this anymore, I find more value in the typical rewrite experience reports (with the usual caveat of not placing any gains only on the language, but also on the fact that the domain problem is better known so even a rewrite in the same language would have provided some gains)

There are different projects that allow people to compare languages in real scenarios:

- Good old TodoMVC (http://todomvc.com/)

- HNPWA (https://hnpwa.com/), that compares PWAs for accessing the HN api

- RealWorld (https://github.com/gothinkster/realworld) that does the same but for both frontend and backend. The API between both is standardized, so you must be able to mix and match any frontend (including mobile) with any backend. The idea is to build a Medium clone, so it's much closer to a real use case than a todo list

More than the language, the framework and the architecture can play a role in speed. If I cared enough about relative performance, I'd build a client that adheres to the API spec of RealWorld and simulate a few thousand clients on a few thousands posts, all reading and writing posts and comments, and see how well they compare under load. That would be much more telling than comparing who can produce the nth Fibonacci number the fastest.

The problem is that solutions optimised for benchmarks are usually non-idiomatic. If the solution doesn't represent "real code", then is there any real value? An interesting benchmarking approach would be to take solutions to the same problem from many different programmers at different skill levels and gather metrics from them. I think that would give a much more interesting view into a language's performance characteristics.

> If the solution doesn't represent "real code", then is there any real value?

When "idiomatic" performs so poorly that the software is unfit for purpose, does that represent "real code" ?

> … solutions to the same problem from many different programmers at different skill levels…

How could we know programs weren't written to make some language look bad?

> When "idiomatic" performs so poorly that the software is unfit for purpose, does that represent "real code" ?

I'm confused about what argument you're making here. I assume you aren't suggesting that suboptimal performance on a benchmark implies that the software is unfit for purpose? Can you elaborate?

I'm suggesting that "idiomatic" code in real software can perform so poorly that the real software is unfit for purpose.

Hence "idiomatic" may represent not "real code" but "naive code".

Fair enough, but I don't see how that relates to the conversation. Is that intended to rebut the idea that it's useful to know about the performance of idiomatic software?

You don't seem to have said "it's useful to know about the performance of idiomatic software".

I’m not the one you were originally responding to but it seemed pretty clear that that’s what they were getting at.

It seems clear that they conflated "idiomatic" code with "real code" into a subjective dismissal.

Do you intend to provide some support for your idea that "it's useful to know about the performance of idiomatic software"?

> It seems clear that they conflated "idiomatic" code with "real code" into a subjective dismissal.

No, they're dismissing it as useful for answering the questions we tend to care about when evaluating the performance of various programming languages. One such question is "If I choose a given language, how far will I be able to go with just idiomatic code before I need to invest in expensive optimizations (most of the expense here is that we're spending time optimizing instead of iterating on features)?". "Real code" is a function of this concern, so measuring it doesn't help us answer this question (i.e., if a language performs poorly on this question, "real code" will be more optimized and less idiomatic). Further, microbenchmarks don't measure "real code" either they maximally optimized code, which is what the OP was complaining about.

> One such question…

Your question is sensibly answered by — It depends.

> If the solution doesn't represent "real code", then is there any real value?

When you're actually aiming for performance, in my experience code stops looking like "real code" pretty quickly in the hotspot, so benchmarks are representative in that way.

I mean, for example media codecs "written in C" are typically highly vectorised, architecture-specific ASM kernels with C around it for handling the flow.

Fair benchmarking is really hard. It is much better to compare languages on features, safety, programmer efficiency etc. But you can't boil that down to a single number.

The most fair benchmarks I've seen are to implement strictly numerical algorithms, but that's not very userful because you're basically testing a small piece of compiler/VM optimization at that point. It also leads to unidiomatic code because it's faster, e.g. "C in Java".

The most useful benchmarks I've seen are exercises where programmers of different languages try their best to create as fast a program in their language as possible. It allows you to see idiomatic v.s. fast code in that language. Downside is you're testing for a large part which contributors you've managed to catch.

Fair benchmarking is really hard! But I appreciate this style, because it shows two basic "real world" examples that you can extrapolate/estimate the results of. If Crystal is compiling to smaller binaries that run faster in a basic test without a lot of optimization and configuration tweaking, you can imagine that that relationship will hold true in larger codebases as well. Also very important is that the style, verbosity, and clarity of both examples showcase the differences in the languages on the programmer's side.

> * If Crystal is compiling to smaller binaries that run faster in a basic test without a lot of optimization and configuration tweaking, you can imagine that that relationship will hold true in larger codebases as well.*

Why would that be the case? I see no correlation. The two programs in the article were little more than glorified "hello, world" samples.

For all we know, Crystal might accumulate binary code cruft much quicker than Go as the project grows.

> After all, they are not apples-oranges comparison

And more: Them are the MOST useful comparisons. Compare 2 oranges is not as useful than compare an apple and orange. You have MORE useful differences and more insights on what make them similar.

In the case of speed:

I compare a bike and a car. Is unfair? yes. But, if the GOAL is how fast is go to the next city, is totally worth to know that the car is better.

However, if the goal is what is the most ENVIRONMENT FRIENDLY why to travel, the bike win.

Comparing 2 very different vehicles is far more useful.


What is important in benchmarks? To make clear WHAT is the goal to be measured. And not shy away to hammer if some choice become clearly superior.

> compare a bike and a car. Is unfair? yes. But, if the GOAL is how fast is go to the next city, is totally worth to know that the car is better.

The problem with these benchmarks is they have a propensity to not make a fair comparison. For example, sometimes they try to compare underperforming naive implementations in one language with expertly tuned implementations in another. Sometimes a blatantly underperforming implementation is used. Sometimes results are turned upside down by flipping compiler flags.

And sometimes photofinish results that fall within the standard deviation are sold as this absolute proof that language A outperforms language B.

Let's look at these benchmarks to see how these things are spun to sound like objective assertions are derived from meaningless data. Does it really matter if someone could use language A to shave off about 1MB from build artifacts? Are we really expected to take a 1.5s/5% difference in a cherry picked synthetic benchmark as meaningful? Is anyone in their right mind expected to drive their decisions on their tech stack on two cherry-picked tricks posted on a blog?

This is why no one takes these stunts seriously.

Never trust a benchmark you didn't falsify yourself.

The corollary is to write your own benchmark (i.e. something that remotely matches your use case).

Also it is almost always more interesting why something is slower/faster than something else than the raw numbers.

> For majority of programmers, the speed doesn't quite matter.

Sure. But if that code is going to be ran a lot, please try to keep the number of required nuclear power plants down.

Also don't forget about phone and laptop battery life either.

I think the problem is languages are very versatile and there's almost always several ways of doing things, so nailing down what constitutes a fair comparison is always contentious.

To use your analogy, I'd saying comparing two languages is more like comparing a woodworking workshop to a metal working workshop. There's a lot of similarities (they both have hammers) and there's a lot of overlap in functionality (you could make a little box in both workshops). But how do you make a sensible comparison between the two shops? Just saying 'you can build this little box slightly faster in the woodworking shop' doesn't really tell you much at all. They're tools that can be applied to a wide range of tasks, and unless you have a specific task you're interested in (e.g. I want to be able to process my log files as quickly as possible) it's hard to make any general statement about one set of tools being better or worse than another.

> What is a good way to compare 2 languages. Say Python vs. C?

The old 0install “language replacement” shootout is definitely the one I remember best.

It’s really outdated now (and oddly enough not always for the better) but:

* it evaluated multiple dimensions / criteria

* the criteria were very explicitly about relevance to the tester

* but their testing was quite objective

* and it really was quite extensive

I guess it depends on your use-case, but personally I'm interested in ballpark estimates of idiomatic code because that's sort of the performance space I occupy. I bucket pretty heftily (interpreted ruby, JITted Lua, Javascript, C) and that's enough. The specific details aren't important.

A more complete web benchmark might be one which connects to the database, does some queries, and renders the result. Then we'd look at what percentage of the time is spent in the runtime of each language, and apply Amdahl's law to see if switching is cost effective.

There isn't any easy answer.

The problem is that yes, they are an apples to oranges comparison. You may use them for the same problems, but different languages lead to different styles, different trade-offs, and even different ways of thinking.

As an extreme example, a slow language may make it easier to do some high-level optimization that you would never be able to do on a fast language, getting a much faster program. And we have no reliable way to measure or communicate that kind of thing.

Not exactly a "benchmark" but I think projects like ToDoMVC [1] - where the same functionality is implemented across different frameworks serves as an interesting and useful way to see the real differences.

This of course still misses how long it took to implement, how easily changes can be made, etc.

1 - http://todomvc.com/

> What, then, is a good way to compare languages?

With genuine interest and curiosity.

With the understanding that any conclusions will be provisional.

With incredible focus on the context that matters to you.


The point of most such caveats is that for any non-trivial benchmark, implementation details will inevitably be different enough that it's unlikely a single benchmark result will give you much information in general about how to compare the two languages. But the other thing I'd say is that speed of execution is only one of many, many factors to consider when choosing a programming language for a project, and it's very often not a very important one.

A golang expert isn't going to flip to crystal based off of one benchmark.. Humans don't work that way. The first reactions to a post like this always involves golang fans trying to reconstruct the evidence into a reality that is convenient and proportional to their time invested into golang.

The results do say something about golang and crystal. Without more data we cannot confirm anything, therefore the following possibility remains an open question:

Crystal may be generally faster and more efficient than golang for most use cases.

To not acknowledge this possibility or to fully embrace it as an absolute is a form of bias.

People do whatever they can to "win" on these benchmarks, so the resulting code is often not idiomatic

The recommendation is to write only idiomatic code that implements a certain algorithm without tricks: https://salsa.debian.org/benchmarksgame-team/benchmarksgame/...

Did you actually see programs that do not follow the guidelines or you're just stating this as what you would expect?


Templating Lua here is probably borderline really.

Seems like those Lua fasta programs should be removed.

What someone else considers to be "idiomatic", you might not consider to be "idiomatic".

Completely anecdotal:

- In my small benchmarks, Crystal is very fast - In benchmarks not my own, Crystal is very fast - Crystal has a very nice batteries included stdlib.

Benchmarks that aren't mine: http://lh3.github.io/2020/05/17/fast-high-level-programming-...

The only other language that is high level in its ballpark is D.

Benchmarking strings is often benchmarking memory allocation speed or the GC.

If you manipulate strings day in day out in your workflow, the first thing you do is remove all those allocations and for what is left you use a memory pool.

The speedup could easily go into the 20x in any language. It's actually one of the area where Python or JS can beat statically typed languages, they focused a lot into optimizing strings while statically typed languages often leave that as an exercise to the reader.

The article linked is doing much more than benchmarking string operations. I'd even argue that string allocations are about as minimal as possible with the algorithms used. The C version takes this a little farther, but each of the other implementations are pretty similar.

I don't disagree that benchmarks should be taken with a mound of salt though.

> It's actually one of the area where Python or JS can beat statically typed languages, they focused a lot into optimizing strings while statically typed languages often leave that as an exercise to the reader.

I think this has more to do with writing the string handling functions in heavily-optimized C rather than in Python/JS than it does with memory tricks, but I'll happily accept correction.

Yes low-level C + retaining memory around in the GC. The second part avoids many malloc/free that may be hidden in destructors in low-level languages.

Almost certainly. Iteration is around 40x slower in Python than in a compiled language, last I checked.

Sounds like another win for languages with an LLVM backend. Definitely excited to see this language grow, especially as it has generics already.

Does anybody use Crystal professionally here? Any thoughts so far if so?

We run it in production. Our apps are mostly Ruby but we have been rewriting services in Crystal. Originally we were attracted by the speed and type checks, but one surprising benefit has also been the reduced memory consumption. It's difficult to compare directly, but in some cases it cosumes 10x less memory and performs around 20-35x better. Even I/O bound services are sped up since we can take advantage of Crystal's concurrent fibers (rumored to come in Ruby 3.0).

The main downside has been the somewhat frequent deprecations of methods and changes to the standard lib. But it's mostly due to the preparation to launch 1.0.

Is there any web framework for it, like rails? That's really the main reason I use ruby is because as a single developer it's amazingly fast to set up new apps (admin panels, reporting panels, sales funnels, etc). The out-of-the-box functionality of rails makes this super painless. I'd LOVE to use something faster but I don't really want to have to custom write a lot of things (CSRF, sessions, cookies, blah blah).

Also, how do you handle jobs with Crystal? Sidekiq being the common one for ruby.

Hello! I'm the creator of the Lucky web framework https://luckyframework.org. I've been building it for about 3 years and we've got a number of people using it in production.

It still lacks some features found in bigger frameworks, but is nearing 1.0 and gaining many new contributors that are helping us fill in the gaps.

Feel free to hop on our chatroom to ask questions about it. We try to be super friendly and love answering questions and getting feedback https://gitter.im/luckyframework/Lobby

Awesome, thanks!! What would you say the top 2-3 missing things are currently?

Right now I'd say we need to make it easier to work with nested params so you can easily save Parent + (n) children. Easier handling of uploaded files is another big one. It's being actively worked on right now. There are a few more escape hatches that are needed when you need to break out of the framework. But overall it is fairly full featured.

You can check out our roadmap to 1.0 here: https://docs.google.com/document/d/1EYzx37Kq5h7iLH9SQTFyXNwb...

There are a couple but I think the most Rails-like are Amber or Lucky: https://amberframework.org/ https://luckyframework.org/

There's also sidekiq.cr from the same person who made Sidekiq for Ruby: https://github.com/mperham/sidekiq.cr

I just dont think Amber or Lucky's ORM are anywhere near as good / clear / concise as ActiveRecord.

But both are still very early stage of dev. Is Amber still being developed?

Edit: Looks like Amber had a new release a few days ago, and GitHub got a new design?

author of Lucky here. Would love to hear what you don't like about Lucky's ORM (Avram). It is not quite as feature complete as ActiveRecord, but in some ways it has more features (like being able to do more advanced queries in Crystal `>`, `ILIKE`, etc.

clear is also an ORM you could check out.


Kemal is Sinatra-like: https://kemalcr.com/

I really liked amber, very similar to rails. Lucky is a bit more different so a bigger learning curve coming from raila

Another downside is the long compile times.

I started my little company on it about 18 months ago. It's a single server running the main app and processing 7 figures of revenue.

My biggest gripes would be major API changes (like the JSON mapping), but that's expected with a young language and better in the long run.

Another problem is compilation time and compilation memory. waiting a couple of seconds every time i make a change during development sucks, but not a big deal. I've had my deployment server fail deployment a couple times because it ran out of memory when compiling.

A solution to compilation times is to split up your app when it gets huge, so that you have smaller apps to compile.

I don't use many external dependencies, not even for the http server. The stdlib is very good. I would probably be using OCaml instead if the stdlib were as modern and clean as Crystal.

Can you share the project?


Most of the code is on the admin side. I've basically been building something akin to shopify but for managing local orders and delivery logistics.

For what it's worth, I love the visual design of your site. It's stylish, distinct, straight-forward, and communicates purpose/process well.

The site looks cool. Pardon me if I sound doubtful, but I just wanted to clarify that you're making 7 figures from this? If so then my hat is off to you.

yeah, that's what stripe says :). We deliver a lot of cookies! I used to have our revenue public on IndieHackers but took it down after we hit $200k/mo.

Yes, we run several production services on it. Much like Go, it's built for the web, and it's standard library will handle most things (files, http, memory io, ffi to c, compression and image formats, etc... quite extensive.)

There's the likelihood that many future users will encounter it via a "from soup to nuts complete" web application framework like amber or lucky (ROR-like) that include scaffolding for models and migrations and such, or possibly with the sinatra/express'esque kemal framework that mostly concentrates on routing and exposing the request/response cycle, allowing you to inline middleware and pass the request context through your own stuff.

.ecr templating (syntactically similar to .erb templates) is built into the standard library and compiles to actual crystal code.

macros resolve to actual crystal code at compile time, so some of the ruby meta-programming requires a rethink.

While some folks eagerly await true parallelism, which I THINK was here already and then revoked temporarily, but I might be mistaken, I'm quite content to build distributed applications with Crystal communicating over standard protocols or directly via some kind of socket (sockets network and local also being part of standard library)

It's executables are fast, small, and don't use much ram (12megs is one anecdotal totally context-less number I'll throw at you, lol...)

I use crenv for controlling the version churn, as someone else pointed out, they are ramping up to v1.0, and they now claim to have largely finished breaking changes on 0.35.1 and are prepping for 1.0.0 already on the basis of where we are at now. I also tend to use local shards (specific versions) to ensure build reproducability.

Any specific questions?

I'll repaste my questions from another comment, thanks so much:

Is there any web framework for it, like rails? That's really the main reason I use ruby is because as a single developer it's amazingly fast to set up new apps (admin panels, reporting panels, sales funnels, etc). The out-of-the-box functionality of rails makes this super painless. I'd LOVE to use something faster but I don't really want to have to custom write a lot of things (CSRF, sessions, cookies, blah blah).

Also, how do you handle jobs with Crystal? Sidekiq being the common one for ruby.

Parent post calls out: amber & lucky.

> While some folks eagerly await true parallelism, which I THINK was here already and then revoked temporarily,

It is there, but not enabled by default. You need to add -Dpreview_mt to the command line to enable it.

I came across this article from one of the Crystal project sponsors which is Nikola Motor Company (NKLA) that uses Crystal for their dashboards on their electric trucks this year. https://manas.tech/blog/2020/02/11/nikola-motor-company/

LLVM is known for trading developer time for run time speed (code runs fast, compiler doesn't). So this is generally the expected results as any decent LLVM based language should beat Go considering Go makes the opposite trade off.

I'd be interested to see these same benchmarks with the Go LLVM frontend.


I'm hoping to put a side project into production soon, crystal the language is basically perfect, biggest issue is that I often have to write my own shards, but every shard I write is a shard someone else won't have to write and sometimes npm/gems can easily be converted. The performance and type system working in ecr templates gave me confidence that I could write a higher quality lowjs webapp with a better dev experience than my default stack, which was going to be an uphill battle getting typescript Frankenstein'd into vuejs with a backend in laravel or Frankensteining typescript into a nodejs library (I didnt find any typescript-first nodejs http webservers that I actually liked _and_ also didn't require infinite yakshaving just to get tsconfig, tests, vuejs SSR, etc. working).

Nim uses GCC and normal, idiomatic Nim competes very well with Crystal in performance and memory usage.

These are microbenchmarks; they don't tell you much of anything about real world performance.

Except the compilation time of course

This is a pretty basic benchmark, but there are more robust ones for Crystal and others here: https://www.techempower.com/benchmarks/

Under Filters you can select the languages / frameworks that are of interest to you. For example Go seems to have several implementations that beat Crystal: https://www.techempower.com/benchmarks/#section=data-r19&hw=...

Having programmed in both though, I far prefer the language / idea of Crystal for innovation. But it is hard to beat the performance and explicit nature of Go for production applications.

The non-recursive Go fibonacci implementation computes in ~160ms and uses only 1.5mb on my MacBookPro (and I can get the binary size down to 850K by stripping symbols and using the println builtin instead of fmt.Println). This microbenchmark seems to tell us more about function call overhead than it does about typical application performance.

Further, while I don't doubt that Crystal binary sizes are smaller in general, I don't think we can get much information about binary sizes from these micro-toy-sized programs. For example, the Go version shaved off a full mb by using the `println` builtin rather than `fmt.Println`, probably because the former pulls in more of the runtime than the latter. These toy benchmarks are very sensitive to these kinds of things, and the size of the runtime is going to be a trivial portion of the size of any real application anyway.

Benchmarks don't matter. It's 2020. We use Stock Price Driven Development now.

The main reason to use Swift is Apple created it.

The main reason to use Go is it's the first two letters of Google.

The main reason to use Python is Google hired GvR in the 2000s and created momentum.

The main reason to use Kotlin is Google announced at I/O 2017 its support in Android Studio.

Get with the times gramps.

The only time I've ever heard anyone assert this is in posts like this where you lambast everyone but yourself for being an idiot.

Wow. No.

Engineering school was/is supposed to help us (you) move beyond cynical judgements to thinking.

I could care less if a statement was positive or negative what is important is if the statement has a realistic possibility of being true.

If anything Engineering school should have taught you how to logically observe situations rather than viewing things through an optimistic or cynical lens.

the HTTP benchmark is not fair since the crystal implementation is setting the content type explicitly while the Go implementation is auto detecting it.

Cards on the table: I'd expect Crystal to outpace Go in general. Go is a "fast language" on the general landscape, but as compiled languages go, it's on the slower side. So this isn't a partisan-driven response. (I'm broadly in favor of the entire landscape of newer languages and look forward to the success of many of them and more.)

That said, these two benchmarks straddle both sides of "not very useful"; the first is more likely testing function calls than any sort of what we usually consider "performance", and the benchmark times suggest to me that what you're seeing there is a one or two instruction/cycle difference in the function boilerplate between the two languages. In the vast majority programs, function call time is far from your biggest problem; usually you're writing functions that are much, much larger than the function call overhead.

On the other side, as a language benchmark, testing the entire HTTP server stack is way too big. There's just way too many software engineering decisions involved in writing an HTTP server around performance vs. correctness for that to be a fair language test. The autodetection of content type may be the dominant factor today, but it's only an example of a whole class of decisions that can be legitimately made in various ways, plus all the other software from OS on up that you're also benchmarking implicitly.

(I'm emphasizing "language" because benchmarking "the simplest, fastest request this web environment can give me" is a useful bit of information. I think it's often greatly overemphasized. Once time-per-request (which I think is a better way to think of it usually rather that requests-per-time) is substantially less than your customer handlers, it stops mattering whether it's .1% of your web request or .06% of your web request time. It's a useful bit of information, but it's only a small bit of information about the web stack in a large pile of other considerations. But it's not a great benchmark of the language's performance because of the aforementioned substantial, legitimate differences in structure and safety/performance tradeoffs that can be made that can dominate the language differences.)

Indeed the benchmark was dismissed on Reddit coupe of days ago: https://www.reddit.com/r/programming/comments/h0knmi/go_vs_c...

The repost[0] in r/golang shows similar critiques. This perf comparison is naive on many levels.

- binary size (static vs. dynamic linking)

- recursion is not idiomatic in go, it is assumed that you write imperative for loops.

- mathematical functions like the Fibonacci sequence are a rather atypical computation use-case for a Go program (I don't know about Crystal). Tree/graph traversal/mutations would be a more fitting test. Or generally something that is composed of dynamically growing and shrinking slices and maps.

- http test apparently cannot be reproduced, some get better results for Go, some for Crystal.

- http tests w/o involving some parsing/marshaling/serialization or something along those lines aren't that useful. You usually want to either read or send some JSON string or similar.

[0] https://www.reddit.com/r/golang/comments/h0kogq/go_vs_crysta...

Recursion is really not the idiomatic way to solve most things in Crystal either, not that it make the test relevant.

That said, "was dismissed on Reddit", is one of the less convincing arguments

Well, it expands into "was dismissed elsewhere, so go there to see if the arguments there are convincing."

Why doesn't Crystal use more than 100% CPU like Go does if the server has 8 cores? Why didn't both languages max out the cores? Is it because the params to WRK was within what one core could handle in Crystal?

It seems crystal is single threaded by default. Although you can configure it to use more threads.


Yeah, but in a benchmark it seems strange that he doesn't restrict both to use just one CPU or unlimited.. kind of compares apples/oranges.

I've taken Crystal out for a spin with a number of popular and less popular frameworks. Even for my small test application, edit/compile/run times were slow. Only the thinnest frameworks like Kemal seem tolerable to me.

I really do hope that the compiler gets faster and the larger frameworks figure out how to compile faster. These are the benchmarks that matter to me before I'd make a recommendation.

My understanding is that the compiler performance issues in Crystal are sort of inherent to the language design and making it much faster will be very hard. I believe it's related to Crystal's type inference -- it has to traverse every code branch to ensure type safety. I believe explicitly typing everything is supposed to help, but I'm not sure by how much (especially if you have dependencies that aren't doing that).

maybe LLVM as well..

Here Is a Gist where I compare Crystal, Go, OCaml including compile time. https://gist.github.com/s0kil/155b78580d1b68768a6c601a66f8e2...

Why did you combine compile time and execution time?

The idea is to have a balance of compilation + runtime performance.

Development I suppose

So Crystal is trying to beat C++?

I did this in Java on Jdk 8 and it took 6 seconds.

I wonder how the GC pause times and throughput compare. My understanding is that Go sacrifices some compute performance for better GC pause times.

> better GC pause times

What does that mean exactly / in this case?

My assumption is that "better" means more predictable. Or does it mean straight up fewer/shorter?

Rick Hudson gave this excellent keynote on some of the "recent" (as of 2018) improvements and guiding principles for Go's GC, if you're interested in the nitty-gritty. https://blog.golang.org/ismmkeynote

He outlines the SLOs for the garbage collector, with the motivation that this SLO impacts the SLOs/time budget of every application built with Go. They key metrics are "share of CPU used for GC", heap sizes, latency and frequency of pauses, and scaling favorably with go-routine allocations.

Go sacrifices basically every other GC metric to minimize average pause times. Sortof a 'if you can't succeed, redefine success' mentality if you will.

That's not necessarily a bad thing for writing web services. I used to spend a fair bit of time tuning JVM to achieve acceptable tail latencies for Java services, not to mention once in a while, they would need to be tuned again... Not so with Go.

Java now has two low latency collectors: ZGC and Shenandoah.

The Go GC and the JVM GCs are built within very, very different constraints.

Since most short-lived allocations in Go are stack-allocated, the cost of adding a read-barrier and branching on every heap access is more expensive than the potential gain from bump-pointer allocation.

In Go's case, it means that GC pauses are shorter because it uses a concurrent mark and sweep algorithm.

It seems strange that Crystal would not utilize more than 100% of the CPU in the HTTP benchmark given that it ran on an 8-core machine and with fibers should be able to split the load across multiple cores. What explanation is there for that? Also, it's impressive that while using one third of the CPU Crystal outperformed the Go HTTP server on throughput.

Because currently Crystal is singlethreaded. You only get multithreading if you compile with -Dpreview_mt . I suppose it will be on by default at some later point.

Might binary size comparisons be misleading, unless the Crystal binary is statically linked, as presumably the Go binary is?

Anybody knows the status of light-weight threads, channels and automatic multiplexing ("mapping") of light-weight threads on operating system threads in Crystal?

(This has been promised to come in Crystal (as opposed to pretty much any other language than Go), but has always seemed to be "yet to come").

It's available https://crystal-lang.org/reference/guides/concurrency.html. Fibers (like goroutines) are scheduled by Crystal and map to system threads.

The work on parallelism is available as a compile-time flag, but not yet GA: https://crystal-lang.org/2019/09/06/parallelism-in-crystal.h...

One wonders, once it is GA, if there is value in a Go transpiler. There is a lot of useful software in Go land that would be immediately useful for Crystal devs, and Go itself is not too complicated to map to another language with the same features (a few of the runtime features will have to be emulated).

How would this be achieved? Could you expand a little on this idea?

Literally convert Go code to Crystal code. I have not looked into Crystal enough to confirm it's features are a superset of Go's. For example, I saw that Kotlin w/ the advent of their coroutines, had most of Go's features so I wrote a transpiler[0] that got pretty far (can run all of this [0]). I abandoned the project because I have abandoned the JVM.

0 - https://github.com/cretz/go2k 1 - https://github.com/cretz/go2k/tree/master/compiler/src/test/...

an IR interface, lol


I always find these benchmarks superficial. unless you're doing some serious number crunching, it doesn't make sense. people need to start bench marking languages on speed of development, developer ergonomics, error reporting, ease to deploy etc.

Tangent: would technical writers please stop using the phrasing “is X times smaller” when “is one Xth the size” is more accurate? I had to read the reference to “Crystal’s binary size is 5 times smaller than Go’s” twice to realize it was 0.2x and not 5x.

I don’t get it. “5x smaller” and “0.2x the size” are the same.

Can you explain then what do you expect to be the meaning of “two times smaller”?

Clarity of language for one thing, but also that "times" is indicative of multiplication so "five times" anything has to be bigger[1]. One fifth is far clearer as there's no way to interpret that other than being a part of the size of the reference, ergo, smaller. Is it impossible to understand? No, but it's poor style for technical writing.

[1] I'm ignoring the case of decimal math since the reference in this case is always the whole integer 1.

I don’t know under which rock you’ve been hiding but “x times faster/smaller” has been in common use for decades, including technical writing. Everyone understands it as 1/X, there is absolutely no confusion.

I think this takes the cake for the most pedantic HN comment I have ever read.

If a thing has size x it has smallness 1/x (so the bigger the thing, the less smallness it has). When you say one thing is 5x smaller than another, it means the smallness of the first is 5 times the smallness of the second. :P

Without having done any of my own research, I’m initially skeptical of Crystal’s binary size. I initially saw something similar with Swift, but that’s just because the runtime is external to the binary.

Now compare Crystal with Nim.

I want to say this: https://framework.embarklabs.io/news/2019/11/18/nim-vs-cryst... was published on HN at one point. Regardless, it's a decent rundown.

Oh wow I expected Nim to be at least as fast as Crystal.

Nim JSON parser is not optimized, it was mostly written for maintenance (for example it allocates a Table per node in the json file)

I.e. the difference in speed here is a difference in elbow grease.

I have yet to come into an optimization problem where you cannot reach the speed you can achieve in C in Nim.

Interesting. But more important is how productive can I be and how expansive is the ecosystem? What’s the developer experience like? Is the community welcoming?

I suggest the word femtobenchmark for this kind of work.

Consider using wrk2 in order to avoid coordinated omission.

Has anyone compared Crystal and Rust on similar lines?

Did a rust build with the following...

    fn fibonacci(n: u32) -> u32 {
      match n {
        0 => 1,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),

    fn main() {
      println!("{}", fibonacci(47));

size: 2.67kb time: 7.391s

I think rust wins :-D

--- edit: not sure if I used valgrind/massif right, but the largest values in the output file.

I think this means it stayed under 2k ram?

For the web server... 3.6mb file using hyper, not the leanest but one of the more popular options.

    extern crate hyper;

    use hyper::Server;
    use hyper::server::Request;
    use hyper::server::Response;

    fn hello(req: Request, res: Response) {
        let response = format!("Hello from {}!", req.remote_addr.ip());

    fn main() {
Output from wrk -t8 -c400 -d60s http://localhost:3000/

    Running 1m test @ http://localhost:3000/
      8 threads and 400 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency   173.31us   88.10us   0.98ms   65.18%
        Req/Sec    35.06k    13.46k   45.48k    74.99%
      8383194 requests in 1.00m, 775.50MB read
      Socket errors: connect 0, read 0, write 229, timeout 0
    Requests/sec: 139487.00
    Transfer/sec:     12.90MB
According to top, rust_http is using about 2.8mb ram, 630% cpu, and wrk is using 700k and 370% cpu.

Though I'm on completely different hardware, but the thing definitely screams...

For reference: AMD Ryzen R9 3950X, 64GB DDR4@3200(CL18) under WSL2-Ubuntu (Windows 10 insiders)

Wouldn't processing time depend on your CPU, and unless you run the benchmarks in the article, this comparison would be meaningless?

Don't have a go environment setup, but could do that. The executable size and memory use was also lower.

Well to be honest, Go binary sizes are always going to be big, to the benefit of ease of development and deployment.

Memory usage, I don't know. The Fibonacci benchmark code was a bit... shit.

Do you want to send over your Rust code in a Gist or something, and I can compile and compare?

It's in the comment above...

It would be cool to see. Crystal-lang isn't finished yet tho. Maybe it's a little too early for that?

I'm still waiting for someone to write an LLVM backend for Go.

gollvm? vs gogcc. that's why i said "an IR interface" when someone asked about transpiling crystal to go and vice-versa.

It’s always interesting when people compare Go to other languages they always use file size of binary, as if that’s something that anyone in modern professional engineering considers.

Another good example would be to statically cross-compile a non-contrived program into ARM64, 32-bit linux, and Darwin without needing google!

And if you think you have your binary statically linked, go test it in the “scratch” docker image. You may be surprised at how difficult it really is.

Sure we do, I have spent the last week trying to upload 10 MB files in a quite slow uplink, and yes in Europe, suburban area of a German town.

First reason why people uninstall apps on mobile devices is app size.

Are mobile apps the target market for Go? CI/CD takes care of upload for the dev side at least.

It would be handy to have a compile time flag indicating the desired file size optimisation level still, with tradeoffs being ease of debugging, and performance.

Given that gomobile exists, maybe.

CI/CD is something almost unknown outside HN bubble.

Also that was just two examples, here are another two, cost of production for USB Armoury keys running Go bare metal, or download costs for WebAssembly modules written in Tiny Go.

Are you sure re: CI/CD? I was looking around the job market recently, and every role was interested in my experience in it, since they were using it too. To be honest, that could be self-selecting, as my resume likely attracts companies similar to ones I have worked at. I still do hear about microservices, CI/CD a whole lot in job descriptions etc, but whether they are already practicing it (well), is another question...

I was going to suggest TinyGo! I mainly work on a backend system, so I guess we have different priorities and needs. The embedded work I have done has all been C/ ASM, though it could be fun to revisit with Go.

In terms of sheer numbers, I would guess that at least a plurality of websites are running old versions of PHP and are released via FTP. But I don't think that's really meaningful or interesting to worry about, since those services aren't even considering the tooling decisions we're talking about.

I still get to work on customers whose process is to compile on the IDE and manually copy a zip file into whatever environment they need to deploy.

Companies whose main business is not selling software don't care much about whatever are best practices in software engineering.

AFAIK, if you deploy an app to an AWS auto-scaling group, then bundle size can very much impact your cold boot times, and therefore scaling agility.

Technically true but you're talking about sub-second figures that have no meaningful impact compared to the high and unreliable variable actual process of autoscaling booting a server and waiting until it gets a health check before adding it to the load balancer.

If you were to attempt to optimise around that, or any other argument such as ingress to s3 etc etc you would quickly find that you'll never in thr lifetime of that codebase make the dev cost back in tangeable returns to your business.

Yes I mostly agree.

Also I suspect a lot of the Go executable size comes from "initial" runtime overhead that does not scale linearly with the project size.

Rapid scaling in a "Superbowl ad" or similar situation is affected heavily by cold boot time, which, as you point out, is a result of many factors, including your platform. If you use something like Lambda, your bundle size plays a much bigger role than if you fire up a new EC2 instance.

Given Go was developed by Google to solve Google problems I would err on the side of no.

If you're entire binary fits in L1, you're going to realize significant performance improvements over a 30mb binary. It's just that simple.

Does that include a dynamic linked library?

Yup. To absolutely maximize performance, you need none of your code paths to swap your binary out of L1. Of course, it's not the end of the world if one of them does (an occasional call to a large dynamic lib isn't too bad) but a small binary has important performance implications that most developers don't internalize.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact