Hacker News new | past | comments | ask | show | jobs | submit login
Comparing Pythagorean triples in C++, D, and Rust (atilanevesoncode.wordpress.com)
135 points by atilaneves 9 months ago | hide | past | web | favorite | 116 comments

By the way, is it a secret to programmers that Pythagorean triples can be parametrised?

I understand that the point of the code is to enumerate them without knowing that they can be parametrised, but the parametrisation in this case is not so scary, so it's a fun bit of trivia for programmers (for mathematicians, it's more fundamental, as the Pythagorean parametrisation is an important prototype for counting rational points on elliptic curves, a fundamental problem in number theory).


Slightly less easy nor well-known amongst mathemticians (or at least, this is new to me), you can enumerate all primitive Pythagorean triples in a tree!


This tree is really interesting! I wonder where those matrices come from. The proof that it works is a boring mechanical check, but I'm now interested in how they were derived in the first place. This is a 20th century discovery, unlike the parametrisation which comes from antiquity. I think I'll be reading some papers about this.

This was useful to me when doing Project Euler exercises, but I wouldn't know how to prove this generates all triples.

The basic idea I like is, take x^2 + y^2 = z^2, but divide by z^2 to get (x/z)^2 + (y/z)^2 = 1. Now you're looking for the rational points on the unit circle. An important observation is that any two rational points define a line with rational slope. You know of one such rational point on the unit circle, say, (-1, 0). Now you can invert this process and pick a line with rational slope that goes through (-1,0). All lines with rational slope going through (-1,0) will hit another rational point on the circle, so you've found all rational points on the circle by this method, which include all primitive Pythagorean triples.

When you work out this math, you recover the classical parametrisation, where the slope m/n of the line corresponds to the parameters, and this line-intersection method is a good inspiration for how to do the same with some higher-order curves such as elliptic curves, where it leads to the group law on them.

There is a paper by Josef Rukavicka called "Dickson's Method for Generating Pythagorean Triples Revisited"[1]. It is a very nice proof without words. Based on it, the Project Euler solution was almost an oneliner in Prolog.

[1] https://www.ejpam.com/index.php/ejpam/article/view/1844

Also, Fibonacci numbers and Binet's formula


I used the Dickins method just naively and it seems to be about 10x faster than what they did here at generating 1000 triples.

Last time I bothered with calculus was around 1996, so yeah it isn't something that sticks around.

This doesn't have anything to do with analysis-like techniques, and is closer to combinatorial/number-theoretic stuff that programmers are more likely to be familiar with.

Which on my case happened to be on that same year.

The only math stuff I kept updated was denotional semantics and linear algebra, due to compiler and 3D papers that I regularly read, which is anyway not something any of my peers bothers to do.

Those of us who have compute-bound workloads (scientific simulations in my case, but this often comes up in gamedev and elsewhere) often complain about how unusably slow debug builds impede our workflow in C++. One fairly common solution to this in extreme cases is sticking to a mostly C-like subset of the language or just using C. I don't have much experience with Rust, but judging by the results here, debug builds in Rust are effectively useless, even compared to C++. How do Rust programmers deal with this? Do you just give up and resort to printf debugging, or are there ways to improve the situation?

In my work (text search), if I'm debugging, then the vast majority of my time is spent debugging performance problems. In those cases, it only makes sense to debug with all optimizations turned on. If I'm debugging logic problems, which is a bit more rare, then I've found debug builds to be fast enough in most cases. When they aren't, then I turn on release mode. Incremental compilation mostly makes compile times bearable, but not always.

I suspect the experience here will differ quite a bit depending on what you're working on and what your preferred workflow looks like.

> Do you just give up and resort to printf debugging

FWIW, I've been printf debugging since I started programming, regardless of the language I use. To be fair, if I spent more time in a predominantly memory unsafe language, then I'd probably use a debugger more.

I mostly use printf. A debugger is only good for telling you where it seg faulted and the stack trace.

The reason is because I write a custom printf to print exactly what I need to know. Debuggers just bury you in irrelevant output.

Debuggers are great interrogating arbitrary program state (it's a printf you don't need to hardcode!) and acting as a REPL for trying out new code.

I'm with Walter on this one. He's right. When I printf-debug, I very much do not want to necessarily see the raw in-memory details. I want to see a pretty printed view. For example, right now I'm working with DFAs, and if I just printed out its transition table as it is in memory, it would be unreadable. Instead, I have a custom fmt::Debug impl that pretty prints something I can read and comprehend.

I don't think I'd say this is the only reason I use printf instead of a debugger, but it's definitely a compelling one.

> I very much do not want to necessarily see the raw in-memory details. I want to see a pretty printed view [...] I can read and comprehend.

This isn't an argument in favor of printf over debugging. It's an argument in favor of making debuggers not suck!

Visual Studio has decent watch-window formatters for STL containers. Vectors and Lists are easy. But it also has nice views into unordered_map and unordered_set. Visual Studio has a .natvis file format for adding custom debugger displays for custom datatypes [1].

If you're broken into the debugger and all threads are paused then a good debugger should be able to display your data however you like. Hell, it could have different display options to choose from if you really wanted!

I also find myself regularly relying on printf to debug. But I view this as a failing of the debugger, rather than the superiority of printf.

[1] I think the natvis format kinda sucks and is not sufficiently powerful. But that's a separate issue.

I don't use C++ and I don't use Visual Studio. And even if I did, it would be pretty annoying to have to define debugger specific files just to print my data types. In Rust, I just add a fmt::Debug impl and now everyone who uses my code benefits from it, whether in a debugger (by calling that impl) or by print-debugging.

I'm not here to convince anyone to use printf debugging. I don't care about some grand "argument" in favor of it over something else. What I care about are the tools available to me and the most effective way to debug, has historically, for me, been to use printf.

This does not mean I only use printf. This does not mean I hate debuggers. This does not mean that I think debuggers are useless. This does not mean that I think debuggers couldn't or shouldn't be improved. This does not mean that I don't use profilers when debugging performance issues. All it means is that my tool of choice for every day debugging is printf. It is a convenient for me on a number of different dimensions.

That there could exist a theoretically better tool sounds like a great reason for someone to go out and build something. But that someone isn't me, at least not right now.

I'm familiar with your work. I use ripgrep daily, thanks!

> I mostly use printf. A debugger is only good for telling you where it seg faulted and the stack trace.

That was Walter's statement, with which you agreed, and with which I strongly disagree.

Maybe I overgeneralized. The Rust debugger story indeed sucks and I mostly use printf! I interpreted "a debugger is only good for..." to be all debuggers. And I don't think that's true. Which is why I gave a Visual Studio/C++ example.

2019 won't be the year Rust's IDE story gets good. But maybe 2020? People are laying the groundwork. I'm hopeful.

> 2019 won't be the year Rust's IDE story gets good. But maybe 2020? People are laying the groundwork. I'm hopeful.

I'm very hopeful 2019 will see some major improvements in that domain. Some IDEs such as Qt Creator just added support for the language server protocol [1] which already supports Rust. Thus the critical groundwork is already being deployed, albeit some work still needs to be done.

[1] https://langserver.org/

Just to share another POV (I hear that you're not using MSVC so much of this may not be useful to you):

For me, it's annoying to have to modify my code and recompile / relink / redeploy / re-repro (assuming I even have a good repro case) just to inspect my data - linking alone can take over a minute for some projects I work on, nevermind the other steps! Meanwhile, changes to MSVC project natvis files hot reload, even if I'm looking at a crashdump for a coworker's computer for a bug that only happens every other full moon while rhyming off-key in Swedish. For some third party libs I may not even have the source code available to modify, but I can still sometimes write natvis files for their types. It's a little duplicate effort, sure, but I'll probably finish adding a new type to a natvis file before I'll finish relinking my project in a lot of cases. https://github.com/rust-lang/rust/blob/master/src/etc/natvis... , while perhaps a bit arcane if you don't know natvis (there are docs), and verbose on account of being XML, really isn't all that much XML for a couple of debug visualizers.

I consider debugger info important enough that even though I'm not using Rust in production, I did write one rustc patch to auto-embed stdlib natvis files into pdbs (although those won't hot reload): https://github.com/rust-lang/rust/pull/43221 . There are gdb scripts I'd be improving if I were debugging rust with gdb instead. Many script debuggers can take advantage of "debug" formatters defined in code, which is a nice option to have too, so it doesn't have to be all one or the other. I'm not aware of any debuggers that leverage Rust's fmt::Debug traits sadly.

I'm not necessarily knocking printf debugging. I use it and things like it sometimes. Especially if I have a harder problem that need more code to diagnose, and is making me run into the limits of the debugger. Memory leak tracking, cyclic refcounted pointer detection, annotating memory regions to be included in crash dumps, explicitly annotated profiling information, etc. - things that tend to involve more permanent systems. Sometimes you can write a debug script for these things, but doing it directly in code can be faster to write and to execute.

I will say: If your debugger isn't at least capable (with a little investment) of being good at inspecting arbitrary program state, it's not a very good debugger.

For example, when I debug the compiler, I'll often need the AST printed. Printing out standard containers doesn't do that. And sometimes I need the AST printed in different ways.

On the Visual C++ team, we use natvis to show FE AST nodes. It works pretty well with a couple hours of investment in writing the natvis.

I know that one can write custom pretty-printers for debuggers. But I like having the pretty-printers in the program itself. After all, I develop simultaneously on many diverse platforms.

With current gdb (i.e. released this decade), you can define pretty-printers in Python that are loaded automatically and print whatever you think is most important for a given type; the "print /r" command is available to print the raw details when necessary.

libstdc++ ships with pretty-printers for its types.

But I agree that printf debugging still has its uses.

I mean, why can't you just call that in the debugger when you need it?

Why spin up a debugger when I can just printf it? :-)

As others have mentioned, how do debuggers fair on optimized builds? Most of my time "debugging" is specifically spent on optimized builds looking at performance issues.

> Why spin up a debugger when I can just printf it?

Xcode launches into a debugger by default, so it's not really an extra step for what I usually do.

> As others have mentioned, how do debuggers fair on optimized builds?

Not well, if you are planning to have variable names and stepping work correctly.

> Most of my time "debugging" is specifically spent on optimized builds looking at performance issues.

Sounds like a job for a profiler?

> Sounds like a job for a profiler?

Consider the similarities between profilers and printf debugging; both of them run your code, and spit out some kind of log, whereas debuggers stop your code in the middle of execution. Workflow wise, they're pretty much the same, even if their objectives are a bit different.

> Sounds like a job for a profiler?

Sometimes yes. Sometimes no.

Borland even used to have a few marketing ads about JIT debugging, by making use of Dr. Watson infrastructure on Win16.

A debugger is most useful when you don't know what the program does, or how it does what it does. It is amazing for learning a new codebase.

But for actually debugging, a log file is better.

A good debugger is good for quite a bit more than that. Watches, break points, performance/profiling. I actually can't believe you're hating on debuggers.

> Watches

> breakpoints

    if (condition) assert(0);
then I use the debugger to tell me how it got there.

> performance/profiling

I use a separate tool for profiling:


It's built in to the DMC++ and DMD compilers.

Sometimes I mess up the code a bit filling it up with debug code, but when I finally fix it it's git to the rescue.

I can be old fashioned when it comes to IDEs, but git really is a marvelous, paradigm-changing advance.

I like debuggers for things like examining variables from higher stack frames after a highly conditional break (where printf from every higher stack frame would be lost in noise), for stepping through abstractions (especially in C++, where there may be surprisingly amounts of hidden code executed in overloaded bits and pieces), and for hardware breakpoints (an approach for solving reproducible memory corruption: combine a deterministic memory allocator with memory breakpoints + counters).

This is a great comment. Most of the value I get out of using a debugger - I work in game dev - is from these, often in concert. Even before I start setting data breakpoints, though, I often find myself examining heap memory with the process paused to give myself sufficient context to do more informed exploration later. In the last year or so I've also started using Visual Studio's "action" breakpoints, a sort of runtime configurable printf, once I've identified areas of interest.

I share his opinion for scientific software. I think this is because it is building an algorithm versus software. I think the coupling and scope are just too different.

If your loop takes billions of iterations on gigabytes of data before returning the wrong answer, how do you debug? Breakpoints are useless because which iteration introduced the fault? The critical paths are long. Watchpoint start after you pauzed. Reverse debugging is to slow for millions of instructions. If you change the code with the REPL (other post) you invalidate your previous calculations too. Stacktraces are useless because you inline everything, and the callgraph is shallow anyways. Performance profiling needs special tools: you know the hotspot, the debugger tells you where, not why.

My conclusion: A debugger is good for finding bug in data that moves, not for data that changes.

opinion based on: my debug approach changes depending on the error. Prints are always the easiest solution in the scientific parts.

I'm leaning towards prinf/println too on Rust. I usually somewhat prefer debuggers, but I'll wait to get one set up until I find I need it. I haven't found myself wanting a debugger for Rust yet. Maybe the type checking is just that good that I don't have many bugs, maybe I'm writing enough tests for things that aren't type-checked well, maybe I haven't written anything complex enough yet. Whatever the cause, it works well enough for now.

I find myself wanting debuggers more on dynamic languages where you have no idea what object types you're handling or what their properties are. Printing the whole thing gets you a pile of mostly useless mush. A debugger lets you poke at parts of it until you find something that gives you some insight into the problem.

I'd also say that performance and threading problems are a different beast. Even when you have a beautiful debugger, it's not very helpful to stop one thread while you poke around at human speed. You gotta log info about what's happening somewhere and then examine it for clues after the run is done. It may take a few dozen runs to log the detail you need without gigabytes of useless mush, but that's just what it takes to get to the bottom of these types of issues.

Maybe you need to try out more powerful dynamic languages? I mean even Python has 'pdb' which gives you a fairly authentic gdb experience, and you can always dir() and help() something in the REPL. For me I find myself not missing a debugger very much at all in Python code, whereas Java code of a certain size and legacy history forces me out of my preferred vim world and into Eclipse for the interactive debugging alone, the language makes it more painful to debug in other ways you can get away with in say Python. Plus I find that language culture matters. Java programmers will assume you have a big IDE and debugger and will write their code accordingly. Other language cultures do something different. Occasionally you'll get principles like "grep-friendly code" in an effort to cut across cultures but they're still not universal.

Clojure is another example of a pretty decent experience, e.g. "add-watch" is built-in, it has a REPL (so I've used it to debug Java code before), and the coding culture is functional programming which has its own benefits for debugging. Common Lisp is even better, it's a system as much as a language and so the runtime itself has all the debugging capabilities that you need a heavy IDE for in simpler non-system languages. (break, compile, trace, update-instance-for-redefined-class, object field inspection, extendable print-object methods are all there part of the standard, lots of introspection and redefinition capability, and CL compilers like SBCL can give quite detailed type, argument count, typo alerts, cross-reference usage, and optimization information at compile time, still no IDE needed (though editors like emacs/vim have nice wrappers and can automate some stuff). Check out this short series: https://malisper.me/debugging-lisp-part-1-recompilation/ )

Yeah I meant Ruby and Python for dynamic languages, and by debugger I mean the command-line Ruby byebug and Python pdb. Those plus logging/print statements have been all I've needed so far, never felt a need for a GUI debugger, but then I already live in Vim and Tmux. Never tried a command-line debugger for Node/JS, but the Chrome GUI debugger works well enough.

Pry is the way to go for debugging in Ruby. Drop into an interactive REPL with local variable context in situ with binding.pry or binding.pry_remote if in a forked server environment.

For subtle edits of a big chunk of code, my preferred approach is to set everything up in a test case, then pop into the implementation that needs to be changed where all the state is available with a binding.pry, and iterate expressions in the REPL until it's good, then copy my history out into an editor.

In Emacs it's nice and easy with rspec-mode and inf-ruby - run the test from within Emacs and get in-editor REPL once you hit the binding.pry.

It's funny to me as an embedded programmer seeing people write about how they prefer printf to actual debugging. My printf command can take MANY TIMES longer to run than most of the code that I'm trying to fix.

Maybe it's a software vs hardware thing, but I would end it all if I had to work hardware without breakpoints, watches, and step-through.

I used to build/program embedded systems (around a 6800 uP). I'd debug using an oscilloscope, sometimes an LED attached to a pin, sometimes connecting the pin to a speaker (!). There wasn't enough EPROM space for a printf. And besides, the turnaround time for erasing/blowing an EPROM was just too long.

Essentially you just get good at staring at the code and running gedanken experiments till you figure it out.

Why is it funny? And why do you not consider printf to be "actual" debugging? I mean, if printf wasn't available to me or was for some reason otherwise inconvenient, then I would look for other avenues to debug, perhaps by using a debugger! This isn't that mystifying.

As I said above, this is very heavily dependent on preferred workflows and what you're working on. Long ago, I remember doing some robotics work in C, and a debugger was invaluable.

printf debugging doesn't work when your code already crashed, and all you have is the process dump (and if you're very lucky, it's a heap dump, not just stacks).

That's like saying that time spent developing a robust resupply and support network is wasted because your army is already starving in Russia. It's true, but sort of misses the point.

Obviously it's possible to debug an issue based only on static state like a core dump. And in extraordinarily rare cases that might be the only available option and a debugger (or more manual tooling) might be your only choice.

But in the overwhelming majority of cases, even working at the lowest and most obtuse levels, the very first step in debugging a problem, long before anyone starts flaming about tool choices, is to come up with a reproducible test case. And once you have that, frankly, fancy tooling doesn't bring much to the table.

At the end of the day you have to spend time staring right at the code at fault and thinking about it, and if you have it up in your editor to do that, you might as well be stuffing printf's in there while you're at it.

Those "extraordinary rare cases" aren't anywhere near as rare or extraordinary when you have millions of users.

In fact, I would argue that reproducible reports are relatively rare in the industry, especially once you get out of developer tooling (where the users are people who know the value of such reports, and how to obtain them).

And then stuffing a printf in the middle of it can easily mean several minutes of build time, for a large native codebase. A tracepoint, on the other hand, is instant.

You can do binary search with the line where the printf statement is, and eventually you will find the line with the error.

I use printf as well. However I use debugger (gdb in my case) to look at the assembly to see if it is optimized without having to resort to compilation flags, objdump, nm, etc.

This is going to sound unbelievable, but I don't really ever do runtime debugging of my Rust code. Nearly all my debugging happens at compile-time (which is faster to iterate with thanks to cargo-check skipping codegen and linking). That said, I do know some people with large Rust codebases who do value the ability to run debug builds, and the slowness of -O0 binaries is a sore spot for them. As with C, you can try -O1 for a compromise between runtime and compiletime.

As my decades of programming experience accumulate, the source of most of my programming bugs shifted. Earlier it was dumb mistakes, memory management errors, seg faults, etc. Now it's logic errors due to design mistakes and failing to understand properly what needed to be done.

A fair bit of the D programming language design is there to head off the detail mistakes people make. A more subtle thing in the language design is to make the designs less prone to error.

The issue is that the optimised code is not the same as the unoptimised code, and debugging optimised code sucks. It appears to the debugger that your execution is jumping around erratically and often variables don't exist because the optimiser decided that they didn't need a reason to exist.

It's a problem with any kind of compiled language that runs through an optimiser, whether C, C++, Rust, or D. Maybe that's why so many people just don't use debuggers, although my philosophy is that everyone should use a debugger, but maybe they just haven't found the right one yet.

It doesn't have to suck nearly as badly as it does today with LLVM-based toolchains, including Rust's. With C and C++, GCC does a much better job of maintaining accurate DWARF information with performance at least as good as Clang. (And optimized code certainly shouldn't be "jumping around erratically," simply because that results in poor performance on modern CPUs with multi-cycle instruction pipelines and imperfect branch predictors.)

A variable doesn't have to be in a single register throughout an entire function, or even present in registers / memory at all! DWARF is capable of representing storage locations that change over the course of a subroutine, as well as synthesizing values from other registers/memory, simple operations, and constants. Compilers can do a lot better (although certainly not 100%) than they do today without sacrificing any of the optimizations they make, and especially Clang/LLVM have a lot of room for improvement.

I highly recommend this blog post on the subject: https://backtrace.io/blog/engineering/compile-once-debug-twi...

How do you debug logic errors in that case? I imagine that most logic errors can't be caught by the compiler, although I could be wrong.

Most logic errors can definitely be caught by the compiler if you're taking the approach of encoding your logic into your type system. I've done this with real codebases (in Python as well, actually, using Mypy) and it's been very effective.

Can it catch something like accidentally writing "x == y", when you meant to write "x != y"? I feel like this kind of bug can be insidious enough to waste lots of debugging time, yet difficult to catch until you run the code.

Not easily. Rust's type system has limits, and what you're talking about is something you'd probably want dependent types for.

But it can catch lots of other logic errors.

The approach I have used in both mypy and rust is to encode state transitions into types. It's more elegant in Rust, and safer as well, but even with mypy it means I could be very sure about certain properties of my service (it was a sensitive service and I needed to be reasonably sure it wouldn't end up in an invalid state).


This is an example of the pattern.

When you take a "Type driven" approach to your code - stating your constraints upfront, and encoding them into your types - you can push a lot of the debug cycle to your compiler.

I come from Ruby, where debuggers are useful but also not often used.

I find that I tend to write more and smaller unit tests than people who reach for debuggers. I debug logic errors by using println or writing more tests.

And yes, Rust is still very susceptible to logic errors.

Developing with powerfull debuggers is an experience that kind of resembles using a REPL, specially with languages that enjoy fast compiles.

I advise devs to go through Xerox PARC and ETHZ papers about interactive development, or Apple's Object Pascal/MCL/Dylan/Hypercard, on was the birth of IDEs.

Fun fact: I loved coding in HyperCard as a kid.

Can't speak to others, but it's funny to think of "printf debugging" as a last resort when it's easily my go-to. I'll often add some println's and re-run tests to trace through what happened.

I only step into a debugger when I'm really wtf'd, and I don't think that's happened to me even once with Rust (it used to happen a lot with C++, since printf debugging mixes so poorly with segfaults etc). No question that long compile times are a pain though. I generally use 'cargo check' 99.99% of the time.

Thankfully there's work going into making debug builds faster by, if I recall, replacing the llvm backend with some other thing I don't recall the name of.

The code generator that may complement/replace llvm for this use case is Cranelift.


Not talking about Rust here, so not really an answer, but in the film VFX industry (rendering in my case) pretty much all the high performance code is C++ or CUDA, and before we moved to c++11, I'd worked out that for STL iterators, at least in the 4.2/4.4 GCC versions, pre-caching the end() iterator of an STL collection before the loop instead of calling it each time as a control condition of the loop made faster debug runtime builds, so I got into the habit of doing that.

In the optimised (O2/O3) builds the compiler effectively did that optimisation anyway, but not at O0.

It meant loop code in the source was longer, but made measurably (~10% if I remember correctly) faster debug builds on code which did a lot of vector iterating.

I have also noticed that often some of debug builds' slowness is actually due to asserts being on in debug builds as opposed to optimised builds, at least in some of the code I work on (which in some cases makes very heavy use of them) - removing asserts can in some cases bring things back to debug builds being ~5x slower than optimised compared to ~30x which is more acceptable.

Thanks for the tip. I too work in rendering and often wondered if pre-caching end() would make significant difference but figured its best left to the compiler.

Have you tried -Og? I would try this before modifying the sources..

Another fix to build times is to use reliable fully incremental builds, such as implemented by e.g. Google Bazel. Bazel can also cache intermediate build artifacts to speed up the builds even further.

> How do Rust programmers deal with this?

You compile with -C opt-level=1.


  #opt-level = "s"
  opt-level = 2

Update: The Rust versions can all be made to run faster with a simple edit. I've updated the timings and edited some of the text.

Not mentioned in the post but I'm assuming you're using O0 for "debug"? It may be worthwhile to reconsider as O0 is now considered to be a tool for compiler authors & regular people should use -Og (or opt-level 1 for Rust) as a way of generating builds that have performance optimizations applied that don't impact debugging.

I've usually found -Og to inline functions or optimise out variables I needed to investigate. It's happened often enough that I just use printf in release mode.

This was a really good and worthwhile analysis, thanks!

Despite the changes people are submitting, it's good to know about these different gotchas. And there may be one or two actual compiler enhancements that could be made here also.

The C++ version is taking a compile and runtime short cut by using printf instead of iostream.

This pull request has timings with this changed: https://github.com/atilaneves/pythagoras/pull/1#issuecomment...

Using the <iostream> stuff isn’t really idiomatic - on both modern and legacy code bases I’ve used operator<< overloads on custom types, but never called std::cout in a hot loop or used std::endl to do anything but actually flush the stream.

Sure it is.

Not everyone is micro-optimizing their code just to save a couple of needless extra ms.

I never, ever since 1993, had to work on a C++ project where stdio vs iostream performance made any worthwhile difference, besides stdio being less secure to use.

Is std::endl guaranteed to flush the stream? I’d think you’d be better off with std::cout::flush(), std::flush if you prefer manipulators. It would certainly be clearer, assuming it’s not temporary debug code.

It is. That is the only reason to use it other than '\n'

It is guaranteed to flush.

What would you suggest is C++ recommended way for doing io?

I would use std::cout but output "\n" instead of endl.

Yeah you also need to be careful mixing C++ and C iostream/stdio as there is synchronization if you mix them. And if you don't then you should use ` std::ios_base::sync_with_stdio(false);` to prevent performance issues.

It's valid C++ and requires no external dependencies.

To save a some human parsing effort in interpreting the column names:

  CT = compile time
  RT = run time

I would be really interested to know where the overhead in the simple rust version comes from. At first I thought it might be the println() being less efficient for whatever reason, but even without that line this[1] takes 300ms, compared to 184ms for this[2] (with the printf!).

1. https://raw.githubusercontent.com/atilaneves/pythagoras/mast...

2. https://raw.githubusercontent.com/atilaneves/pythagoras/mast...

Someone found a single character change providing a 2x speedup: https://www.reddit.com/r/rust/comments/ab7hsi/comparing_pyth...

To quote that reddit post, which I think has some good interesting details about why:

“I believe the problem with ..= is that x..(y + 1) is not identical with x..=y. They are nearly equivalent on most values for x and y, except when y == i32::max(). At that point, x..=y should still be able to function, while x..(y + 1) is allowed to do nothing.”

I don't know anything about Rust, but is it part of it's "safety" that it has operators to allow rollover of type max size and others that do not?

As a C programmer, that sort of feels like cheating ;)

Safety means "memory safety". Integer overflow cannot cause memory unsafety directly, and so it's safe.

(safe) Rust does not claim to make your code correct, it claims that it will not have undefined behavior.

We do care about helping you make your code correct; it will check for overflow and panic at runtime in debug builds. But it's a secondary concern. Memory safety is hard enough!

It's code quality problems with the inclusive-end range iterator.

Ahh yeah, using `..(z + 1)` makes it run faster than the C++ version.

I looked at the disassembly of both of those in https://godbolt.org/ and it looks like for some reason Rust puts the loop variables x,y and z on the stack and the loads them off the stack. This causes a bunch of L1 hits instead of register hits in the tight loop, causing the 2x slowdown.

I have no idea why LLVM misses optimizing this in Rust's case. Likely it is fed much trickier IR. It also misses the optimization that it does in the C case of lifting the xx and zz multiplications to the appropriate loop level.

I think you've seen this comment about integer overflow optimizations? https://www.reddit.com/r/rust/comments/ab7hsi/comparing_pyth...

But I also note that the original version of your benchmark was calling `println!`, which means that you were essentially benchmarking high-level, thread-safe I/O. EDIT: Not in this case, see the thread below.

Some Rust benchmarking tips:

1. Always run `cargo build --release`. Debug builds are slow.

2. Be sure to use `stdin.lock()` or `stdout.lock()` when reading or writing standard I/O. If you don't, Rust has to lock standard I/O for you on each call. This means using `writeln!` instead of `println!`.

> But I also note that the original version of your benchmark was calling `println!`, which means that you were essentially benchmarking high-level, thread-safe I/O.

I don't think so. The program is only doing a thousand prints, which is far less than the total number of loop iterations. Try changing to locked stdout and writeln!---I doubt you'll notice a difference.

Yeah, I tried the locking version and didn't see a noticeable improvement.

Thank you for checking this, and determining it didn't actually apply in this case.

We do a lot of I/O intensive Rust at work, and I have a list of things to double-check when benchmarking. The most common slowdowns: debug builds, unlocked stdio, line iterators, and anything else which allocates in an inner loop.

If I can hit 250 MB/s throughput per core on my laptop, with some light data processing, I'm usually quite happy.

I think the problem is flushing, not locking. Rust println!() flushes after each call, whereas more C-like io libraries flush once on program exit. Big difference in the number of system calls.



> Rust println!() flushes after each call,

To be clear, println! does not flush after each call.

Many terminals are line-buffered, and so when println! emits a \n, the terminal will flush. So, println! will probably cause a flush, but does not flush itself. println! is the exact same as `print!`, except for the additional printing of \n https://doc.rust-lang.org/stable/src/std/macros.rs.html#157-...

Are you sure about this? AFAIK, println! uses io::stdout(), which is defined like so:

    pub struct Stdout {
        inner: Arc<ReentrantMutex<RefCell<LineWriter<Maybe<StdoutRaw>>>>>,
In particular, it's specifically using LineWriter internally, which does flush whenever it sees a `\n`.

println! does not itself explicitly flush, but its current implementation does guarantee that it will. However, I don't think this is an API guarantee. In the definition of Stdout, there is this FIXME:

    // FIXME: this should be LineWriter or BufWriter depending on the state of
    //        stdout (tty or not). Note that if this is not line buffered it
    //        should also flush-on-panic or some form of flush-on-abort.
Which basically means that someone thought its buffering strategy should be determine by whether or not stdout is connected to a tty. This is what ripgrep does for example, but I had to build out my own infrastructure to do it: https://docs.rs/grep-cli/0.1.1/grep_cli/#coloring-and-buffer... --- Arguably, io::stdout() should do the same.

I don't think buffer configuration at the level of the terminal plays a role here. This is somewhat confused by programs like stdbuf[1] which purport to enable users to change the buffering strategy used by a program in their shell. But in reality, it appears tightly coupled with C's FILE data type and perhaps even tightly coupled with glibc itself? I'm not sure.

[1] - https://www.gnu.org/software/coreutils/manual/html_node/stdb...

Ugh, you're right. I got it backwards https://github.com/rust-lang/rust/issues/23818

thank you

> I don't think buffer configuration at the level of the terminal plays a role here.

I just thought I'd confirm that it does not. The 'line buffering' of `tty`s applies to the input to the `tty` (IE. the keyboard) before it goes to the program - basically, the kernel will buffer characters you typed and not send them to the running program until you hit the `eol` character. On the output side the kernel just handles characters as they come in without any buffering/waiting.

Thanks for those deeper details, enlightening!

Perhaps the terminal is linebuffered. (edit: I forgot that `!` was a character in Rust, so I mis-read your comment, oh well).

If there are big differences in where you don't expect them (e.g. printf in a for loop being very slow), then it usually is a difference in implementation and possibly contract. E.g. I wouldn't be surprised if println! in Rust is synchronous, which tends to be slow on both terminals and redirections (i.e. file I/O).

For these simple examples an strace may be enlightening in this regard.

Edit: Indeed this seems to be the case from the library source code (println! -> _eprint -> stdout -> LineWriter).

Yes, this benchmark would be much more interesting if it didn't do any println on the hot path and just accumulated the results and wrote them out once in the end.

Edit: I've tried (println! vs a single io::stdout::lock() and writeln!), and it doesn't seem to make any difference, at least for the range.rs example. Hmm.

Sometimes, the trick there is that compilers can be really good at compile time stuff, and the whole computation may just be computed at compile time. In this case? Probably not. But it's not always the right way to do a benchmark, or at least, without some tweaks.

It's because println! isn't in the hot path. So it's fine to use it here.

Might be interesting if all the tests used the same library for output. After all both D and Rust can use C libraries.

println will lock stdout for the duration of its work, yes. You can lock once before and use write! instead.

The reddit thread already has one good relevant comment https://www.reddit.com/r/rust/comments/ab7hsi/comparing_pyth...

Well, that seems more like an issue with the compiler then, cause avoiding overflows would not result in a 2x slow-down in C or C++.

Possibly! Remember that the Rust language and C/C++ have different semantics with regards to overflow. It may be the programmer's "fault" for not writing the code with the equivalent semantics (you can make them have the same semantics, but it's not the default).

Looking at the Rust source [1] it seems there the `next()` call has some overhead, so it's probably not the compiler.

I remember this was simple to implement in Julia [2] where you can iterate over the full range `for i = typemin(Int):typemax(Int) ... end` without overflow and without overhead. Same should be possible in Rust I think.

[1] https://doc.rust-lang.org/src/core/iter/range.rs.html#340-35...

[2] https://github.com/JuliaLang/julia/blob/master/base/range.jl...

For the sake of history, `range` was found/coined by Andrei [1] [2] as a way to overcome the issues with the iterator pattern (in STL). I wish Eric Niebler credited Andrei somewhere, but I couldn't find any.

Honest question: why should I use ranges after all?

As I understand, ranges are the core of idiomatic D and they are not worth it (as per OP)?

AFAIK, ranges (at least in D) are not thread-safe [3]. So what real benefits does it bring?

[1] https://accu.org/content/conf2009/AndreiAlexandrescu_iterato...

[2] http://www.informit.com/articles/printerfriendly/1407357

[3] https://github.com/carun/parallel-read-tester/commit/3e69da4...

> For the sake of history, `range` was found/coined by Andrei

No it wasn't. Boost.Range library predates Andrei's Range talk in 2009. Boost.Range was introduced in boost 1.32, which was released in 2004:


And from Boost.Range's "History and Acknowledgement" it explains where the term came from:

> The term Range was adopted because of paragraph 24.1/7 from the C++ standard


Furthermore, what is being standardized in C++ is an expansion of what is in Boost.Range, which uses iterators underneath.

Andrei's term for ranges(and what is in D) are actually quite different as it removes the basis for iterators completely.

I wrote the blog post and I definitely think ranges are worth using, and said as much in the link. I just don't think they're worth it in this particular case. The code is usually clearer and more reusable with ranges, and if by chance they're bottleneck (unlikely) then rewrite into raw for loops.


Website is basically useless on mobile. Redirects to random garbage

Not for me.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact