Nice and clear article. I like an idea of looking at quality of the process and not only the product. It was pleasant to read.
I didn't like that C itself was missing, because C is still my language of choice for some tasks and I still use it because I'm happy about it, I don't feel I would need any of its competitors. Maybe if this benchmark included C it would be some argument to give them a try. So I decided to do it for you.
It took me 30 minutes, only change is that I don't check if file is binary, I guess. I liked that I was doing it in C because I know it and I'm comfortable with it. Few checks in man and everything's clear. Only disadvantage of C is that it doesn't have standard API for directories, so my code is only for POSIX. Here's the code: https://github.com/BartekLew/dumbgrep
The other thing are sources you've done. They are almost the same (it'd be nice to see the more feeling of the language, idiomatic code, etc.) and I didn't like reading them at all, not a clean code and all are longer than my C code.
Another thing is that C has very narrow application nowadays. A benchmark should be focussed on them and not on generic algorithms.
This seems to be missing a return in the NULL check for handle in the match_file function. This leads to fgets being called with a NULL pointer, which I don't know what it does, it may be OK.
I would argue that this is why a "better C" can be useful, it's the kind of error that would be much harder to make in rust.
There is missing NULL check, but it doesn't crash, because the function would get nonexistent filename only if filesystem driver would do something strange (or file would be deleted just between readdir and fopen). xD
I also noticed, that I don't close files and directories. Not a big deal, because fully functional and 100% correct program wasn't the goal here. :)
EDIT: Actually fopen would return NULL when file is not readable. My bad. Anyway, such problems are trivial to debug.
I like that I don't have to spend the time debugging trivial problems like these in Rust though. The compiler catches them for me, and I get more time and brain power to spend debugging nontrivial problems. It adds up.
Everything is matter of price. Such an error is nobrainer actually, because just before I get an error message. If I don't know that yet, I run on gdb and everything's clear. Couple of seconds. How manyof them I have to do to ballance learning a new language(and I'm not sure if I like it eventually)? If I was a system software developer, that would probably be a good choice. I program mostly in high level languages, so maybe not. :)
Maybe it doesn’t matter for small programs, but probably in large programs tracing the control flow would be non trivial, so, as the parent said, over the lifetime of the program it would add up
Is there any language that can save you from control tracing problem? I think that well made architecture is only solution. In fact I would say that C is easier in that respect (if you don't use goto), because all control flow is explicit. In other languages you have exceptions, polymorphism, implicit constructors, destructors (and their order in case of inheritance, templates, etc.
C is dead simple in that respect. If you can't write right concise code no language can help you. However of course, as C is more verbose it is more challanging to structure code well. And to some extend other language can help you move the point where control flow becomes a problem.
The biggest problem in C programming for me is memory management, but maybe it's just a lack of experience. Certainly C++ is convenient with smart pointers and automaticall called destructors.
Anyway, i didn't consider big project a scope here. I wouldn't write realy big programs in wine. I would try to create as small independent parts as possible.
It's trivial to fix if you catch it early, and perfectly fine for a tool you run yourself. However if the problem is discovered two years later in production, there will be a forth-and-back until you get at the correct error message, possibly downtime, and if there isn't one you'll be instructing them how to install debug symbols for libX.
This is a very good point. Good testing, good programming practices, good logs are necessary. And it's not only about C. It's not so hard to do better than in Java. xD
Such an error is nobrainer? Yet you deployed your code with several of them?
Understandably you didn’t put your whole effort into a small code example in a HN comment, but a language that allows one to do so without any errors would be big productivity and safety gain.
Of course, in the program like this. It's quite obvious that for a bigger program it would be much harder. I wouldn't use C for a big monolith system or if I had to, I would put much effort in diagnostic features.
I don't write professionaly C code. I use it for my purposes so debugging is easy. I agree that such a problem in a bigger system whithout ability to reproduce environment and debug would be very nasty. However that is matter of context. In bigger system it's more reasonable to buy more safety for price of simplicity, because I doubt any of competitors in the benchmark is as simple as C.
> Not a big deal, because fully functional and 100% correct program wasn't the goal here
Wasn't it? It reminds me about the old article by Strostrup on c/c++ for beginners (I seem to link to this quite often, apologies if it gets repetitive):
One point of using c++ over c is that it should be easier to get a program that is in fact correct - regarding things like closing files etc. (that said conciously leaking some recources ("do and die") could be considered idiomatic c i suppose).
Yes, I consider that memory management is the biggest problem with C. Always the question is what is more valueable to you: simplicity or memory management.
What I find funny is his choice of languages: 'hipe driven' I'd say.
If you can use a GCd language like Go why would you go with Rust(which add a lot of complexity) or Zig(which is unsafe(1)) instead of the numerous existing GCd languages D,Nim, Crystal, Java,... which provides memory safety without complexity
If you can't why evaluate Go instead of Ada or DasBetterC?
1: At work a frequent issue in C++ is UAF a pointer to a stack variable which outlives the function, Zig don't help you here..
I originally wrote a slide titled "D as Better C", and since I was in München at the time, it suddenly morphed into "Das Better C" and so the pun was born.
In my head it is always said with a German accent.
Try it out, I’m thoroughly enjoying learning it at the moment and it really does feel like “better C”. The only thing I dislike is that the official docs are so thorough that they’re not very approachable to a novice (but much appreciated once you grok D’s features). I’ve found this book to be a better programmer’s introduction with lots of useful examples https://www.packtpub.com/product/d-cookbook/9781783287215
I'm not the OP, but could try to venture an answer as someone who's used some of those languages.
D is nice. I could see myself picking it instead of go. It's much more expressive than go and it's mature enough. Cross compilation story isn't as great though. Haven't tried asBetterC. I was under the impression this corner of D isn't as mature as D proper.
Java, IMHO, has limited uses, in the sense that making distributable binaries for client side usage (e.g. CLIs) is quite a hassle, and the need for a JRE makes it quite a bit bulkier than a C or go thing.
I'm partial to Nim and Crystal (and V). They aren't as appealing to me personally because they feel complex compared to C and go (and zig). If the appeal of these languages is "a better golang", I think D has beat them to the punch. Also, again, cross compilation tends to be a bit of a deal breaker for me.
Another language that seem interesting in this space is Odin, but I could not for the life of me find stdlib docs for it.
Another seemingly interesting language is Beef, which has a sort of C# feel (a good thing in my books), but unfortunately I haven't had a chance to play around with it yet.
Cosmopolitan is cool, but is just C, and younger than even zig.
> If you can use a GCd language like Go why would you go with Rust
Memory leaks are possible in GCd languages and when they happens, sometimes you cannot fix them easily / properly because of architectural problems. Thinking about the lifetime of an object can help with this.
We also still miss something in many languages that specifies whether we can modify or access an object after passing it as a argument to a call, which can can problems not related to memory safety.
(I've played with Rust but I don't practice it daily. I've also played with D but don't practice anymore. I don't know Go.)
Memory leaks in Java are usually not so bad to fix. Maybe some off-heap scenarios could be trickier but IME people are pretty careful if they’re going to the extremes of off-heap.
Memory leaks in enterprise java in my experience are usually things like forgetting to close a resource of some sort or maybe doing something silly like continually growing a set or map with duplicate objects because of faulty equals() or hashCode(). Both of these cases are well covered by static analysis.
Maybe the trickiest but still relatively common scenario is a design issue where a weak reference is required - i guess that fits parent description pretty well.
> 1: At work a frequent issue in C++ is UAF a pointer to a stack variable which outlives the function, Zig don't help you here..
Sorry but if this sort of UAF is actually frequent, I would hazard a guess that your coworkers would struggle in pretty much any language? RAII-based lifetime management really isn't that difficult, and the type of bug you are referring to isn't even subtle.
I disagree, as there are lots of ways it can manifest in C++. To list a few examples:
- Implicit lambda capture where you use & (admittedly often out of laziness).
- A string_view constructed from an accidental string copy (e.g. if your function parameter is a const std::string instead of a const std::string&, or if you write for (auto foo : v) { ... }).
- A callback that references some member variables of some object on the stack, which usually completes before the function returns (but maybe you forget to synchronize in an edge case).
Tooling can help identify a number of these issues, but it's not perfect. And a number of these issues are very much C++-specific.
If you can make some mistake, it's only a matter of time until someone will make it. This scales with the codebase size. And you only need one person to make the mistake and another to miss it in review - it doesn't matter if everyone else is a 40 years of experience coder.
"Nobody would write that type of code, and if they did, you should just get better programmers" is such a common theme in these discussions.
... yet every time time I've been told "you don't need Rust, just use C++", I can point to at least one UAF in their own code, and have even pointed to CVEs on Mitre for their own projects!
People make mistakes. Use tools that stop these mistakes, rather than helping you to "write code faster" like in the article (aka writing bugs faster).
Most surveys place the use of such tooling around 11%, which is why all major OS vendors are pushing for hardware memory tagging, as by then is no way to avoid using them.
I agree and don't understand why this would be controversial.
There are plenty of aspects of programming that need systematic solutions instead of just telling people to get better, but returning pointers to the stack should not be high on that list.
If people are doing that, it means they don't understand what they are doing or aren't thinking about what they are writing, probably both. It is much easier to return regular data structures.
In modern C++ it is rare that a raw pointer should be returned in the first place. I wouldn't be sure what to think if I had a team full of people consistently and frequently returning pointers to the stack frame of the function they came from.
It doesn’t happen very often but each time there was a new developer on the team they made such mistake, that’s a way to teach them about valgrind (what a wonderful software!) I guess.
We work in a ‘low latency’ domain where heap allocation is avoided, this may explain that it happens more often to us.
Why would avoiding heap allocation explain it? Why is anyone getting the address of a stack allocated variable in the first place since it already in scope? What are people returning pointers from functions at all?
>I would hazard a guess that your coworkers would struggle in pretty much any language?
Uh? This issue wouldn't happen in Java because everything is heap(GC) allocated.
Which is probably why developers new to C++ have this issue.
Walter Bright said below that the D compiler find most of these issue at compile time, well that's nice for D but unfortunately that isn't the case for C++ at least not for gcc9.
Doing what D does to detect references to expired stack frames would require some restructuring of C++'s semantics, which seems unlikely. For example, D can detect this error:
Yep, ‘-O2 or above’ we used to compile our UTs without optimizations because otherwise it takes too long, but now we have an additional(less frequent) CI job for UT with optimizations to benefit from these improved error detection.
Yes, sorry if my comment isn't clear. D is very capable in this regard. While gcc with C/C++ code will spot a few trivial cases... including the one in the example above.
The whole point of his experiment was to learn about the ergonomics of these languages, to investigate the hype. If you did the experiment and say, found that Rust was super easy, then you could avoid GC and get some free efficiency. If you found that Rust was hard, then you know only to use it when you NEED the efficiency.
When I was learning Go years ago, I decided to go through and speed up some of the Benchmark Game's Go code. Some were on the order of 20x speedup if I remember right.
Of course, the maintainer of the site rejected it because of what I perceive as a clear bias against the language, but the point is...it's easy to fix hotspots in Go.
This was in 2013. My first submission was using memory pools for binary trees. It was rejected for using memory pools, even though the C version quite literally used mempool. I even redid it to use a 'third party' mempool library, rejected for the same reason.
Thanks for the links, though I'm unsure what the second link is for.
I hope my assessment is 'fair.' I used to feel more strongly than I do now, I guess.
It's a weird situation. The rules say 'dont write your own memory pools', but new languages wouldn't have a popular library available. Seems it should be 'dont use them at all', or allow them for all. This puts any new language at an immediate disadvantage.
Anyhow, thanks for keeping it going so long. Fun little site.
"Go was born out of frustration with existing languages … To meet these goals required addressing a number of linguistic issues: an expressive but lightweight type system; concurrency and garbage collection …"
Why can't GC coexist with memory pools? They both have good uses.
It's pretty clear that Go designers wanted a GC, but also wanted control over memory. In go, it is trivial to create a slice of structs that's contiguous in memory, and built-ins like 'copy' means the language designers wanted users to use this feature.
When I write go, GC is used almost always. Except when it becomes a bottleneck, at which point I trivially switch to memory pools. I just add a comment like "preallocate to avoid allocating in a loop", and move on. The next person that comes by my code would have no trouble understanding what happened there--unlike Java where using memory pools stops looking like Java.
I think when the Go FAQ mentions existing pain points, it means both GC'ed and Non-GC'ed languages.
Zig is cool because it went the manual layout route, but with a pluggable allocator. I think it's a brilliant choice, especially for embedded.
Yep, Go program does allocating/deallocating in the loop, while the equivalent Rust program does not. Moreover, Rust program compiled with O3, LTO, and PGO. Go version: 75 us, Rust: 15 us.
Because you're ignoring a lot of other reasons to use Go or Rust.
Go is simple, has very fast compile times, produces a single static binary, easy trivial to cross-compile, has very short GC pauses, has a fantastic standard library. The same cannot be said for those other languages.
Rust has a very strong type system that makes lots of runtime errors into compile time errors, it's basically as fast a language as you can get, except for GUIs it has a really good library ecosystem (way better than all the languages you mentioned except maybe Java), it produces a static binary, cross-compilation is relatively easy (as long as you aren't trying to cross compile to Mac).
I belive the "no outlasting pointers to stack variables" is a very hard problem but it is in the Zig issues list and it will probably be worked on: https://github.com/ziglang/zig/issues/5725
It's a bit early to say what Zig does and doesn't do. Memory safety aspects is still under development, and there seems to be a lot of focus on making it safer. But it might be a model where you focus on detecting errors in debug/safe-builds during development, but release fast builds using less safe code generation and allocators.
Zig will probably never be as safe as a GCed language or Rust in this regard. But that's because it would go against other core design goals.
I feel like the future of non-GC languages is Zig as better C (dead simple and explicit language), and Rust as better C++ (complex language with more memory management facilities, etc).
The choice of languages was a bit odd. But I think Go is still in C's territory when it comes to being simple and explicit. Memory management (GC/not-GC) isn't necessarily the most important distinction in programming languages.
What would "simple and explicit" even mean if somehow it includes "Oh, sometimes, maybe, I'll incur the expense of garbage collection in the middle of this other algorithm running and it'll take much longer than usual" ?
I don't care about that in a Python program, but I can't see why I would accept it from something that's supposed to replace C.
I also think you're imagining a lot of hard work during execution in Rust that just isn't there, most of the tricky stuff happens at compile time. If you can prove you really don't need the safety that does have runtime implications, you can choose not to have it, but of course you probably don't have proof, just the same gut instinct that usually gets us into trouble in other languages.
An attempt in Nim [1]. Something like 7 minutes, but yeah..this is even more crazy dependent upon dev familiarity with the target language/stdlib and their IDE/editor (aka unscientific) than the usual attempted measurements.
GCed language style just seem so messy to me. I like manual memory management. It forces you to think about lifetimes and encourages better design. Does an object persist or is it temporary? Maybe it's a good idea to have well defined state. GC languages encourage lousy thinking and design. GC pauses are still a thing as well.
I agree with you. I am doing a lot of coding in Go for my multiplatform home projects and C/C++ at work 8 hours per day.
Go brings faster results due to the really great and easy to integrate 3rd party libraries. There is literally no way that I would go for such a large number of third party dependencies with any C++ software - it is just too much work to prepare the libraries for use.
The amount of thinking to make Go programs run fast and with low memory fingerprint is about the same as C++ with a huge minus that you dont have a delete/free keyword that designates "now the memory is de-allocated". With go, you need to think hard, I would prefer GC language that has a deallocation keyword, so I can forget about everything I dont care about but still deallocate what I want in specific moment.
For me personally, the hugest difference of Go against C/C++ is the ecosystem and how simple is to use 3rd party libraries/modules/whatever.
Are you actually using new and delete outside your own data structures? I have barely used new or delete for years, I don't even use them when building data structures - the vast majority of the time I make compound data structures out of what is in the standard library. It is only when memory layout needs to be very specific that I use new and even then it is placement new.
You don't need to use delete explicitly to control the lifetime of objects in C++. A properly placed "}" or a call to something like .release() can be just as effective. The problem is, there is simply no way to do it in a GC language.
There are a lot of sweeping unsubstantiated assertions here, and unclearly defined ideas. What is lousy thinking and design? What are the practical downsides of it in the program? Given how many exploits are caused by memory mistakes in C and C++, maybe using those is actually the lousy thinking?
This is irrelevant, if you give F1 racing car to an enthusiastic casual driver, you will get a wracked car and few dead in matter of hours. And should I remember you how many exploits we had due to simple sql injection? The last buffer overrun, stack overflow, that I did was like 5 years back and wasn't remotely exploitable. But I agree that not making stupid mistakes takes years to master (in my case 20+).
The problem is not in exploits, the problem is in price of people having enough knowledge to not do stupid mistakes and every company wants developer that pays him nothing (or as little as possible) and gets everything or close to it (open source(tm); I want to build a house. I am still searching for an idiot that would make it for "recognition").
And as such they search for most fool proof technology where they could replace expensive and knowledgeable developers with, ideally, street bums that would work for food. And most of the software we are using today is like that: made from street bums.
The difference between 1990s software and today is too obvious to not notice the difference, even the "great breakthroughs" and new hypes are just xeroxing what we already had.
Formula 1 is on the bleeding edge of technology, which C is resolutely not. If you're really into car analogies a better one would be racing a Ford T, which is indeed just as unsafe.
You are doing an unfair comparing of c/c++ with the availability of 3rd party libraries. There is just nothing wrong with c/c++. But you cant press alt+enter to get a third party library. And this is the only disadvantage for someone that is able to do c or c++.
C, CAN be a bleading edge. Depends on who is using it., based on a simple fact that there is nothing you cant do.
Except for the amount of money the companies need to pay for skilled c/c++ developer against the <name your poison> developer.
I think that in contrast of common belief, the difference in languages has nothing to do with benchmarks. But it has everything to do with price/performance. And performance is not important today (`I'll just start another "cloud" instance whatever the cost`).
So we are getting to the price. 10 monkeys for the price of one "real" developer. Why? I was reluctant to use java/c# software due to a fact that it was mostly a crap (and still is). Now I am forced to use it as there are no alternatives. The software is just another thing where the quality is non relevant to the earning. So we eat shit.
Wait, we can also do that kind of discussion in the opposite direction:
<dont_take_this_too_serious>
I think that most C programmers are just programming monkeys. People who enjoy solving sudokus^H^H^H^H^H^H^H pointer arithmetics and format string riddles and think that incrementing a variable in the middle of a statement is the best thing since sliced bread. Programmers, who do not have the necessary skills to become engineers - the latters needed to build modern systems. I think it's good that languages like C exist, so we can keep those people busy and let the more capable people work on what makes the real difference when it comes to robustness, performance, and integrity: algorithms, data structures, and software architecture. Things that require abstract thinking beyond "Where do I store my pointer so I don't forget to free that struct later?".
/* I have noticed people are downvoting. Face the reality, it is not my fault the world is as it is (but it is our fault if it stays like that). Dont shoot the messenger, this is something that all developers, regardless of language should be aware of. */
Sure and I agree. The companies have tried to use the monkeys for C development, but it didn't work. There is just too much to be aware of. All the next goals were to find a technology that would still keep "low" paid monkeys but prevent them doing stupid mistakes and train them as fast as possible to produce some results, whatever the results were.
And we got the winners that we today call java, js, c#, ruby, php, python,... and the devices that are using extremely capable cpus with vast amounts of memory and "diskspace" to produce inferior results.
It is not the point that any of those languages is faster (I love those benchmarks comparing speed of language that is written in c/c++ with c/c++), that any of those is better.
The point is that they allow companies to pay less to the developers for the same effect with a loss on speed, memory and all the other handy dandy things that can be bought on (and dumped to) consumer side.
It was never about speed, memory footprint, syntax. It was always about paying software developer the least amount possible.
And we actually helped, open source(tm) anyone? Or in other words, working for free for no benefit but dubious recognition that no one but ourselves care about?
I wonder what is any other branch of industry that relies so heavily on recognition instead of payment.
As I have mentioned, I want to build a house, please recommend to me anyone prepared to build it for free for "recognition" instead of payment...
50 years is not a good age for a bottle of wine. There are wines that are still enjoyable after such long time, but will hardly express the same potential they had at the peak of their maturity. Many other bottles will be already totally gone after 50 years.
Difficult to keep it from just being super oaky at that age, but a great distiller willing to throw away a lot of bad casks can make something amazing.
'?' evidently represents "any character" per the !glob("h?i", "hi") test.
But, if I add a test for glob("???", "hi"), that succeeds unexpectedly even though it should run out of input.
I believe it's because the '?' case checks nt < text.length() (which AIUI is for consuming variable input for '*') instead of t. Changing that case to t < text.length() seems to fix that problem.
(This looks like it's the case for all four languages, as the glob algorithm looks identical in all four, modulo syntax differences)
I think there is definitely merit in these kinds of posts.
This developers measure of productivity is "how low level can I go" clearly; they declared zig was clear despite saying this:
> The lack of string handling routines in the stdlib was unexpected, to concatenate strings one has to do everything manually - allocate the buffer, put strings there. Or use formatter and an allocator to print both strings side by side and free the buffer afterwards.
This combined with "strings" being byte buffers like in C are not a good thing. I had a quick look and the author appears to be german; how do you handle an umlaut in zig?
His dismissal of C++ as "not having a build system" seems like considering the language without the ecosystem. Using cmake and llvm solves all of his gripes on C++.
Finally, this post measures the productivity of a single developer, On a 1-3 file project. How these projects work with 2-5 developers, or on projects that support more than the anglosphere are very important.
> His dismissal of C++ as "not having a build system" seems like considering the language without the ecosystem. Using cmake and llvm solves all of his gripes on C++.
Last week I debugged an issue in a big C++ codebase, where we were getting memory corruption. Source of the issue? Compiler, linker and build system being separate things and only compiler understanding types (also, lack of modules, which is another symptom of build system being separate from the language). Namely, conditional compilation was involved, someone made a struct which had a field in it based on whether a preprocessor constant was defined or not (passed with -D to the compiler on the command line). Turned out some files were compiled with the define and some not, so some files were compiled based on the assumption that this struct is N bytes long, and some with the assumption that this struct is N+M bytes long. But the linker doesn't have any information available to it to detect this error.
And then there is the issue that to write a C++ program these days, it doesn't suffice that you just know C++. Languages you need to know:
- C++
- language of your build system, which is always handicapped, yet Turing-complete
- Python
- if something goes wrong, and you need to debug something: Make with its custom shell-language
- shell
It's a tower of babel. For all the whining on users of other languages for using libraries, C++ users have a lot of complexity hidden in their build systems.
In Zig and the not-yet-released Jai, you only need to know the language you write your program in. The "if" in your language of choice is perfectly suitable for expressing conditional compilation. Structs in your language of choice are perfectly suitable for expressing declarative configuration.
Yep been there done that two months ago, took me two days (at least) to find the issue as a corruption which happen in -O3 but not in debug build which neither valgrind nor ASAN can find is pretty mysterious..
Unless you left out some details, this sounds more like plain UB and buggy c++ code manifesting itself with different symptoms depending on the O level.
This has not as much to do with the loosely coupled build system complexity you have to scaffold in a c++ project, more issues in the core language itself.
I won't go into detail but this was related to macros and separated compilations: if you're not careful with macros, the 'same' object can have different size in different .cpp..
Linux kernel has a lot of these compile time optional struct members.
I think they avoid this issue by using a config header file that contains the defines, so the compilation unit either compiles with the config header, or if it's not included and config macro is used, compilation fails altogether.
It's still possible to include different config header than you're expecting, though. But that's easier to defend against. (you can just include a config header directly via cmd line without using #include <...> in each compilation unit)
It's also better to use #if MACRO = 1 instead of #ifdef MACRO, because the first one will fail if MACRO is not defined.
>Namely, conditional compilation was involved, someone made a struct which had a field in it based on whether a preprocessor constant was defined or not
No offense but this just sounds like bad design. Haskell has the C preprocessor so technically that particular criticism would apply to it too, but nobody would say it's Haskell's problem just because in theory people could use this feature to write terrible code. (I understand maybe this approach was necessary for interfacting with some other vendor/legacy code somewhere, but in that case the fundamental issue is still bad code, not C++: the bad vendor/legacy code).
I'm not sure what you mean by that. Are you saying you can use #ifdef, #include etc. in Haskell code?
> No offense but this just sounds like bad design.
I agree. However that's just one way you get multiple definitions for the same struct. I think a language should detect such an error. It will detect it for functions. If linkers did not detect multiple function definitions, would you shrug it off by saying "having multiple function definitions is bad design"? (I'm asking in a non-accusatory way. I'm not sure how to word the question more gently.)
Many of us don't have the luxury of choosing the quality of code we're employed to debug. It would be nice if languages had some rudimentary sanity-checks to, well, preserve our sanity.
>Many of us don't have the luxury of choosing the quality of code we're employed to debug. It would be nice if languages had some rudimentary sanity-checks to, well, preserve our sanity.
I agree, I don't think C++ is especially bad in this regard. If people are going to write bad code, they can write it just as well in Java, C#, Python etc., they'll just use different language features to bring about the horrors. No language (apart from maybe Go, in its extreme simplicity) can save us from having to debug bad code.
I've seen this sort of thing all the time. Not the part about different preprocessor defines in different files, but about having a struct's contents depend on some compile time conditions. It comes up when you want to track extra debug information but only for debug builds. It comes up if you need to store platform specific information in a common struct. It can come up if you have a major feature that you want to toggle on/off at compile time. Etc.
Regarding the string type in Zig, in case some people are interested by this you can read those two threads from the Zig github repository to understand a bit better previous discussions (I would recommend reading them, they are both interesting):
> This combined with "strings" being byte buffers like in C are not a good thing.
The problem of C strings is that they are zero-terminated, not that they are byte buffers. Strings in Zig are foremost slice types (pointer/size pairs), but for C compatibility, Zig also has the concept of 'sentinel terminated arrays':
I use cmake for my C and C++ stuff too because it's the defacto standard, but Rust's cargo or Zig's integrated build system are on a whole different level.
The downside of a single string type built into the standard library or even into the language is that it needs to be both flexible and performant, and in my experience, those two things exclude each other (e.g. how do you append two high-level strings like "str1 + str2" without hidden allocation?).
Apart from that, Zig's string processing functions are not that bad, what's unexpected is that they are in the memory-operations part of the standard library, not in a specific "string module" (after all, the Zig compiler is (being) written in Zig, and the parser needs to do a lot of string-munching, so I guess we'll get more powerful string processing helper functions in the standard library over time).
But if I need to do lots of high-level string processing I probably wouldn't chose a systems programming language to begin with, languages like Python are a better match for this.
"Handle" how? If you're just shuffling bytes around, you don't need to worry about it. If you need to iterate through by codepoints, there's stdlib Utf8View module, though I understand this aspect of the standard library is set to change before 1.0.
The fact that multi-byte codepoints don't have bytes that could be ASCII gets you decently far for common things like splitting a string at a newline, or otherwise matching against a char.
edit: actually, were you talking about the code in the post? Looking at it now, I see it's buggy in this regard, if you're wanting a ? to match ü. I think basically what you'd do is in the branch corresponding to the ? you could use the stdlibs's utf8ByteSequenceLength function to skip the text ahead that many bytes, while advancing the pattern by just the one byte.
This is exactly my painpoint with C++ too. With Rust and such I'm one simple command line call away from installing a dependency. With C++ every dependency is a battle.
There is conan or vcpkg. I do wish crates.io and julia would make it easier to develop on non internet machines. The ease of dependency management has bred staggeringly deep dependency trees for rust and julia. If I could run a cargo command to get a human readable dependency list that I could then go to an internet facing machine and run a cargo command to put into a tgz to drag back over that would be helpful. Is there any progress on making a crates mirror for offline networks?
I think maybe I don't understand vendor. I thought you had to have your project on the internet facing machine to vendor its dependencies. Which I cannot do.
Likewise, I don’t understand the disconnect. You have to have your list of dependencies on an internet connected machine, yes, but you already allowed that. You said:
> I could then go to an internet facing machine and run a cargo command
Copy over your Cargo.toml to that machine, run vendor, copy the vendor files back.
Yes, Conan and vcpkg have gotten better lately, but that is a relatively recent development. Just two years ago I still had a hard time finding some common libraries with Conan. It's good that the situation is better now.
Then again, for every project you have to then setup vcpkg and cmake. I thoroughly enjoy that Rust comes with a build system, package manager and test framework out-of-the-box. But it is good that the situation with C++ is getting better.
Yep no arguments there. It's not perfect, but it's nowhere near as bad as the article makes it out to be. Also, you only set up a project once, but it's compiled and iterated on many many more times. Having some initial friction isn't ideal but it's better to solve for the real problems which are maintaining a project.
Cmake is pretty good, but you're right it's no dependency manager. And dependency management us definitely tough in c++. There's vcpkg and conan though which are huge improvements over the status quo.
>This combined with "strings" being byte buffers like in C are not a good thing.
It depends what the problem you're facing with. If the problem is "people writing C++ code with terrible latency characteristics because they don't understand how it works underneath and the stdlib doesn't even provide a way to efficently concatenate strings (i.e. by using a buffer, not allocating new memory)", Zig's approach is a great thing.
I've had excellent experience with cmake over the last 10 years. It's no cargo, that's for sure, but it gives you a crossplatform cross toolchain extensible reliable build system
Llvm was specifically in reference to the "tooling" comment in the blog post. Clang format and clang-tidy(which conveniently has a neat integration with cmake!)
I'm curious about this. It seemed to be alot of 'work' just to get arguments. While I could read the code with no prior experience, there were enough unexpectedly verbos things in there that suggested it could take a considerable time to learn
No task is too easy. For easy, we need to jump down to Python. Even C is too easy to learn but difficult to write.
Zig is kind of in between Go and Rust in terms of the learning curve. In terms of programming, it is in between C and Rust. But it is simple enough. The documentation as I said for the stdlib isnt that great right now as I said.
I found the cxx example very easy to read, which surprised me, it was not thst verbose. The D example another poster has in this thread was the easiest and cleanest IMHO.
While I agree that Zig is in between Go and Rust, you're gonna find a lot of crazy stuff that will tie your brain in knots with Go.
A few examples:
- Json parsing being special cased into the language
- multiplying a user-input integer with a time value to be a timeout
- channels and goroutines (sorry, I've been spoiled by erlang)
Aside from things that are actually bugs, I haven't found anything like this in Zig. On the contrary, I typically have found something that doesn't work how I expect, and then I think about it, and it's "of course it's like this, why isn't C like this??".
IIRC there's a std.os.args or some such that is just a slice of byte arrays in unix platforms. The iterator API builds cross-platform support on top of it, as I understand. In my books, that's acceptable for an actually-low-level language.
> The lack of string handling routines in the stdlib was unexpected, to concatenate strings one has to do everything manually - allocate the buffer, put strings there
Not sure how long the author spent looking at zig stdlib, but there is std.mem.join for this. Most string utils are under std.mem, it seems. To be fair, I also rolled my own std.mem.eql for string comparison before stumbling upon it as a beginner, so this definitely an area for improvement.
Overall I agree with the assessment about stdlib docs. Many things are not even listed in docs, and many that are have little to no description. Also some of type signatures are not really clear, e.g. when they use `var` as a type.
What works for me is having stdlib docs and source code from github side by side and jumping back and forth between them.
But overall, as someone who's gone through a similar language evaluation process, his assessment feels pretty on point: zig's lack of maturity still shows in a lot of places, Rust feels complicated and puzzling at the beginning, etc. My only difference is I'd lean towards C if I had to pick between it and C++, but that's largely a matter of personal preferences IMHO.
The Rust code is very non-idiomatic, which is probably part of the reason why it seemed difficult to put together and why reading it is a bit of a chore. Compared to Zig or Go, naively translating C code into Rust isn't going to work nearly as well. This isn't necessarily because Rust is more complicated (although it definitely is) but has a lot to do with the fact that it's semantics are inspired almost as much by OCaml and Haskell as they are by C and C++. It's not surprising, then, that someone who is predominately a C programmer would need more than a couple hours to be able to really judge the difference between the languages.
Honestly I don't understand the complaints about the readability of this Rust code. I'm new to all three languages here, and I find it pretty readable. Maybe it was harder to write in Rust, I don't know, but in terms of readability the Zig and Rust code look pretty similar, and Go is the one that stands out as somewhat hard to parse and understand.
Nothing looks necessarily wrong to me but there are some elements that aren't exactly how this code would normally be written. Like that they don't take advantage of Iterator::enumerate() which makes the the for loop more awkward than necessary. Or how strings are treated as arrays of bytes (rather than sequences of unicode characters as the language pressures you to).
Yeah. This code isn’t bad, but it’s not what I’d write. The largest difference is that, if given this problem, I would start with walkdir (and maybe glob), rather than trying to write all of this myself. Not to mention Cargo instead of a Makefile. The loops also feel very C-like, but that might still be the best you can do. I'd have to actually give it a try, but I feel like iterators could clean this up, as you say.
To reiterate: that doesn’t mean this is bad or wrong. It might be a bit harder or more complex than it has to be. But also, you have to know that these things exist in the first place, which is it’s own challenge.
Honestly Rust is pretty expressive. There's always some things that you are going to run into. But mostly Rust is a really nice language and environment to work in.
>All had about the same performance when scanning through my whole /usr/include file tree. That’s why I wanted to highlight that technical characteristics are often not as important as the developer experience.
Strange conclusion considering their benchmark was I/O-bound by design
im worried about something managed to actually replace C, now every platform provides their lowest level sdk in C which is super nice, if you want to do cross platform native dev like games and ui, just using C can get you anywhere, i dont't want it to be like graphics where better tech came out but each vendor uses their own specs (OpenGL everywhere -> Vulkan / Metal / D3d / WebGPU) and makes it impossible to do actual cross-platform dev unless you can put up with some HAL bloat
I am really surprised that D's --betterC subset (or just D proper) wasn't even tested, considering that's exactly what it is called and would likely outrank a good bit of everything else on the list.
>They often complain the syntax is unclear and requires paying attention to details.
Attention to detail comes with the territory for programming roles. People who can't or don't want to pay attention to details are not fit to be programmers.
We are working with precise binary digital machines, FFS. I sometimes find this attitude among students of my training courses - the misconception "computer, do what I mean, not what I say". Obviously such students don't do well in my course or in their career.
> With no previous experience I opened vim and started coding.
That is where i stopped reading. If you're not using an IDE in 2021, you're not working on serious software/big codebases/commercial projects and what you have to say is not something I'm interested in, ESPECIALLY when talking about UX of programming! VIM, oh please... time to have a nap, grandpa.
We've banned this account for repeatedly violating the site guidelines. Please don't create accounts to do that with. It's not what this site is for.
If you want to read https://news.ycombinator.com/newsguidelines.html and commit to using HN as intended, you're welcome to email hn@ycombinator.com and we'll be happy to unban you. Otherwise not.
You need to learn how to code without crutches. IDEs are great, and people should definitely use them, but I’ve met too many programmers who just are lost without them. The worst example was a guy who couldn’t even navigate his own small program except with the debugger. Even though he wrote this small program himself, he didn’t managed to have a mental image of the call stack.
So, some of you who rely on the IDE all the time maybe need to do without it a little to train some muscles that have atrophied.
I work on some serious commercial big and old code base using Eclipse and NeoVim every time that I touch XML, SCSS or JavaScript files. Correctly configurators, VIM/NeoVIM it's better IDE for CSS/SCSS and JavaScript that Eclipse
Have you never written a bug in your life? Should we not believe your conclusions too?
> His pretty buggy code infinite loops if pattern == "". Does anyone truly believe his conclusions, if he evaluates programming languags according to how well they execute buggy code? -systemBuilder
I dunno. One could say that C string handling is "simple" (it's just byte buffers!) or that manual loops are "simpler" than proper iterator abstractions, but those are both constructs that are hideously error-prone in practice.
Perhaps it's the other way around and manual buffer management is actually the complex thing, since your application logic will have to involve a lot of code doing toil that isn't directly related to what you want to accomplish.
EDIT:
To expand a bit, I'm a huge proponent of simplicity; I think overabstraction is a massive problem with software nowadays. However, my thinking is that the problem isn't so much with the abstractions themselves, but that they aren't transparent. ORMs are a frequent offender here: a good abstraction wouldn't allow you to perform a thousand database queries by accident just by looping over a list. You could still model a list of database-backed persistent objects as a list, but any operation that hits the database should be an explicit fetch, not an implicit one.
Are manual loops, the kind that that can be replaced by iterators, really "hideously error-prone"? Don't get me wrong, I'll take the iterator approach every day, because the index is simply complexity I don't need, but I've never seen issues with the basic for(int i...). Off by one errors tend to happen when you start doing index arithmetic, and iterator are not going to help you there.
basic loops certainly don't go wrong all that often, but once you get to more difficult requirements, the iterator abstraction really helps.
Maybe you need to iterate over two differently sized datastructures in lockstep, or over a complex nested structure, or accumulate and filter the values, or load the data into a temporary buffer that gets flushed and reused once it fills up, or maybe your data source is infinite, or all of the above.
These are cases where using iterators is significantly safer and notably, more composable.
Unsophisticated engineering solutions are not always the best. Modern CPUs are monstrously complex, for example. Modern airliners are incredibly complex. Simple solutions would be woefully uncompetitive.
In the domain of programming languages, minimalistic low-level languages like assembly, C, and Forth, tend to be unsafe. They enable categories of serious bugs that never occur in safe languages. Modern garbage collectors, for example, are very complex and very valuable.
One of the reasons Safe Rust is so compelling is that it seems to succeed in offering safety, C-like performance, and good developer ergonomics. Prior to Safe Rust you had to pick two: C and C++ lack safety, Java/C# lack performance, and formal methods are hard to use (at least in the current state of the art).
Safe Rust is not a simple language. It's not possible to achieve what Safe Rust does without sophistication far beyond that of C.
> Modern CPUs are monstrously complex, for example.
But not all of that complexity is good. For example, things like branch-prediction vulnerabilities exist as a result of that complexity, which makes it extremely difficult to reason about the entire system and predict which interacting systems might lead to security issues.
And it looks as if some of the complexity of the x86_64 instruction set is a liability with respect to performance compared to "simpler" instruction sets like ARM.
Also with regard to Rust, the more that I write, the more that I wonder if some of that complexity is due to unsolved design problems in the core of the language. For instance, once you start introducing data structures with references inside, this starts to introduce all kinds of complexity, which baloons when these kinds of types start interacting with other systems like async.
I am not expert enough to know what the solution would be, or to know if these types of problems really could be avoided, but Rust sometimes feels like a language which is missing that one key piece which would keep all of this complexity under control.
> And it looks as if some of the complexity of the x86_64 instruction set is a liability with respect to performance compared to "simpler" instruction sets like ARM.
I don't think any of the issues with Intel's x86-64 chips are due to the instruction-set, they're due to the chips' internal architectures. ARM CPUs are by no means 'safe by nature'.
> the more that I write, the more that I wonder if some of that complexity is due to unsolved design problems in the core of the language
> Rust sometimes feels like a language which is missing that one key piece which would keep all of this complexity under control.
I'm reminded of a quote [0] from Bjarne Stroustrup, creator of C++:
> Within C++, there is a much smaller and cleaner language struggling to get out.
That sounds incredibly complex and hard to maintain. What you need is a Docker container for every method, for isolation, then just use K8s to orchestrate them and YAML for control flow.
People have completely different and non-compatible definition of "a simple tool" and "simple solution". It's often more about familiarity than anything else.
I didn't like that C itself was missing, because C is still my language of choice for some tasks and I still use it because I'm happy about it, I don't feel I would need any of its competitors. Maybe if this benchmark included C it would be some argument to give them a try. So I decided to do it for you.
It took me 30 minutes, only change is that I don't check if file is binary, I guess. I liked that I was doing it in C because I know it and I'm comfortable with it. Few checks in man and everything's clear. Only disadvantage of C is that it doesn't have standard API for directories, so my code is only for POSIX. Here's the code: https://github.com/BartekLew/dumbgrep
The other thing are sources you've done. They are almost the same (it'd be nice to see the more feeling of the language, idiomatic code, etc.) and I didn't like reading them at all, not a clean code and all are longer than my C code.
Another thing is that C has very narrow application nowadays. A benchmark should be focussed on them and not on generic algorithms.