> A type-safe C successor that compiles directly to x86_64 assembly.
...
> Warning
> There is no safety measure regarding string literal manipulation. Doing this will most probably result in a segmentation fault.
# String literal (int8*)
"abc"
# Character literal (int8)
'c'
# Undefined behaviour
"abc" at 0 = 'd'
This doesn't seem very type safe, at least not yet.
This is a cool project, though! I have been wanting to learn to write compilers. They're the coolest things.
Also, this is a side note, but what I actually hope is the successor to C is higher level languages having the option to restrict oneself to a low-level variant that can manipulate pointers, and/or let you embed assembly under macros and special functions, like this old blog shows how to do for Lisp[1]. It would be cool if more languages let you do stuff like that.
A killer feature of C is its simplicity. It is so simple that there are dozens, probably hundreds of independent C compilers and C standard libraries.
Rust has no formal specification and no independent implementations even close to complete. gccrs is still struggling to compile the standard library after years of development by some of the smartest compiler writers in the world.
Rust is a successor to C++, not C. Rust may be a lot of things but I don't believe it will ever be a successor to C.
> Rust is a successor to C++, not C. Rust may be a lot of things but I don't believe it will ever be a successor to C.
Well, the Linux kernel which is C and ASM, rejected C++ (for reasons although not technically because of the language itself so much) and it is adopting Rust. Rust is a language & a runtime. You can write C-like code in Rust and C++-like code in Rust or mix'n'match however you like (you can even get close to ML style languages).
> Rust has no formal specification and no independent implementations even close to complete
Rust has plenty of formal specifications. Their whole process is to formally specify how a thing works & then implement it against that spec.
It's not standardized by an external body / doesn't have a snapshot of the standards that are relevant to implement for a given edition. I view standardizing a language as an anti-pattern for similar reasons that attempts to do that with human languages fails - it ossifies things whereas languages really need to continue to grow and evolve (+ standards bodies become their own weird little fiefdoms that often have nothing to do with achieving the goals they're nominally supposed to be focusing on).
Same goes for independent implementations. It's fine for C because C as a language is a very simple thing and hasn't really changed in 30+ years and doesn't really have a runtime (most of the things people call C is actually POSIX).
It has not worked out well for C++ which has stagnated quite badly, having to implement the same feature 3+ times and having representatives from each arguing against features that are more difficult to inject into their architecture and for features that are easier is part of the reason. There are positives in that errata can be found more likely when you implement the same thing 3 times, but it's not clear to me that it's net better than just implementing a thing in the nightly release chain & letting it bake until it's ready.
Having a single frontend simplifies a lot of things. gccrs struggling is a positive thing as far as I'm concerned - it means that the Rust language continues to evolve and isn't concerning itself with needing to support a fork of the frontend.
Rust is too complex and clunky to ever gain widespread acceptance outside its niches.
It's been a decade already, us recovering and current zealots we all need to wake up and smell the roses.
For system programming, treating memory as a critical resource with strictly traced ownership is a good design. For the other 97% of programing, memory needs to be there when you need it, to be plentiful and to get out of your way and not be a source of runtime bugs or compiler friction. Slight inefficiency is acceptable.
> Rust is too complex and clunky to ever gain widespread acceptance outside its niches.
Rust is a systems programming language (it didn't start out that way but is where it pivoted to) so I think we agree on that niche. Most people don't work on problems that Rust would help with, but the places it's suited for is seeing more Rust adoption, not less and those niches are quite large. For example, Linux kernel, Android, Windows kernel, Chrome, etc etc. Basically any C/C++ codebase, if it really needs to be C/C++, will get migrated and there's a metric fuckton of such code. There's probably more JS, Java, Python, Ruby, etc but those programs don't need to be migrated to Rust unless they need more performance/multithreading and can benefit from Rust in that way. Rust not making sense for JS, Java, Python, Ruby, etc coders does not make it a failure or a valid claim that it doesn't have "widespread acceptance" - after all those other languages also don't see widespread acceptance outside their niches.
Well, as usual, use whatever language fits your needs best. But Rust's main benefit is not performance, it is safety. There are very few languages in the industry that I feel have my back when it comes to reliability and one of them is Rust.
It's both. It's fast enough to provide an alternative to C and C++ purely due to its performance, even ignoring that it's more enjoyable to write Rust than C and C++, and ignoring that Rust code is reliable. That's why Discord switched from Go to Rust for performance reasons, and why there are Python developers that replaced C with Rust for high-performance Python extension modules.
Sure - and in rust that probably means boxed types everywhere. Maybe the worst feature of Java was the easy access to bare types. For rust it might be a culture thing - but that might change.
I've written some Rust. This isn't by far the first language that aims to replace C. I haven't touched Rust for a while, and now decided that I want to get into Spark (that's Ada's thing). And going through the process of learning a new language that's somewhere close to C, for... I don't know exactly which time now, I realized that:
The unsafe aspects of C is what you pay for comfort. Sometimes, when you write a program you simply don't know what type it's going to be, and the details of that type aren't important right now. You just need the convenience of casting everything to everything or passing "void *" around just to get things done.
Similar to Rust's lifetimes Ada has what it calls "access checks". Roughly, it's a static check that your pointers are always valid. Similar to Rust's lifetimes it's a huge pain to deal with. Sometimes you'd sit in front of your monitor and be like "this code is correct! why don't you just do what I'm asking you to do and let it fail at runtime so I can tell where the problem is?" And then you start to weasel around. You don't spend time actually working on the problem you need to solve, you just typedef this, expand the definition of that, try making something a constant, a variable, try not passing some piece of data -- all kinds of manipulations just to make the checker happy.
All while in C you can just tell it to shut up and move onto the next goal. It's more natural to deal with problems when your code actually fails.
So, in the aftermath: do I like C programs? -- Absolutely no. They are unsafe and will potentially do all sorts of bad things. But do I like writing in a safer language? -- No... not really.
I wish there was a way for humans to write in C and then some magical compiler would read that code, understand what the human actually wanted and produced an Ada or Rust program from it. While there isn't such a thing, maybe the next best thing is to write first in C, and then rewrite it in a safer language?
I'm pretty sure I had seen a C# project years back that jailbroke the iPhone as well, but I don't even remember how it worked, it was probably a decade back.
Not sure how. Zig is targeted as a systems level language but isn't memory safe. It's complex enough in its own way. I don't see it faring well against Rust - it'll have it's proponents and enthusiasts but I doubt it'll ever see the same adoption as Rust (i.e. it'll be closer to what D accomplished trying to supplant C++).
I was looking for concrete issues which violate either memory safety or type safety but not both. However, these examples are clearly both.
Let's look at the null pointer case first: Is this a memory safety issue? Yes, this is definitely a memory safety issue. The actual hardware has no concept of a "null" pointer, it just uses raw addresses - so if we allow this operation typically we're accessing arbitrary memory which happens to have an address corresponding to our "null" pointer. How about type safety? Well we violate type safety if we perform any operation on an object which isn't sanctioned by its type, and dereferencing isn't a sanctioned operation for the null pointer type.
Now how about array bounds? Memory safety is pretty obvious here, this is the classic stack smash beloved of "hackers". And type safety? The bounds miss is not a sanctioned operation for this type.
That's not really how most type systems work. Take array access. The typical typing is: if you have an `Array[A]` and reference an `Int` index you get an element of type `A`.
ref: Array[A] Int => A
This says nothing about what happens if you access an out of bounds element. In a memory safe language this is enforced by a runtime check. Without memory safety you get YOLO semantics like in C.
I don't know what kind of type system you have in mind then. Do you have an example of language where all array types have a way to check, statically, that all indexing operations are in bound? How would that even work? A language without dynamically-sized arrays at all, maybe?
I think you can do this with proof assistants, and with the SPARK version of Ada.
You can also do it with a dependent type system. I haven't used it, but from what I've read, in the ATS programming language, which is dependently typed, you can guarantee that array index accesses and pointer arithmetic never goes out of bounds, at the cost of having to pass proof variables explicitly to various functions.
This poster goes over how to ensure you don't cause buffer overflows in it:
Firstly, it is simply not the case that all constraints must be able to be statically enforced for them to constitute part of the type's definition. I'd guess that's a huge part of the confusion here.
But, yes, actually all WUFFS indexing is statically checked. It's using inference to conclude the range of the index, so if it won't fit that's a static type mismatch.
Firstly, I guess you state these aren't built in yet, so I guess it might change, but I wonder if the lack of unsigned primitive types is intentional? It seems like something I would want to be built in.
Secondly, you have byte as int8 and char as int8 - in reality, either could be unsigned surely? For me, byte is unsigned. As I mostly do C# these days, char is not generally used as a numeric value also. It feels like - if you have byte, char can represent an actual character value. But it feels like not making char multibyte (UTF8 would work) it makes your typing system need to deal with all the WChar nonsense/legacy. Better to just make it UTF8 now? If you don't have byte, I guess you can have char, but that feels like more legacy.
This language calls itself functional - does it support higher order functions? What about closures? Or function composition and (de)currying? Does it have pattern matching like you find in Haskell or ML?
Also, what about C is type unsafe and how does minilang fix it? Would love to have seen some comparisons.
They are apparantly allocated in a non-writable section. Non-const qualified string literals cannot* be modified in C either. The following snippet will cause a segfault:
My guess is that the actual behaviour depends on the runtime environment. On some platforms you might get a segfault, on others it might overwrite the first character.
Yeah I think this is the answer. Maybe it should be obvious, but I had to think it through. Maybe not the best example for the first-exposure documentation in the README?
Well, this looks a bit early to make a strong opinion on the language, but it's always nice to see a new take on low-level-meets-safe! While I'm personally a big fan of Rust, it's clear that there are still many alternative approaches to explore.
The modifiers (pointer, "function taking _ returning _", "array N of", etc.) are prefixed, but they could just as well be postfixed. I think people don't do it because it's more similar to English to make the modifiers prefixed.
C:
int *p;
Prefix:
p: pointer to int;
Postfix:
p: int pointer;
Another example.
C:
int (*fp_arr[5])(int, int);
Prefix:
fp_arr: array 5 of pointer to function(int, int) to int;
Postfix:
fp_arr: int (int, int) pointer array 5;
This is something I was thinking about because some non-English languages use post positions instead of prepositions, and say the verb last instead of in the middle. For example, in English, you could say "He looks at me," and in some other languages, you'd say the equivalent of "He me-at looks". Instead of "I went to the bathroom" it would be the equivalent of "I bathroom-to went." And instead of "the fifth day of the month" it might be something like "month of fifth day", except "of" has the opposite meaning. I'm speaking vaguely because I've only read about such languages from a linguistics perspective, and I don't know them. Specifically, head-initial vs head-final languages: https://en.wikipedia.org/wiki/Branching_(linguistics)
You're correct. Of course the price for this is not only does it cause compiler authors to go bald earlier, it also means that you have natural language-style garden path parsing problems where the meaning of an expression isn't obvious at all and may take several attempts to discern.
And so it also leads to situations where a human thinks "I made a Doodad" but the compiler sees instead "For some reason the human just decided to draw my attention to the fact Doodads exist, which I already knew". In a language like C or C++ where you do need to constantly repeat yourself because it's designed for 1970s compiler technology, warning about this will lead to too many false positives, so it's not done. C++ lock guards are a famous example where this happens - leading to code where authors and even sometimes reviewers believe a lock has been taken properly, but from the compiler's perspective there was never any lock at all so mutual exclusion isn't achieved.
This is mostly an issue with C++ specifically. If you implement the lexer hack[1], C can be parsed using an otherwise context-free grammar.
C++ suffers from requiring arbitrary tokens of lookahead before being able to determine whether something is a declaration or an expression statement, which is a consequence of the combination of C-style declarations, function-like casts, and function-like constructors.
For example, here, the parser cannot know whether the variables x, y, and z are being redeclared in the same scope, or if they're being evaluated and discarded, until it reaches the "new int" part. Last time I tested it, clang++ parsed it correctly, but not g++.
int x, *y, *z;
{
int(x), *y, z, new int;
}
In C++, "int(x)" is equivalent to "(int) x" (and this doesn't exist in C), but as a holdover from C, it also allows redundant parentheses in declarations, so one could also just redeclare x in a new scope like this (in both C and C++):
double x;
{
int(x); /* x is an int now */
x = 5;
}
I think this is a variant of the most vexing parse[2], which your lock guard example also sounds like an example of.
You probably already know about this, but I was surprised by that example even knowing that C++ requires arbitrary tokens of lookahead to parse correctly, because I had only seen the Timekeeper example from Wikipedia.
There's also the problem that it can't parse something like this unless it knows whether it's supposed to be a template instantiation or an expression statement, but I think that's just an extension of the typename/variable problem that it inherited from C.
...
> Warning
> There is no safety measure regarding string literal manipulation. Doing this will most probably result in a segmentation fault.
This doesn't seem very type safe, at least not yet.This is a cool project, though! I have been wanting to learn to write compilers. They're the coolest things.
Also, this is a side note, but what I actually hope is the successor to C is higher level languages having the option to restrict oneself to a low-level variant that can manipulate pointers, and/or let you embed assembly under macros and special functions, like this old blog shows how to do for Lisp[1]. It would be cool if more languages let you do stuff like that.
[1]: https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-co...