
How to make your C codebase Rusty: rewriting keyboard firmware keymap - houqp
https://about.houqp.me/posts/rusty-c/
======
xvilka
Much or this can be automated with c2rust[1][2]. They recently published an
overview[3] of lessons learned and future directions for Rust and C
integration.

[1] [https://c2rust.com](https://c2rust.com)

[2] [https://github.com/immunant/c2rust](https://github.com/immunant/c2rust)

[3]
[https://immunant.com/blog/2019/11/rust2020/](https://immunant.com/blog/2019/11/rust2020/)

~~~
Animats
That doesn't really translate C to Rust, it compiles it to unsafe Rust with
all the weaknesses of the C code. It doesn't even use Rust arrays. You would
not want to maintain what comes out. It turns

    
    
        p[j] = p[j-1];
    

into

    
    
        *p.offset(j as isize) = *p.offset((j - 1 as libc::c_int) as isize);
    

which is unmaintainable.

If you put in

    
    
        void insertion_sort(int const n, int const p[]) 
    

you get

    
    
        pub unsafe extern "C" fn 
            insertion_sort(n: libc::c_int,
                           mut p: *const libc::c_int) 
    

C has no way to say how big p is. (Biggest flaw in the design of C). That info
is essential to translating this properly. The translator needs either a hint
from the user, or inference from examining the calls to the function.

Now, if you could stick something like

    
    
        ASSERT(LENGTHOF(p) == n);
    

which would be ignored by C if ASSERT was defined to ignore it, and the Rust
translator understood that, it could use a Rust array, which has a size. Now
the output Rust code could use subscripts properly.

Generating constraints like that is a hard problem, but providing an
interactive tool which prompts the user on where to insert them is not.

~~~
jeffdavis
"it compiles it to unsafe Rust with all the weaknesses of the C code"

Isn't that the idea?

Sure, it makes it into essentially a blob of uneditable code. But it allows
you to integrate it with rust code and build the whole thing together.

~~~
setr
Not when it's being pitched by gp as an alternative to rewriting such that the
rewrite is the primary code. This particular scenario is one where an opaque
blob is not at all the goal, and thus c2rust is not a valid strategy

~~~
Animats
It seems that part of the goal of c2rust was to retain the C memory layout in
memory. If that's not required, C arrays can be translated to Rust arrays.
Except for some corner cases involving casts and pointer arithmetic, that
should work, and generate much cleaner output code. You can detect those, at
least. For pointer arithmetic, do the pointer arithmetic and convert it to a
subscript before indexing. Then you have bounds checks.

This is a classic problem with computer language conversion - you lose the
idioms of both the input language and the output language in translation.

------
alxmdev
Reading the comments here it seems like I'm not the only one who thinks some
Rust code is pretty syntax-heavy even compared to older systems languages like
C and C++.

I wonder if this is in part to further separate application development from
system development (slowly paving the way for more verbose but formally
verifiable code in core components), and to push more people to write higher
level code instead. I can imagine many reasons for why things would be steered
this way.

Edit: Bad code example, sorry; thanks for the informative reply! I didn't mean
to be unnecessarily negative, just a thought that ran through my head.

~~~
steveklabnik
I can assure you that Rust's syntax is not designed to further separate
application development from system development, if anything, we encourage a
way broader audience to learn Rust than just pure systems programmer folks. In
general, Rust's syntax is intended to be vaguely similar to C/C++/Java, but
with some influence from functional languages where appropriate, and with
syntactic forms that work better with type inference. There is very few
syntactic forms that are truly novel in Rust (even lifetimes are taken from
OCaml's generic syntax, because lifetimes are generics!)

That said,

    
    
      fn my_panic(_info: &core::panic::PanicInfo) -> ! {
    

would be something like

    
    
      [[ noreturn ]] void my_panic(&core::panic::PanicInfo _info) {
    

in C++, which is _very_ similar, syntactically speaking. The main differences
are:

* fn for function declarations

* Rust has the return type after parameter list, and adds ->

* ! is [[ noreturn ]], less punctuation actually!

* no need for the superflous void return type

* "name: type" not "type name", which adds a single :

I am admittedly very biased, but it seems much more similar than different.

~~~
petschge
! might be "less" punctuation than [[ noreturn ]] but is a lot harder to
google. And why do you need the "->"? That is just more line noise.

~~~
steveklabnik
You don’t need to google, you just need to click: [https://doc.rust-
lang.org/stable/book/appendix-02-operators....](https://doc.rust-
lang.org/stable/book/appendix-02-operators.html)

In theory, punctuation is noise. In practice, it helps you understand the
parts of a sentence. Same with code.

~~~
petschge
8 pages of punctuation. So clear and easy to read.

~~~
adrianN
Not that much more than in C++:
[https://en.cppreference.com/w/cpp/language/expressions#Opera...](https://en.cppreference.com/w/cpp/language/expressions#Operators)

------
clktmr
I don't know, to me the C version is more readable. I know for sure that in
the end the C macros will only do text replacement, whereas the Rust macros
could do anything.

And emojiis in code, really?

~~~
psv1
Yes, Rust is by far the most unreadable language that I've tried. I really
wanted to like it but everything is just so ugly, unclear, and (seemingly)
unnecessarily complicated.

~~~
cdirkx
Could you expand on this? I have heard many people say this about Rust, and I
would like to know why.

I myself am now so used to Rust's syntax that I don't know what is confusing
or unreadable about it, and would like the outside perspective.

~~~
petschge
First example scrolling through that post: How is

    
    
      static keymaps: [[[u16; 3]; 2]; 1]
    

an improvement over

    
    
      const uint16_t keymaps[][2][3]
    

u16 is marginally nicer than uint16_t, but why is there a colon? and why do we
have to nest square brackets instead of [1][2][3] (or maybe even better
[1,2,3] which you can get with suitable C++ libraries)?

Sure I can parse Rust. But it is definitely more complicated, and more
"noisy".

~~~
pcwalton
Types going after identifiers avoids the need for the lexer hack, which causes
all sorts of problems in C (such as "typename"). A colon nicely separates the
two; I prefer something there as opposed to "x int" like in Go.

You have to nest square brackets to avoid ambiguity. Is &int[] an array of
references or an reference to an array?

~~~
petschge
I am sure there is good theoretical arguments. But they are hard on the
humans. Ideally I would want something like

    
    
      constdata keymaps[1,2,3, u16]
    

That is easy to read, gets rid of all the extra line noise and directly tells
me everything I need to know about memory layout and performance.

    
    
      1.) it is constant, known and compile time and can be put into a read-only segment (or possibly flash rom on an embedded system).
    
      2.) it is named keymaps. The name is important and should come early
    
      3.) it is an array. arrays and primitive datatype have many important differences and programming languages should not try to hide that.
    
      4.) it has dimensions 1 by 2 by 3 (in that order). Listing the "3" first in Rust when  the first dimension only has extend 1 might have good reasons but is damn hard to read if you have more than 2 dimensions. Especially if you end up with things like 3 by 3 by 4 by 3. Which of the inner two is larger?
    
      5.) Having the type of the element last makes sense, because in terms of memory layout that just means that we have 2 consecutive bytes. I also makes it easier to which from "a 1 by 2 by 3 array of u16" to "a 1 by 2 array of (three vectors of u16)".
    
    

Now you will probably give me reasons why I can't have that. But when I am
coding I don't hard how hard it is on the compiler writers (as long as I can
express things unambiguously), but want to have it as easy as possible so I
have brain cycles to spare to think about data layout and algorithms.

~~~
Ar-Curunir
You can already define a custom type which will allow you to have a nice
syntax for multidimensional arrays: `Matrix<1,2,3>`. It solves your issue of
nesting brackets, and you can impl arbitrary indexing for it.

~~~
kazagistar
Unfortunately, rust does not have numeric types outside the special case baked
in arrays, so it cannot do that yet afaik. There is a ticket for it, but it
needs work.

~~~
steveklabnik
You can sorta do it kinda today:
[https://crates.io/crates/typenum](https://crates.io/crates/typenum)

But it will be much nicer and better once const generics lands, it's true.

------
shepmaster
See also “Rust out your C” by Carol Nichols:

[https://github.com/carols10cents/rust-out-your-c-
talk](https://github.com/carols10cents/rust-out-your-c-talk)

~~~
discardable_dan
At this point, I'm not so much against Rust because it's a difficult or hard
thing to use, but because there's such a learning curve I can't, in good
conscious, move team projects to it. "To continue to work on this with me, you
must first learn Rust" is a hard sell.

~~~
umanwizard
Despite being a Rust fan, I do believe that it is in fact harder to use than
nearly all languages, despite the valiant effort of the community to make it
as easy to use as possible.

Things shouldn’t be ported to Rust willy-nilly, but in its niche (basically: a
better C++, because the compiler enforces many of the things you need to keep
track of in your head in C++, and also because as many productivity features
as are possible without violating “pay-for-what-you-use” _too_ fragrantly are
baked in), it can be worth the cost.

I am sure some others will disagree with this, but I think using Rust in
domains that Python or Java or Awk or Bash is well-suited to is quite silly.
For me, Rust competes with C and C++, and little else.

------
unlinked_dll
I'd argue it'd make more sense for the keymap macro to load a json/toml/yaml
config file and then generate the code but that's just me. Would be a little
more straightforward too using serde, just type out your schema in normal
rust, automatically derive the deserializer via serde, load a file with a
function like macro, and synthesize the C compatible code from there.

~~~
jeffdavis
True, but there's some difficulty using codegen. If macros can do it, it's
often nicer.

But not always, and it's subjective.

~~~
unlinked_dll
Procedural macros in Rust are essentially code generators.

------
wyldfire
Hah, interesting. A new spin on RIIR: RIIRFTSOETC ("rewrite it in rust for the
sake of exporting it to C").

~~~
matheusmoreira
What's commonly known as the C ABI is actually the System V ABI:

[http://wiki.osdev.org/System_V_ABI](http://wiki.osdev.org/System_V_ABI)

Projects that are meant to be widely reused should export their functionality
through that simple ABI. Otherwise people will rewrite it in C.

~~~
gpm
It's known as the System V ABI on Linux. Not on Windows (Microsoft X64 Calling
Convention for x64 windows).

[https://stackoverflow.com/a/44893431](https://stackoverflow.com/a/44893431)

~~~
umanwizard
Those are not actually the same. There are several differences, but the most
obvious one is that the standard ABI on x86-64 Linux passes integer parameters
in the following order: rdi, rsi, rdx, rcx, r8, r9.

I noticed this immediately the first time I tried reading some Windows
disassembly, as “rdi rsi rdx” is very thoroughly burned into my brain...

~~~
gpm
Didn't mean to imply that they are the same ABI, sorry. Just that they're both
"What's commonly known as the C ABI".

~~~
umanwizard
My bad for misunderstanding you!

------
andrepd
The `keymaps!` macro does indeed look somewhat cleaner, but I wonder how many
hours it took to develop for such marginal benefit :p

------
zabzonk
As far as I can see the major difference is that the C code uses longer macro
names.

------
jay_kyburz
Off Topic, but I really like this trend towards very simple blog themes.

------
z3t4
The Rust example made me feel anxious. I never got macro systems. What's so
hard to read about a basic multi dimensional array? It's like the most basic
of data structures. Am I the only one that can easily picture this in my head
?

~~~
squiggleblaz
Isn't the advantage of the macro in the Rust example not the fact that it's a
more convenient multi-dimensional array (is it even that? like you say...) but
that we're using literals not for their literal value, but for some local
symbolic value. For instance, `1` doesn't mean 1, it means whatever `KC_1`
means (perhaps that 63).

I surely find the Rust macro example easier to read in isolation.

Macro systems that are effectively simple find-and-replace are terrible.
Modern macro systems that give you access to the actual syntax tree to
manipulate directly - well, I feel more comfortable writing a little more code
than using them, but at some point the differences mount up. I suspect the
tradeoff here is better in the demonstration than the actual use, but other
cases are valuable.

