I wonder how readable is to someone who isn't experienced in Haskell. To me reads like a breeze, but I have a project using the exact same parsing library so maybe that puts me at an advantage.
The language-c library he uses is an excellent one, it's a fully spec compliant C parser that's well maintained. I've based my C compiler on it and I haven't encountered any C code it couldn't parse yet. One time I upgraded to a new OSX and Apple added some stupid thing to their headers that broke the parser and a fix was merged within days. This means it takes away the entire headache of parsing C leaving just the actual compiling.
The documentation is amazing. One problem I find with rendered literate haskell is that it because quickly unclear how the indentation across blocks of code fits together. It would be nice if there was some kind of renderer that kept the indentation in the docs, or had some kind of "indentation guide".
By "some stupid thing to their headers" are you referring to nullability annotations? I'm not sure what else Apple would have added to pure C headers (as opposed to obj-c) any time in the past few years.
I wanted my compiler to be able to compile a hello world. Something which would be trivial in any language, though of course not C. In C the first line of hello world is "#include<stdio.h>". No doubt known to you, that file is a veritable quagmire of incomprehensible C constructions, specific to each operating system. So I spent a few too many nights getting my compiler to get any possible C declaration known to mankind into its symboltables, just so I could get it to the code generator phase and emit all of three or so instructions.
I forgot what the change was, and I can't find the commit that fixed it, as it seems active development on the library has resumed.
That was many years ago. I was assuming tinco was referring to something more recent. Besides, AFAIK, every usage of blocks in Apple's headers is guarded by an #if check to make sure blocks are supported, so compilers that don't support blocks won't even see them.
I was curious about how this worked so I looked into the source a little (even though Haskell isn't exactly my cup of tea), and WOW... This is just amazing. The most important part of the source is highly educative literate haskell:
Literate programming is more than that. It is encouraged to order things such that they make sense to the reader, not to the compiler. You can give names to chunks of code and later glue them all together in a different order.
My tool of choice for this is Emacs org-mode with noweb support. The source file is "tangled" (i.e. individual chunks of code are glued back together in a specified order) to create the source for the compiler to process.
Haskell itself has only limited support for literate programming. Most importantly it lacks the ability to reorder chunks of code. (noweb or org-mode babel can be used with Haskell, of course.)
Reordering is still useful in Haskell. Just two examples for order dependent things in Haskell: top-level Template Haskell splicing; imports and GHC feature comments.
It can be nice to have a single document explaining the whole project, producing individual source files upon tangling. For this approach you need control over order and target files. "Literate Haskell" is hardly more than a reversal of the meaning of comments and code.
Sorry, I am not understanding what you're implying here. I am well aware of the definition of literate programming. (And what you've said isn't actually enough to be considered literate programming under Knuth's definition: that's just tangle, still missing weave. Though notably, a lot of people argue that weave isn't needed with today's programming languages...)
As someone who knows zero haskell, and little markdown, can someone explain how this works?
haskell.org [1] says there is "bird style" and "LaTeX style" ways of marking off code vs documentation, and I see neither in the linked file. Is it the "```haskell" blocks?
Just some additional info, because literate markdown support is not completly frictionless yet:
To so this yourself you need to add the --markdown-unlit flag to ghc and add the package "markdown-unlit" to your dependencies.
Additionally you have to symlink your .md file to .lhs, as ghc does not look at .md files (even with the --markdown-unlit flag, which I find kinda sad)
This is the coolest thing I've seen on HN in a long time, and useful to boot. Hopefully this will be a very big help to people moving over to Rust from C for its safety and type-checking. In general I don't support rewrites because, as many experienced programmers have pointed out, rewrites often make many of the same mistakes as the program they're rewriting. But transpilation allows us to keep the code with all the fixes to those mistakes.
In theory I'm a big supporter of Rust. I strongly feel that we should be using stronger-typed languages than C for developing security-critical applications, and from the outside, it looks like Rust solves a lot of my criticisms of C without giving up any benefits of C. A transition over to Rust could be a big win for security and reliability of many systems.
However, I'm reluctant to devote time to learning Rust primarily because it's not supported by GCC (or any other GPL compiler that I know of). I hope the next cool thing that that the Rust community does is to continue the work done by Philip Herron[1] on a Rust front-end for GCC. I know the classic response to this is, "Do it yourself!" but there are too many other areas of Open Source that are higher priorities for me, so sadly this will have to be done by someone else if it happens at all.
I'm hesitant to build on a platform that could be appropriated by corporate interests. An example of this is the way Apple has appropriated BSD for MacOS. The BSD code is still open, but the tooling and funding comes from Apple to an extent which allows Apple to determine the direction to some extent.
I have no problem with MIT for smaller tools, but the compiler is too foundational a dependency to take risks on, and it needs better protections--protections which are afforded by the GPL and the FSF's funding of GNU projects.
Maybe it's me having confirmation bias, but the transpiler "market" seems to favor Haskell heavily. "...in Haskell" almost seems superfluous anymore when talking about implementations.
I do get that Haskell is useful to be taken as a tool for these kind of code transformations (at least I have seen quite a few of those) but I am always a bit surprised that people would start such a project in a language that has -per se- nothing to do with either the source or the target language. I know, I know, it doesn't always have to be this way, but I am very much of the opinion that everytime good tools in an ecosystem are written in the language in said ecosystem you get a lot more (and meaningful) contributions.
Best examples: rake (and everything in the ruby ecosystem basically), the amount of people touching ruby c code is very small compared to all the 'standard tools', or cargo.
> I do get that Haskell is useful to be taken as a tool for these kind of code transformations (at least I have seen quite a few of those) but I am always a bit surprised that people would start such a project in a language that has -per se- nothing to do with either the source or the target language.
The author explained this in a blog post[0]: Haskell has a very complete C parser library with a nice API[1] which the author already knows, Rust doesn't; furthermore since one of the project's goal is to be as syntax-directed as possible the translator is straightforward and should be understandable with very little understanding of Haskell (which can be bootstrapped from understanding Rust)
My -general- point still stands, appaently Haskell is very good at this and the people undertaking these projects choose to value this over ease of contribution :) (says someone who just doesn't come to terms with Haskell)
Haskell is very good at writing correct parsers easily—that's one thing. This is part of the reason it was chosen as an early perl6 test bed via pugs. Rust is getting there (I'm a big fan of the lalrpop library) but the ecosystem is no where near as mature as Haskell's for feature-complete libraries. The pace of development of the rust ecosystem is mind-boggling, though—I never guessed that rust would have taken off in popularity as much as it has.
It transliterates C to Rust all right, but the Rust isn't any safer than the C that goes in. Note the representation of an null-terminated string - it's an unsafe pointer to a byte. That's what it was in C, transliterated unsafely to Rust. Some safe Rust representation for C arrays is needed.
From the description of how it translates a FOR loop, it does so by compiling it down to the primitive operations and tests. A Rust FOR loop does not emerge. That needs idiom recognition for the common cases including, at least, "for (i=0; i<n; i++) {...}".
This is a big job, but it's good someone started on it.
A Rust module that exactly captures the semantics of a C source file is a Rust module that doesn't look very much like Rust. ;-) I would like to build a companion tool which rewrites parts of a valid Rust program in ways that have the same result but make use of Rust idioms. I think it should be separate from this tool because I expect it to be useful for other folks, not just users of Corrode. I propose to call that program "idiomatic", and I think it should be written in Rust using the Rust AST from syntex_syntax.
Not to look a gift horse in the mouth, but it seems like Corrode misses some other chances to use idiomatic Rust:
1. Rust fn:main doesn't need to return something.
2. The arguments to main aren't mutated, so Rust doesn't need to declare them as mutable.
3. Ditto for the argument to printf.
Anyone know how easy it is to recognize and code for such cases in the transpiler?
Edit: It looks like they might have opposite design goals [1]: "Corrode aims to produce Rust source code which behaves exactly the same way that the original C source behaved, if the input is free of undefined and implementation-defined behavior. ... If a programmer went to the trouble to put something in, I want it in the translated output; if it's not necessary, we can let the Rust compiler warn about it." (Edit2: cleaned up and numbered)
I think that keeping an exact one-to-one mapping makes this tool a lot more useful. There's no telling what code depends on C idioms that would be broken by using a Rust idiom instead. Generating 100% equivalent code means that programmers can make intelligent decisions about when to switch over to Rust idioms as they continue developing the program.
Yeah, once you've got equivalent Rust, the rest is just optimization that should probably be implemented in the Rust compiler. No reason to put that stuff in the niche transpiler.
> Anyone know how easy it is to recognize and code for such cases in the transpiler? Edit: It looks like they might have opposite design goals
Yes the author has explicitly noted that they want a compiler as syntax-directed as possible, semantics change would go against that grain. In that spirit, idiomatic alterations would be the domain of rust-land fixers and linters (e.g. `cargo wololo` or `cargo clippy | rustfix`)
So you could chain Corrode with one of those to get a C-to-idiomatic-Rust converter?
FWIW, I googled those; Clippy and rustfix just seemed to be linters that can't detect things like "you're not mutating this so drop `mut`", and I couldn't find wololo.
No, most real-world C code will expect a C `int` to be 32 bits, while `isize` is often 64 bits.
On the other hand, at least for Unix systems `long` is often equivalent to Rust's `isize`: 32 bits for 32-bit architectures, and 64 bits for 64-bit architectures, so it would make sense to convert `long` to `isize`.
They're different types. isize is ssize_t (well, intptr_t), in that it is tied to the size of the address space, while C's int is not constrained. In fact, it is usually 32 bits, even on 64-bit architectures, where isize is 64 bits.
Wow. So I did some sleuthing and apparently in Rust the maximum size of an object must fit in isize, not usize. That means on 32-bit architectures you can't have arrays larger than 2GB, whereas on Linux and similar systems 32-bit processes have access to 3GBs and even the full 4GBs of address space. It actually matters for things like mmap'ing files.
Technically, C's int is constrained. C defines a minimum range of values for all the datatypes. The minimum range for int is -32767 to +32767. long is -2147483647 to +2147483647. Though the discerning pendant will claim, ex post, to target something like POSIX (which increases the bound on int, defines char as 8 bits, etc) if you point out improper use of int.
One irony of criticisms against C is that people argue it's too low level, but that's often because people treat it as too low-level. For example, novice C programmers think of C integer types in terms of bit representations and infer value ranges. Good C programmers think of C integer types in terms of representable values, understand that bit representation (specifically, hardware representation) is almost always irrelevant, and understand how to leverage the unspecified upper bounds on value ranges to improve the longevity and portability of their software.
Languages which emphasize fixed-width integers are, in some sense, a retrogression. The real problem with C integer types is you won't see the folly in poor assumptions until it's too late. Languages like Ada addressed this with explicit ranges. But I guess that was too burdensome. Fixed-width integers is an appeasement of lazy programming. I admit to being lazy and using fixed-width integers in C more than I should, but at least I feel dirty about it.
Many of the compromises Rust makes are clearly informed by the _particular_ experiences of the core team. For example, the fact that most Rust developers are of the belief that malloc failure is not recoverable (a big hold-up in adding catch_unwind) is a reflection of their experience with large desktop software. Desktop software has very complex, interdependent, and less fine-grained transaction-oriented state. Recovering from malloc failure is very hard and of little benefit. Most server software, by contrast, has more natural and consistent transactional characteristics. Logical tasks have less interdependent state, so it's both easier and more beneficial to be able to recover from malloc failure.
I think some of the choices wrt integer types is similarly informed.
> the fact that most Rust developers are of the belief that malloc failure is not recoverable
This is untrue. The true statement is similar, but has different implications -- malloc failure is usually not recoverable, and nonrecoverable malloc failure should be the default, for the problem space Rust targets (which encompasses more than low-level things). You can recover from malloc in Rust, it just requires some extra work.
> Because the project is still in its early phases, it is not yet
> possible to translate most real C programs or libraries.
It is currently trying to port over semantics exactly, so the Rust code is far from idiomatic Rust. Doesn't mean it's not useful, just saying that it's trying to be 1:1.
I guess the next stage would involve translating common non-idiomatic patterns into idiomatic Rust. Looks like this could be a job for a community-managed database!
On the rust subreddit someone tongue-in-cheek suggested `cargo clippy | rustfix` to be used in conjunction with this tool for better rust code.
But that actually could work! Clippy has a ton of lints that make your code more idiomatic, and rustfix basically takes diagnostic output and applies suggestions (still WIP).
Clippy is geared towards making human-written unidiomatic code better, so it might not catch some silly things in this tool's output but or certainly could be extended to do that.
It tells you about places where you can improve your code. Possible pitfalls, style issues, documentation issues, unidiomatic code, everything.
Its a developer tool so you can use rustup to switch to nightly to run clippy (and use stable otherwise) and not impose nightly on the rest of the people who use the project. We have plans for making clippy a tool that you can fetch via rustup without requiring nightly.
This is best handled on a per-project or per-organization basis. I would have such a project concentrate on the tooling for maintaining and developing such databases.
Has anyone tried it on some real-world codebases? How about kernel code? It would be very exciting to improve real-world crash safety and security by e.g. converting popular drivers quickly and automatically, followed by a manual pass applying safer Rust semantics.
> Partial automation for migrating legacy code that was implemented in C. (This tool does not fully automate the job because its output is only as safe as the input was; you should clean up the output afterward to use Rust features and idioms where appropriate.)
This was my immediate concern. Is there any chance this tool can produce anything close to clean, safe, idiomatic, rust code?
> Is there any chance this tool can produce anything close to clean, safe, idiomatic, rust code?
That is generally not possible, unless the C code only uses specific patterns known by the converter tool. That's very unlikely, considering that people write C code to be 'quick' and usually use all kinds of tricks.
I wonder if there would be any point in using this to fuzz the Rust compiler.
On the one hand, you could use CSmith with a C compiler as a convenient oracle, but on the other you would only be covering a very limited subset of e.g. the type system.
Only if you are talking about iron corrosion. If you own an aluminum Macbook, for instance, it was oxidized (or corroded) during manufacturing, on purpose. This protects the metal from further corrosion.
Ah very true. I interpret it as (from Googles define) "destroy or weaken (something) gradually." i.e. destroy/weaken the C code into Rust... Maybe it's just me, though.
Well, it's just building off the same (negative) connotations of "Rust", which I think was itself a questionable name, trivial though it might seem.
I've been looking at newer languages recently, and I see a lot of promise in Nim -- which renamed itself from Nimrod after users warned about what it connotes. Rust could take a cue.
Well, Rust isn't actually named after iron oxide. It's actually named after the fungus (https://en.wikipedia.org/wiki/Rust_(fungus)). Most people just don't know about the fungus.
How so? For me, "rust" conjures images of a rusty nail, rust getting in your tap water, a machine in disrepair from rust, and becoming "rusty" at some skill.
And Python conjures images of dangerous snakes (and people getting killed by them), Go conjures images of that argument with my SO, when she told me "go!", C# conjures images of sharp objects like knives, and PHP conjures images of programming in PHP.
It's just a name, nobody really cares about rusty nails and rust in tap water when discussing Rust the programming language. And by nobody I mean nobody in the statistical significant sense.
Just so we're on the same page, are you dismissing all relevance of PL names, or just the downsides of this particular one? That is, are you saying you wouldn't balk at naming a (enterprise-promoted) language something like Filth, Feces, Pubes, Jizz, or (Brain)fuck? Would those be "just names" too?
You're right that no language's name is perfect and immune from bikeshedding; however, the psychophysical mechanism of disgust works at a far deeper, extra-rational level than other negative traits. "Rust" evokes it, pythons don't.
I was going to downvote you for a snarky response (I felt the reply explaining that Rust also has iron/oxidation references was a better response)...
... but that's a really good link that you provided. I've never read that before & it's a great resource for thinking about how to behave socially as a programmer. While I don't agree with all of it, I love their "don't pile-on" and "don't bring negativity from outside" approach. (Plus I'm guilty of the 'feigning surprise' rule.)
> I was going to downvote you for a snarky response (I felt the reply explaining that Rust also has iron/oxidation references was a better response)...
100% of English speakers who hear the word "rust" will first think of metal oxidation. "Correcting" people who make that "mistake" adds nothing of value to the conversation and just expresses "I know more than you" because you know some obscure minutia that was mentioned in an IRC channel once. Nerds love doing this "I'm smarter than you" kind of shit, so much so that it is developing its own noun, "well actually". It's obnoxious and distracting and contributes nothing, so I called it out.
I think the effect is the same. But that's also why I liked the link - I'm sometimes honestly surprised, or seeking clarification (eg "So you really haven't heard of the Playstation 4.5 & Playstation VR? But I thought you were a PS4 gamer?"). But by expressing it as feigning surprise, I understand that the other person feels I'm belittling or mocking them, even if that wasn't my intent.
If you're honestly surprised (and can't/don't hide it), the effect on the other person may be similar, but that's the nature of communication. Feigning surprise is manufacturing a situation with negative aspects.
This is of itself a misunderstanding of the origin: that's one of many reasons it's called Rust, not the only one. There's no single reason for the name.
It's too bad this is written in Haskell. I don't have anything against Haskell, it is just not as popular a language as others.[1] Any ANTLR target language would have been a solid choice.[2] This way more of the community could contribute. This is an invaluable tool if we're truly going to see a shift from C (or C++) to Rust.
> The only reason I wrote Corrode in Haskell is because I couldn't find a complete enough C parser for Rust, and I was already familiar with the language-c parser for Haskell.
I've used ANTLR in anger a few times and for some reason it's always left a bad taste in my mouth. I always seem to spend more time debugging how ANTLR works rather than doing the work I set out to do.
Granted most of my use cases was building a simple DSL so it might be different when talking about whole source conversion.
The documentation is really spare, but coincidentally, the books written by Terrence Parr are good reads and with the right time investment, make dealing with Antlr feel less like voodoo and more like software engineering.
https://github.com/jameysharp/corrode/blob/master/src/Langua...
I wonder how readable is to someone who isn't experienced in Haskell. To me reads like a breeze, but I have a project using the exact same parsing library so maybe that puts me at an advantage.
The language-c library he uses is an excellent one, it's a fully spec compliant C parser that's well maintained. I've based my C compiler on it and I haven't encountered any C code it couldn't parse yet. One time I upgraded to a new OSX and Apple added some stupid thing to their headers that broke the parser and a fix was merged within days. This means it takes away the entire headache of parsing C leaving just the actual compiling.