
C to Java Translation. Automatic, Complete, Correct. Free for Open-Source. - marco2357
https://www.mtsystems.ch/
======
marco2357
Main author here. Let me know if you have any questions. I’d be happy to
answer.

~~~
shultays
I tried so hard to break main arguments (char __argv), on my last try I sent
char __argv to a function and It wrapped main, and inserted program name to
beginning and called the wrapped main with new arguments. I give up.

~~~
marco2357
Hahaha, cool!

------
raverbashing
Interesting how gotos are translated to switches

90% of C should be pretty easy to translate, but of course, the devil (and a
lot of functionality in existing libraries) is in the details.

There would be probably money in translating COBOL to Java, but maybe there
are solutions already?

~~~
emsy
The main problem being undefined behaviour that is actually undefined and
varies from compiler to compiler!

~~~
raverbashing
Yeah, you might need a config switch to configure what would you want to do
with the most common undefined behaviours (or "just change" your C code)

(Because "just change" usually ends up creating other problems. Been there,
done that, "code is wrong but works" and when you try to fix stuff breaks)

------
bedatadriven
This looks pretty nice. Pointer/array arithmetic seems to be handled nicely,
(double)malloc(sizeof(double)*100) looks pretty ugly but it's hard to tell
what's going on under the DoubleContainer hood.

For OSS (or other projects) that just need running JVM byte code, checkout the
GCC Bridge component of Renjin, which uses a combination of GCC to Soot to
compile C and Fortran code to bytecode:
[https://github.com/bedatadriven/renjin/tree/master/tools/gcc...](https://github.com/bedatadriven/renjin/tree/master/tools/gcc-
bridge)

~~~
marco2357
Thanks!

malloc() we only optimized for char* so far (there are endless possible
optimizations when translating C the way we do).

"(double* )malloc(sizeof(double) * 100)" should be translated as "new
DoubleContainer(100, true)". We'll add that add some point.

------
bcg1
NestedVM was able to do this over a decade ago.

[http://git.megacz.com/?p=nestedvm.git;a=summary](http://git.megacz.com/?p=nestedvm.git;a=summary)

And is actually open source, not "free for open source".

Kudos to the developers though, I'm not trying to bash your skills or diminish
the quality of your work... I'm sure many enterprises can/will benefit from
this. Just wanted to let the free/open source community know that you don't
have chomp on the "free for open source" carrot.

------
haches
I think I saw this the other day in the VIM discussion. Great job! Are the
other examples, e.g. micro http, also available for download?

~~~
marco2357
We translated dozens of open-source projects and decided to list only the
interesting ones on the website and upload only the most interesting ones; The
ones that are very well known and have a nice GUI.

Feel free to send us an email and we'll be happy to send you the other
programs you're interested in.

You can also ask for translations of open-source software we didn't translate
yet if you want to see the translation of a specific project.

------
fuklief
It says the translation is correct, is there any proof of that ?

~~~
marco2357
Only in the form of translating and running dozens of C applications (programs
and libraries) and running their testsuites. E.g. libcurl comes with a great
extensive testsuite (a perl script running against the binary).

Translated applications still need to be thoroughly tested and usually some
bugs are still found.

So we didn't formalize and verify our translation. Interestingly enough, we
run into bugs in javac and ecj (Eclipse Java Compiler) surprisingly often. So
verifying our translation would still lead to translations with bugs ;-)

Another fun fact: Since our translation knows the limits of allocated memory
(and many other things), we found many illegal memory accesses in C programs
that were unknown before (libgmp, micro httpd, vim, ...) since they didn't (or
only very seldomly) lead to segfaults.

~~~
danieltillett
This sounds like a great spin off - bug finding in C code. Have you put much
thought into pursuing this?

~~~
marco2357
Didn't think too much about it since there are many C specific analyzers and
tools that do the same. Well, they do it way better. E.g. Valgrind.

~~~
danieltillett
I would guess the value would be if you find bugs that other tools don't. If
you just find the same bugs as Valgrind then I agree that there would not be
too much value, but if you find unique bugs then it would be useful.

------
sputnik27
What do I need to do to get opensource software translated? I would like to
have this
[http://stjarnhimlen.se/comp/sunriset.c](http://stjarnhimlen.se/comp/sunriset.c)
as java code..

~~~
marco2357
Send us an email. We're currently getting overrun with requests but will
handle it as fast as possible.

Please note that the software needs to be in some public repository (github,
bitbucket, sourceforge, ...).

------
orodley
Pretty neat. I did find a bug while I was playing around with it though: it
doesn't correctly translate Duff's device.

~~~
marco2357
Can you elaborate on that? What's the C code, the translated Java code and
your expected Java code?

Thanks!

~~~
snnw
[http://lmgtfy.com/?q=duff%27s+device&l=1#](http://lmgtfy.com/?q=duff%27s+device&l=1#)

~~~
marco2357
I saw that ;-) The translation is:

    
    
      do {
    
        to.set((from = from.shift(1)).get(-1));
    
      } while(--count > 0);
    
    

which looks correct to me. Hence my question what orodley thinks is wrong.

~~~
jmuhlich
I think you might be looking at the wrong thing on the Wikipedia page. The
core feature of Duff's Device is interleaved switch and do statements:

    
    
      n = (count + 7) / 8;
      switch (count % 8) {
      case 0: do { *to = *from++;
      case 7:      *to = *from++;
      case 6:      *to = *from++;
      case 5:      *to = *from++;
      case 4:      *to = *from++;
      case 3:      *to = *from++;
      case 2:      *to = *from++;
      case 1:      *to = *from++;
              } while (--n > 0);
      }
    

(extraneous register statements removed for conciseness)

~~~
marco2357
That would be a bug in the translation. We'll investigate. Thanks!

~~~
marco2357
Yes indeed. We handled goto into do-while statements wrong. Fixed now. Thanks!

------
danieltillett
Is there any tool that does the opposite?

~~~
marco2357
I've seen many research papers on Java -> C translation during my PhD. Some of
them came with a prototype tool. But as with previous work on C -> Java
translation, none of the tools actually really worked completely.

I actually wrote a Java -> Eiffel translator:

[http://se.inf.ethz.ch/research/j2eif](http://se.inf.ethz.ch/research/j2eif)

Based on that experience I can say it would be quite a big effort to write a
Java -> C translator. But not impossible.

~~~
danieltillett
Thanks for the post. What was the big sticking point with the Java -> C
translators?

~~~
marco2357
It's easy to do a minimal prototype when doing research. But writing real
translators means to get all the details right. In research we usually don't
have the time for that.

Translating Java - in my experience - is very hard because of the extensive
runtime system (reflection, base classes, synchronization, ...). E.g. if your
application does System.out.println("Hello"); you already need the System and
PrintWriter classes. They in return depend (among many other things) AWT which
needs the security classes. And so on. A HelloWorld pulls in 1208 classes of
the base library. They in return depend on java.dll which you have to re-
implement from scratch. Or you rewrite all base library classes which is even
harder.

I hope this gives you a basic idea of the problem.

~~~
danieltillett
It certainly does. I can't imagine even starting on a project like this.

------
danbruc
One million Dollars for a couple of days of work?

~~~
shultays
You can always format your whole source code into 1 line, so it will cost only
1$.

------
marvel_boy
Do you translate open source free of charge?

~~~
marco2357
Yes. As long as it's non-commercial software.

------
ExpiredLink
C++ to Java would be more interesting.

~~~
marco2357
True. But also adds a lot of complexity on top of an already very complex
translation. But it's certainly something we'll look at in the future
(together with supporting Cobol and Fortran).

But I heard there are C++ to C translators. I don't know how good they are and
how the resulting code looks like. But if they're decent, you could do C++ ->
C -> Java :)

~~~
SCHiM
On the C++ to C translators: The first actual C++ compiler was actually a
trans-compiler to C (actually the language was still called cFront back then,
but basically it's the first version of C++[0]).

So in that case, the C++ to C compiler was there before the first real C++
compiler which appeared some time later.

[0]
[http://www.cplusplus.com/info/history/](http://www.cplusplus.com/info/history/)

------
TheLoneWolfling
I want a Java-to-Java translator.

------
lessthunk
I wonder what the performance impact is.

Why do young students no longer learn C? Don't you want to be closer to the
underlying OS?

~~~
coldtea
> _Why do young students no longer learn C?_

Who said they don't? This project is about porting existing stuff, for
interoperability, etc. Not as a way to "avoid learning C".

> _Don 't you want to be closer to the underlying OS?_

No, why would I want to do that unless I have a specific need for that?

~~~
vezzy-fnord
_No, why would I want to do that unless I have a specific need for that?_

At least if you're using a Unix-like system, then understanding POSIX is
essential for knowing how things work starting on an intermediate level.

~~~
tormeh
And why would I want to understand how things work down in the OS?

Anyway, C is a horrible language. Between Fortran and Ada it's unclear to me
why C should be chosen for anything, inertia and herdthink aside.

~~~
TorKlingberg
Have you used Fortran or Ada?

~~~
tormeh
Have used Fortran. It's pretty nice, actually. I mean, it does have
unstructured programming, but its structured alternatives feel pretty natural.
Fortran code, like C code, feels a bit fragile but unlike C, Fortran doesn't
seem to be actively malevolent. I only know Ada by its (excellent) reputation.

EDIT: Only true if you use the implicit-none flag when compiling Fortran.

