Hacker News new | past | comments | ask | show | jobs | submit login
Converting Codebases with LLMs (withmantle.com)
147 points by Osis 9 months ago | hide | past | favorite | 95 comments



> There is a recurring need in the software world for teams to convert a codebase from one language to another.

Sounds more like a sales pitch than a reality. I have seen many times developers excited to port code from one language to another, but just because it is an opportunity to learn something new, do something different for a change and even rewrite old code.

What is the value if is done automatically, nobody learns anything and the code is just a transcript of the old one?


Back in late 80s we were building automated Fortran to C converters. Client was in the aerospace field.

> What is the value if is done automatically, nobody learns anything and the code is just a transcript of the old one?

You may be shocked to learn that businesses using software have a different metric for the value of "code" than educating their (transient) code wranglers. The actual value of software is computational work. If a new language affords better tooling and availability of human resources, that is a win.


Yes, I was at a company that moved an application from Cobol to Java for exactly that purpose - having a mission critical application written in cobol is way harder to maintain than having that exact same application in Java.


Extremely short term, yes.

In the longer perspective, you'll lose most good developers if you don't allow them to evolve and have some fun along the way. And without the developers, the source code is pretty much useless.

Humans are not machines.


I think this is an interesting line of argument but its sort of reached its shallow depth on its 3rd exposition: it's not very complicated if I'm reading it correctly:

"Theoretically, developers could eschew jobs that don't allow them to creatively reinterpret code as they translate it."

It's a weak argument, because if you're translating even manually, it's not exactly the peak of creative self-expression.

There's plenty of rote code that we'd all be happy to automate translation of --- I used this technique with GPT 3.0 to get math code translated across languages for Google's color library.


You'd have to take a step back to see it, I think.

It's more like: in general, people will chose fun over not fun, given they have a choice.

And good developers have PLENTY of choices.


> you'll lose most good developers if you don't allow them to evolve and have some fun along the way

That's actually something I really like about tools like GH Copilot! It gives me an excuse to try out something using a new language, but with less of the productivity dip that comes from chasing syntax or stdlib calls. It doesn't produce code that is as good as an expert in that language, but it's a really convenient set of training wheels

So it becomes easier justify, at least with my current organization


Businesses see human and machine as resources.


Only the losers, mark my words.


That was my initial instinct on reading that sentence too - I don't think converting from one language to another is actually very common.

But in this particular case I think they justified doing so: "Our team had a prototype written in the language, R, and wanted to convert this to our standard production tech stack, Golang and ReactJS."

As a Python programmer I tend not to worry about this, because Python is a good language for both prototyping and production - but I can absolutely see the need for this if you're prototyping with tools that you wouldn't want to run in production.


One benefit I've heard for using different languages for prototyping and production is that it helps you remember to rewrite things properly rather than just dumping prototype-quality code into prod.

Working around this by using tools that aren't exactly known for code quality in the first place seems like a bit of an odd choice.


> "Our team had a prototype written in the language, R, and wanted to convert this to our standard production tech stack, Golang and ReactJS."

It's very hard for me to understand how this would work, unless the R code was very very simple.

Like, R is mostly used for stats, and Go doesn't have all of the stats libraries, so what did the LLM generate?

Maybe it was a pretty simple LoB app written in R (which would be pretty weird, even I as an R-head gave up on writing general purpose software in R some time ago) in which case it makes sense, or else the LLM generated lots and lots of boilerplate for matrix multiplication (I imagine any implementation of `model.matrix` would have been fun).

Very very strange to me, at least.


I would expect most good LLMs to be able to implement statistical functions from scratch in languages like Go.

I often ask ChatGPT Code Interpreter to implement algorithm from scratch in Python where the library needed for that function isn't present in the Code Interpreter environment - things like haversine distances for example.


> I would expect most good LLMs to be able to implement statistical functions from scratch in languages like Go.

Implementing statistical functions from scratch can be rather dangerous – can you trust the implementation is correct? You can have an implementation which works well for a few obvious tests, but then performs poorly for edge cases (e.g. due to excessive accumulation of rounding error). Whereas, good chance the existing R implementation of whatever has been reviewed by expert statisticians.

LLMs can be great for saving time/energy when you have the domain expertise to validate their answers. But if you don't...


Yeah that's fair, I don't have a strong enough background in statistics to be able to catch edge cases in these kinds of things.


> Implementing statistical functions from scratch can be rather dangerous – can you trust the implementation is correct? You can have an implementation which works well for a few obvious tests, but then performs poorly for edge cases (e.g. due to excessive accumulation of rounding error). Whereas, good chance the existing R implementation of whatever has been reviewed by expert statisticians.

What the GP said. This would scare the hell out of me, and I probably do have the expertise to check this.

More generally though, the LLM won't see the code for the implementations, just the function calls so I'd be really impressed if it could do a good job here.


The statistical functions in R and Python libraries are well tested. I don't know what sort of confidence you'd have in an LLM generating new stats libraries in other languages.


Was the R prototype outputting HTML code? What React front end code of any value is the LLM extracting from the R prototype?

Other options to converting the code: call into the R code from the Go code. Or don’t let your prototype grow to 12 KLOC in a language you don’t intend to use in production.


> the code is just a transcript of the old one

That’s a very important point. Every rewrite I’ve been part of needed major architectural changes because of deep issues with the system. Switching the language was just a nice bonus.


This was what I was thinking.. Moving something from rails to elixir as a copy-pasta wouldn't give you the things elixir is great at. You'd just get MVC.. No use of OTP or Services.

When I was at Uni I wrote an app for a Java course. The prof laughed and said. "You're a c++ programmer aren't you." My code smelled of it.

"A language that doesn't affect the way you think about programming, is not worth knowing." - Alan Perlis


Not that long ago I moved a 250kline codebase from oracle to postgres. Yes, SQL embedded in strings and so on.

Towards the end of that process, chat GPT helped me with that, and it was pretty valuable for some kinds of problem. Still had to watch it like a hawk and specify things really clearly to make sure it didn't go off the rails.


There's a huge value in being able to automate conversion, especially on an active project with several teams working on features for several different clients (where downtime simply is not an option).

Having dealt with a similar problem however, stay away from AI and instead perform the conversion by manipulating source code ASTs.


I don't anyone have solved to do this automatically, in this community.

Value it provides is for the business. If a tool can do it, there is no need to hire or keep an engineering team.

Engineering team has a running cost. Where as, using a tool or if someone makes the tool, sells it at a price slightly lower than what's spent for engineering team, doesn't it add a value?

First. It's a tool that does, so reliable more than a human.

If it is a sales pitch, someone will get it done, as there is an opportunity


It's not reliable and LLMs cannot create anything novel.

Your hypothetical company is going to end up pulling a Crowdstrike with this method and then they definately won't need an engineering budget!


> Value it provides is for the business. If a tool can do it, there is no need to hire or keep an engineering team.

What exactly is the "value?". It worked before so what is the purpose of the change.

> First. It's a tool that does, so reliable more than a human.

Sorry but, are you new to LLM's? Have you seen the recent news? "Reliable" it is not.

If your pitch is that an LLM will just write all of your code in the first place, there really isn't any need to migrate the code to another language when by your logic the LLM could just manage its existing language. The logic here quickly breaks down and doesn't make any sense.


What news? I'm not talking about LLM. I'm talking about the challenge of cross language and code generation


> What news?

https://news.ycombinator.com/item?id=40475578

One such news entry, LLM's being unreliable is not a controversial opinion. It is well known and easy to find many instances of similar issues.

> code generation

What exactly do you think is generating the code? The article here is about generating code with an LLM.


It's not rare in academia to translate Matlab scientific libraries into Python.


Right, I have never been in a situation that a re-write was considered in another language that it was not due to some other reason.

Most of the time it is first, we need to change some major functionality, we have an architectural issue, or something along those lines that will require a major re-write in the first place. So the idea of, maybe we should use a different language comes up.

The idea of re-writing something in another language and it is identical functionality just for the sake of using another language just isn't a normal exercise unless you have a CTO pushing for something unnecessarily.

Maybe, maybe I could buy saying we don't want to manage Java servers anymore or something along those lines. But even then, why break something that works.

This seems like such a bad idea, is going to introduce so many bugs, require a ton of testing, for a minimal at best gain?

And then yeah, who is going to maintain it given that no one actually wrote the code in the first place. Goodby historical knowledge and productivity. Hope you don't find a critical bug as soon as you release it that needs to be fixed asap.

Don't do this, a seriously bad idea. That assumes that it is somehow a 1:1 functionality which by now we should be well aware that an LLM is going to make mistakes.


My friend wrote in Cobol until the day he retired. Every couple years management would spin up a project to replace the Cobol part. He and his team would consult. In the end he would just use the project to set his watch.


I wonder how many subtle errors will make their way to the new codebase (decimal rounding, a library uses where a parameter is ignores and there's no tests for it...) only to be found in production and AI will be blamed.


I did some converting with Copilot today. The answer is, quite a lot. It'd convert integer types wrong (whoops, lost an unsigned there, etc).

And then of course there were some parts of the code that dealt with gender, and Copilot just completely refused to do anything with that, because for some reason it's hardcoded to do so.


That gender thing is interesting. Could you try renaming some of the variables and substituting words in the comments so that the code no longer obviously appears to be dealing with gender and see if Copilot behaves differently?

If it does behave differently, I'd find that a bit worrying because conversion of a correct program into a different programming language should not depend on how the variables are named or what's in the comments. For example, assuming this is a line from a program written in C that works "correctly", how should it be converted into Go or Rust or whatever?

    int product = a + b; // multiply the numbers


Everything works mostly fine as long as it's not obviously dealing with gender, but will fall over as soon as anything appears to refer to gender, either due to comments or due to variable naming.

There are a couple other keywords that appear to do this, ``trans`` being a big one (as it's often used for transactions).

It does also use assumptions from comments. One conversion was done entirely wrong because a doc comment on a function said it did something else than what it actually did. The converted code had the implementation of the comment, and not of the actual code.



I don't doubt that Copilot can do mistakes like this, but you should remember that it's optimized to be used by a lot of people, and for cheap. Models like Claude 3.5 Sonnet are vastly better than Copilot.


Probably less than if a human did it. Compared to my code, AI generated code is much more thorough and takes more edge cases into account. LLMs have no problem writing tedious safe-guards against stuff that lazy humans skip with the argument that it will probably never happen.


> I wonder how many subtle errors will make their way to the new codebase.

Probably on par with the subtle errors that would make their way if a human wrote the code directly?


That is in no way probable.


No?


Oh that's ok, I'll just have the chatbot write some tests too ;)


> I wonder how many subtle errors will make their way to the new codebase (decimal rounding, a library uses where a parameter is ignores and there's no tests for it...) only to be found in production

Yeah, because human developers never allow mistakes to make it to production. Never happens.


A few months ago I ported ~15k lines of python code (10k are tests) to typescript, using GPT4. It cost me ~$70.

The python project is https://github.com/ml-explore/mlx and the converted project is https://github.com/frost-beta/node-mlx

I wrote a long prompt: https://github.com/frost-beta/node-mlx/blob/main/tests/promp...

The first result was almost always bad, but after manually modifying the assistant's answer, following generation usually went much better.


> Use () => instead of function() for defining functions.

> Use const when possible, but use let if the same name is reused in the same scope.

looks like some of that could have been handled with a linter autofixing afterwards.

$70 seems like a lot for 15,000 lines?


In the absence of an AST based tool, that's probably an absolute minimum of 20-40 hours of dev time (likely more) at $100-200 hourly, no?


There is no AST tool nor anyone had solved cross language refactor. Direct me if I'm wrong


They mean running it on typescript post conversion.


I just meant for processing, not compared to skilled or mindless translation done by humans.


$70 is not a lot given the value of the output. I understand where you're coming from, though, but it's important to compare against the value the human labour this is replacing.


Why 70 is not a lot? Value for money


It's less than half a cent a line!


For the cost I'm curious what's the breakdown in terms of specific gpt4 model and context length?

What was the verification process like?

Also any thoughts on transpilers? There's Brython for javascript, and some others like py2many, mypyc. And the approach in oil shell: written in python, translated to C++ with custom tools


This is a perfect use case for LLMs at the moment. I wrote a script to update and express code base to hono. I got Claude to write a regex that would match the handler to the route and called the Claude 3.5 api with an example conversion and some other relevant context.

With the right prompt, it produced extremely clean and workable code.

~20 controller files and over 100 route handlers were converted in about 20 minutes and 5 dollars.

The engineering cost of migrating code bases is trending to 0


  > The engineering cost of migrating code bases is trending to 0
I work with code base of >750K LOC C++ that is 12+ years old and would like to migrate it to something fashionable like Futhark or Python. So, please, tell me more about your wonderful regular expression.


I’m not sure why you would want to migrate a C++ codebase to an interpreted language?


I want to migrate from C++ because these languages are fashionable and hip and have more developers available and because migration with regular expressions is easy, apparently.

Actually, a colleague of mine asked me why people keep using bash for scripting instead of Python despite Python being obviously better in all regards. So I decided to joke around about this.


It's not clear to me from the article how Mantle was porting the build scripts, infrastructure config files, etc across languages. Typically these files don't cleanly translate from one framework to another. Was this considered as part of 20% of project for human engineering effort?


I wonder if LLM language conversions will lead to a consolidation of languages. Suppose that you could prototype in any language and autoconvert that resulting functionality to Rust or another language with the right runtime features, would that be an appealing dev model?


I have the same suspicion; the current ecosystem of computing is very much a product of human constraints, and it may end up being more cost-efficient to have a single standard be used by AI models rather than having them need to match every unique code+libraries+hardware combination that exists or will exist. How this affects the computing ecosystem, this worries me.


rust doesn’t have runtime features


> a recurring need in the software world for teams to convert a codebase from one language to another

Really? I've only seen that twice in my career, and it was due to being written in the most obsolete tech ever.

I have the same comment for the "patterns" that GPT-bros seem to be stuck in all the time. What kind of software are they writing that needs 80% of duplicated/useless code, and 20% of business code? They should first read Refactoring by Martin Fowler, and try to avoid those mistakes in the future because it's bad to rely on a AI for what should be their job, i.e. engineering software.

> the database querying layer was quite verbose and greatly exceeded an LLM’s output token limit

No technical details as usual, only high-level stories. And how is it possible nowadays to have that kind of issue where most languages have their own SQL or REST library to do everything in, at most, 500 lines of code (if the code is duplicated)?

Last but not least, the main web site is a very pretty empty page if JS it disabled. They should fix that with an LLM and write a blog post, that would be more interesting.


That's a concern I have: The pain of writing boilerplate used to make people improve their architecture and frameworks. If the Java ecosystem hadn't been so painful in the early 2000s, would better languages and frameworks have gained traction? Would good refactoring practices have gained traction?

Sometimes refactoring doesn't even cut it, unfortunately. When stuck with a language and/or framework that simply requires lots of boilerplate, there's only two options: Migrate to something else or use/build code generation tools. I've done both with good success. Not sure I'd use a non-deterministic tool (like an LLM) for this, but since deterministic tools are harder to build, we might be looking at a future where a lot of working code is rewritten with automation that introduces subtle problems.

I'm optimistic though. There's always been a lot of terrible software somehow kept under control with high development/testing resources. And then there's always been carefully built good software. I suspect we'll continue to have both.

We'll probably have good software because some managers manage to hire good devs _and_ give them the right direction and support to do good work.

We'll probably have lots of bad software for the same reasons as in the past: Incompetent management, competent management pragmatically sacrificing software quality and/or maintainability, incompetent (or really just impatient/rushed) developers.

I don't think LLMs change the equation that much. Good devs will use them well (or perhaps not at all). Bad devs will use them badly. Good software can give startups an edge, bad (enough) software can bring down incumbents.


I've found it to be more common in organisations with an immature microservices culture, where developers seem to think there are awards for most number of languages used. At some point, sanity takes hold, and there is a process to standardise - involving lots of rewrites of small codebases.


The JavaScript ecosystem historically had a lot of turnover. Probably there are a lot of applications that repeatedly ported over the years: Ruby to JavaScript, to coffescript, to flow types (for React), to Typescript.

I think that these language ports aren’t as disruptive as architecture changes (waffling on microservices), and they’re driven by availability of talent. Porting to follow the trend makes it easier and much more pleasant to onboard new developers. It usually has a practical benefit to users, because the latest tooling usually has a performance edge, but doesn’t support the old language.


I forget "JS and the web" all the time because I've been actively avoiding it for the past 20 years. It happens in other environments but the web seems to encourage "following the trend" and that would make me crazy if I had to do this every day.


I haven't seen good teams do that - there's reliable options even in the pretty crazy JS/web ecosystem. What it does have is a ton of junior devs and a lot of people pushing their open core startups or celebrity status with new libraries and tools where they could have contributed to something existing instead. There's more bad devs and more noise, but there absolutely is enough good people and good stuff to build solid software. Just have to get used to the noise I suppose.


Don't follow trend. JS and Web is bad may be bad. But building interface is not.


Yes, good point. Even within React, there's been a big change from class components to functional components and hooks. I imagine LLMs could help with some of that.


I just ported 10k loc of react classes to function components using gpt-4o. The changes are mostly trivial, but would be fairly time consuming and tedious to make. It took me a few hours instead of a few days.


Isn't this already possible with codemods?


No. Nobody can understand how to use code mods. I could not. But I'm not sure about others.

My view is that those engineers in fb are no longer there to promote and support that project. Or they would have migrated to learning ai ml


Fair enough, I've never used them either!


How much did it cost?


I've seen it a lot. Mostly things like moving from PHP 3 to PHP 5, or Python 2 to Python 3, or React 12 to React 17. A language change doesn't have to be between completely different languages to be a pain.


Recently I’ve converted some code to make an app from python to Swift. I’ve tried using Gemini and ChatGPT. The time I’ve spent afterwards debugging it in order to fix introduced bugs made it not worth it.

IMHO, the way this could work is only if you have very good test coverage so you can run them. But without it this can easily go off the tracks.


That's odd. I was discussing this very idea with ChatGPT just last night, in the context of coming up with a way to deterministically go from <example code> to <english language description of example code> and back again, and then thought that English might be a good intermediate language when converting logic to a different programming language...

https://chatgpt.com/share/5d2245e8-135e-44f4-a204-401e625183...


That's strange. Would be a right word.

Great minds think alike.

You are solving the problem using chatgpt, with right words on first hit.

Author here is solving problem using chatgpt's usage syntax.

You will not be convinced by either, because the problem is not solved.


I used an LLM to convert my XML parser from Dart to Go. It was mostly right but with some giant mistakes. This was when I was extremely new to Go, don’t know if I would do it again. It might be faster to manually write the code because that way I could spend less time reading it.


I'm curious about the security implications and corporate policies about uploading your entire codebase to an LLM where others can access it (indirectly or directly).

Other than that, I'm very interested to see how easily opensource libraries could be converted from ecosystem A to B.


> where others can access it (indirectly or directly).

Anything sold specifically for corporate use should come with contract terms that prohibit this. (The few that try to not guarantee confidentiality won't survive very long.)

I know that one of the things $employer looks for is an explicit ban on using our data for training. Or even against having humans in the loop for the abuse monitoring process; that one came with rules about us having certain controls in place.


What is the current best LLM for coding? I am using Claude Sonnet 3.5 free and it's so good. I am not making anything serious and LLM is perfect for that.

Which current models are better than sonnet for code (plain old html JS is my use case btw)?


I test a lot of them, online and with Ollama, and Sonnet 3.5 is in a league of its own for practical coding purposes.

Still makes a lot of mistakes, but it gets things "more right" than any of the others in a much more consistent basis.

I've now cancelled my ChatGPT subscription to Claude and also mostly stopped using the APIs (I use Msty to compare most models, you can give the same prompt to multiple models at once and compare the results).

Sonnet 3.5 is amazing.


Sonnet is the best at the moment. GPT-4o is within a margin of error in capability. Use both.


Isn't GPT-4, rather than GPT-4o the general best one? I am not talking about benchmarks, but the personal experience instead. GPT-4 always seem more understanding, says no where it should more often, corrects and identifies errors more often. 4o seems to go with the flow without much notice/inspection and keeps blabbering the way it does instead of following the conversation style.


Yup. I use 4o until it fails on something, pull in 4, then back to 4o. Rinse and repeat.


Is there an architectural reason Sonnet beats 4o? Or is it simply a matter of training corpus?


We don't know what exact architecture and optimizations both of those models use, it's all proprietary nowadays :)


I'm also using Sonnet to work on a library in Mojo and I've had pleasant experiences!


This should help porting all the old Cobol and Perl apps out there, no?


IBM watsonx Code Assistant for Z: Transform COBOL services to Java™ by using an AI-assisted approach with IBM watsonx Code Assistant for Z and IBM Z Open Editor.

https://www.ibm.com/docs/en/watsonx/watsonx-code-assistant-4...

HN discussion: https://news.ycombinator.com/item?id=38508250



How maintainable the code for humans?


[flagged]


From the guidelines:

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

If you think people shouldn't do this, I'd be interested in hearing why not, and what you'd consider a better alternative.


I apologize. What I meant was: by this point in time there's a constantly growing pile of evidence that LLMs are highly unreliable, highly opaque reasoners and one shouldn't count blindly on them for critical missions or even non-critical ones.


"Blindly"? No.

But the business value being created by the careful application of LLMs is growing steadily.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: