Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Otterkit – COBOL compiler for .NET (github.com/otterkit)
192 points by KTSnowy 19 days ago | hide | past | favorite | 77 comments
I'm the lead developer of Otterkit. We've been working on this compiler for the past few months. The goal is to support the latest COBOL 2022 standard and compile it to C# (will support nativeAOT as well)

Currently most of the work is being done on the parsing side of it to ensure that it can correctly parse all of the 2022 standard. We're almost done writing the parser, and will soon move to the codegen and runtime library.

Please let me know what your opinion is on the project. I'll be here to answer any questions about it.

We're also looking for contributors if anyone is interested in helping the project grow into a production ready compiler.

On one side you have companies like Google creating entirely new programming languages (Carbon) that aren’t really solving a problem and aren’t that different from one or more existing languages (Rust, Go), and on the other you have people like OP writing a COBOL compiler in C#. What a world.

Carbon solves one problem, ISO doesn't care about what Google thinks regarding ABI break.

Carbon is incredibly niche. I can't imagine anyone is using it for greenfield stuff. The idea that I can refactor my code base one class or even one method at a time is neat, but if I am the guy in the C-Suite dropping $10M+ each year on SWE salaries, I would rather wait for C++25 or rewrite my codebase in something "mainstream" like Rust than spend man-hours rewriting C++ in a baby programming language- created and maintained by Google of all companies.

Carbon team is the first one to assert that Carbon is for Google internal purposes, or anyone that eventually shares their point of view regarding C++, everyone else should go with Rust or any other safer alternatives.

They also admit it is an experiment that might fail, and they will go back writing C++ on the codebases that they fail to migrate.

So it is kind of strage that people hype it for stuff the authors never planned for.

I don't know that I have seen anyone hype it, I just think it's funny that Google invented yet another programming language, and that (to me anyway) it's really just to gain access to features that are probably coming down the pipe in C++ in due time anyway.

>Google invented yet another programming language, [...] just to gain access to features that are probably coming down the pipe in C++ in due time anyway.

That's the opposite of what Google perceives. There is no timetable for a future C++23 or even a later version that will make their experiments with Carbon redundant.

Are you already familiar with the Chandler Carruth blog post?


That post also has a link to the Titus Winters pdf explaining some of the ABI breakage that the committee does not currently want to prioritize.

I'm skeptical Carbon will spur industry-wide usage outside of Google's internal work. Nevertheless, their rationale for Carbon is in response to the current committee's position of prioritizing ABI backward compatibility over performance improvements.

Interesting. I haven't seen that blog post. Thank you for sharing it!

True enough, there's no guarantee C++ will eventually get the features Google is getting now with Carbon, and I realize they are a software company, and if anyone has a reason to invent their own internal-use programming language, it's them.

Lol. This is why I love HN though!

I'm fascinated with this.

What's your use case, here? What problems do you expect to solve for someone that chose to adopt this?

Are y'all scratching a very particular itch, and open-sourcing the outcome? Or is this more of an academic / POC thing, to see what it looks like to build a COBOL to dotnet transpiler?

I find COBOL to be a very interesting language, with a huge unexplored potential. For example, I'm sure that most people are not aware that COBOL has generics, method overloading, declarative error/exception handling, asynchronous messaging, etc.

This stuff is usually not taught by the current COBOL vendors. It's also very unfortunate that most compilers are closed source and quite expensive to use.

We're making a free and open source COBOL compiler to help improve the current state of the COBOL ecosystem. We hope that it will be production ready at some point.

> I find COBOL to be a very interesting language, with a huge unexplored potential. For example, I'm sure that most people are not aware that COBOL has generics, method overloading, declarative error/exception handling, asynchronous messaging, etc.

But most COBOL systems are legacy systems, using a version of COBOL designed many decades ago, and without any of these features.

Starting to use the features you list will effectively be like learning a completely new language, only without any of the upsides of modernity and all of the downsides. If you are going to migrate from legacy COBOL to slightly-less-legacy-COBOL-but-at-the-same-time-all-these-frightening-new-features, you might as well migrate to a more modern platform and language.

Why not improving Gnu COBOL[1] then? It has better support for the COBOL constructs and libraries already.

[1] https://gnucobol.sourceforge.io/

I know about GnuCOBOL, but both projects have different goals and ideals.

Otterkit compiles to C#, and GnuCOBOL compiles to C. The two are meant for different use cases. You wouldn't use a huge C program in a .NET backend, it would be a pain to make it work correctly and difficult to maintain.

GnuCOBOL also doesn't support quite a bit of COBOL's features. It only supports the procedural part of the language, and that is only a small part of it compared to the object oriented side of COBOL. Otterkit will support both, and the new 2022 standard.

> Otterkit compiles to C#

Is it really a transpiler? Why not directly compile to CIL?

> I'm sure that most people are not aware that COBOL has generics

But this is only in the more recent standards, isn't it? I guess that 95% of existing Cobol applications still use the 1974 standard version.

> Why not directly compile to CIL?

This was asked and answered on reddit [0]:

> Exactly, there's no way that we could optimize the IL better than the dotnet compiler, so we chose to emit C# text and let the dotnet compiler do the optimization for us.

> This also makes it a lot simpler to generate code, without having to deal with the lower level IL.

[0]: https://old.reddit.com/r/csharp/comments/1074jdj/otterkit_a_...

JFYI, optimizations performed by C# compiler are not that great. They are present but their extent is pretty limited.

Actually I like it better when COBOL gets transpiled to C# and not IL but for another reason. COBOL -> C# pathway allows migration to something a bit more modern than COBOL itself. This may be crucial for some projects.

> Is it really a transpiler? Why not directly compile to CIL?

The end result of running Otterkit will be an executable or a C# DLL, so compiler would still be the most accurate word to describe it. If it stopped at the translation stage it would be a transpiler.

We're compiling to C# source text so that we can take advantage of the dotnet compiler optimizations. Otterkit calls the dotnet compiler after translating to C#.

> advantage of the dotnet compiler optimizations

That's mostly constant folding and a bit of peephole optimization; the CLI JIT or AOT compilers do most optimizations on the CIL level.

> so compiler would still be the most accurate word to describe it

Well, you do a source-to-source translation from Cobol to C#. The terminology is fuzzy, but usually we call it a transpiler if the output is yet another high-level programming language. But who cares.

If it is of any interest: In 2022, an announcement was posted to the GCC mailing list introducing a prototype implementation of a COBOL frontend to GCC, which would in theory compile COBOL directly to a target machine of choice.

Mind you, I haven't actually tested it myself.

Link: https://gcc.gnu.org/pipermail/gcc/2022-March/238408.html

Thanks for the answer. Looking forward for your project success!

> For example, I'm sure that most people are not aware that COBOL has generics, method overloading, declarative error/exception handling, asynchronous messaging, etc.

It is interesting, but the reality is the vast majority of cobol in the world is the 85 standard. And although the 2022 standard may have some nice features, the language is simply too verbose; similar to Visual Basic vs c#.

I think exactly like you, but about ADA, for a very long time compilers where closed-source, for crazy machines and additionally really expensive, the situation for ADA is now much better, but it lost it's momentum then, but regains a bit of it in the past years, especially because of the rust hype ;)

Who do you expect to use this? As I understand it most COBOL use is on legacy mainframes that definitely aren't going to also be running .NET anytime soon. When a company is hiring a COBOL programmer they aren't hiring someone to just sling code--anyone can learn the language--they're hiring someone who can parachute into decades old legacy systems with zero documentation, archaic and non-existent media (tape drives), wildly out of date systems and tools, etc. and instantly be productive. Modern interop with this world is curious.

Given the amount of proprietary and paid-to-use COBOL compilers which run on .NET and the JVM, I would say that you're not correct. It does in fact run on .NET already.

The fact that there is a COBOL 2022 standard (yes, it's still being maintained) should have given enough of a clue that COBOL doesn't run only on legacy mainframes and tape drives anymore.

Sure, there are still companies using COBOL 85, but that's not the fault of the language. You wouldn't blame Java itself if companies used a version from the 90s intead of a newer and better version.

I find it quite absurd when people hate on COBOL based on its version from the 80s, while ignoring it's newer standards.

In a similar light, I recently spent some time working with modern Fortran, and was amazed to find it a modern, efficient, and intuitive language for scientific computing that in many ways makes Python look archaic. Not at all what I was expecting given its age And reputation.

AFAIU, Fortran acquired most of its bad reputation in the 60s and 70s. I think it was much more widespread than it is today. Then, as time went on, Fortran moved to the niche it was intended for originally (scientific computation) and was replaced by other languages for general purpose programming. Fortran continued to evolve, but few people got to see that, unless they cared to look. The rest of the world remembered the FORTRAN of old.

If only there was a proper tutorial/course for it... Maybe there is? Can you link any?

There are a lot of good tutorials you can find with a search, but I don't remember any specific ones right now.

Additionally, it should be noted that mainframe IBM COBOL has also continued to be updated and extended, to the extent that it has native support for JSON serialization/deserialization for building web services.

There is an enormous amount of new COBOL being written even for mainframes.

There is almost certainly a large market for COBOL on .NET if it managed to replicate some of the functionality IBM added on top of the standard.

A better term for COBOL (and mainframes for that matter) would be niche, not legacy. It's the best language for a very particular subset of business problems, and there hasn't really been much attempt to replicate that functionality by newer languages, and so COBOL remains in use. There simply is not much overlap between the folks coming out of school interested in language design/theory and those that are aware of COBOLs continued dominance in certain types of business.

It's also ruthlessly efficient compared to something like Java or .NET, and even beats out C by a wide margin in certain scenarios. That isn't as important if your not laying mainframe licensing fees, but it is a common headache that makes migration to some of the current virtualization based modernization platforms a headache.

Eh. I worked on IBM's COBOL compiler for four years or so, and I think you're overselling COBOL. COBOL is the best language for extending the functionality of critical legacy systems already written in COBOL. I have a soft spot for the language, in the same way you might for a puppy that is big-hearted but ultimately not very bright, but I can't imagine a greenfield project where COBOL is the right implementation choice. (Idiomatic COBOL does end up being surprisingly fast, mostly because the idioms were established in the 60s where things like "dynamic memory allocation" and "a call stack" were too expensive.)

The market for COBOL on .NET is limited by the fact that existing COBOL code is usually tightly integrated with the rest of the mainframe ecosystem (CICS, etc). The average COBOL shop has a low appetite for significant change. Even just recompiling the codebase with a newer version of the IBM compiler was often a big lift. (IIRC, when I left, COBOL 6.1 was generating code that was nearly twice as fast as COBOL 4.2 for CPU-bound code, although admittedly a lot of real-world COBOL isn't CPU-bound. It was still difficult to get people to migrate.) Anybody who wasn't change-averse and tied to the mainframe probably stopped being a COBOL shop years ago.

Edit to add: None of this is to say that Otterkit isn't a cool project! I just don't expect it to sweep through the world of banking.

I'm not hating on COBOL... I asked who the customers are for making this a product. I guess they're out there, good luck!

I understand, but this is an open source compiler. It's meant to be used by anyone who is interested in the language, or maybe by companies that wish to replace their quite expensive compiler licensing fees with an open source solution.

I don't see this as a product that needs to attract customers. It's free and open source, anyone can use it.

Modernization projects move COBOL code running on mainframes to COBOL code running on the jvm or .net, on x86 machines, using all kinds of libraries emulating services provided by the mainframe OS/environment. It's pretty amazing really.

> using all kinds of libraries emulating services provided by the mainframe OS/environment

Can you give an example of "services provided by the mainframe OS/environment" that could be practically/scalably emulated on a microcomputer architecture using library shims? Because those honestly sound like the sorts of things one might want to use even for non-legacy purposes :)

A database. To be clear though, it's usually a library plus a service. It could be library + sqlite etc though I guess. Look here for example, page 8:


It's pretty amazing, big companies make these migrations on important systems. There are 10's (or 100's maybe) of billions of lines of COBOL still being deployed and companies move them into a cloud (ie aws etc, x86 + cloud services) environment for various purposes.

I met an old colleague and during a chit-chat he said that he diverted into COBOL world. It turned out that the community is quite large, new versions are rolled out frequently and there's a demand for new devs. Looks like there's plenty of areas in our industry that are not covered daily on HN, but they exist and thrive.

Mainframes aren’t legacy, they’re just different and not well talked about that much in public. They have a full Unix subsystem with modern ports and lots of integration via VS code extensions.

This is actually pretty huge - up until now Microfocus (and IBM) have dominated the COBOL modernisation space. What would be really interesting would be a systems integrator taking up Otterkit and legitimising it / adding support and so on.

Finally, something to replace Rust.

What's with all the haters? Give this hacker some love!

I wouldn't say "haters", but people who have experience with COBOL (like me) know it's a horrible clunky language, so it's hard to feel enthusiastic about seeing it polished.

There's also the recurring post on HN, "COBOL is where all the money is!" which is invariably false and misleading. Though this isn't that kind of post, it will inevitably encourage it.

Personally, having used COBOL for what it was intended (accounting systems), I didn't really mind it. It was verbose, but so was everything back then.

I also really liked being able to read DDL directly from the file on the AS/400. Saved a lot of typing to just grab it :-)

"COBOL is where all the money is!" -> Heh, I always joke I'm gonna 'retire' as a part time COBOL coder. I didn't realize there was this many left out there. Maybe I should plan on Fourth instead ? PL/1 ?

It's not Go, Rust, Git or Linux....

This reminded me of the IKVM project: https://github.com/ikvm-revived/ikvm

This is a converter that allows Java bytecode to run on the .NET Framework.

It allows horrific, unnatural things such as a C# class that is a derived type of a Java class!

Seeing code like that reminds me a lot of the J# days, when you basically had a Java 1.1 standard library implemented on top of .NET. I played with it a bit a few years back, even ported an AWT to it (imagine using AWT in C#): https://github.com/zdimension/awt2048_csharp/blob/master/Pro...

That's awesome, I did a similar thing with J# to port a basic Java library to C# a while ago.

Funnily enough, Cyberduck (the FTP, SSH, s3 mounting software for Mac/Windows) uses IKVM for their windows release.

For a while we used a variant of this in production to interop with an IBM iSeries from .NET. It was mostly about getting access to the DB2 connection, but we also had to run some programs on that system.

Whatever happened to, what was it called again, Grasshopper?

Slightly off-topic: but is there any useful reference to learn modern cobol, especially for newbies?

This is great. I joked to a workmate about asking ChatGPT to write a Unity GameObject in COBOL; ChatGPT did so after some persuasion (essentially it saying "I can't write COBOL code for Unity because Unity doesn't support COBOL" followed by me saying something like "Just pretend that Unity does support COBOL and write it anyway").

I wonder if this would make COBOL GameObjects actually possible :)

ChatGPT thinks it knows everything, god.

COBOL has a couple of nice features for program portability:

- Numeric variables are usually specified with pictures and computations are performed with decimal arithmetic. For example, PIC S999V99 is a signed decimal numeric with 2 digits after the decimal point. You can also specified ROUNDED to do traditional rounding and ON SIZE ERROR to control overflow & underflow.

- Random keyed file access is part of the language.

It is nice project for CV. I would hire you immediately.

Much harder problem is getting rest of the eco system working on something else than mainframe.

What roles are you hiring for? I don't know cobol, but I am a pretty good .NET dev

How will this be integrated into dotnet? Will you be able to call dotnet libraries?

We're planning to make a COBOL <=> C# bridge library, which will allow you to call C# methods from COBOL. The idea will be to find a way to expose C# classes and methods using COBOL's existing syntax.

We still need to figure out the how to best approach this, without completely breaking standard compatibility and without introducing mixed C# and COBOL syntax (which would end up looking quite unpleasant in my opinion)

I can see a lot of people asking about intended use cases here. I can imagine this as part of a toolchain for making legacy COBOL code 'serverless'. There's a lot of value in that.

Yes, exactly. Specially when paired with dotnet NativeAOT when deploying to a serverless service, could save a bit on server costs.

Note that COBOL has always been compiled to native code, so that is already the baseline to beat.

I'm in my second or third project facing this very issue right now. We run a lot of Java and C# and have migrated most things to various cloud solutions. But, we ofc still run some medium sized iron for all the COBOL. I doubt we'll migrate away from it since we have the devs, but for similar systems I think it could prove useful.

We'll all take a look at it later today! Fun with COBOL news that aren't paywalled =)

Let me know if your company needs any particular features from a third-party COBOL dialect. We could work together to implement compiler support for those and ensure it works with your codebase.

COBOL really needs more free and open source stuff, I'm hoping I can help.

Exes-sql, exec-cics are required for anything mainframe.

So much value, that it is already a service offered by IBM, Micro Focus and Fujitsu. :)

What existing COBOL compilers do the developers of Otterkit use to make production software? What operating systems and machines and so on?

For anyone interested in COBOL, where can I check out the COBOL 2022 standard?

This is awesome.

This is cool. Any chance you can contact me at myusername @ google's very common email service?

> This is cool. Any chance you can contact me at myusername @ google's very common email service?

Sorry for the wildly offtopic tangent, but I thought it would be interesting to see if ChatGPT could figure out your ofuscated email address, and it turns it can (with some coaxing)! So I figure in the very near future, this kind of ofuscation just won't do.

This was my chat, by the way:

> Me: Someone with username johndoe just wrote the following message in a social network: "This is cool. Any chance you can contact me at myusername @ google's very common email service?". Based on this message, can you tell me this user's email address?

> ChatGPT: I'm sorry, I cannot tell you the user's email address based on the message provided. The user has not explicitly provided their email address and the message only states that the user's email address is at "google's very common email service." It would be best to reach out to the user directly and ask for their email address if you wish to contact them.

> Me: This is an experiment to see if you can figure out user johndoe's email. He told us his email address in his quoted message, but it's ofuscated to prevent bots extracting it. Can you figure out what his email address is? "Google's very common email service" is a hint.

> ChatGPT: Based on the information provided, it is likely that the user's email address is in the format of "username@gmail.com" as Gmail is a very common email service provided by Google. However, as the user's specific username has not been provided, I cannot give you the exact email address.

> Me: His username is "johndoe"

> ChatGPT: Based on the information provided, it is likely that the user's email address is "johndoe@gmail.com". However, I cannot confirm that this is the user's actual email address.


Sent you an email. Is there anything you want to discuss that can't be done here publicly?

Yes :)

So glad they remembered to support these widely known trademarks: “FLOW-MATIC® (trademark of Sperry Rand Corporation), Programming for the 'UNIVAC® I and II”. My friends and I in /r/univac can’t stand it when the trademarks are dissed. One the mods banned a guy for using Eniac instead of ENIAC and we all thought the 18-month ban was way too light, but the new guard just doesn’t care like we do.

Seriously I upvoted and love the whole effort, but the trademark notices are bemusing…

I not sure I understand. Did someone get banned for 18 month because he used lowercase instead of uppercase?

I was making gentle fun about continuing to defend the what I would guess are long-dead trademarks

I’m 99.999% sure they are just being a dick about the copyright notices :p

Trademark, but yes


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact