Hacker News new | past | comments | ask | show | jobs | submit login
How small is the smallest .NET Hello World binary? (washi.dev)
413 points by susam on July 9, 2023 | hide | past | favorite | 210 comments



This "minimum viable program" stuff is great!

Michal Strehovský used the framework currently known as .NET (as opposed to the Framework formerly known as .NET per OP) to create a snake game in under 8KB with no .NET [nor .NET Framework] runtime dependency. (Medium sucks and Twitter isn't useful right now, but I believe ~"hello world" supported Windows 3.11.)

https://medium.com/@MStrehovsky/building-a-self-contained-ga... [https://web.archive.org/web/20200103110836/https://medium.co... | https://archive.ph/b6qXE]

https://news.ycombinator.com/item?id=22010159 (2020)

https://news.ycombinator.com/item?id=22104734

You can see the latest work (and sponsorship!) at https://flattened.net: "bflat - C# as you know it but with Go-inspired tooling".


To continue on the “Go-inspired” C#, .Net has System.Threading.Channels!

https://learn.microsoft.com/en-us/dotnet/core/extensions/cha...


I love channels in .NET. I use them in our .NET web services to pass work items from controllers to background services without having to directly couple the controller & background service classes.


How do you do this? Does the controller then not know what the receiver is or wants in any way? Surely the controller needs to put something on the channel in the correct format?



Yes, the controller consumes an instance of ChannelWriter<T> where T is a job type, and the service consumes an instance of ChannelReader<T>. With basic dependency injection, this means you have to set up & register different channels with different types if you need to coordinate with multiple services, but I think that's a small price to pay.

Having a job class with a type parameter for its encapsulated workload saves you from having to actually write different job classes, at least. It's trivial to write a generic extension method that takes the encapsulated type as a parameter, sets up the channel with your preferred options, and registers the reader & writer for DI. Then you have a one-liner for adding this kind of channel support to your projects.


What are some of the benefits of this approach compared to calling an interface of your services returning Task<T> if needed from you controller?

Some of the benefits of this in my view is: - Clear view of how many dependencies a controller has, though the dependencies are abstractions - Easy to Control+Click into the service in your IDE. - Easy to "Find all usages" of the service. - Easy to mock in unit tests, just mock the interface - Can be decorated if needed with Scrutor - Fast, just a method call - Leverages Task if async is required, easy to e.g. call a service get id returned to use to call another service.

How would you do these things with ChannelWriter and ChannelReader, do you have an example of this approach somewhere?


I'm glad you asked, because I should clarify that my use case is "fire and forget" tasks. The controller is not interested in the result of the service's operation, so there is no need for any return value.

Regarding your other points:

Dependencies can be traced by the job type associated with the channel. It is somewhat indirect but no more indirect than an interface, really.

Since I am not concerned with results, mocking is even easier. I just have a service implementation that consumes from a ChannelReader and discards the result. This is really a single class since it can take a type parameter if that makes sense--no need for separate mocks.

I haven't used Scrutor, so I can't speak to that.

Reading and writing from a channel are also just method calls. There are synchronous and asynchronous options for both.

I don't have any open code to share, but I found the design very intuitive. If you read the documentation and follow your intuition, you will probably implement things exactly as I did.

Also I am not trying to convince anyone that channels are the best answer! I just found them helpful in my projects and think they have not been promoted very well in the .NET sphere.


Why didn't Microsoft implement the arrow syntax for channels? It makes it much less confusing than passing it as parameters.


Channels in C# are just a regular library, it would be overkill to introduce syntax changes to the language just for this. Btw, you have other ways to get the producer-consumer pattern in .Net, Channels are just one of the various options but you also have BufferBlock from System.Threading.Tasks.Dataflow (with a more complicated API).

https://learn.microsoft.com/en-us/dotnet/standard/parallel-p...


Prior to channels, we'd add a ConcurrentQueue<T> from System.Collections.Concurrent to the service then expose a public method that enqueued an instance of T. Controllers then received the singleton instance of the services through DI and would invoke that method to add a job item to the service's queue.

Obviously, that created a direct coupling. You could set up an interface and have the controller consume the interface instead, but it was still a coupling in practice. I suppose you register the public method as a delegate or something instead and just inject the delegate, but that's a bit hacky to me, and I think my juniors would find it confusing.


If you really wanted to you could create a simple Channel wrapper that operator overloads some visually arrow-like operator, exactly such as C++'s famous stream overloads using << and >> byte shift operators as "stream operators".

That sort of "cutesy" operator overloading is generally frowned upon in C# best practices, but still possible if you don't mind some C# developers either frowning or laughing at you.


The widely used MediatR library[0] could be used to do that as well, just FYI.

[0]: https://github.com/jbogard/MediatR


For the love of God, do not use MediatR. Especially in scenarios Channels are designed for. It is mentally confusing and bloated abstraction, in most places simply resulting in more code achieving the same result at 0.25x(best case) speed.


It's not for the speed but separating parts of the same monolithic program into a microservice like structure.


This one, that is the problem. It makes it impossible to logically follow (without reviewing all type references) the flow of execution - just call a service directly instead of inventing 3 contexts and 2 intermediate message handlers that end up just passing the data to one consumer at the end.

The fact that this does not raise eyebrows is one of the (avoidable) reasons, among many, why C# gets bad rep. Go solves this by the virtue of its design that makes it painful to invent big brain solutions that should have stayed simple. This is where C# has much longer learning curve and requires judgement when applying the tools at hand.


I never heard of MediatR before today, but what do you think of message buses like IMessenger from MVVM Toolkit? https://learn.microsoft.com/en-us/dotnet/communitytoolkit/mv...

I've been mostly using C# for GUI applications and recently have been thinking about relying on a message bus as the main communication device between services and view models. As you mentions that complicates the flow of logic and ones loses benefits of explicitly listing your dependencies you get by just injecting all your services via constructor parameters. But that makes it simpler to refactor complex view models and services without impacting the rest of the codebase (at least in theory, it's of course often messier in practice...).


Oh yeah, GUI applications impose completely different requirements and use messaging very often, the comment above was specifically addressed at a popular anti-pattern in back-end services.

Community Toolkit code is well written and often exists as "does not quite meet the bar for including in standard library but otherwise solves common use cases".


Thanks! Yes, I found all the community toolkit libraries to be really high quality, Sergio Pedri does a fantastic job.


>Can C# apps hit the sizes where users would consider the download times instant? Would it enable C# to be used in places where it isn’t used right now?

I wonder if this could be used to make C# webassembly more viable. If I remember, a significant part of the download size is the .NET libraries (themselves written in C#?)

On that note, is there any 2D game framework/engine that targets the browser that uses C# and produces small binaries?


> I wonder if this could be used to make C# webassembly more viable.

You might enjoy reading this GitHub thread [0] where the community contributed a WASM library wrapping the C# regex code so that regex101.com could have a "C# mode". Lots of nerd sniping about reducing the payload size.

(There's also another thread [1] discussing the minification of a rust version of that same regex101 wasm library to provide a "rust mode" using @burntsushi's regex crate.)

[0]: https://github.com/firasdib/Regex101/issues/156

[1]: https://github.com/firasdib/Regex101/issues/1208


>I wonder if this could be used to make C# webassembly more viable. If I remember, a significant part of the download size is the .NET libraries (themselves written in C#?)

Until WASM ships GC, you're stuck with having to download an entire runtime regardless.

>On that note, is there any 2D game framework/engine that targets the browser that uses C# and produces small binaries?

Godot supports C# scripting and WASM compilation, probably your best bet.


Yeah, I was wondering if the techniques described in these articles could reduce the size of the .NET libs in the wasm bundle. (I'm not sure what language the runtime itself is implemented in, I think also mostly C# for (formerly known as) .NET Core?)

Thanks for the Godot tip. I haven't checked it out in a while. I found it very counterintuitive (and a 3D engine is definitely overkill for small 2D games... I basically just want Flash... I guess Phaser or Haxe are my best bets).


Godot was originally 2d (only?). At any rate the 2d portion is much more mature than the 3d.


Unity can hit below 3MB[0]. Afaik Godot builds are a bit larger due to limited code stripping and builds not being compressed out of the box.

[0] https://github.com/JohannesDeml/UnityWebGL-LoadingTest


I think godot exports to web, but I’m not sure if you’d consider it small binaries.


Michal and the whole CoreRT team saved C# to me, without NativeAOT today, I'd never ever consider using C#, specially when Go exist


This article will confuse people since the executable in question requires a runtime installed on the system (in fact, it uses .NET Framework 4.7.2, in 2023!). Most likely fair comparison for such solution would be .jar or even .py files.

A good source of truth for an actual executable would be Native AOT binary, which can run on Windows, Linux or macOS. Naturally, it will be much larger, having to incorporate GC, possibly ThreadPool, Console and other auxiliary code.


Not necessarily - the actual output exe doesn't actually use the .NET framework for anything at all, besides invoking the main. The actual logic of outputting the bytes to the console is done via a pinvoke (c# ffi) call to the underlying unmanaged (non-.net) code exposed by ucrt.dll

But anyway, this other article [0] (shared in another comment here) about creating the smallest version of Snake (the game) was done on top of .NET Core (the new version of .NET that isn't bundled with Windows) and effectively does the same thing, but actually produces a non-.NET binary, if that's more to your taste.

[0]: https://medium.com/@MStrehovsky/building-a-self-contained-ga...


> the actual output exe doesn't actually use the .NET framework for anything at all

except for runtime environment initialization, JIT compilation, and execution.


A comparison between a fully statically linked native binary and one that loads some or all libraries dynamically would equally be vastly unfair.

I think comparing container images would be a good start. Use the most minimal base image you can get away with, or even "FROM scratch" if possible, but compare only images with the same architecture. I'd prefer 32-bit, to take things like running 32-bit on 64-bit or 64-bit with pointer compression out of the picture.

Then compare the size of the uncompressed exported tar. Probably also not completely fair, depending on what question you want to answer but it takes the obvious variances out of the equation.

EDIT: Thinking about it some more, it might even be more fair to compare maximally compressed image size to account for compression within the container. Of course you'd have to compress with the same algorithm and parameters or just add the decompressors size to the final result like they do in compression benchmarks.


It’s funny because back in the day these were these kind of discussion points thrown around in the Lisp forums when folks would balk at “executable sizes”.

Specifically about how C leverages the “free” runtime of the OS while Lisp has to bundle is own.

This idea of container runtime sizes would be an amusing thread. A good test of something like Nix or Guix to a very fine grain. “We don’t need bash or ls so we removed it. “ “The VM only simulate a single network card, so we’ll scrap all the other driver. “


C doesn't leverage the free run time of the OS, when that OS is Windows. Your hello.exe size has to include the run-time DLL's required for it from the C implementation that was used to build it.

Windows has only recently been moving toward providing a public run-time for C; the "Universal C Run-Time" (UCRT) that has been introduced in Windows 10.


> C doesn't leverage the free run time of the OS, when that OS is Windows.

Stdio, no. Printf, no (unless you’re an adherent of LIBCTINY[1] or one of its descendants[2] and need only a minimal set of features). Malloc, yes—MSVC’s malloc() on Win32 has always been a thin wrapper around kernel32!HeapAlloc. Strlen or memcpy or whatnot—also possibly yes, but it’s not like minimal implementations of those are large anyway.

[1] http://bytepointer.com/resources/pietrek_libctiny_1996.htm

[2] https://www.benshoof.org/blog/minicrt


For some reason, even with the UCRT, binaries seem overly bloated. I've been toying with Zig over the past couple weeks and I really like that it doesn't rely on a C runtime on Windows. Plus it makes it easy to operate entirely with UTF-16 strings, which is a nice to have when I'm writing a utility that is specific to Windows.


The ucrt is included as far back as Windows 7 (if you install updates).


The increment in disk space usage caused by the update has to count toward the effective size of the hello program, experienced by the user of that installation.


.net program on 8bit IOT OS such as contiki [0] would be lot smaller than 32/64 bit OSes. (assuming not requiring use of native 16/32/64 bit routines ).

[0] https://en.wikipedia.org/wiki/Contiki


My hello world requires the "bash" framework and is zero bytes. You call it via:

    echo "hello world"


> echo "hello world"

I count 18 bytes in that program. Too long!

I left a definitive answer on Quora some years ago:

https://www.quora.com/What-programming-language-has-the-shor...


Those 18 bytes aren't the binary, they're just the command to call the program. Don't count it just as you wouldn't count the length of the filename and its arguments for a regular executable.


If you insist on categorizing `echo "hello world"` as the invocation, rather than the program, then what is the program under your categorization? I'll submit that it is the `echo` binary, which is a whopping 35kB on my machine.

I'll just observe that there is no way to compress "hello world" below a certain size (definitely not to size 0). If you think you have, you've just "moved" it into, say, the framework/os/input/algorithm/etc.

But this fun little debate we're having here is actually connected to some deep theoretical questions, like Kolmogorov complexity, its invariance theorem, and applications to the concept of data compression [0].

[0] https://en.wikipedia.org/wiki/Kolmogorov_complexity


I’m intrigued, please share your 100% text compression rate algorithm!


Just make sure you have a filesystem that can take arbitrary-length filenames! Then you can compress to zero byte file with a very long filename ;)


I guess if you store the filetable in unlimited S3 storage, maybe ;-)


TXT records in Route 53 for 100% SLA


Infinity compression.


I once ran gzip and infinite number of times on a text file. I was surprised at the result.


Should it become idempotent after a certain number of iterations? What were the results?


I'm pretty sure part of the contract with gzip (and compression in general) is that applying it N times is undone by decompressing N times.

The size definitely gets bigger with each iteration:

  $ echo text >0.txt
  $ for i in {0..9}; do                             
  gzip <$i.txt >$((i + 1)).txt
  done
  $ ls | sort -n | xargs -n1 wc -c
       5 0.txt
      25 1.txt
      46 2.txt
      69 3.txt
      82 4.txt
     105 5.txt
     120 6.txt
     143 7.txt
     161 8.txt
     184 9.txt
     207 10.txt


He doesn't have the results, yet. He's running it in an infinite number of times.

Without actually checking, the result is going to be that the output size increases slightly over time.


I was in quantum space at the time so results are hard to translate. The best approximation to our current reality is a black hole.


How about this, assuming there must be a stand-alone "binary":

  cd /tmp
  echo 'echo $0' > hello\ world
  chmod +x hello\ world
  PATH=. hello\ world


I am afraid it requires "echo" runtime environment, doesn't it?


yes but it’s very easy to install if you have npm


That almost launched a sip of morning coffee... Well played!


echo is actually a bash builtin, so, not really, unless you're using another shell that doesn't have builtin echo.


whoosh...


You can skip "echo" and let the error message output "hello world" along with extra text!


My hello world in php. Requires php.

Hello World


Are you sure that is zero bytes?


The .NET Framework is built into Windows, and a mandatory component of it since at least Vista. .NET Framework 4.7.2 specifically should be built-in on supported Windows versions.


You never have the right version installed though. Every time I've tried to install a .NET program it's always asked me to install a different version than the one I had.


Sounds like confirmation bias, because when the right version of .NET is ever used, you probably never knew it was a .NET program to begin with.


Not true in my experience. The "look and feel" of the program usually gives it away more or less immediately.


> The "look and feel" of the program usually gives it away more or less immediately.

If you are talking about the base controls, then maybe. But there are .Net cross-platform frameworks such as Avalonia that can get you a modern loooking UI with theming.

https://github.com/irihitech/Semi.Avalonia

https://github.com/AvaloniaUI/Citrus.Avalonia

etc.


Probably, if that app uses WPF, which is "self-drawn" GUI library. However, if .Net app uses WinForms, that API is just a wrapper over standard Win32 controls and it looks like any other old school Win app.


You're right, but there are a few subtle differences here and there that often make Windows Forms recognizable.

The best example I got off the top of my head is KeePass v1 [0] and KeePass v2 [1]. v1 is written in C++ with native controls, and v2 uses Windows Forms.

If you look at the menu bar and the toolbar, you'll see a difference. Most notably the drag handle on the left, and the search box on the right, in v2. The difference is often a bit easier to spot on Windows 7.

[0]: https://keepass.info/screenshots/main_big.png [1]: https://keepass.info/screenshots/keepass_2x/main_big.png


The blurrier font in the non-native one is another difference.


I'm surprised Microsoft still isn't pushing .NET 6 (and the MSVC runtime, for that matter) to everyone with Windows Update. They're not very large, most consumers will want them, and picky enterprises could opt out.

It's an odd annoyance that Windows developers have to deal with.


Getting your thing adopted as a Windows component and distributed by Windows Update is a common trap for Microsoft developers. It's always a mistake if you're in some DevDiv or random app team instead of the Windows team. Windows is the slowest-moving product that Microsoft has, and users don't install updates. Hitching your wagon to Windows means no two users will have the same version installed, and getting people to update .NET via Windows Update is a nightmare. God forbid the Windows team decides to stop supporting a version of Windows that is still commonly use in industry; they'll never get a .NET update again.

As a user, think about how annoying it is to get a message saying you need to run Windows Update before you can start an application. Totally unnecessary own-goal for the team that decided to ship their independent component in Windows Update.

It's way easier to either 1.) go self-contained, or 2.) use the on-the-fly .NET download for a framework-dependent build. I absolutely think they made the right call removing current .NET as a Windows component. The annoyance was far greater when .NET Framework was part of Windows.


They push updates to .NET if it’s installed. I can understand them not installing if it’s absent, because there’s no good story for what to do when that version goes out of support. If they leave it, then customers have insecure unsupported software on their systems; if they remove it, they’ll break apps that depend on it.


Because the modern way is to ship it with the application and preferably use trimming.


Congruent with runtimes (both .NET and native) for Microsoft languages always being kinda weird. "Uh yeah don't use that CRT in %WINDIR%, that's not supported! Everyone need to bring their own... but not as loose DLLs, that's not allowed! Use the MSI package and install it whenever!" (I think most of those restrictions have been removed in the last couple years, and MS also settled down on VC++ ABI and VCRT changes and introduced UCRT in Windows 10, so)


Do we care about “fairness” or “reality”? Since it’s impossible to have a modern Windows install without .net, and Linux and MacOS don’t have it by default, it seems an odd way to be “fair”


> Windows install without .net

correction: without some specific version of .net.


Is there backwards compatibility between .NET Framework versions? Which version number should I request in my ultraportable executable? (Java seems to work fine most of the time eith.some javac flags.)


> Is there backwards compatibility between .NET Framework versions?

In theory: yes. In practice: mostly. There can only be one .NET Framework 4 installed at a time; the recent changes are found at https://learn.microsoft.com/en-us/dotnet/framework/whats-new.

> Which version number should I request in my ultraportable executable?

This gets tricky if you want to support more than Microsoft does. Here is the details on recent .NET Framework versions per OS: https://learn.microsoft.com/en-us/dotnet/framework/install/g... and ancient of days: https://learn.microsoft.com/en-us/archive/blogs/astebner/mai....

For example: Microsoft currently supports Windows 10+, which first included .NET Framework 4.6. However, Microsoft only currently supports v4.6.2+, and v4.8 has been "Windows Update"'d since May 2019. I personally bumped an old open source project from v3.5 to v4.8 recently because of how hard it was for myself as a returning contributor to build these days.


From my understanding, .NET Fx 1.x and .NET Fx 2.0 will likely "always" be supported on Windows. Both Fx 3.x and Fx 4.x have no trouble "pretending" to be 2.0 in a backwards compatible way and 1.x is small enough that Windows just still bundles it in about the same way that Windows still bundles the VB6 runtime.

If you want the most "ultra-portable" executable for .NET Framework, you could choose 1.1 or 2.0. Picking 1.1 in 2023 is about as silly as picking VB6, it also won't feel just about anything like modern .NET. 2.0 feels a lot more like modern .NET (especially because that's when Generics and Generic Collections first exist), but also not really something I'd recommend in 2023. (But in theory targeting .NET Fx 2.0 gets you ultra-portable all the way back to Windows 98.)


.NET Framework is still officially supported and targeting anything higher than 4.7.2 is unnecessary since there are no new APIs in 4.8. 4.8 is just a drop-in replacement for 4.7.2 with things like better high DPI support.


How about any of the modern .NET versions? They're on .NET 7.0.8 nowadays.


.net core is at 7. This article is about. Net framework which stopped at 4.8


.NET Core stopped being called .NET Core at version 3, after which it was renamed .NET, and Microsoft announced it was meant to supersede the old legacy .NET Framework. The article opens with asking itself how to get the smallest .NET executable, and then for some reason limits itself to this legacy version.


> Microsoft announced it was meant to supersede the old legacy .NET Framework

What actually never happened, to nobody's surprise.

So now we have .Net that was renamed into .Net Framework that is legacy, .Net Core that is legacy but compatible with the modern version, and .Net that is current. Anyway, the platform never stopped being called .Net, because it's larger than just the runtime.

We also have 2 different number sequences starting from 1, and one starting from... some times 4, other times 6, depends on your point of view.

We also have a bunch of confused people without any reason, because all of this is as clear as water. But anyway, it's not the author fault that he didn't communicate the version in an adequate way.


> .Net that was renamed into .Net Framework

This is not correct. .NET Framework was named Framework from 1.0. The only time something was renamed is .NET 5 which came after .NET Core 3.1.

> We also have a bunch of confused people without any reason, because all of this is as clear as water.

It's funny you say that. Do you consider yourself one of those confused people? :)


Haha, the burn.

I remember back in the day - around 2001 - Microsoft thought it will center all their products around Web Services and call them .NET. Windows .NET Server was the supposed name for Server 2003. In the end a few things came out of it: Visual Studio .NET, .NET (the framework), VB.NET, ASP.NET.

https://en.wikipedia.org/wiki/Microsoft_.NET_strategy


> Do you consider yourself one of those confused people?

Yes. I have never took place in a conversation about versioning problems in .Net where each person wasn't talking about completely different things.

Anyway, I clearly remember nobody ever naming anything "framework" until the second or third stable version. And if there was such a thing, I would probably have heard, because MS was incredibly loud at the time.

It was retroactively renamed after it.


As I recall it, even 1.0 was always referred to as .NET Framework. There was a million .NET "brands" from Microsoft when .NET Framework launched. .NET was the overall "initiative" and .NET Framework was only one in that "portfolio". Then Microsoft got bored with most of the other ".NET brands" and .NET Framework was last .NET standing. It wasn't until 2.0 or so that I recall people felt safe calling .NET Framework just ".NET" without feeling confused about other .NET branded things. Microsoft's own branding advice never dropped "Framework" from .NET Framework even deep into the 4.x timeframe so long divorced from "the initiative" and no other remaining uses of .NET as a brand.


It didn't start from 4 but 5. they skipped 4 because people would confuse it for .net framework


> What actually never happened, to nobody's surprise.

Unless they provide feature parity, it will never happen.

A working WinForms designer for third-party controls (read: any control not provided by the framework itself or NDA-ed vendors) in Visual Studio would be nice, for example.


.NET Framework 4.x is built into Windows, and .NET Framework 4.x binaries are understood by the Windows executable loader. The modern .NET must be manually installed and the executables must take care of launching the runtime on their own.


I think this will miss the point though. You can consider the specific .NET runtime as its own compute platform / operating system, regardless of what has to be installed on the machine for the program to actually run, and explore the limits of it. It will teach you a great deal about the binary format of the programs.

You can do this for any format that stores executables, for any programming framework. .jar might be interesting, .py obviously isn't. You can do this for obscure old formats, or for something very common like the standard Linux .elf.


I disagree. I cross-compile F# sources with .NET on my Mac to get a stand-alone Linux executable without additional dependencies that I can upload to my server host. It's enough that I maintain the .NET stuff on my development host, I've got no desire to duplicate that work on the public host.


What do you disagree with?


Considering the runtime as something separate from a single program. I want both in the same binary.


It's not about considering this or that, but about making a fun little exercise that tells you how the binary format works.


> .py obviously isn't

Although .zipapp is to Python what .jar is to .py. Probably even closer to .war.

But for purely stand alone stuff, "nuitka3 /tmp/hello.py --standalone" does output a executable that can be used without the user to manually bring in the Python run time. In that case, the hello world is about 16Mb on Ubuntu.

It would be interesting to do this with MicroPython though.


Microsoft themselves just released a new product based on .NET Framework 4.7.2, so don't be that hard on the authors.

https://www.infoq.com/news/2023/06/logic-apps-custom-code/


That’s a logic app plugging that allows for compatibility with framework code. M$ is a behemoth that supports old legacy code from businesses that don’t want to upgrade so I’m not surprised that they’re making things like logic apps compatible but this isn’t really a “new product based on framework”.


As far as I understand, VS 2022 is also written in .NET Framework. I am not sure exactly which version though.


>This article will confuse people since the executable in question requires a runtime installed on the system (in fact, it uses .NET Framework 4.7.2, in 2023!).

That's fair though. The post is titled "the smallest .NET hello world binary", not "the smallest C# hello world binary"


Always see this sort of pedanticism with topics. Are we gonna start counting the bytes of firmware on all chips on motherboard as well?

Display drivers and firmware of various mcus used in the computer monitor itself?


For a tiny, single cross-platform binary, check out Cosmopolitan libc.


Closer analogy would be .pyc files. Precompiled bytecode.


On MacOS at least all binaries are dynamic (except dyld itself) so I think this is fair. This is because everything is supposed to link I believe some kind of runtime dylib instead of doing e.g. syscalls. This includes anything written in C, for example. Size of binary with dynamic links is the most fair comparison between anything running not on Linux.


According to Apple you are supposed to link to libSystem.dylib for syscalls, but there's obviously nothing stopping you from calling into the kernel directly.


> According to Apple you are supposed to link to libSystem.dylib for syscalls, but there's obviously nothing stopping you from calling into the kernel directly.

As a matter of OS design, this is no longer obvious:

https://lwn.net/Articles/806776/

> A new mechanism to help thwart return-oriented programming (ROP) and similar attacks has recently been added to the OpenBSD kernel. It will block system calls that are not made via the C library (libc) system-call wrappers.

MacOS doesn't implement that, sure, but it could.


On a side note, did anyone notice that the author's location is McMurdo Station, Antarctica? I didn't think they would have someone of their skillset at that station: https://usscar.org/directory?combine=&field_usscar_nsf_progr...

Could be a fascinating story.


I've been down to McMurdo for a research trip. A couple of things that outsiders find surprising is that most of the people there are support staff and only a small fraction (maybe 20% or so) are the scientists. The support staff are basically people from all walks of life... fireman, carpenters, cooks, mechanics, etc. Basically everyone you need to make a modern city run. However, because of the selection processes, those folks are almost all extremely talented and at the top of their specialties. The number of "swiss army knife" individuals I met with very broad skill sets was astounding. The challenges associated with living and working down there tend to draw quirky and motivated individuals like bugs to a light. And many return year after year. It's a wonderful community.


Can a normal software engineer ever hope to go to McMurdo?


This case I don't understand the sentence: "This was a dumb way to spend my Saturday." What else you could do there?



How did you find it?

I don't see it in .net meta data nor about


Github profile.


As every section needs to be aligned to the smallest possible section alignment of 0x200 bytes (1KB), we inflate our file by at least that amount of bytes just because of that.

0x200 is 512, or 0.5K. It's been a long time since I've done PE size optimisation at this level but I remember 512 was the minimum accepted by Windows 9x but NT could accept alignments as small as 1.

Also, I didn't see any mention of the old trick of overlapping the DOS MZ and PE headers, which was state-of-the-art when I was still doing this stuff: http://www.phreedom.org/research/tinype/

...and then you realise that the demoscene has managed to do this in a 1k binary:

https://www.pouet.net/prodlist.php?type%5B%5D=1k&platform%5B...


This was definitively not a dumb thing to spend time on. Diving into the details of a binary gives deep insight into how things are made and the concepts learned will enrich your knowledge. If it is interesting and you learn something it is always worth your time and effort.


This is far more relevant and exciting This is way more exciting. https://twitter.com/MStrehovsky/status/1669502394827419648

> "A fully self-contained natively compiled C# Hello World, including GC and everything can be as small as ~440 kB."

.net classic framework is windows deprecated legacy, and really no new apps should be built on it.


The resulting exe doesn't really "use" the framework in any way, other than relying on an implementation detail of the backing ucrt.dll library exposed as an unmanaged pinvoke (c# ffi) call.


Unless you're building an app relying on office/COM apis


Or a proper development experience for Windows Forms and WPF.


What's different about the development experience. In Visual Studio it doesn't make much of a difference if you're target framework or .Net 5+.


Designer is still buggy, the new out of process model makes it incompatible with existing component libraries, requiring a rewrite.

In many things, .NET Core/.NET 5+ is the Python 3 of .NET ecosystem.


i was hoping this is about 'real' baniraies, natively compiled .net executables, like the .Net AOT

https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...


You can get a .NET AOT example in a few megabytes now. Looks like the most stripped down AOT compiled builds are about to crack the 1mb barrier:

https://github.com/AustinWise/SmallestDotnetHelloWorlds


But isn't a dotnet runtime a feature? AOT strips a main feature of the language, while still doesn't even get close to compiled languages ( hello world < 300kb (or < 100 kb compressed))


Technically, the runtime is the thing you need if you didnt use [native] AOT. A big example being that you don't need to emit/interpret IL in an AOT binary.

I agree though, the minimal .NET example is still not close to a full-featured .NET platform. I kind of like the idea of opting-out of features though. Bringing the whole runtime for the party each time is a bit overkill (unless its already on every target machine). Lots of code doesn't need reflection. Some special cases actively dislike GC, etc.


I am quite sure that Go as compiled language doesn't do a 300 KB hello world (unless TinyGo is used), nor don't plenty of others.


That's why I love the ready to run option. It's aggressively AOT compiled but maintains the ability to JIT during runtime.


Congratulations. You've just dropped unicode support, and made a C call to puts. Wouldn't be easier to compile it using C compiler?

thats_just_c_with_extra_steps.jpg


Yeah skipping compiling kinda really ruins it


On the other hand I understand the value of the teardown as educational. I never wrote in C# though, except for some project during studies. I like how they put everything in the binary, living next to normal PE32 executables, but interconnecting between them. It's really done well, too bad it's not interplatform. Java would benefit from such comprehensive packing...


Just tried with my Oberon+ IDE and get a Hello.dll of 2048 bytes for this module:

  module Hello
  begin
    println("hello world")
  end Hello
It uses the https://github.com/rochus-keller/Pelib assembly generator.


Nicklaus Wirth was right. I felt that the first time I used Lazarus/FreePascal, and felt it to be true the more of his works I read.

Recently though, I think maybe the way to get off the "cycle of reincarnation" for this type of thinking is to do WASM - but I cannot abandon hardware like that. I am mentally incapable of accepting a spiritual machine that I cannot break with a hammer upon my desk; [I am] too primitive; an indestructible global computer that runs on other people's hardware for thousands of dollars of compute time and is slower than an RPi (like Ethereum) is anathema to me in some fundamental way.

I still want to get that (Wirth was right) on a shirt with his face on it.


> Nicklaus Wirth was right

In what respect in this context? I should have explicitly stated that "Hello.dll" is a .NET assembly, i.e. Oberon+ uses the Mono CLR to run and debug the Oberon+ code (but not the .NET framework); it can also generate C99 (as a substitute for AOT compilation), but here it is about the minimal size of a Hello World .NET binary.

WASM and especially the WAMR runtime might be a good alternative to the Mono CLR in future, but today it's too slow and only a few architectures are supported (much less than Mono).


> It uses the https://github.com/rochus-keller/Pelib assembly generator.

Shouldn't Oberon+ be self-hosting? There's so much C++ ecosystem crud. It reminds me of all those projects for open source software hosting forges promoted as alternatives to GitHub and Gitlab, and then when you go to clone their code or file a bug... it's all on GitHub...


C++, in the moderate subset e.g. also used in Qt 1 to 5, I still consider to be the optimal implementation language for a compiler and IDE at the present state of Oberon+ development, not because C++ is such a great language, but because of the ubiquitous availability and the huge, proven code base, which allows me to implement a project like Oberon+ as a single person worlds faster and more robust than with any other technology I know. As an illustrative example of this efficiency, I would like to cite https://github.com/rochus-keller/LisaPascal, where I was able to implement a parser and code browser within a week as a side project. With any other technology (including Smalltalk or Delphi) this would have taken months. A lot of development is still needed for Oberon+ to reach this point, and using the present approach I can support this development with sufficiently powerful tools.


> As an illustrative example of this efficiency, I would like to cite https://github.com/rochus-keller/LisaPascal, where I was able to implement a parser and code browser within a week as a side project. With any other technology (including Smalltalk or Delphi) this would have taken months.

As a former Smalltalk developer, what do you think about it would actually slow you down (despite popular claims re productivity in Smalltalk)? Is it tooling for C++ static types? If so, then what about Delphi?

And is it one goal of yours, then, to mold Oberon+ so that it would eventually be at parity, productivity-wise, with C++?


> As a former Smalltalk developer, what do you think about it would actually slow you down (despite popular claims re productivity in Smalltalk)?

I was a Smalltalk developer in the early nineties as well, but then switched to statically typed languages and the source file concept for many reasons (performance, testing effort, integratability, etc.). For the present case performance is critical (parsing and cross-referencing of large code base) and also the specific required presentation/interaction features which come with Qt out of the box; and I already have a large toolbox based on C++ which I was able to reuse.

> And is it one goal of yours, then, to mold Oberon+ so that it would eventually be at parity, productivity-wise, with C++?

It would take more than a lifetime to achieve that goal given that my https://github.com/rochus-keller/LeanQt and https://github.com/rochus-keller/LeanCreator/, which support my efficient toolbox today, have more than a million SLOC. My goal in the first place is to find out how I have to extend Oberon+ so I can use it in real-world (i.e. non-academic) projects with not too big restrictions compared to my current use of C++; reaching parity in terms of efficiency and the therefore required ecosystem is yet another challenge.


is the file filled with 0 bytes at the end? I think there are options to disable this


Yes, it has about 700 zero bytes at the end; might be just some overhead due to the PE format; didn't have a close look, was just curious how big it is out of the box after skimming the article. Anyway the size of the DLL itself is irrelevant compared to the binaries required to run it (Mono + mscorlib.dll ~ 10 MB in case of Oberon+).


That was eye opening for me how the .exe could be brought down to less than 1KB... if someone could get a Hello World .exe in Flutter down to just 1MB, that would really be something.


The first step would be to convince Microsoft to ship Flutter with Windows.


Does it count if I only compile "Hello world" in dart?


As it happens, I was playing around with martypc and msdos 6.22, and installed TurboPascal v1 on it. I built a hello world com that was just under 1k.


I remember when the entirety of your C# binary would include the source code, and back before the concept of Release or Debug was prolific enough, closed source C# binaries would be distributed in Debug mode erroneously, allowing someone to extract pretty much a 1:1 replica of the codebase.

Just a fun anecdote. I don't know if this is the case anymore.


You’re probably thinking of debug symbols, i.e. the PDB file that is generated alongside the binary. You can generate this for both Debug and Release builds (should be on by default in fact). It’s super useful for debugging crash dumps from production, and for exception logging in web apps.

You can think of this like a source map, but only files and line numbers.


I don't think it ever included actual source code, but .NET IL is just so rich that it made it extremely trivial to decompile, no?


It indeed did, you could open up the .exe in notepad for example. It was all there.


This doesn't sound right. C# has always been a compiled language (to MSIL), from the earliest versions of .NET.


This does sound unsurprising if you talk about a debug binary. It would allows you to debug it with the original source code. You are not supposed to run debug builds in production. But nothing will prevent you from doing it.

Production binaries of course doesn’t embed any source or debugging symbols.

It’s like erroneously shipping source maps files into your front end production build. It shouldn’t happen and it’s not necessary, but it’s just one configuration variable away so it happens a lot.


But that doesn't mean the source code wouldn't necessarily be exuded in a debug build.

I don't know the truth of the original statement, but it isn't too unbelievable.


Never said the source was used to run the program. It was included for debug builds.


Sorry - I wasn't able to convey my thought. There was no reason for the C# compiler to include the source code into the binary. Into the debugging symbols (which is a separate file) - maybe..?

Anyway, I went ahead and compiled a debug app on .NET 1.1. The binary does not include the source code. And neither does the PDB. It includes a file path to the source though ("Visual Studio Projects\ConsoleApplication1\Class1.cs").


Back then things were different. I'm talking c. 2008.


.NET 1.1 is 2003, should be "back then" enough.

When compiled with VS 2008 and .NET 3.5, the binary does not include the source code either.

You might confuse readable parts in the binary with the metadata.

[0] https://stackoverflow.com/q/65699183

[1] https://stackoverflow.com/q/4700317


Nope. I explicitly remember the source code being there.


The cool thing about being in 2023 is that you don't have to believe me. Winding up a Windows XP virtual machine with Visual Studio of your choice takes 15-20 minutes.


This was likely ASP.NET code. In the early days many folks just FTP'd their whole ASP.NET project folder to IIS, including the source code (e.g. .aspx.cs code behind etc) when you didn't need to. There was a lot of misunderstanding about how to deploy ASP.NET applications, many folks were still working with a Classic ASP mindset. I speak from experience as my company's .NET go to engineer and developer for a shared hosting company back in the day and having to explain to customers how to deploy their apps sans the source code.


No, it wasn't lol. I wish people would stop telling me what I very clearly remember. I did a report on this in high school - how C# programs (desktop programs) could have their source recovered under certain build conditions.


> how C# programs (desktop programs) could have their source recovered under certain build conditions.

Yes, using a tool such as .NET Reflector, but the source was never embedded in the compiled binaries, or anywhere else if you were competent with the toolchain.


> It was included for debug builds.

I've worked with .NET since the pre-1.0 betas back in ~2001 (I know...appeal to authority). The source code was never included in .NET debug or production builds. You could however use tools such as .NET Reflector (now owned by RedGate) to decompile the IL and reconstruct code as C#, VB etc. If you had the PDB files then you also had the symbols and could decompile to a close representation (variable names) to the original source.

This was a very useful feature because the .NET Framework managed code DLL's were and still are shipped obfuscated. This meant you could in the early days use ILDASM, and then .NET Reflector to find out what was going on inside the .NET Framework code.

Of course this created a market for obfuscators so that commercial and paid for shipping code was more difficult to reconstruct (especially since you wouldn't have the PDB files available).

You could do pretty much the same thing with Java.

Now if your binaries were NGEN'd [0], your chances of reverse engineering were reduced considerably because the IL is now gone and you're working with pre-compiled machine code rather than pre-JIT'd IL.

[0]: https://learn.microsoft.com/en-us/dotnet/framework/tools/nge...


At least according to the format and specification since VS 2005, .NET 2.0, the assembly format has been consistent and doesn't have any section for source code.

It has always been trivial to load an assembly in something like dnSpy or another IL Disassembler and generate C# code and patch .NET assemblies. At least in the versions < 5.

https://learn.microsoft.com/en-us/dotnet/standard/assembly/f...


I remember this too. Actually I (maybe incorrectly)think release builds contained it too. You used to have to actively obfuscate to make sure it was protected. I definitely used decompilers and got great results with very readable code that felt very close to the original.


MSIL is easily decompiled back to C# (or VB.NET if you prefer), it has always been a thing, yes. But binaries never included the source code.


I am perfectly aware of what IL decompilation is. I've written a decompiler before before. That's not what was happening.


That could be, actually. I don't know if it was release or just debug, but I remember obfuscation tools definitely being a thing, in large part because of this.


There was never a time when the distinction between Release and Debug was not "prolific".


Yes, there definitely was.


Maybe if you started life as a VB6 developer.

But your original post especially doesn't make sense because the default Debug configuration has always put the debug symbols in a separate PDB file. So even if you were arguing that early C# developers didn't understand how to use their tools, they wouldn't have had the source embedded in the EXE.


I guess I'm lying then. :)


I think you may be miss-remembering.


I can't recall this from C#, though I only started using it when .Net 1.1 was released.

I do recall good old Visual Basic did this though, as it ran interpreted. The executable it generated was just a small loader with the source code appended.


> Even though this is probably quite a useless project

No. It taught me quite some.


Meanwhile it seems to be possible to write a Hello World program for DOS using just 20 bytes:

https://www.gnostice.com/nl_article.asp?id=225&t=The_Smalles...


It's possible in 7 bytes + length of string (including terminator). "xchg ax, bp" will save 1 byte over what is in that article.


Very cool. I love stuff like this. I remember a Snake game being made in 8KB of .NET


Related:

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...

Tiny ELF Files: Revisited in 2021: https://nathanotterness.com/2021/10/tiny_elf_modernized.html


it would be interesting to compare this with non-Windows platforms. A while back I did a similar (but much less in depth) comparison of .NET Core 5 and 6: https://taoofmac.com/space/blog/2021/11/14/1600


How would this compare to a non-.Net binary?


I just generated C99 with my Oberon+ example (see https://news.ycombinator.com/item?id=36652878) and compiled it with -O2. The generated Hello.c is 382 bytes, the Hello.h is 240 bytes, and the compiled Hello.o is 1264 bytes (compared to the 2048 bytes of the Hello.dll assembly). If I build a shared library which includes the runtime and some boiler-plate stuff the stripped version is 17760 bytes; if I instead build an executable, the stripped size is 17952 bytes (compared to the ~10 MB including mono and mscorlib.dll to run the assembly).


I think the limitation here is mostly the PE format, not the .Net framework.


You can make super small hello world binaries if you try.

https://nathanotterness.com/2021/10/tiny_elf_modernized.html

has a 120 byte hello world program for x86_64 ELF.


There is also this article about creating a tiny elf executable that's pretty interesting: https://news.ycombinator.com/item?id=21846785


It seems amazing to that a post about making something small on an end-of-life, Windows only framework. It would have been far more interesting to see what is possible with AOT and trimming.


Hear me out: Create a runtime that prints Hello World without any input given and you could go as low as 0 bytes for your "binary"! (And yes, I am aware this already exists.)


You'd still need a file header (or at least a magic number[0]) that the OS will recognise in order to launch the runtime with your binary. e.g. you'll need an initial "#!" or "\x7fELF" or "MZ" to have your "binary" even start to be run as an interpreted program, ELF, or PE binary respectively.

[0] https://en.wikipedia.org/wiki/File_format#Magic_number


I don't need a binary. Same rules as the article. The runtime prints Hello World without any input at all. 0 bytes. See "Stück" for example. Calling the runtime itself is not part of the problem. Again, see the article.


> Same rules as the article. The runtime prints Hello World without any input at all.

Sorry, I don't understand how the OS decides to load your "helloworld" runtime, instead of e.g. the .NET Framework 4 runtime, to go with your 0-length binary?

The program file created in the article has a valid PE file header, including a ".NET Directory" header section that tells the OS about the runtime to load. If you don't have an equivalent, how is your new runtime being loaded?

> See "Stück" for example.

I don't understand this reference?


Calling/executing the binary/runtime on OS level isn't part of what has been counted in the article. Why do you insist on this being relevant then? We are talking about input size (binary) for a runtime. Calling it isn't within the scope.

> I don't understand this reference?

Neither the article apparently.


Yes, it was a dumb way to spend a Saturday. You have to count the size of that ".NET Framework 4.x.x", against which the reductions in the hello program are insignificant.


Probably 4-5MB in memory while running . CLR is heavy.


.NET 8 Aot changes that drastically. It’s on par or better than Go.


Love these, keep it up! I am glad I am not the only one faschinated in producing lowest possible builds.


How about .net core vs framework?


Here's a story about producing the smallest possible with .NET Core [0].

[0]: https://medium.com/@MStrehovsky/building-a-self-contained-ga...



there are some nice examples here showing how to overlap the headers and a few other tricks.

https://github.com/rcx/tinyPE


Personally I don't like .NET anymore because it's almost exclusively C# only (which seems quite dated compared to eg. Kotlin). JVM env is much more scattered across multiple languages.


And personally I think C# is moving forward almost too quickly, as if Microsoft has a compulsion to always introduce new features to the language with each new release, accelerating especially since post-.NET Framework. Funny how different views one might have!


> and really no new apps should be built on it.

You know you don't need to use the new features, you can still build your apps the way you like and gradually adopt stuff you think might be better for you. Many of the new features aren't always aimed at line of business apps but have been added to improve "systems programming" capabilities.

And they aren't taking anything way from you (unless you're jumping from .NET Framework to .NET 5 and beyond, e.g. remoting).


That's funny, because so many of the early sales pitches for .net focused on polyglot as a main feature.


Dear downvoters please do care to express your opinion instead of downvoting. I will help you making strong statement - why would I use smaller ecosystem (.net vs jvm) and worse general purpose lang (C# Vs Kotlin) if I am not Microsoft shop ?


What can we do if we use C or x64 assembly?



Probably 2-4MB in memory while running


Ok this is like saying let's make an interpreted Python program run fast, knowing full well it's not intended for that. .Net's intention by using bytecode is safety, sandboxing and performance.

Why not teach the right tool for the job and illustrate it being ported to C (or lower) if the aim is to make it smaller?

Can you imagine a mechanical engineer trying to turn a screw with a pair of pliers with the justification that it's just a simple screw and they want to see what the minimal number of turns is whilst shaving the screw to improve grip? You'd think they were nuts...


Because the true aim of this exercise is to understand and illustrate the various components of a .NET binary. It's like tuning your Geo Metro to go as fast as possible; you do it to learn about the inner workings, not to win a Formula 1 race.


> .Net's intention by using bytecode is safety, sandboxing and performance.

Not true. You can get all of that in C++ if you’re careful and enable compiler flags that no one uses. The entire reason for the IL platform that .NET uses is for cross-platform executables, just like Java.

.NET does give one safety in the form of bounds checking and what-not, but that’s the runtime, not the byte code. There’s no “bounds check” opcode; array dereferences are managed by the runtime and bounds checks are elided if safety can be proven (like Rust does).


I actually just did this experiment yesterday with C. Smallest "hello world" I could get from GCC on Linux is somewhere around 14 KiB. (Lots of startup code and unnecessary ELF sections.) So C# has a leg up here.

(Granted it's apples-to-oranges, given that this blog post is for C# managed mode (as opposed to AoT).)


Code-golfing some more, I could get it down to 448 bytes (leaving out standard library, reimplementing syscall/2 by hand, and convincing the linker to drop useless sections). So C does win out here, but moreso by virtue of object format than anything I think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: