Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

if its so big that its unwieldy as a byte array in a source file, does it really need embedding in the executable?

what about the case when you want n blobs of data instead of one?

good to know, but edge case useful...



> does it really need embedding in the executable?

Yes, because self-contained executables are massively easier to deploy than ones that have data dependencies, because it's just one file that you can put anywhere. Plus you avoid having to write any file I/O code (and deal with potential errors).

Honestly I never understood why this technique is so obscure, rather than being standard practice for C/C++ devs.

> what about the case when you want n blobs of data instead of one?

Each one turns into a .o file, then you link them all together. There's nothing limiting you to just one.


Well, the only drawback that comes to mind is that you can't mutate the state of that blob (well, you can, but you really really shouldn't). Also if it's obscenely large, it might be better to keep it on disk and load only as much as you need/can.


Also if it's obscenely large, it might be better to keep it on disk and load only as much as you need/can.

That's what this technique does, since the kernel doesn't fully load the executable file to memory, it'll mmap it but only load data from disk as it's requested.


I wasn't aware that kernel did that. Now that you said it, it's obvious that they would, but it never actually crossed my mind. Thanks for providing some insight and making me a bit wiser! :-)


thats only true in some cases. in general it is not true.


When is it not true?


Yes, this mostly only makes sense for immutable data.


thats not /need/ its /want/.

i'm also not sure i believe you that self contained executables are massively easier to deploy in most cases where the end user is involved. normally they want minimal interaction - if it makes a difference to them how many files it is, then you are doing something even more wrong in how you expect users to deploy - for most users on most platforms it should be an installer or via some app store - possibly both.

if you are targeting a very specific type of user then you might well be right, but thats not obvious at all.

i asked about the case for n blobs of data because it is unclear where the symbol names come from in your example, and changing them after the fact is not something i would expect people to work out for themselves...


I primarily had servers in mind.

That said, a self-contained executable is super-easy to distribute to a user: They download the executable and then they run it. No need for an installer. When they don't want it anymore they can delete it. That's actually pretty great.

That said obviously if you are using some sort of package manager anyway then the benefits are greatly decreased -- effectively, the package becomes the self-contained unit of deployment. The benefit of a self-contained executable is more for cases where bringing in a package manager would create a lot of extra work or if requiring the user to use some particular package manager would be an unacceptable limitation.

> it is unclear where the symbol names come from in your example

I'm not the author of the article.

The symbol name is based on the file name of the input data. This is my least-favorite part of this linker feature -- I have looked for a way to specify a different symbol name but haven't been able to find one. But in any case, yes, different input files will get different symbol names.


utorrent did this quite well actually... i'm not sure if the installer came to put users at ease, or install malware, but i do miss that quality.


Finding the data belonging to an installed application is not portable across operating systems, so embedding the data will result in much simpler code.

When the application starts, the current working directory can be anywhere, you need to find the absolute paths to the data files, and this works differently on each operating system.

One solution is to find the path of the current executable, and doing this in a portable way is several hundred lines of code already, e.g.: https://github.com/gpakosz/whereami/blob/master/src/whereami...


i think you are thinking of a special case... in general this is not a problem because the working directory is specified.

most users never touch a command line tool or system component where this kind of thinking is most valid because it will be run from any old context (but even then...)

its also quite common to think that embedding data in your executable is a bad idea if it is large, even if it shouldn't be something changing or generic. there are some classic reasons for this:

* some platforms have tight constraints on executable size, you actually just can't do it. most platforms have some constraints at all. you won't be embedding 64GB of lunar altimetry data into your executable no matter what...

* you lose control over what goes into and out of memory and have to trust some implementation detail. mmap isn't always great on your platform (it might load everything, twice!), and might not even be the approach the executable loader takes.

* you lose control over choosing to load the data after a delay or on the background. it impacts the time it takes for the executable to launch at all. for large data and small devices this is easily measurable without much data at all.

* file systems are a good way to manage data. it is what they are for. if you have lots of data and you need multiple people to work with it, then its easier to manage in some hierarchical and divided way - like a file system.

tbh i've written a lot of cross platform software targetting every major platform from the last 10 years and plenty of tiny ones too... this is not something i've ever felt was necessary, and that working directory problem... i stopped trying to work around that a very long time ago (the first time i tried to write a big bit of software i convinced myself it would be a problem somehow and wrote something similar to this link [but worse and less platforms]) and i've not looked back or had problems because of it, just less code to maintain.


Sure, it mainly depends on the data size whether embedding makes sense. For instance I've written an 8-bit emulator recently where all available software ever written for the original machine is under one MB. In this case it definitely makes sense to embed the data into the application executable. It doesn't make sense if the data is dozens or hundreds of megabytes big, and especially if only a small chunk of the data is needed at any one time.

I don't quite agree that the 'finding the data' problem is trivial. You can't just do an fopen("mydata.txt", "r") and expect it to work for different platforms and different launch methods, especially when launched through a desktop environment instead of the command line. There's always some platform-specific code involved to get the data's absolute location, on some platforms it's more complicated than on others.


I do this for test scripts at work. They're written in Lua and I found it easier to embed all the possible Lua modules (not individual scripts---there's a subtle difference) in a custon Lua interpreter. That way, all that is needed is one executable and the specific script. The two dozen modules are already there in the executable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: