Hacker News new | past | comments | ask | show | jobs | submit login
Bringing a dynamic environment to C: My linker project (macoy.me)
137 points by todsacerdoti on Oct 28, 2022 | hide | past | favorite | 88 comments



Reminds me of Rational Software's Instant-C from the 1980's which combined an incremental function-at-a-time compiler and runtime link/loading. Basically you just edited your code and ran it - kind of like using an interpreter, except it really was compiled. It really was instant even back on the slower computers of that era. You could even run code that had missing functions, then write them and continue when the runtime system complained!


This is how Julia works today. We sometimes like to call this "Just Ahead of Time" compilation as a riff on "Just In Time" compilation since most JIT languages are operating very differently.


that's how common lisp and clojure still work too!


How does it work in Clojure? Attempting to compile/evaluate this:

  (defn foobar [] (undef-fn))
gives:

  1. Caused by java.lang.RuntimeException
  Unable to resolve symbol: undef-fn in this context
Attempting to call foobar:

  (foobar)
gives:

  1. Unhandled java.lang.IllegalStateException
  Attempting to call unbound fn: #'user/foobar
Environment info:

  ;; CIDER 1.3.0 (Ukraine), nREPL 0.9.0
  ;; Clojure 1.11.1, Java 18.0.2


Only change is that you have to tell the compiler that 'undef-fn' exists but you don't have to provide an implementation. So, before all of that just do

    (def undef-fn)
It should work after that.


Pre-declaring the undefined function does prevent the first of those exceptions. I guess it just feels weird to include Clojure here, because the CL experience is like it brings up an interactive debugger and asks if you wanna define the function now or abort.


Re-reading the original post I see what you mean now. But Clojure definitely fits the first part of the definition by compiling your methods instantly as you're writing them in the repl. For the second part I'm not aware of a prompt which would let you define a method in response to a missing function call. Might be an interesting feature.


For the second part: with restarts and conditions, CL pulls up an interactive debugger that asks if you would like to abort, continue/retry calling undefined or supply a definition now, amongst others.

Evalutating this in CL (SBCL)

  (undef-fn)
gives:

  Restarts:
   0: [CONTINUE] Retry calling UNDEF-FN.
   1: [USE-VALUE] Call specified function.
   2: [RETURN-VALUE] Return specified values.
   3: [RETURN-NOTHING] Return zero values.
   4: [RETRY] Retry SLY interactive evaluation request.
   5: [\*ABORT] Return to SLY's top level.
   6: [ABORT] abort thread (#<THREAD "slynk-worker" RUNNING {100288C7E3}>)
  [... as well as a clipped stack trace ...]
Pressing 1 will allow you to name an alternative function in a minibuffer prompt, 2 lets you set the return value of the undefined fn, etc.


Defining the function after the error and retrying does I think fit the description too.

I guess the seamless continue without retry possibility could be useful if you are doing important side effects, but Clojure is quite functional unlike CL so it doesn't come up as much.


HolyC from TempleOS also has some similar facilities, I believe. I just know it from Terry's videos, but for example the TempleOS shell is just a HolyC REPL.


Reminds me a bit of zig's in-place binary patching[1]

> In-place binary patching is based on a granularity of top-level declarations. Each global variable and function can be independently patched because the final binary is structured as a sequence of loosely coupled blocks. Another important characteristic is that all this information is kept in memory, so the compiler will stay open between compilations.

1: https://kristoff.it/blog/zig-new-relationship-llvm/


> all the functionalities presented in this post have been designed and prototyped to the point where it’s just a matter of doing the methodical part of the work.

Pretty sure that feature is still very much WIP


yup, that's my understanding too; the self-hosted compiler just landed.


Intrigued by how this can work in the face of changing data structures. eg. You start by storing some data in a linked list. However later you realise that's too slow so you switch to a hash table. There seems no sensible way to load a new object file and have it access the old data without some kind of in-place upgrade system (and what if two modules want the data in old & new formats?)

Edit: He does mention some kind of data persistence system, but I think he vastly underestimates how complex and insidious this problem will be.


Erlang does it, but it requires an extensive system, and the application programmer must support it, too: [0], [1].

[0] https://www.erlang.org/doc/design_principles/release_handlin...

[1] https://www.erlang.org/doc/man/gen_server.html#Module:code_c...

It's... cumbersome, let's put it this way, and mind you, Erlang makes it happen at "safe" points, when the code being replaced is idle.


> some kind of in-place upgrade system

For Common Lisp, here's a toy example[1], and spec[2].

[1] https://malisper.me/debugging-lisp-part-3-redefining-classes... [2] http://www.lispworks.com/documentation/lw70/CLHS/Body/04_cf....


> changing data structures

liballocs is an approach for providing high-level facilities such as changing structure layout to a a low-level Unix environment.

https://github.com/stephenrkell/liballocs


Nobody intended to give an answer to an unsolvable problem, but to solve practical problems. If you redesign data structures, most likely you'll have to restart the program. So what?


http://www.lispworks.com/documentation/HyperSpec/Body/f_upda...

That's how you'd handle it in Lisp. Smalltalk has something similar, but I'm less familiar with the internals for it. No restart needed, but it requires a runtime that is able to track data instances by type. You could build that onto C, though. Probably with a custom allocator that was passed some symbol indicating the structure type so it could report on instances for executing an update. Would be hard, though, to do this without invalidating pointers when the size changes. You'd probably end up with an extra level of indirection as a consequence to make it simpler (for the runtime author, not for the end user).


Many things are possible with another level of indirection, but not necessarily worth while.


I mean, the project that started this discussion is considered "not necessarily worth while" by most people, that's why they use systems that require restarts on function redefinition instead of hot code reloading. But if you're going to the trouble of supporting hot code reloading, you ought to support (at least as an option) structure redefinition as well. Otherwise you've only gone halfway to making an actual interactive system.


I agree, this is still very useful in something like game development where you want to maintain state and you’re well beyond the point in development where you’re changing data types!


That's simply unacceptable for some systems, like if you're actively routing phone calls.


If a restart is unacceptable then live-patching a data structure change at runtime requires a #YOLO approach to developement and ice in your veins


Depends on the platform. Erlang was specifically designed around telecom systems that needed maximal uptime, and the language and runtime collaborate to support hot patching with a distinctly non-YOLO approach.


Live patching is cool but it's something of a mythical beast. I hear lots of people talking about the wonders of live patching but very few people reporting on actually using it in their own production systems. If you're building an ultra-reliable system, then you have to somehow make it robust in the case where (some of) the hardware executing your code fails. Once you've covered that case, there's not much need for live patching.


I think it depends on which level of live patching you're talking about.

At Basho, we would load module fixes into Riak instances for customers...I hesitate to say "fairly regularly" because it's been long enough and I had limited visibility into it, but it certainly seemed like a routine operation.

The full "upgrade the entire application" hot loading definitely requires more planning than most companies will ever be willing to do.


Interesting, Erlang comes up here on HN rather frequently but I haven't really looked into it. I think I should though, any resources you can specifically recommend?


Funny you should ask…

I haven’t checked these links in quite some time: https://gist.github.com/macintux/6349828

And if you don’t mind videos, here’s my talk from Midwest.io (RIP) on the philosophy behind Erlang & its VM: https://youtu.be/E18shi1qIHU


Well, I can't say too much, but it's not as bad as you might think. It's extremely annoying, but doable.

It's really not that different from doing a DB table update.


It's acceptable (and required) for others. Which is my point.


This could be extremely useful in doing small logic tweaks and such. Of course it won't be able to understand your data at a high level and migrate that over.


tcc (The tiny C commpiler originally made by Fabrice Bellard) is still actively maintained and is surprisingly fast.

tcc does compile, link AND run the program directly on several platforms, including windows.

If you can live with C99 it is a very good fast-iteration system.

I think the codebase could also be used as a starting point for a fast linker-loader.


Also, the wave of programming languages started by Jonathan Blow talks (jai, zig etc.) all focus on very fast compile times even for large code bases. So, hopefully hot reload will be less of a need in a couple of years (when these languages mature).


It is kind of ironic that we had to wait 20 years to get back what Delphi, Eiffel and co have been offering for decades, in much more resource constrained hardware.


I do not remember best in a class products / technologies often winning a race. It is mostly decided by suits for totally different reasons.


Or free beer stuff, where some developers rather suffer than pay for productive tooling.


Hot reload would still be an improvement on top of fast compile time, but yeah, the most important part is the compile/link part of the iteration.

Saving/reloading state could be done at the framework level, and it can't work in all cases (typically when memory layout is changed between iterations).

But I think that we collectively realized how slow compilers had become only recently (something like five years ago) because it was a slow boil frog situation.


Regardless how fast zig compiler self is, as far as it outsources code generation to llvm and linking to ld it will always be much slower than tinycc. Although recently I've noticed that if you ask llvm to generate and execute something they call bitcode, the code generation seemed faster.


I think incremental linking is already a thing. It involves leaving bits of unused space in the binary to deal with minor changes in the size of the object file. I can't remember if Sony shipped that with the PS4. Pretty sure Microsoft had a version for the Xbox at that time.

Unloading code while running is difficult. Dlclose doesn't really know whether anything is still holding onto pointers into the shared library.

What would probably work is compiling each object into a shared library and otherwise disallowing shared libraries. That would ensure the loader has a consistent model of which addresses have been patched to which shared library and give a relatively sane way to unload and replace at that granularity.

Moving symbols between object files during linking would be tricky.


There's actually pretty simple solution to the loading/unloading problem if you can assume that function signatures do not change. Just generate a wrapper function which has a R/W lock and then just calls the actual implementation while holding a R lock. To update the implementation, acquire a W lock and replace the implementation.

There is overhead, but I don't think it would be too bad. You could also probably reduce the overhead by doing something fancier like having the daemon inspect the stack frames of other threads (which is expensive, but presumably updates are extremely rare compared to calling functions).


It's not (just) a low-level synchronization problem, though; it's a larger structural problem. Simple example: If someone has a reference to your function that free()s some pointer, and your function has been replaced with an inert no-op, that pointer is no longer freed when it should be, and you have a memory leak. Extrapolate this to more complex code.

Put another way, assume that you solve low-level threadsafety/data race issues. You still have to design the library from the start to be unloadable; you can't unload code that isn't expecting to be unloaded without undesirable side effects.


You typically wouldn't replace a function with a no-op since changing who allocs/frees memory is almost certainly an API change which I never claimed my solution would support. The thing my solution would be useful for is for fixing small errors that are tedious to go through a whole compile cycle for. It would be a replacement for modifying values directly in the debugger to try and fix issues (to see if the fix works) and not turn C++ into LISP.


LLVM ORC / JITLink is another similar project: JITLink handles single-file linking of MachO, ELF or COFF objects into a target process. ORC coordinates JITLink instances -- running compiles / links on demand and tracking dependencies between objects being linked on different threads.

See e.g. https://www.youtube.com/watch?v=i-inxFudrgI and earlier talks.

ORC is used by CERN's Cling c++ interpreter, Julia, PostgreSQL (for JIT database queries), the Clasp LISP VM, the LLDB debugger (for expression evaluation), and many other projects.


Isn't this what Cerns ROOT does for C++? https://en.wikipedia.org/wiki/ROOT

Or maybe more correctly the (currently) cling part?

https://root.cern/cling/


As I understand it from the webpage you linked, Cling is an interpreter for C++. OP's linker is supposed to take actual native object code, link it, patch it to try to keep the data in memory and continue running.


That project looks vastly more complicated, as the mention of a "C++ interpreter" loudly flags.

The OP's project looks rather minimal in contrast, and very cool. I think I have had this thought too but I (semi-sadly) don't work with big enough compiled codebases to focus on it now.

Epic to see Andreas Fredriksson [1] mentioned as inspiration. I used to work with him in my distant gamedev past, and he certainly keeps knows a thing or two about performance and working with big codebases.

Edit: typo.

[1] https://deplinenoise.wordpress.com/


Reloading a file in cling requires unloading all dependent files first. That includes destroying objects created from structs/classes in defined dependent files.



The author specifically mentioned that their linker-loader doesn't require separating reloadable code into a separate shared object, but jet-live does.


This person seems to have greenspun themselves a JIT.

> It is difficult to get the boundary right between code which can and cannot be dynamically loaded. This is the same issue with using embedded dynamic scripting languages like Lua or Python—how much of your application should be written in them?

All of it.

> JIT compilation requires generating machine code, which is usually a complex process and a large maintenance burden. In practice this means shipping out to a 3rd party library for JIT, and the libraries are typically very large dependencies.

Is a c compiler not a large dependency and a maintenance burden?


> Is a c compiler not a large dependency and a maintenance burden?

Not in my experience, unless you feel that 577KB is a large dependency.

For size, ship TinyCC(100kb, also available as a `libcc.so` or `libcc.dll`) and a small stdlib (377kb for musl) with your app and you're done.

For the maintenance issue, it's not different from depending on any other library, except that TinyCC and/or Musl are incredibly more mature than any other library you are bound to use (with the exception of SQLite).


TinyCC is not heavier than would be a JIT generating code of comparable quality. And as for maturity: I have used tcc. It is buggy and lacks useful features.

As a point of comparison: luajit is 600k, and unlike tcc, it actually generates good code.


The solution is clear:

Fork TCC to use DynASM, and layout code in a way that's amenable to tracing.

Extend LuaJIT to trace the TCC output, teach it to do this across the FFI, implement the hyperblock scheduler and quad-color GC.

Make a FreeBSD distribution which uses a one-sector Forth to bootstrap the TCCJIT, which compiles the source code directly into memory, and uses the GC for process allocation and cleanup. The JIT makes the happy path fast, until or unless it changes.

Binaries on disk being no longer a useful starting point, they can just be frozen process images. If they break, start again from source code, otherwise, incrementally compile changes into the image while it's in memory. Quitting is just writing it to SSD cache.


I have found the libtcc from https://github.com/TinyCC/tinycc to be absolutely fantastic. I'm using it to instantaneously compile the C output from my hobby language to create a repl. Once I had the compiler in good shape it allowed me to create a 100% compatible interpreter for (basically) free.

The libtcc API is minimal. For my needs that has been 100% sufficient and a pleasure to work with.


What are the bugs in tcc you have discovered? I am using it almost every day for many years and yet to encounter a single bug. It generates much slower code than gcc but the point is it does it at least an order of magnitude faster. The code it generates is similar to code generated by Delphi mentioned in a sibling comment. I wish they added linear scan register allocator and inlining of small functions to tinyc wich would not slow down compilation I believe but would make the code much faster.


No - he's addressing the linker/loader stage, not compilation. Code is getting fully compiled before it's run, but then there's no linking step, and he also allows individual source files to be updated (recompiled, and dynamically loaded) while the program is running.


>> It is difficult to get the boundary right between code which can and cannot be dynamically loaded. This is the same issue with using embedded dynamic scripting languages like Lua or Python—how much of your application should be written in them?

> All of it.

Hard disagree. Games, which the OP works on, cannot be written in traditional scripting languages due to performance. Thus, games use ad-hoc methods of either segmenting non-critical paths off that can be written (and reloaded) in slow languages, or putting reloadable code in DLLs, which comes with a host of exceptions and special cases you have to keep in mind.

What he's proposing is basically writing a clever .o loader that allows you to ignore all of the above, write plain C or (presumably) C++ code, and have it automagically show up in your running process when you hit the compile button.

Sounds pretty compelling to me.


>"This linker can facilitate program introspection. I plan on having symbols the linker itself provides to the program image that allow the program to inspect its own symbols. This opens the door to a whole variety of interesting things:

o Call any function in your program in an interactive read-evaluate-print loop

o Visualize function compiled sizes

o Visualize function references

o Introspect on program data

…and more things I haven't thought of yet!"

Brilliant idea! I'd love to see this linker when it is complete! I hope you succeed wildly in this endeavor!


Mac OS X on PowerPC used to have a feature called ZeroLink which was kind of similar, if I understand correctly.

(Weirdly it was on by default and meant that your builds were non-distributable, as the executable depended on dynamically loaded object files on your system.)


I like seeing stuff in this area -- C is great but feels unnecessarily hobbled by the long edit/compile/run cycles.

My own attempt at solving this right now is a C variant that's amenable to SLIME-style incremental development, and I'm having fun doing it, but it'll be a while before it's useful. This linker idea is really smart. Being able to drop it in to existing C projects is a big win.


>- C is great but feels unnecessarily hobbled by the long edit/compile/run cycles.

People under Unixen have been using ccache forever.


So he needs to patchup

* all pointers in all new objects,

* all pointers into the old-range,

* and all live on-stack values into their new on-stack locations.

Did I forget something? using volatile-only vars would solve only the third problem.


All pointers that could be deterministically generated from existing values. I could store a pointer in a 64-bit field, or I could go mad with power, break it up into 11-bit chunks, and store it in the unused exponent of half a dozen float64 values, to be recombined later on.

It's not to say that this would be a reasonable thing to do, but that these sort of shenanigans may be done (e.g. to pass data across an excessively narrow API callback)


* all structs whose definitions have changed,

* all arrays of such structs.

Dealing with deleted fields is more or less straightforward, but what should be put into newly added fields is anyone's guess.


Or have global pointers be in a specific range, then replace pointer dereference with a check if the pointer is in that range and if yes, read the real pointer value from a table. Keeping the fake pointer values consistent would be tricky, but not impossible given that you have a full view of the program.

Edit: Or just live with the fact that certain kinds of changes require a program restart.


It would help to replace pointers by integer offsets, and keep the base in only one location. Using offsets is also useful if you move your code to GPU.


A pointer offset or pointer compression optimization would need a complete rewrite. Nobody does that. It would only work in a VM (bytecode or jit).


> Visual Studio Edit and Continue is intended to let you live edit any code in your project and magically apply the edit. However, I have never gotten it to work, and none of my coworkers have either. The rumor among us is that it is not well suppported, especially not on large projects like games (which are what I work on professionally).

We've been using Edit and Continue for games at my job. It requires some care and attention to detail to keep it working, and it definitely has its faults, but it's pretty effective for constant-tweaking, IMO.


https://github.com/rui314/mold don't support live update, but fast enough to boost productivity.


May be not exactly what you want to achieve, but take a look at 'insmod' and 'rmmod' code of the Linux driver ecosystem. It is ELF based but very enjoyable.


Another thing already built into nim :p

In nim's case when hot reloading is enabled functions between modules become pointers.


I've thought about similar stuff, but I'm on a slightly different path.

I'm picturing a long-running server, and people wanting to run experiments on it. Like, the next time an HTTP GET comes in, please use my new code.

In my head, I had imagined that there would be a versioned method table for each function (or at least each function that I wanted to be versionable.) I would compile my new version of the code, and then dynamically link it in, and then use some experimentation flag to say which version of the system the execution should follow.

So, it's a bit like Git. Each object is versioned, and you can request a snapshot of the entire tree, at a particular version. And because you can walk the tree of function calls, you can have a root version, etc.

It's super cool to see people experimenting with this.


Erlang has it built-in, or you can roll it manually, like in [0]. But the biggest problem is not changing the code, it's changing the layout of already existing data — throwing it away is equivalent to killing the process and starting it from scratch. It can't be done automatically, you, as a programmer, must supply an upgrader (and downgrader, for rollbacks) written manually by yourself.

[0] https://joearms.github.io/published/2013-11-21-My-favorite-e...


Sure, I get that - which is why I'm thinking about the HTTP GET scenario, because each Request is almost entirely stateless on entry. Sure, it builds up state, but that could all be in an Arena with the scope of the Request.


What state builds up on an HTTP GEat that isn’t bound to the servicing call stack?


If you're dynamically building a page, you'll have state while you're building and rendering it. SQL queries, etc.


Isn't this what rolling updates in container-based deployment systems (e.g., Kubernetes) do? And also the reason we have CI/CD pipelines and micro-services (to decouple processing from data).

So basically, in your example, change the code that handles GET and then trigger the CI/CD. Your code will be compiled, containerised then a rolling update is started to replace the old version of the running web server with the new one.


What you're describing works great if the server is running at head.

I'd like the server to be able to execute a code path which mimics my Git branch, and on the next request to execute a code path which mimics your Git branch.

If you and I have two different versions of f(), I'd like them both to be available to be executed, based on some flag or condition.

As we merge our branches, f() will collapse back into the CI/CD pipeline you imagine, yes.


You might have a look at Java's JMX and how it integrates with things like Spring, Tomcat, etc. It's pretty close to what you're describing, where you could have runtime configurable dynamic behavior and a framework to manage it.


Redbean (part of the often discussed comopolitan c library) does something like that.


I feel like this is rebuilding erlang in some aspects? Erlang has hot module replacement


Linkers and loaders excite this weird part of my brain. I've had this idea in the back of my head for a while now to make some sort of scripting framework that lets you dynamically load natively compiled modules together. Sort of like the LD_PRELOAD trick on steroids: dynamically loading code based on the command line arguments, environment variables, and configuration files. And then you don't have to do any of the command line parsing or initialization stuff in native code, but rather in a flexible scripting language. I guess it would be sort of like trying to implement Spring for C code, as evil as that sounds. I don't think such a system would be actually that useful, but I would find it pretty neat to play with.


As much as I love this idea, Most systems I have seen don’t even have incremental compile setup properly. Any change triggers so many changes to compile that link and restart are the small part of it. Plus you may want to rerun the tests anyways from scratch in software projects. This is not true when when you are doing data analysis or learning type work. for regular software, I am not sure hot reloading will be a game changer.


This is really neat! In general I think there are some interesting niches for non-standard linkers. One example I considered (but did not start) when I was on the Chromium project was going to be a debug-only linker for Windows with a focus exclusively on iteration speed (but also debug symbols), as the linker being used for that platform at the time was tremendously slow.


have a look at julia. as i understand it uses llvm to jit compile a high level language in an interactive interpreter where changes to modules can be made and components hot recompiled (and relinked into) a dynamic runtime.

it aims to replace numerical python and matlab for high performance interactive numerical computing.


DWARF is very useful. I've used it before to generate prototypes from object code. We should really have a lot more tools that use it.


Always happy to see new C projects! This is awesome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: