Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Hancho – A simple and pleasant build system in ~500 lines of Python (github.com/aappleby)
162 points by aappleby 73 days ago | hide | past | favorite | 56 comments
Hi HN, I've been taking a break from my big side projects to work on a smaller side project - a tiny build system that's based on what I've learned from using Ninja and ad-hoc Python for my homebrew build systems over the last few years.

It's basically a promise-based dependency graph runner plus a simple text templating engine, and it works quite well for the smallish projects I've tried it out on so far.

If you find Make crufty, CMake inconsistent, Ninja verbose, and Bazel just too much build system, give Hancho a try.

Don't like one of Hancho's defaults? It's only 500 lines - hack it up however you like.




> Don't like one of Hancho's defaults? It's only 500 lines - hack it up however you like.

It's fascinating to think of this as a deliberate approach or paradigm rather than a shortcoming. The antithesis to the inner-platform effect [1]. Instead of digging an ever deeper rabbit hole of options and configuration, encourage customization via forking. You can't beat the flexibility. And lots of projects have well over 500 lines of build system configuration anyway. With that kind of competition, forking a 500-line build system may well yield the more comprehensible end result. Not to mention that when facing a problem outside of a given build system's anticipated set, I'd much rather have Python to solve that with than a special build system language.

[1]: https://en.wikipedia.org/wiki/Inner-platform_effect


Problem with python based "systems" of any sort is that they break over time without constant maintenance. make from 2000 will still work with makefile from 2000. Python script from 2015 will not work today unless you have maintained it over that time. Every time I try to use any Python script for anything I follow the posted instructions and end up with an error. Something somewhere has changed and the script no longer works...if it ever did, there is no way to know or go back to the working version. So for me anything more than 10 lines of Python can be considered broken and unmaintainable.


It's possible that you experienced the end of the 2->3 transition. My Python scripts from 2015 all still work, except the ones that used the then-experimental async framework, which was widely documented as experimental and subject to change.


This is sort of the suckless approach. Most (all?) of their projects are customized by editing the source and recompiling. From their window manager, dwm:

dwm is customized through editing its source code, which makes it extremely fast and secure - it does not process any input data which isn't known at compile time, except window titles and status text read from the root window's name. You don't have to learn Lua/sh/ruby or some weird configuration file format (like X resource files), beside C, to customize it for your needs: you only have to learn C (at least in order to edit the header file).

https://dwm.suckless.org/


I’m a big fan of copy-pasting code. When you are just learning to program you learn all of the common abstractions, for loops, functions, modules, etc. to reduce code duplication. This leads too many devs to think duplication is a bad thing to be avoided at all costs. I’ve seen a ton of terrible libraries out there that aim to end code duplication. Please, I can handle needing to copy/paste a couple of lines of code around my code base. But I can’t handle your bad abstractions.


Copy-pasting is fun when you add code, and a pain when you subsequently have to alter it. So it's easy to sell to a beginner or a hobbyist, and it also works well in projects which do not live long, like most games and many websites.


It can go either way. If you duplicate code, but the two copies need to stay in sync, then you've made it a bit harder and more error-prone to maintain. But the worse offense is if you have some code that serves two conflicting sets of requirements, but you keep the code together with increasing complexity to thread the needle of satisfying all the requirements.

The latter is the worse offense because it's much harder to decouple two systems than it is to notice a similarity in two bits of code and to extract the commonality.


Yep years ago I worked on a web application that was made with "classic" Active Server Pages. Each page was a self-contained .asp file. To make a new page, you copied another page and changed it around.

This meant that when you wanted to change something on all pages, it was a PITA but not always terrible because tools like grep, sed and awk exist for Windows also.

The nice thing was, you could make a change on any page and know that it would only affect that page and had no (or very little) chance to introduce bugs anywhere else in the system.

I wouldn't recommend the approach, but it wasn't all bad.


Exactly. If you have to care about this for any prolonged time, do yourself a favor and structure your code carefully, think about the right abstractions, etc.

If you only need to release it before Christmas no matter what, and sales will wane anyway in 6 months, then even worse crimes against maintainability may find a justification.

The problem is, of course, is that even code that was intended as throwaway tends to live much much longer than expected, if it actually works.


I think of it instead as the code is just a more expressive config file.

Smetimes you don't want that but sometimes you do, and it's a counter-productive mistake to only ever go one way or the other in all situations instead of identifying what best serves different situations.


This looks like a nice distillation of core DAG-running features. I really like the simplicity of `honcho` and will likely try it next time I have a DAG to run that fits in to its model.

I can not help but compare `honcho` to others in this space that I know, particularly `waf` and `snakemake`. I guess these do cover a larger feature space than `honcho` and that comes at the cost of rather greater complexity. I don't think sacrificing the simplicity of `honcho` to add features is necessarily a good thing but I do wonder if / how `honcho` might be used in these ways:

- Execution of a rule adds a new task node to the DAG (a `TaskGen` in `waf`).

- Implicit DAG forming and rule execution (akin to how we define a `rule:` for `snakemake` but it is the system that determines what rules to run and then runs them).

- DAG edge types besides files. Eg, run a downstream rule if some Python data object's value changes (instead of a file change). I believe neither `waf` nor `snakemake` supports this and but one can serialize the Python data to file to make it fit the DAG engine.

- Batteries included such as a cross-platform version of the `c_binary()` function in the `honcho` tutorial. (like `waf` "tools").

- In-system dependency generators (eg `waf` "scanner" pattern).

Thanks for sharing `honcho`!


Hi, author here. I'm slightly confused by your bullet points - are you asking if Hancho supports these patterns?

- "Execution" of a rule can add new task nodes to the DAG (via Rules with command=<async Pythonfunction>

- Implicit DAG is there in the form of Rules that depend on promises generated by other Rules.

- DAG edge types are just the promises returned by Rules. A custom rule that returns a promise that resolves to an empty array (or a dynamically-populated array) is totally valid

- 'Batteries included' rules are out of scope for the base release but I will probably add a default rules.hancho with my preferred C++ build commands.

- In-system dependency generators can be as simple as "glob.glob('*.cpp')", or as fancy as you're willing to write.


Hi, thanks for the answers! Sorry for lack of clarity but yes, these answers were just what I was hoping to learn.


Tiny point -- it's spelled "hAncho".


OMG! Ha, so embarrassed. Thanks for the correction! It's too late for me to edit. sigh....


PS: please assert an explicit license!


License added


That assumes it’s supposed to have a license. “Proprietary” is a perfectly reasonable choice.


Sure but

1. In that case posting the source code to GitHub is less useful

2. More to the point, op said: “Don't like one of Hancho's defaults? It's only 500 lines - hack it up however you like.”

I think OP simply forgot to add a license. In which case, reminding them to do so is useful.


License added.


I like it. I wrote Piku (https://github.com/piku/piku) with much the same interest in fixing some of my pains, so I get where you're coming from with this. Will drop it into one of my current projects to build ESP32 binaries :)


Please ping me back with your experience using it for ESP32 stuff.

I've already used Hancho to build FPGA bitstreams with the Icestorm tools and it was delightful, but as the author I'm not a neutral observer. :)


Sure. Right now I'm wrestling with taking an old codebase and getting it to build under PlatformIO with the right dependencies on a new machine. Once it builds once, I'll go and replicate the process under honcho.

Edit: if you use RSS, keep an eye on https://taoofmac.com, it will show up there...


TIL “honcho” is not a Native American term as I’d always assumed, but Japanese?

https://www.merriam-webster.com/dictionary/honcho


Yes! I was amazed several years ago after I had learnt the word 班長 (hanchō)… and a while later realised it sounded a bit like the phrase ‘head honcho’…and then realised that that’s actually where it came from.

There are lot of words ending in 長 (chō) in Japanese that refer to people in positions of leadership: shachō (company director), kōchō (headmaster), kichō (captain, of an aircraft).

Another somewhat surprising example of unexpected Japanese etymology: ‘rickshaw’ comes from the Japanese 人力車 (jin-rikisha, literally “person-powered-car”). I would’ve probably expected it to have come from an Indian language, or something.


Some say hunky dory comes from "honcho doori" or 本町通り (personally I don't see how ki can come from cho)


Would be fun if true! Seems to somewhat match semantically as well.


It’s the integration with those platform-specific requirements which I find so tough. Apple wants a particular format, signed and whatnot via xcodebuild. Windows, I don’t even know. Something about universal apps on the Windows store. Android is a bit simpler if you can target the right directory with the right toolchain.

It’s these particular systems that I find the most difficult. If there were a collection of nice utilities for working with them uniformly, oh, that would be amazing.


Author here. I do most of my development in WSL but do have a few projects that I want to get working again with Windows/MSVC - I will be using Hancho for those builds and expect it to work fine cross-platform though I haven't tackled it yet.


Not really the same as app packager, but your comment reminded me of this tool: https://fpm.readthedocs.io/en/v1.15.1/


Thank you for being interesting and useful without trying to claim to be better than grampas old make or claiming to solve all problems.


Thanks for keeping this project simple and clean in a single file of Python.


You are welcome! Single-file tools please me.


Even though I will never likely need/want this personally - I LOVE all the posts and dialogue on this topic... which else woud have never entered my awareness... +1 to all


I'm looking at the example and I just don't see how it's better than a Makefile. What am I missing?


It's not "better" than a Makefile, just different.

I find it easier to understand and maintain. I've used Makefiles for decades and I still find myself looking up basic Make syntax.


Maybe that it never claimed to be better than Make.


We need more trials in this space, building is still not fun after decades of it.

Personally, I think doit (https://www.bitecode.dev/p/doit-the-goodest-python-task-runn...) is still the winner in the python space, but there is a lot of it that could be improved.


Author here. I hadn't heard of Doit, but it does look similar to Hancho. Hancho pushes much harder on the minimalism side, however.

I think I may steal Doit's "actions can be arrays of commands" feature, as that would be trivial to add to Hancho and potentially useful.


I have used pantsbuild in the past. Had a pleasant experience with it.


Reminds me of pantsbuild.


Hm it seems like the defining feature of Bazel is high level "macros" like cc_library() and proto_library() on top of the low level build graph

I looked at the examples and tutorial and it still seems more low level, in that you mention object files and link them?

    main_o = compile("src/main.cpp", "build/tut1/src/main.o")
    util_o = compile("src/util.cpp", "build/tut1/src/util.o")
    link([main_o, util_o], "build/tut1/app")
One reason I like the layering is for build variants -- dbg opt ASAN UBSAN -- something Make does very poorly. IMO you don't want to mention literal object file paths in the build config for this reason.

I use a tree layout like

    _build/obj/cxx-asan/frontend.syntax.o
    _build/obj/cxx-asan/core/process.o
    _build/obj/cxx-ubsan/frontend.syntax.o
    _build/obj/cxx-ubsan/core/process.o
and then this is abstracted with the Bazel-like target syntax

    //frontend/syntax
    //core/process
This works well - you don't have to clean when rebuilding variants, and all the object sharing between test binaries really speeds up the build.

IMO this is 100% essential for writing C++ these days -- all tests should be run with ASAN and UBSAN.

---

I wrote a mini-bazel on top of Ninja with these features:

https://www.oilshell.org/blog/2022/10/garbage-collector.html...

So it's ~1700 lines, though some of that is our own logic, and extra features like computing preprocessed size, which I used to improve compile times.

You get the build macros like asdl_library() generating C++ and Python (the same as proto_library(), a schema language that generates code)

And it also correctly finds dependencies of code generators. So if you change a .py file that is imported by another .py file that is used to generated a C++ header, everything will work. That was one of the trickier bits, with Ninja implicit dependencies.

This build file example mixes low level Ninja n.rule() and n.build() with high level r.cc_library() and so forth. I find this layering really does make it scale better for bigger projects

https://github.com/oilshell/oil/blob/master/asdl/NINJA_subgr...

Some more description - https://lobste.rs/s/qnb7xt/ninja_is_enough_build_system#c_tu...

Comment about Chrome/Android using the same pattern - https://lobste.rs/s/0qc6vp/arcan_0_6_3_i_pty_fool#c_ahqwky

which is explicitly in the Android docs - https://android.googlesource.com/platform/build/bazel/+/7d96...

The main thing my system doesn't have as mentioned is the per-action build sandboxing, which Bazel has OS-specific wrappers for. That's really useful for finding missing deps. Right now it's not a big pain point, since the build will break in a slightly less obvious way if you're missing deps.


The defining feature of Bazel is sandboxing build steps so that you can't accidentally miss edges in the dependency graph. This has enormous benefits:

1. Incremental/cached builds are reliable.

2. You can reliably avoid rebuilding/testing artifacts that cannot possibly be affected by a PR.

Neither of those are possible in build systems that don't do this sandboxing. If you're doing a big project you really need these, hence why Google came up with this technique.

I'm not sure there's much value in a new build system that doesn't even get that right. We already have dozens.


Yeah that's what I mentioned in the last sentence of my comment

The reason most build systems don't have it is because sandboxing is OS-specific

Bazel has sort of a complicated bootstrap/deployment for that reason.

I mentioned it a few times, but someone should extract the exec wrappers from Bazel as a separate project :) It's a small amount of code, though fiddly to build and distribute

There are few other strategies like bubblewrap, but last I heard bubblewrap is not installed by default on most distros. It needs setuid root so it's a bit fiddly and has security implications

But then you also need something totally different for OS X and Windows


The problem with Bazel is that a sandboxed build environment is useless unless you also have a hermetic runtime environment, and Bazel does nothing to solve this problem. While you may be able to consistently build, your runfiles tree can easily be un-runnable on any system except the one that built it.

When I was heavily using Bazel ~3 years ago we had made substantial changes to the Bazel source to try to support this, but ultimately couldn't get it across the finish line without making changes to Linux's dynamic loader. After talking to a few folks familiar with Blaze, it seems that's exactly what they've done inside Google to solve the problem.

Perhaps the situation has improved in the last 3 years, but my general experience with Bazel is that it made extremely lofty claims and delivered on very few of them.


> your runfiles tree

I don't remember using runfiles much outside of tests in Bazel, even at Google. My guess would be that runfiles are packaged up into via MPM [1], which in open source world I would guess a container image is the closest equivalent? Then you get hermetic runtime environment, especially because k8s/containers seems to be the default deployment platform these days. I think the bazel OCI rules [2] should package runfiles too (I haven't checked)

[1]: https://www.usenix.org/conference/lisa14/conference-program/...

[2]: https://github.com/bazel-contrib/rules_oci


Hi, Could you please elaborate on your original problem?

The binaries your Bazel system built could not be run unless the runtime environment matched the build environment?


I'm curious too. I'm guessing it's something to do with sonames & rpaths which often need fixing up during installation (CMake does this for example). I haven't ever used Bazel for distributing Linux binaries so maybe it just doesn't have that built in.

(I would go with statically linked binaries anyway tbh.)


True, if you need sandboxing you've already committed yourself to a build system 10x larger than Hancho.

If you don't need sandboxing, you can ditch a _ton_ of complexity.


Bazel's "macros" are just Python functions in Hancho. Manipulating the object files directly to link them into a binary can be wrapped up in a 2 line Python function if you want to abstract over how the link rule works.

Both of these details are covered in the tutorial.


Hm I don't see anything like cc_library(). Though I misread the intro -- it doesn't claim to be like Bazel, just less than Bazel

I saw "Bazel and Ninja" in the post, and thought it was more similar to what I had built.

Though IMO, to build C/C++, even for small projects, you need something to parse .d files, the output of gcc -M. Otherwise you need to write more build metadata manually, and you can get it wrong

And there is no enforcement of errors

For truly small projects, I just use a shell script, and no incremental build. Works great

---

And Ninja does parse .d files. The compiler is the authoritative source of header deps, because it has the preprocessor.

Ninja is very fast and small. It's meant to be generated, so I wouldn't call it verbose. I don't think anyone really writes Ninja by hand. There are at least 3 common code generators for it -- CMake, the Chrome one, and Android one


Hancho parses .d files.


"Hanchō" is the likely origin of "head honcho".


Whenever looking at one these, I think back to the obscure but interesting "tup", and try to decide if this new thing is better. Because tup is awesome:

“How is it so awesome? In a typical build system, the dependency arrows go down. Although this is the way they would naturally go due to gravity, it is unfortunately also where the enemy's gate is. This makes it very inefficient and unfriendly. In tup, the arrows go up.”

https://gittup.org/tup/

On a more serious note, the whitepaper: https://gittup.org/tup/build_system_rules_and_algorithms.pdf


Tup is interesting, but it requires keeping track of dependencies in a SQL (sqlite?) database - already more complex than Hancho it seems.


[flagged]


?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: