Hacker News new | past | comments | ask | show | jobs | submit login
Wave Function Collapse library in pure C (github.com/krychu)
136 points by graderjs on Jan 5, 2022 | hide | past | favorite | 84 comments

Another explanation which I found useful: https://www.gridbugs.org/wave-function-collapse/

I feel like the algorithm is maybe poorly named. At first I thought this was a tool to help with computational quantum physics.

Wave Function Collapse is actually the same (or essentially identical) as Model Synthesis[1]. IMO that is a better name.


Taking a look at the comparison paper [0] provided on that site, there does seem to be a significant difference. The images generated with MS are more prone to long horizontal lines while WFC produces larger blocks of texture. This is hard to spot with more homogeneous textures like pipes, but it's more apparent with something like the grey cat faces, where WFC generates a variety of shapes and MS mostly makes very long ones. Going back up to section 2 I found the explanation: "Model synthesis sweeps through the grid in scanline order. WFC chooses the lowest entropy cell." The results from WFC look more natural and varied to me so I'd call this a significant advance.

But it does seem appropriate to say they're variations of the same algorithm. As for the name, I don't like either: Wave Function Collapse is based on a silly pop-science interpretation of quantum mechanics, and Model Synthesis is extremely generic and forgettable. And it's not even a synthesis, more of a filtering process. I guess I'd stick with Wave Function Collapse.

[0] https://paulmerrell.org/wp-content/uploads/2021/07/compariso...

I thought the same - though it actually gets its name from wave function collapse in quantum physics, as each pixel has multiple potential positions it can collapse to, leading to the procedurally generated map

As someone who has actually studied physics (including quantum mechanics) I find this analogy extremely poor. Keeping multiple potential candidates which you gradually reduce into a single solution is a very common approach for solving a variety of problems (e.g. sudoku solver). This is not at all what makes the wave function interesting or useful in quantum mechanics. And the whole concept that you iteratively collapse each "wave element" isn't a thing which is present in quantum mechanics at all.

Oh well, I guess people like the fancy name.

> Oh well, I guess people like the fancy name.

The way "quantum leap" is used by laypeople springs to mind too; literally the opposite of what they mean (not only tiny but random).

Quantum does not mean "small and discrete", it only means "discrete", as in "quantisation" - fitting values into a discrete defined set. For instance in mathematics the operation of "rounding" quantises fractions into the set of integers.

Yes, but the quantum leaps in physics are small.

I think it is (in this usage) meant as the gain in human understanding of the photoelectric effect (Einstein 1905), specific heat, diamagnetism, and so forth once quantum mechanics began development. The leap from classical mechanical theories being fundamental to quantum mechanical theories being fundamental (with classical physics emerging from that), in other words, was a big shift even though subatomic particles themselves, and their bound states, are generally very small.

Random doesn't really enter into it. One can get that in a firmly non-quantized theory, as in one of Einstein's other 1905 papers, https://en.wikipedia.org/wiki/Brownian_motion#Einstein's_the... https://history.aip.org/exhibits/einstein/essay-brownian.htm https://physicsworld.com/a/einsteins-random-walk/

is it though? I've heard the argument, but to me it still makes sense as it is used colloquially. It is a discrete jump forward, thus if you are not talking about electrons and just the latin definition can be an arbitrary sized quanta if defined to an an appropriate set. a quantum leap between scientific epochs is quite large for example

also just reread your comment. I don't agree with the definition "small and random", it is smallest unit of measurement an arbitrary system magnitude can change

Here's my interpretation: a classical leap is when you are somewhere, you expend a bit of energy you have, and you end up somewhere else as a result. Basically, you do something and something happens.

A quantum leap, on the other hand, happens when you are somewhere just sitting, and boom, now you're somewhere else. It just happened randomly, without any energy input from you. So to bring this back to the idiom, an example of a quantum leap would be this: "I had a dream yesterday that gave me the insight I needed to finish my proof of XYZ theorem!"

The main example of a quantum leap that comes to mind is between energy levels of an electron of an atom, and that does involve emitting or absorbing a photon, carrying the energy that needs to be accounted for.

I don’t think what you’ve said is correct.

It's also not clear what this algorithm has to do with quantum physics. Maybe that's how it was discovered or it has applications to physics, but the algorithm itself doesn't require any advanced math or have anything to do with the wave function. It can be described using elementary combinatorics.

It's reasonable to be surprised when a computer science term uses a metaphor, but I don't think it's fair to call it a poor name unless you also object to almost every other name in programming.

Strings have nothing to do with textiles. And, very confusingly, they aren't even related to fibers or threads. Zippers aren't related to textiles either.

Trees and bloom filters are not related to botany. Heaps and garbage collection are not about municipal waste management. Graph coloring, despite the term, has nothing to do with either pictures or pigments. Red-black trees aren't colored or botanical.

Bubble sort does not involve fluid dynamics. Neither does bucket or cocktail shaker sort for that matter.

Circular buffers are not round. You can't eat a spaghetti stack or poke your finger on a cactus stack.

Very naive question but is it even possible to do quantum experiments with code on a regular computer? Anyone knows if it's been tried?

Yes. It's just an equation. The complexity of quantum system that can be simulate is limited.


From the above link you can look single slit, double slit, triple slit, step, spike, or load an arbitrary image as a potential and do experiments in 2d box.

>This WebGL program simulates the quantum mechanics of a single particle confined in a 2D box, where inside this box the user can create new potential barriers and scatter Gaussian wavepackets off them. The full instructions are found here.


btw. Wave function collapse in quantum physics is completely speculative phenomenon. There is only apparent wave function collapse.

> btw. Wave function collapse in quantum physics is completely speculative phenomenon. There is only apparent wave function collapse.

To be more precise, the Born rule non-linear adjustment of the wave function to a single real value after a measurement is strictly necessary for QM to match experiments. Whether this should be interpreted as a physical phenomenon of wave function collapse, or as entanglement with the environment (MWI), or as an update of probabilities for hidden variables (Pilot wave) or some other phenomenon is speculative, but the wave function must be "collapsed" to a single real value after a measurement to correctly predict experimental results.

Adjustment is just cutting interference terms from equation. Mathematically apparent wave function collapse is caused by quantum decoherence.

Wave-function collapse as a priori process is just speculation. Finding that it actually happens would be new physics.

However you put it, [to a classical observer] the wave function still "collapses" after a measurement. This is most famously seen by adding a detector inside one of the slits for the double-slit experiment: the original wave function is not consistent with the experiment, you have to update the wave function after the interaction (or lack of interaction) with the detector.

Sure, in MWI the wave function of the universe never collapses, but something similar still happens for "parts" of the universal wave function.

Quantum decoherence may give you a diagonal density matrix but you still need to "collapse" that to a single outcome in some way or another.

In which case, where does the Born rule come from?

As far as I can tell, that is just explaining the numeric value of the Born rule, not the wave function collapse/update. I have not found any claims that Gleason's theorem (which I was unaware of, so thank you for pointing it out!) solves the measurement problem.

Huh. I can't believe I missed this. Thanks.

In quantum computing the consequence of the theorem (as I understand it) is that once you have asked every yes/no question there is to ask about the state, the state has collapsed.

I didn't think you needed the theorem for that; quantum entanglement behaves very similarly to information theory (specifically, information provenance).

If you like that simulation you might also like my own webGL Gross-Pitaevskii Equation (GPE) solver. It uses RK4 to simulate a 2D box of ultracold atoms undergoing Bose-Einstein Condensation.

The GPE models a condensate as a single-particle quantum wavefunction with a non-linear form of the Schrödinger Equation, so you get some interesting behaviour from the non-linearity while the simulation remains computationally feasible.

You can interact with the potential term by clicking and dragging inside the 2D box.


You can simulate quantum dynamics but the number of bits required is exponential with respect to the number of qubits simulated. So basically you can simulate really simple quantum systems but it becomes pretty much intractable when the system you are simulating has many parts.

Sure. We’ve been doing quantum theory with pencil and paper for over a century!

This gets brought up every time its mentioned. And I think its interesting. It's only tangentially related to actual physics. But parallelizing the WFC for large inputs might be exactly the kind of problem QC assists.

Reminds me of the 5-star WFC code on github: https://github.com/merrell42/model-synthesis/blob/master/src...

int***** support;

see https://paulmerrell.org/wp-content/uploads/2021/07/compariso...

Pierre Terdiman from nvidia is doing some experiments using it, for Omniverse level generation:


eli5 int*** support;

Whats the matter? Never used 3+ levels of indirection on a variable before? /s

Hah, this is actually a FiveStarProgrammer with 5 indirections, rather than https://wiki.c2.com/?FiveStarProgrammer

I've never heard this referred to as "Wave Function Collapse" before. Isn't this just constraint satisfaction being solved via backtracking? Like this is the standard way of solving the N-queens problem. Make a random choice, propagate the constraints, and repeat, backtracking if you reach a contradiction.


The original implementation[0] has a comprehensive description in the README file.

[0] https://github.com/mxgmn/WaveFunctionCollapse

After reading [1], I'm not really sure backtracking is even an integral part of WFC, as apparently it sometimes fails to find a solution and needs to retry from start. Backtracking seems to be more of an optional add-on.

[1] https://paulmerrell.org/wp-content/uploads/2021/07/compariso...

I think the statistical aspect is an important part of the algorithm; it's trying to match the statistics of the input data. I guess you can think of it as an approximate sampler for Markov Random Fields.

I tried to use this to generate Numberlink puzzles. As the input I gave a solved image from https://github.com/thomasahle/numberlink . However I never even got to the "progress screen" / "cells collapsed" count. I guess the input image has to be very small for this to work well.

I also tried generating some large images (like 1000x1000 pixels output, but small 64x64 input), but it only collapses around 100 cells/second, so this still takes about 3 hours assuming no contradictions are encountered.

Works well for small inputs and outputs though!

I wonder why the implementation is all in a header.

This is a typical idiom frequently seen in modern C and C++ codes. As https://github.com/krychu/wfc#how-to-use-the-library says, you need to define a macro in a C file in order to "expand" the actual code there.

Having the code in the header gives a bit more flexibility considering the file layout. IMHO this is an awkward consequence of the missing de-facto-standard in C/C++ build systems.

How is it more flexible exactly?

I hate this idiom every time I come across it. I like to look at the header file to see the interface and important documentation, and this just obscures it. Depending on your compiler it can make debugging a huge pain as well.

> I like to look at the header file to see the interface and important documentation

This still works nicely: the important stuff (documentation and public interface) is at the top, followed by the 'unimportant' implementation at the botton of the file.

STB-style headers are a bit different from typical C++ 'single header libraries' in that they put the declaration and implementation into separate sections in the header file (with the implementation inside an #ifdef/#endif pair), while C++ headers are usually a wild mix of class declarations intermixed with implementation code in template and inline functions (which is indeed harder to read).

I don't quite understand how debugging is affected? E.g. there are (usually) no inline functions in STB-style headers.

It does seem that 'modern' idioms tend to just be doing the same thing in a more obfuscated manner. Just a .c/.h pair would have been easier to add to your source.

This is a valid point. I was on the fence between doing .c/.h pair and a single .h. But finally took the inspiration from: https://github.com/nothings/stb.

Overall I think the code in the library is good, this was just the thing that stuck out to me most. Congrats on getting shared on HN.

That would be easier if the most popular build systems weren’t terrible

If the library is installed separately, the terribleness is about adding "-lfunkylib" to your linker command.

If you embed the source in your project, the terribleness is about adding to your Makefile:

  funkylib.o: funkylib.c
       $(CC) $<

  myfinalexe: ... funkylib.o ...
       # your existing executable creation (linking) command

Or I don't know, people actually bothered to learn them.

Learning something doesn't stop it being terrible. One time I ran `cmake`, and forgot to run `sudo -k` beforehand. It reconfigured my system.

That has more to do with sudo than cmake.

This might be an unpopular opinion, but compiling a program into a local directory shouldn't try to install packages globally. `make install`, I could understand doing that.

I had no idea that cmake would do this, after reading quite a lot of things about cmake v.s. make and how to write various makefiles for them. I posit that the C build tools are fundamentally hard to comprehend.

Any build tool can do whatever they feel like.

Blindly trusting it will lead to the same outcome, regardless of the programming language ecosytem.

Sure, but should it? Do you really want nasal demons from your build tool?

Due to the success of build tools like Gradle, build.rs or any other one that packages a Turing complete language to steer a build, apparently many want such daemons on their homedir.

Just because you can do something, doesn't mean you should. A `build.rs` is supposed to be for compiling non-Rust code and stuff like that, not mucking about with the package manager – and by and large, people don't use it for mucking about with the package manager.

People use it to muck with the host OS in any possible way, unless one bothers to do the necessary code audit.

Besides that was just one example, there are plenty of them with turing complete languages.

Its a slow to compile idiom. The only reason to use it is if you expect people to download the header and plop it in. Basically its an end-run around build systems.

In my tests with STB-style headers written in C between a few thousand to a few tens of thousands of line of code, the compilation speed difference between including the header with disabled implementation, versus including with the implementation section removed from the file is absolutely negligible. This is because the implementation block is already skipped in the preprocessor, and never parsed by the compiler frontend.

Again, this is different from typical C++ single-header-libraries (e.g. pretty much all C++ stdlib headers) where the implementation code is inlined or in template code.

This is a sad idom followed in C and C++ libraries that cannot be bothered to learn build tools, dealing with C and C++ as if they were scripting languages.

It's a completely valid and useful idiom. Whatever build system you use, it's trivial to add these kind of libraries to the project.

The fact that one has to learn complex build tools (and often multiple ones), is the sad thing here. Luckily there are also unity builds, which are extremely handy in many contexts, because they are so fast and easy to use between different platforms without having to deal with annoying external tools.

Have you ever tried using a unity build?

Another nonsense, no I haven't ever bothered with it.

A hack designed by those that cannot be bothered to modularize a build with binary libraries.

It definitely is a hack, but it's not nonsense. The speedups on large projects are significant, even on modularised ones.

Not entirely. If you think about it, a resulting C binary is just data and text - it’s all compiled into a single unit of globals and functions, much like a unity build layout. Regarding optimization it allows for LTO without the need for a link time optimizer. Regarding convenience, it spares one from function prototypes and using Make or CMake. Should one use Make to speed up build times it’s as simple as tossing the unity code you’re not working on into a precompiled header; this primes the compiler for a quick compilation of the newly written code. As for design, I find unity builds resemble the design mind of a mechanical, civil, or electrical engineer - a single assembly (main.c) incorporates sub assemblies (*.c) like a CAD designer would design SolidWorks parts for a vehicle or spacecraft

I rather prefer the software engineering mind to use binary libraries for what they were designed for.

You don't need an elaborate build system to add a single .c file to your project

and now you need to add a .h to where ever you need to call those function. This .h will need to know if it got included multiple times, and so isn't just a trivial declaration.

It's just easier to have a single .h you include.

>to add a .h to where ever you need to call those function

No, it can be in a single place and you can just #include it.

>This .h will need to know if it got included multiple times, and so isn't just a trivial declaration.

This is the point of using header guards.

>It's just easier to have a single .h you include.

Depending on the build tool it's the same or only marginally easier (you save like 10 characters)

All code in the header leads to a lot of duplicated work for the compiler, which means slow build times. You should avoid this unless it is absolutely necessary for performance to inline methods.

That work will be done by the preprocessor and should be fairly quick given it doesn't compile the duplicated code and only removes it.

It may still generate code for function definitions, which are deduped by the linker, so a lot of the code can still go through the whole compiler.

An STB-style header with the implementation disabled (which is the default) looks exactly the same to the compiler as a regular C header which only contains public API declarations (e.g. just struct declarations and function prototypes, and most importantly, no inline code).

All code that would otherwise live in .c files is between an #ifdef/#endif block which is only activated in a single compilation unit in the whole project.

Not sure how this approach would lead to redundant "function definitions" which would need to be deduped by the linker. The only overhead is in the preprocessor for skipping the implementation block, but that happens pretty much at IO speed - it's not comparable with the parsing overhead in typical C++ headers with template and inline code.

Sure, after brushing up on this "stb style", then I see what you mean.

Still, it seems like an ugly kludge to cope with a breathtakingly antiquated way of doing things.

> with a breathtakingly antiquated way of doing things.

I think it's mainly a fix/workaround for the breathtakingly antiquated build systems in the C/C++ world ;)

In the end, STB-style single-header vs. a single .h/.c pair is not all that different, both are equally straightforward to integrate into a project.

The actual problem are libraries made of dozens/hundreds/thousands of header and source files and coming with their own complex build system setup.

Why should it be any more complicated than that?

What ever happened to keeping it simple?

Things can be locally simple but have complex and chaotic implications. E.g. just by pushing the complexity up a level and dumping it on literally everyone else. I've learned over the years that complexity sometimes has a right place and a wrong place. (Most times complexity ends up everywhere, TBH.) I think resolving references between different parts of code in different compilation units is definitely within the purview of a compiler/build system, and should not be up to every programmer to flail at poorly with #ifdefs.

I've come across a lot of C programs that should be libraries, but they're written as an executable tool. They tend to assume that they're the only program in existence, never free their memory, use non-constant globals, stuff a whole bunch of logic into main(), etc.

By contrast, wfc is a breath of fresh air. The good stuff is in wfc.h, and the tool is in wfctool.c, which serves as an example for people who want to use the library. Consumers of the library have the option of writing a thin wrapper to produce an object file, or directly including it, if they prefer.

If this was a gargantuan library, it would make more sense to dicker about what's going into your build. But it's not a gargantuan library; it's tiny, and has the appropriate guard to prevent multiple compilations.

It is an atrocious fad imported a few years ago by young programmers coming from script languages, because they don't grasp modularity à la C and adding -lthatlib in a Makefile or a build script seems out of their reach :-(

see also Procedural Worlds from Simple Tiles - https://news.ycombinator.com/item?id=29866062

How about something like:


Run that in an infinite loop.

Anyways this post ignited some deep contemplation

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact