Hacker News new | past | comments | ask | show | jobs | submit login

An integer is just a concatenation of bits. Floating point appears more complicated but from an information theory perspective it is also just a concatenation of bits. If, for the sake of argument, one replaced a 64-bit int with 64 individual bits, that's really the same amount of information and a structure could hypothetically then either choose to recreate the original 64-bit int, or use the 64-bits more efficiently by choosing from the much larger set of possibilities of ways to use such resources.

Trits are helpful for neural nets, though, since they really love signs and they need a 0.

So from the perspective that it's all just bits in the end the only thing that is interesting is how useful it is to arrange those bits into trits for this particular algorithm, and that the algorithm seems to be able to use things more effectively that way than with raw bits.

This may seem an absolutely bizarre zigzag, but I am reminded of Busy Beavers, because of the way they take very the very small primitives of a Turing Machine, break it down to the smallest pieces, then combine them in ways that almost immediately cease to be humanly comprehensible. Completely different selection mechanism for what appears, but it turns out Turing Machine states can do a lot "more" than you might think simply by looking at human-designed TMs. We humans have very stereotypical design methodologies and they have their advantages, but sometimes just letting algorithms rip can result in much better things than we could ever hope to design with the same resources.




> So from the perspective that it's all just bits in the end the only thing that is interesting is how useful it is to arrange those bits into trits for this particular algorithm, and that the algorithm seems to be able to use things more effectively that way than with raw bits.

Thank you. I find many other things interesting here, including the potential implications for hardware, but otherwise, yes, I agree with you, that is interesting.


This sort of breakdown also reminds me of the explanation of why busy beavers grow faster than anything humans can ever define. Anything a human can define is a finite number of steps that can be represented by some turing machine of size M. A turning machine of size N > M can then use M as a subset of it, growing faster than than the turing machine of size M. Either it is the busy beaver for size N, or it grows slower than the busy beaver for size N. Either way, the busy beaver for size N grows faster than whatever the human defined that was captured by the turning machine of size M. This explanation was what helped me understand why busy beavers is faster growing than any operator that can be formally defined (obviously you can define an operator that references busy beaver itself, but busy beaver can be considered to not be formally defined, and thus any operator defined used it isn't formally defined either).

The bit about floating point numbers just being a collection of bits interpreted in a certain way helps make sense why a bigger model doesn't need floating points at all.


> We humans have very stereotypical design methodologies and they have their advantages, but sometimes just letting algorithms rip can result in much better things than we could ever hope to design with the same resources.

Yes. Though here the interesting point is not so much that these structures exist, but that 'stupid' back-propagation is smart enough to find them.

You can't find busy beavers like that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: