Hacker Newsnew | past | comments | ask | show | jobs | submit | scapp's commentslogin

It's somewhat a coincidence. In the paper, the dimension has to be larger than this constant K which is shown to be no larger than 1728. This is a fairly crude estimate that comes from multiplying a bunch of small numbers together, so the fact that it ends up being 12³ is not unimaginable. Then taking the dimension to be one larger than this estimate on K, we get 1729 = 12³ + 1³ (= 9³ + 10³).

However, 1728 isn't the minimum possible value for K. With more precise estimates, sketched in the paper, we can bring it down to 8 + ε. So assuming that this finer estimate works, using a 9 dimensional Fourier transform would also make the algorithm be O(n log(n)).


This is the easiest of the paradoxes mentioned in this thread to explain. I want to emphasize that this proof uses the technique of "Assume P, derive contradiction, therefore not P". This kind of proof relies on knowing what the running assumptions are at the time that the contradiction is derived, so I'm going to try to make that explicit.

Here's our first assumption: suppose that there's a set X with the property that for any set Y, Y is a member of X if and only if Y doesn't contain itself as a member. In other words, suppose that the collection of sets that don't contain themselves is a set and call it X.

Here's another assumption: Suppose X contains itself. Then by the premise, X doesn't contain itself, which is contradictory. Since the innermost assumption is that X contains itself, this proves that X doesn't contain itself (under the other assumption).

But if X doesn't contain itself, then by the premise again, X is in X, which is again contradictory. Now the only remaining assumption is that X exists, and so this proves that there cannot be a set with the stated property. In other words, the collection of all sets that don't contain themselves is not a set.


M.A. was the highest degree available in the UK at the time [1]. The closest equivalent to a PhD program might be the Prize Fellowship that Hardy had from 1900 to 1906, though it's certainly not one to one. Don't get the impression that Hardy was done with his education when he got his masters degree.

[1] https://www.economics.soton.ac.uk/staff/aldrich/Doc1.htm

[2] https://royalsocietypublishing.org/doi/10.1098/rsbm.1949.000...


Interesting. The Wikipedia page for his contemporary Littlewood lists a Doctoral Advisor.

https://en.wikipedia.org/wiki/John_Edensor_Littlewood

Edit: The Mathematics Genealogy Project lists Littlewood as an M.A. as well.

https://www.mathgenealogy.org/id.php?id=10463


Nothing wrong with starting at 1 for induction, but yes, having an additive monoid is nice (still get a multiplicative monoid with N*)


*Bill Odenkirk. Bob is his brother


Nice! I wondered if someone would beat that TAS level anyway.


It's easy to find the UUID associated to a nickname. For example, here's [0] the one for `accrual`

[0] https://namemc.com/profile/Accrual.1


It's always fun to make fun of cranks. Thanks for linking that. The author really needs to find the right statement of what they call the Nested Interval Theorem. I cracked up at the complete misuse of it in the "Interval Argument for the Rationals" section


Kind of disappointing that it doesn't have the victory screen. But man, my fingers still remember that combo you have to do for those 4 wide areas.


2 line explanation of where e comes from here: to represent a number x in base b, you need roughly log_b(x) digits. If you weight that by the number of different digits in base b, you get b * log_b(x) = b * log(x)/log(b) = b/log(b) * log(x) (where now the log is any base you care to choose).

So assuming x > 1, this is minimized precisely when b/log(b) is minimized. The derivative of b/log(b) is (log(b) - log(e))/log(b)^2, so this is zero when b = e. (The second derivative is 1/(e log(e)) > 0, so this is a minimum).


Why should we weigh by the number of different digits? I.e. is there an argument why the cost of a single digit is linearly proportional to the number of values it can hold?

To me this seems like the weak point in the argument.


Why does the number of digits in the base should matter?


It's called radix economy. Different radices have different efficiencies as defined by the ratio of number of digits in a radix alphabet over the number of placeholders required.

Simple example, base 1 is obviously inefficient once counting past 1. Similarly base 1 million is inefficient (until you're counting in the bazillions)


If you can store one digit of a number base million in one physical element then it's exactly as efficient when it comes to storage of digits as binary. One element per one digit.

But it's about 20 times more efficient when it comes to storage of whole numbers because you can store number up to a million in a single physical elements while binary needs 20 elements.


I see what you're trying to say, like I understand the intuition, but like I was saying earlier this is an already understood topic https://en.wikipedia.org/wiki/Radix_economy and that point has been addressed. Sorry, not trying to be a jerk. Just wanted to point that out in case you're interested in this subject.

In the example you gave, I'm assuming by 'physical element' you mean 'placeholder' or digit. Storing more numbers in a single placeholder seems like you're just getting efficiency for free, but that's not how information works. You have to come up with a unique symbol for one million numbers (0 - 999,999). Which you have to pay for.

base 64 is a more realistic example. With 64 characters per digit, it may seem like it's more efficient, since you require less digits to express the same number as base 10 or 2, but you still have to encode 64 unique characters, and it ends up being less efficient. That doesn't mean less efficient is worse. It just means it is more specialized and used for different things. For example, base64 gets used when you want to encode information in a small amount of space. Otherwise, for the actual storage and computation of data, lower bases are preferred, and base 64 is still obviously stored as binary.

For what it's worth, the base integer with the best radix economy is 3, followed by 2.


Thanks. I should have googled this when you mentioned it.

I see that radix economy makes some sense if you assume that the "cost" of the "element" is not fixed but linearly proportional to the number of different states it needs to hold. It was apparently linear for the computers built with triodes but log2(n) to hold n states seems more realistic in electronics. Maybe little more for error resiliency.


I think the b factor is due to the analog representation: if the base has b values, you use on average b/2 voltage for representation. I would think the mean square value in might have been more appropriate though? (as voltage energy losses usually are V^2 / R or C V^2)

Repeating the calculation (for k bits and b values) using square values, you get E = b^2 * log_b(k) and dE/db = 2b/ln(b) - b^2 / (ln(b))^2 / b = b/ln(b) * (2 - 1/ln(b)). Setting dE/db to 0, we get 2 - 1/ln(b) = 0, ln(b) = 1/2,

b_opt = sqrt(e) = 1.6487...


In our computers, to hold b states we are using elements which cost is not proportional to b or b^2 but log2(b).

Which base is optimal then? 2? Any?


That's true, if you use binary encoding to represent the b states. I were referring to base b representation, in which by definition we use b states to represent log2(b) bits: for example, 4 states to represent bits 00,01,10,11, which could correspond to voltage levels 0V,1V,2V,3V of a variable (analogous for other values of b).

In this case, the average voltage used would be the average of 0, 1, 2^2, 3^2, which is 0+1+...+(b-1)^2 / b ~ b^2 (proportional to b^2). I showed that in this case in theory the "optimal" base would be sqrt(e), but I can't think of simple representations of a variable with less than 2 states (you could have say 2 variables representing less than 4 states, for example the 3 pairs { 0V/1V, 1V/0V, 0V/0V }, but that seems a tad complicated![1]), so the least non-trivial integer base is just 2.

I think in reality this kind of argument doesn't apply to real computers in a straightforward way, because there are many other factors at play than what I would call a 'dynamic power consumption' associated with the square voltage. There are things like leakage currents (that consume some fixed power per transistor) and other effects that significantly complicates things, so that such simple claims of being "optimal" don't apply (so I agree with the other comment about avoiding absolute claims!). But they can inspire designs and we can see if there are any advantages in more realistic cases.

[1] Further analysis: In our example, the 3 voltage pairs have mean square voltage 1/3. If we instead use the pairs { 0V/0V, 0V/1V, 1V/0V, 1V/1V }, the mean squared voltage is 1/2. In the first case, we encode log2(3)/2 bits of information per voltage value, in the second, 1 bit per value. So the energy per bit is (energy per value)/(bits per value) = (1/3)/(log2(3)/2) = 0.420... for the first case, and 0.5 for the second case. Unfortunately, it's a loss!

As an exercise, I believe that if we delete the state 1V/1V/.../1V of a k-voltage tuple, for large enough k-tuple, we come out ahead (probably not by much), although again that's not necessarily useful in practice.


Correction: I've just realized I've drawn an incorrect conclusion, the energy per bit in the case I've shown is indeed lower than the standard 2 bit encoding, so it's actually a win! Again, for a myriad reasons, I'm not sure you'd want to use that encoding in real life, but it does consume a little under 20% less power in terms of V^2 losses.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: