I think you're misunderstanding. AI is not a black-box, and neither is a compile...

drdeca · 2025-08-26T18:52:45 1756234365

If there are cryptographically secure program obfuscation (in the sense of indistinguishability obfuscation) methods, and someone writes some program, applies the obfuscation method to it, publishes the result, deletes the original version of the program, and then dies, would you say that humanity "knows how the (obfuscated) program works, and what it does"? Assume that the obfuscation method is well understood.

When people do interpretabililty work on some NN, they often learn something. What is it that they learn, if not something about how the works?

Of course, we(meaning, humanity) understand the architecture of the NNs we make, and we understand the training methods.

Similarly, if we have the output of an indistinguishability obfuscation method applied to a program, we understand what the individual logic gates do, and we understand that the obfuscated program was a result of applying an indistinguishability obfuscation method to some other program (analogous to understanding the training methods).

So, like, yeah, there are definitely senses in which we understand some of "how it works", and some of "what it does", but I wouldn't say of the obfuscated program "We understand how it works and what it does.".

(It is apparently unknown whether there are any secure indistinguishability obfuscation methods, so maybe you believe that there are none, and in that case maybe you could argue that the hypothetical is impossible, and therefore the argument is unconvincing? I don't think that would make sense though, because I think the argument still makes sense as a counterfactual even if there are no cryprographically secure indistinguishability obfuscation methods. [EDIT: Apparently it has in the last ~5 years been shown, under relatively standard cryptographic assumptions, that there are indistinguishability obfuscation methods after all.])

mr_toad · 2025-08-25T20:41:52 1756154512

> AI is not a black-box

Any worthwhile AI is non-linear, and it’s output is not able to be predicted (if it was, we’d just use the predictor).