More

ActivePattern · 2025-10-10T16:32:21 1760113941

I don't think you've understood the paper.

- There are no experts. The outputs are approximating random samples from the distribution.

- There is no latent diffusion going on. It's using convolutions similar to a GAN.

- At inference time, you select ahead-of-time the sample index, so you don't discard any computations.

diyer22 · 2025-10-10T17:59:08 1760119148

I agree with @ActivePattern and thank you for your help in answering.

Supplement for @f_devd:

During training, the K outputs share the stem feature from the NN blocks, so generating the K outputs costs only a small amount of extra computation. After L2-distance sampling, discarding the other K-1 outputs therefore incurs a negligible cost and is not comparable to discarding K-1 MoE experts (which would be very expensive).

f_devd · 2025-10-11T15:14:02 1760195642

You are probably right, although it's not similar to a GAN at all, it is significantly more like diffusion (although maybe not latent, the main reason I assumed so is because the "features" are passed-through but these can just be the image).

The ahead-of-time sampling doesn't make much sense to me mechanically, and isn't really mentioned much. But I will hold my judgement for future versions since the FID performance of this first iteration is still not that great.

ActivePattern · 2025-10-09T18:49:53 1760035793

It doesn't play nice with a lot of popular Python libraries. In particular, many popular Python libraries (NumPy, Pandas, TensorFlow, etc.) rely on CPython’s C API which can cause issues.

jszymborski · 2025-10-09T19:22:24 1760037744

FWIW, PyPy supports NumPy and Pandas since at least v5.9.

That said, of all the reasons stated here, it's why I don't primarily use PyPy (lots of libraries still missing)

pletnes · 2025-10-10T13:36:14 1760103374

But pypy doesn’t necessarily perform as well, and it can’t jit compile the already compiled C code in numpy, so any benefits are often lost.

ActivePattern · 2025-08-06T14:17:55 1754489875

A “sufficiently smart compiler” can’t legally skip Python’s semantics.

In Python, p.x * 2 means dynamic lookup, possible descriptors, big-int overflow checks, etc. A compiler can drop that only if it proves they don’t matter or speculates and adds guards—which is still overhead. That’s why Python is slower on scalar hot loops: not because it’s interpreted, but because its dynamic contract must be honored.

pjmlp · 2025-08-06T14:34:32 1754490872

In Smalltalk, p x * 2 has that flow that as well, and even worse, lets assume the value returned by p x message selector, does not understand the * message, thus it will break into the debugger, then the developer will add the * message to the object via the code browser, hit save, and exit the debugger with redo, thus ending the execution with success.

Somehow Smalltalk JIT compilers handle it without major issues.

ActivePattern · 2025-08-06T14:51:01 1754491861

Smalltalk JITs make p x * 2 fast by speculating on types and inserting guards, not by skipping semantics. Python JITs do the same (e.g. PyPy), but Python’s dynamic features (like __getattribute__, unbounded ints, C-API hooks) make that harder and costlier to optimize away.

You get real speed in Python by narrowing the semantics (e.g. via NumPy, Numba, or Cython) not by hoping the compiler outsmarts the language.

afiori · 2025-08-07T01:19:15 1754529555

Python'a JIT could do the same, it could check if __getattribute__() is the default implementation and replace its call with p x directly. This would work only for classes that have not been modified at runtime and that do not implement a custom __getattribute__

pjmlp · 2025-08-06T15:15:08 1754493308

People keep forgetting about image based semantics development, debugger, meta-classes, messages like becomes:,...

There is to say everything dynamic that can be used as Python excuse, Smalltalk and Self, have it, and double up.

tekknolagi · 2025-08-06T16:26:51 1754497611

If I may toot my own horn: https://bernsteinbear.com/blog/typed-python/

cma · 2025-08-06T15:27:12 1754494032

edit and continue is available on lots of JIT-runtime languages

nu11ptr · 2025-08-06T14:41:37 1754491297

First, we need to add the word 'only': "not ONLY because it’s interpreted, but because its dynamic contract must be honored." Interpreted languages are slow by design. This isn't bad, it just is a fact.

Second, at most this describes WHY it is slow, not that it isn't, which is my point. Python is slow. Very slow (esp. for computation heavy workloads). And that is okay, because it does what it needs to do.

ActivePattern · 2025-06-04T14:32:13 1749047533

Ironically, this comment reads like it was generated from a Transformer (ChatGPT to be specific)

tough · 2025-06-04T15:03:20 1749049400

its the em dashes?

ActivePattern · 2025-05-23T13:07:39 1748005659

It's a OpenAI researcher that's worked on some of their most successful projects, and I think the criticism in his X thread is very clear.

Systems that can learn to play Atari efficiently are exploiting the fact that the solutions to each game are simple to encode (compared to real world problems). Furthermore, you can nudge them towards those solutions using tricks that don't generalize to the real world.

6stringmerc · 2025-05-23T23:00:17 1748041217

Right, and the current state of tech - from accounts I’ve read, though not first hand experienced - is the “black box” methods of AI are absolutely questionable when delivering citations and factual basis for their conclusions. As in, the most real world challenge, in the basic sense, of getting facts right is still a bridge too far for OpenAI, ChatGPT, Grok, et al.

See also: specious ethics regarding the training of LLMs on copyright protected artistic works, not paying anything to the creators, and pocketing investor money while trying to legislate their way around decency in engineering as a science.

Carmack has a solid track record as an engineer, innovator, and above the board actor in the tech community. I cannot say the same for the AI cohort and I believe such a distinction is important when gauging the validity of critique or self-aggrandizement by the latter, especially at the expense of the former. I am an outlier in this community because of this perspective, but as a creator and knowledgeable enough about tech to see things through this lens, I am fine being in this position. 10 years from now will be a great time to look back on AI the way we’re looking back at Carmack’s game changing contributions 30 years ago.

dgb23 · 2025-05-23T15:21:44 1748013704

That sounds like an extremely useful insight that makes this kind of research even more valuable.

ActivePattern · 2025-04-25T15:11:39 1745593899

I am quite confident that an LLM will never beat a top chess engine like Stockfish. An LLM is a generalist -- it contains a lot of world knowledge, and nearly all of it is completely irrelevant to chess. Stockfish is a specialist tuned specifically to chess, and hence able to spend its FLOPs much more efficiently towards finding the best move.

The most promising approach would be tune a reasoning LLM on chess via reinforcement learning, but fundamentally, the way an LLM reasons (i.e. outputting a stream of language tokens) is so much more inefficient than the way a chess engine reasons (direct search of the game tree).

ActivePattern · 2025-04-02T15:28:09 1743607689

Wouldn't the extra stamina have been rewarded, assuming creatine allowed you to perform extra repetitions? All exercises were done to repetition maximum.

lowqualityworld · 2025-04-02T15:56:50 1743609410

You're right, I read that too quickly.

TomHenderson3 · 2025-04-02T17:52:13 1743616333

You may want to delete your comment as it is spreading misinformation about this study.

lcnPylGDnU4H9OF · 2025-04-02T18:38:20 1743619100

They can't delete the comment because 1) it is past the 2-hour deletion window and 2) it has replies.

ActivePattern · 2025-04-02T15:24:57 1743607497

The study seems to have controlled for training intensity -- all exercises were done to repetition maximum.

ActivePattern · 2025-04-02T15:24:02 1743607442

If you read the study, you can see that they controlled for training intensity. All exercises were done to repetition maximum.

ActivePattern · 2025-04-01T16:33:56 1743525236

The idea of an asymmetric Chess starting position is very interesting, although it does introduce more risk of one side starting with a big advantage (perhaps this has been analyzed).

I also like that in this variant, castling works like normal -- that is one of the most unintuitive aspects of Chess960.

IvanChess744v3 · 2025-04-01T18:41:18 1743532878

We fixed the castling issue with Chess 744. Thoughts?

https://sites.google.com/view/chess-744/todays-744-game

ActivePattern · 2025-04-01T19:30:33 1743535833

Yes, that appears to be another good solution to the castling trickiness! And probably how you assume castling works in Chess960 if you weren't given the rules.

IvanChess744v3 · 2025-04-02T13:18:31 1743599911

But that's the problem with Chess 960 - that's not how you castle at all in that variant.

orthoxerox · 2025-04-01T20:45:29 1743540329

One option is bidding with points: player A looks at the position and bids X<0.5 points for the privilege of picking the color. Player B either accepts or raises the bid, the process repeats until one of them accepts or bids 0.5 points. The match is then played for the remaining 1-X points.

Or, more simply, player A is shown N random positions, picks one of them and lets player B pick the color.

ActivePattern · 2025-04-02T15:50:08 1743609008

I don't think that bidding system really works. If one side is strongly favored in the opening, the optimal bid would be essentially 0.4999999999... so that you can pick the color and win the game by a slim margin. Players then increase the bid with tiny steps ad infinitum.

The other idea works but is essentially just discarding all of the lopsided starting positions, in which case they might as well not be in the game.

bsder · 2025-04-01T21:18:22 1743542302

Or you just run things like duplicate bridge. Everybody plays the same set of randomized boards.

philsnow · 2025-04-01T17:13:42 1743527622

> one side starting with a big advantage

I have never played this variant of chess, but on the surface it seems that having both bishops on the same color would be a sizeable disadvantage.

The other randomized pieces (queens and knights) can get to any square, so having two knights start on dark squares, for instance, doesn't seem to really matter.

anamexis · 2025-04-01T17:19:40 1743527980

The bishops are required to be on different colors.