Isn't the point closer to, humans simply go "hey that seems to be taking a little long?" when a program doesn't halt, so why couldn't a machine? Basically a fairly obvious constraint on the solution space is "completes in less then N wall-clock time".
You can definitely detect a portion of halting machines this way, but it's probably a relatively small portion because the Busy Beaver numbers grow inconceivably quickly: the longest-running machines that halt can go practically forever, you'd need more time than the universe has negentropy left to detect them.
One sentence from The Economist seems to explain more than TFA: "Microsoft reported a 31% increase in its indirect (Scope 3) emissions last year from building more data centres (including the carbon found in construction materials) as well as from semiconductors, servers and racks."
So no, it's not about lack of renewable electricity.
Projects change license for new code going forward. The old code remains available under the previous license (and sometimes new). Here, they are able to change the conditions for existing weights.
Falcon2-11B was trained on 1024 A100 40GB GPUs for the majority of the training, using a 3D parallelism strategy (TP=8, PP=1, DP=128) combined with ZeRO and Flash-Attention 2.
From the fraud cases I've worked on, greed does factor in and you can often see a point in time within the data where it seems like they realize they are 'getting away with it'.
Yes.
It's speculative decoding but instead of generating just a few sequential tokens with the draft model they generate a whole tree of some sort of optimal shape with hundreds of possible sequences.
It ends up being somewhat faster than regular speculative decoding in normal setting (GPU only). If you are doing CPU offloading it's massively faster.
The top level view seems to leave a bit too much margins on mobile.
reply