Hacker News new | past | comments | ask | show | jobs | submit | diziet's comments login

I am not sure if we read the same article, but the article I read specifically calls out funding by nations as problematic:

> This is because WADA relies heavily on funding from stakeholders, some of which have had the highest number of doping cases to investigate, such as Russia, China and the US. This in turn creates serious challenges for WADA in maintaining its own independence and impartiality.

The US has had more olympic medals stripped than China due to doping, but less than Russia [1]. This could be because China is better at hiding/masking it. It could be because US athletes get tested more thoroughly. However, the article calls out the relationship between funding and nations that are interested in doping and winning athletic competitions as problematic. Your... quick dismissive and berating diversion to our current administration is not very enlightening.

[1] https://www.statista.com/statistics/1288717/countries-most-s...


In authoritarian countries, sports have long been an important tool for laundering their international image. East Germany had a particularly notorious government-run doping program (with horrible long-term health consequences), and the Russian system is well-documented too. Talented young athletes, often while still children, are coerced into doping, with the full backing of security services, which use all available resources to manipulate the subjects and cover up any traces. I can't think of any Western country ever placing such emphasis on professional sports. That's definitely a factor too.

If you haven't seen it, I recommend watching Icarus, an exceptional documentary on state-sponsored doping: https://www.imdb.com/title/tt6333060/


I was able to understand the Google Translate version well, but I am very familiar with the intricacies of BW and zerg 12hatch openers.

Chatgpt and Claude did an incredible job translating the korean text:

Claude:

  Today I'll teach you about the 12 Hatchery build. I'll explain the types of 12 Hatchery builds, their advantages and disadvantages, and the build orders in a simple but detailed way.
  Against Protoss, this is the build you use when you want to start with the most economic advantage. Against Terran, there are several builds you can do with 12 Hatchery, so I'll explain some of the most commonly used builds.
  The first is the two-hatchery build that starts with 12 Hatchery:
  12 Hatchery
  11 Spawning Pool
  10 Gas
  This build uses early gas, and it's often used when you want to quickly transition into a three-hatchery build with three gas bases.
  The second build is:
  12 Hatchery
  12 Pool
  12 Gas
  This build allows for moderately fast tech tree and moderately fast three-hatchery expansion. This build is commonly known as the "safe three-hatchery" build, and you can think of it as a build that enables both quick Mutalisks and quick third base.



Playing on a diagonal board is very frustrating. I estimate hard to be a bit under 2000 elo, medium around 1400.


You can set it to not be diagonal.


I could imagine situations where airlines need to get pilots/crew/planes to some location for the next flight and somehow recoup costs there, and are willing to cut prices on such multileg flights to take business away from their competitors... but I generally agree with your statement.


That doesn't explain why that would make the price cheaper to keep someone on the flight than letting them get off. It may be a meaningless difference, or even the same price, but cheaper to have the extra leg doesn't make sense if there is competition. That person has weight if nothing else and that costs money to haul.


Yep, it is not just travelers going to destinations that compete for chairs on an airplane but also workers of the airline too. The larger airlines have to balance these priorities


Factories/guns are quite rare, Oscillators are fairly common (but maybe not rendered properly on this? . . . is a common oscilator and I see many but they don't render propely) and spaceships tend to collide with stuff!


> Since September 2022, Patagonia has donated more than $71 million in earnings to numerous charitable and political causes, The New York Times reported earlier this year.

https://archive.is/kut8Y

Rough math at 100k / employee / year is 18 million cost over 2 years.

Edit: Patagonia plans to donate 1% of proceeds under the "1% for the Planet" pledge: https://www.patagonia.com/one-percent-for-the-planet.html


Their other past actions including knowingly selling products containing known carcinogens and exploiting slave labor can't be brushed under the rug with some greenwashing or philanthropy PR.


1% has existed for 40 years. More recently, they restructured as a nonprofit, somewhat similar to what Bose did (Bose profits go to MIT, Patagonia to environmental causes).


I think these types of posts are extremely nitpicky and not useful. From my experience, these sort of SEC filings are compiled by associates in a biglaw firm under a lot of time pressure, with file names like "version 41.B FINAL EDIT". Things get lost in translation. The signing off parties on the document are not sama: https://www.sec.gov/Archives/edgar/data/1849056/000110465921...

I think someone made a mistake. Also, from looking at the sig page the signing parties are others...


How does a mistake like that get made 3 or 4 years later when he was never the chair, and the only official source for that had been edited within days to remove that, and apparently there has never been a YC chair at all?


I imagine the associates googled for it and repeated whichever source existed at that time.


I had the opportunity to chat with Sam, and I would say that the majority of publicly written opinion about him is false.

He’s a pretty genuine guy with a lot of advice and wisdom. He definitely doesn’t come off as a capitalist but more of a helper type — someone who helps people become whole.

I felt like OpenAI and the sector in general made a lot of sense — now he can help everyone in the world.

As for the issue at hand, for one it’s doubtful he prepared this, but moreover, I have seen quite a many resumes in my day and not one wasn’t padded so to speak.


He thinks Worldcoin is a great idea.


It's a bad execution for a wholesome, world helping, idea.


It’s a bad solution for the wrong problem


> He definitely doesn’t come off as a capitalist but more of a helper type

A capitalist is an owner of the means of production. It's as simple as that. I think Altman qualifies, especially to the African workers who make $1.50 an hour to get PTSD[1][2][3] in his AI sweatshops.

[1] https://www.ft.com/content/ef42e78f-e578-450b-9e43-36fbd1e20...

[2] https://time.com/6275995/chatgpt-facebook-african-workers-un...

[3] https://www.theverge.com/features/23764584/ai-artificial-int...


You are very perspicacious if you managed to figure that out after a single chat.


Chats + Many emails

He always helped when I ping'd him. ¯\_(ツ)_/¯

Thank you for everything Sam.


It’s his job to build relationships and trade favors. I don’t see how it contradicts the majority of publicly written opinion about him.


To be fair, Lord Rasengan is a sovereign head of the Joseon Empire cybernation according to their own bio, so Sam was just conducting international relations for openAI's next country to exploit. Be careful you don't anger Lord Rasengan; they could throw you in internet jail.


The fact that the paper does not mention the word "hallucinations" in the full body text makes me think that the authors aren't fully familiar with the state of LLMs as of 2024.


This is a surprising result to me, given that (in my mind) the method simply does a few more forward passes, without encoding or transferring meaningful state between each pass.


You get embeddings at every activation layer of the network, at every token. That's extra state accessible to the network when running in recurrent 'generate the next token' mode.


How much extra state and computation is it per token exactly? Can we account for the improvement in just those terms?


That's the point of this paper: investigating whether 'chain of thought' promoting kinda-works because it actually induces reasoning, or whether it's just that more verbose answers give the model more tokens to work with, and this more state in which to hide interesting computations. This work introduced a way to give the model more tokens - and thus compute and state - to work with, independent of the prompt, which makes it easier to separate the computational impacts of verbosity from prompting.


Basically in a chain of N tokens, the state of the token at layer L reflects L * N / 2 intermediary states worth of info. (In practice a lot less, since attention is a somewhat noisy channel between tokens.)


I've only read the abstract, but also find this strange. I wonder if this is just tapping into the computational chains that are already available when tokens are further away, due to the positional encodings being trained that way. If so, that makes the reasoning/modeling powers of LLMs even more impressive and inscrutable.


Every token generates a KV cache entry based on this token and all previous KV cache entries. This happens in every layer. KV cache is 100k-1MB per token so quite a bit.

Edit: also you can forward generate 5 or 10 dots in batch without much overhead compared to a single dot since the main cost is pulling KV cache from VRAM so you have free tensor units.


I understand what you are trying to say, but correct me if I am wrong: Each forward pass is stateless. The KV cache you describe is a clever method to make compute scale closer to linear complexity for attention. I am trying to build some conceptual understanding of how performance on 3SUM problem improves on this fine-tuned example but not on the fine-tuned no shot.


In the case of 3sum, because the LLM has been fine tuned to use 'blank' token key values as a register to represent the sums of specific integer triplets.


I think the general idea is that since partial embeddings can be copied laterally (between token positions) from one layer of a transformer to the next, then additional work done at filler positions can also be copied to following real token positions. There's obviously a limit to how useful this can be since these are added parallel token steps rather than sequential transformer layer ones, and results from different experiments seem to be mixed.

Still, I don't see how this really works .. more compute / embedding transformations are being potentially applied to the prediction, but in what circumstances are these filler positions being used in a useful way? The filler token embeddings themselves presumably aren't matching attention keys, but positional encodings for adjacent tokens will be similar, which is maybe what triggers lateral copying into (and perhaps out of) filler positions?


Given an algorithm that takes 100 iterations, ask the computer to perform it in ten iterations. It gives you a nonsense answer. Tell it to do it in 100 steps and it might just be able to do it. What this tells us is that context size appears to be a limiting factor as models get bigger and bigger.


You can transfer some state just through dots. The dot count could mean "the first n ideas do not work, analyze the n+1 one, if that's bad, emit another dot"


And this works even if we assume the dots don't actually transfer information but just slightly mutate the state. First it tries a random approach, if that approach doesn't lead to a powerful result it emits a dot to try another random approach in the next round, until you get a sufficiently good path forwards.

Essentially a brute-force search. Which is a bit wasteful, but better than just blindly taking the first idea


Kind of like trying nonces until you find one that gives you a hash with lots of leading zeros? Dotchain?


Chain of Dot.


I believe the model is trained to always output the same number of dots. Additionally, the way LLMs generate output tokens are not really in in line with transferring state this way.


Can't anything be compressed into one word by comparison?


Godel and Heisenberg say no, in the most generalized case. Our universe is not deterministic


I'm not sure what Godel is doing here but quantum mechanics is consistent with the universe being deterministic

https://en.wikipedia.org/wiki/Superdeterminism


Explain how words do that?


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: