don't forget Jane Jacobs winning a fight against NY's planned highway expansion, giving a potent narrative and organizing blueprint for local activists to oppose all sorts of development. Two centuries of conflict between federal and local authorities over racial integration leading to white flight and establishment of suburbs bitterly resistant to outside control. Seventies anti-growth and anti-industrial sentiment driven by anxiety over ecological disasters and a new awareness of physical limits to growth.
And these are all on top of single family zoning with decades of nimbyism and near ban of mid rise housing and impossible parking requirements making it impossible to densify except by way to expensive “luxury” high rises. That’s left pretty much all desirable areas with far too little housing stock.
Well that settles it, the top comment is right all Biden’s fault. /s
Why aren’t people ashamed of posting such obviously wrong hot takes?
I have experienced the flip of this: a less experienced MD specialist not recognizing a lab error, and wanting to act on it, which was recognized (correctly) as an error by PA primary care and confirmed in retesting. In this case the MD was significantly younger/less total years of experience and maybe that had to do with it?
PA experience isn't the equivalent of training as a resident (and I think we should be training more MDs) but the MD isn't always right either.
These are all topics in the article, which doesn't deny energy in = out at all.
It specifically points out that it's clear that this is how the medical interventions discussed in the article work: reducing calories by altering eating behavior.
Also the article: it isn't taboo to say "just eat less", just that it is as ineffective as telling people with compulsive alcoholic behaviors to "just drink less".
"Although needle-sharing among drug addicts was one of the main reasons the disease spread so quickly, most HIV transmissions in Russia — 57 percent — are now a result of heterosexual sex. Drug use is responsible for 40 percent, while gay sex accounts for around 3 percent, according to Russia’s Federal Research Center for AIDS Prevention and Control in Moscow."
The majority of people living with HIV worldwide are straight women (52%):
I agree with you that the claim from some people to make like sex and number of sexual partners has nothing to do with STDs is ridiculous. It annoys me a great deal.
Gay men are disproportionately at risk and affected due to number of partners and type of sex. But most people with AIDS are straight.
Anyway your LGB link is funny because it claims (I think out of confusion) that lesbians have more partners as well. But lesbians also have the lowest std rates of any group. Making them obviously the best and most morally superior of the orientations.
Floating point math is not associative: (a + b) + c != a + (b + c)
This leads to different results from accumulating sums in different orderings. Accumulating in different ordering is common in parallel math operations.
So I guess here my question is why a GPU would perform accumulations in a nondeterministic way where the non-associativity of FP arithmetic matters. You could require that a + b + c always be evaluated left to right and then you've got determinism, which all things being equal is desirable. Presumably because relaxing that constraint allows for some significant performance benefits, but how? Something like avoiding keeping a buffer of all the weights*activations before summing?
Basically because it affects performance. You really don't want to write any buffers!
This is sort of a deep topic, so it's hard to give a concise answer but as an example: CuBLAS guarantees determinism, but only for the same arch and same library version (because the best performing ordering of operations depends on arch and implementation details) and does not guarantee it when using multiple streams (because the thread scheduling is non-deterministic and can change ordering).
Determinism is something you have to build in from the ground up if you want it. It can cost performance, it won't give you the same results between different architectures, and it's frequently tricky to maintain in the face of common parallel programming patterns.
Consider this explanation from the pytorch docs (particularly the bit on cuda convolutions):
There has been speculation that GPT4 is a mixture of experts model, where each expert could be hosted on a different machine. As those machines may report their results to the aggregating machine in different orders then the results could be summed in different orders.
Maybe my assumption of how MoE would/could work is wrong, but I had assumed that it means getting different models to generate different bits of text, and then stitching them together - for example, you ask it to write a short bit of code where every comment is poetry, the instruction would be split (by a top level "manager" model?) such that one model is given the task "write this code" and another given the task "write a poem that explains what the code does". There therefore wouldn't be maths done that's combining numbers from the different experts, just their outputs (text) being merged.
Have I completely misunderstood, does Mixture of Experts somehow involve the different experts actually collaborating on the raw computation together?
Could anyone share a recommendation for what to read to learn more about MoE generally? (Ideally that's understandable by someone like me that isn't an expert in LLMs/ML/etc.)
for performance reasons, yes, I believe it's because the accumulation is over parallel computations so the ordering is at the mercy of the scheduler. but I'm not familiar with the precise details
edit: at 13:42 in https://www.youtube.com/watch?v=TB07_mUMt0U&t=13m42s there is an explanation of the phenomenon in the context of training but I suspect the same kind of operation is happening during inference
CMU School of Computer Science has its own admissions for undergrad which is dramatically more selective than CMU as a whole. It really is well known for undergrad CS, it's not just a halo effect of the grad program.
CMU has 7000 undergrads, of whom 600 are in the School of Computer Science. So, if you take a random CMU student who decided to start a company and looked for VC funding, you've got a <10% chance of an SCS alum and a 90% chance of a non-SCS student (assuming likelihood of starting companies is uniform, which who knows if it is but I have no reason to assume otherwise). Whereas at Stanford 100% of students had to get into Stanford.
2008 happens, fed takes interest rates to zero in final days of GWB.
Ultra-low interest rates through 2019 under Obama and then Trump.
Trump demanding negative rates and interest rates dropping again, rates are dropped again even prior to covid also artificially juicing things.
Then the covid era and interest rates dropping to near zero again.
The road to where we are at now with housing is long and involves many politicians, presidents and parties.