intra-distribution generalization is also not well posed in practical real world settings. suppose you learn a mapping f : x -> y. casually, intra-distribution generalization implies that f generalizes for "points from the same data distribution p(x)". Two issues here:
1. In practical scenarios, how do you know if x' is really drawn from p(x)? Even if you could compute log p(x') under the true data distribution, you can only verify that the support for x' is non-zero. one sample is not enough to tell you if x' drawn from p(x).
2. In high dimensional settings, x' that is not exactly equal to an example within the training set can have arbitrarily high generalization error. here's a criminally under-cited paper discussing this: https://arxiv.org/abs/1801.02774
What we mean by x ~ p(x), y ~ p(y|x) is not x -> y st. x = f(y)
Reality itself has no probability distributions. Reality follows a causal model, where a causal relation is given in terms of necessity and possibility.
Eg., there is no such thing as Photo ~ P(Photo|PhotoOfCat) to be learned, only (All Causes) -> PhotoOfCat. Thus the setup of ML as y = f(x) is incorrect, there is no `f` which satisfies this formula (in almost all cases).
Consider the LLM case: reality has no P("The War in Ukraine"| TheWarIn2022) -- either the speaker meant TheWarIn2022, or they didnt. There's no sense in which reality has it that the utterance is intrinsically ambiguous (necessarily, for communication to be possible, pragmatics+semantics has to be able to fully resolve meaning).
So what are LLMs learning? Just an implied empirical distribution which is "smoothed over" the data just enough that it "hangs on to it, without repeating it" -- and this is vital, since if it were to try to generalise in the scientific sense, it would cease to be meaningful, since no algorithm which computes P(y|x) in this manner could capture the necessary relata that fully resolves meaning. Any system capable of modelling meaning would be probabilistic only in the sense of having a prior over such causal models: P("TheWarInUkraine"|TheWarIn2022, CausalModel) = 1, but P(CausalModel) < 1
So it's always undefined what it means to "generalise" wrt to an empirical distribution -- there aren't any.
When we say scientific theories generalise, we mean their posited necessary causal relations are maintained across irrelevant interventions. Eg., newton's theory of gravity generalises in that each term (F, M, m, r) is a valid measure of some property, and it remains a valid measure across a very large number of environments.
It fails to generalise for extreme values of M, m, etc.
In the ML sense, all intra-distributional generalisation fails for trivial permutations of any causal property, eg., m+dm -- because this induces an entirely new distribution. The "generalisation error" depends on what m+dm does within our model, but regardless, generalisation fails.
Scientific theories do not fail to generalise in this way, irrelevant causal interventions make no difference to the explanatory adequacy (or predictive power) of the theory.
Thanks for the clarification. I understand much better what you mean by "scientific generalization". I can't tell whether you're suggesting that LLMs are a dead end for modeling meaning or just that LLMs as estimating probability distributions is the wrong way to think about them?
LLMs fail to model meaning, but in doing so, model empirical distributions of meaningful tokens which is more useful, given the method being used.
If you were only modelling conditional probability, trying to model meaning this way, would make your solution worse.
ie., if LLMs really generalised in the ML sense, i.e., unbiasedly randomly sampled from some hypothetical "Meaning Distribution", they'd perform terribly -- since there is no such distribution to choose from.
By hijacking an empirical distribution, and "replaying it back", its actually possible to generate useful output.
Think about it this way, probability distributions are just measures of subjective confidence: each person has their own subjective confidence distribution P("some written words"|WhatTheyMean). If you could actually model this -- which one would you model? If you modelled any of them, you'd not be able to understand a great deal, since each person's confidence is poorly calibrated and missing meanings (eg., "acetylcholine").
So the LLM models some half-baked average of the subjective distributions of all speakers on the internet (/ in the training data) with respect to next word expectations.
This is not what we're modelling when we mean things (eg., when I say, "pass the pen", the cause of my saying it is: 1) need for a pen; 2) you having a pen; etc. -- these reasons are unavailable to the LLM, so it cannot model meaning). But as stated, it would be useless if it actually tried to -- because these methods are incapable of saying, "pass me a pen" and meaning it.
I am an AI researcher. Most actual AI researchers and engineers use very few of these tools - the only one being model providers like OpenAI API and public clouds (AWS, Azure, GCP). The rest of these are infra-centric tools that a16z is highly incentivized to over-inflate the importance of.
This does look like the sort of complex ecosystem that emerges when there is an inflection point and then, before consolidation happens. It reminds me of adtech in the early 2010s.
That said, while much of this might not have any real traction long-term, looking at what researchers use seems to miss the mark a bit. It’s like saying network technology researchers aren’t using Vercel.
There are some other useful ones in there. Hugging Face jumps out. W&B. I haven't used Mosaic but I could see myself for bigger projects, I know of at least two PIs @ Stanford using them.
I entered the title question of the article into ChatGPT, as I was curious whether it could factually synthesize information on the Internet similar to how this article did. Notably, it missed the detail about keeping vegetation away
<chatgpt>
Railway tracks have crushed stones, commonly known as ballast, alongside them for several important reasons. Here are the main reasons for using ballast on railway tracks:
Stability and Load Distribution: Ballast provides stability to the railway tracks and distributes the load of the trains evenly. The crushed stones interlock with each other and form a stable base for the tracks. This helps prevent the tracks from shifting or sinking under the weight of the trains.
Drainage: Ballast allows for effective drainage of water from the tracks. The open spaces between the crushed stones allow water to flow away from the tracks, preventing the accumulation of water. Proper drainage is essential to maintain the stability of the tracks and prevent damage caused by water-related issues like erosion or track bed saturation.
Track Alignment and Adjustment: Ballast allows for easy alignment and adjustment of the tracks. The crushed stones can be added or removed as needed to maintain the proper track alignment and ensure a smooth ride for trains. This flexibility is especially important during track maintenance or when new tracks are being laid.
Noise and Vibration Damping: Ballast helps in reducing noise and vibrations generated by passing trains. The layer of crushed stones absorbs and dampens the vibrations, minimizing the impact on the surrounding environment and nearby structures.
Track Maintenance: Ballast provides a protective layer for the tracks. It acts as a barrier between the track infrastructure and the underlying soil, preventing soil movement or erosion that could damage the tracks. Additionally, ballast makes it easier to detect and repair any issues with the track, such as loose fastenings or track bed settlement.
Overall, the use of ballast alongside railway tracks is crucial for maintaining the stability, alignment, drainage, and longevity of the tracks, ensuring safe and efficient train operations.
</chatgpt>
1X (Formerly Halodi Robotics) | https://1x.tech/ | Multiple Roles | Onsite in Oslo and Bay Area
1X is an engineering and robotics company producing androids capable of human-like movements and behaviors. The company was founded in 2014 and is headquartered in Norway, with over 50 employees globally. 1X's mission is to create robots with practical, real-world applications to augment human labor globally. We recently announced a $23.5M Series A2 funding led by OpenAI (https://1xtech.medium.com/1x-raises-23-5m-in-series-a2-fundi...)
When I joined Brain in 2016, I had thought the idea of training billion/trillion-parameter sparsely gated mixtures of experts was a huge waste of resources, and that the idea was incredibly naive. But it turns out he was right, and it would take ~6 more years before that was abundantly obvious to the rest of the research community.
As a leader, he also managed the development of TensorFlow and TPU. Consider the context / time frame - the year is 2014/2015 and a lot of academics still don't believe deep learning works. Jeff pivots a >100-person org to go all-in on deep learning, invest in an upgraded version of Theano (TF) and then give it away to the community for free, and develop Google's own training chip to compete with Nvidia. These are highly non-obvious ideas that show much more spine & vision than most tech leaders. Not to mention he designed & coded large parts of TF himself!
And before that, he was doing systems engineering on non-ML stuff. It's rare to pivot as a very senior-level engineer to a completely new field and then do what he did.
Jeff certainly has made mistakes as a leader (failing to translate Google Brain's numerous fundamental breakthroughs to more ambitious AI products, and consolidating the redundant big model efforts in google research) but I would consider his high level directional bets to be incredibly prescient.
OK - I can see the early ML push as obviously massively impactful, although by 2014/2015 we're already a couple of years after AlexNet, other frameworks such as Theano, Torch (already 10+ yrs old at that point), etc already existed, so the idea of another ML framework wasn't exactly revolutionary. I'm not sure how you'd characterize Jeff Dean's role in TensorFlow given that you're saying he lead a 100-person org, yet coded much of himself.... a hands-on technical lead perhaps?
I wonder if you know any of the history of exactly how TF's predecessor DistBelief came into being, given that this was during Andrew Ng's time at Google - who's idea was it?
The Pathways architecture is very interesting... what is the current status of this project? Is it still going to be a focus after the reorg, or too early to tell ?
Jeff was the first author on the DistBelief paper - he's always been big on model-parallelism + distributing neural network knowledge on many computers https://research.google/pubs/pub40565/ . I really have to emphasize that model-parallelism of a big network sounds obvious today, but it was totally non-obvious in 2011 when they were building it out.
DistBelief was tricky to program because it was written all in C++ and Protobufs IIRC. The development of TFv1 preceded my time at Google, so I can't comment on who contributed what.
1. what was the reasoning behind thinking billion/trillion parameters would be naive and wasteful? perhaps part are right and could inform improvements today.
2. can you elaborate on the failure to translate research breakthroughs, of which there are many, into ambitious AI products? do you mean commercialize them, or pursue something like alphafold? this question is especially relevant. everyone is watching to see if recent changes can bring google to its rightful place at the forefront of applied AI.
Can you link the source? My google search of "punctuality and logical objectivity were "white expectations" not to be applied to persons of color" didn't find any exact matches so I'm curious which guide you're referring to.
Like im white and canadian, which is basically the same culture as america, but like a quarter of these seem specificly american to me.
And a bunch seem rediculous to apply as "white culture". Like that your intent matters when it comes to morality. I can appreciate that different cultures fall differently on the spectrum between outcome vs intention, but i highly highly doubt that white people are the only ones who give primacy to intention.
The white people value rationality part seems straight up racist.
Absolutely these people are American centric. They aim to convince everyone that USA is uniquely evil and having people look into what humanity has really been up to in any other location or time period would completely discredit their goal.
Similar for communists. Don’t bother asking them for a historical track record of that idea.
>The framework recommends eight times that teachers use a troubling document, “A Pathway to Equitable Math Instruction: Dismantling Racism in Mathematics Instruction.” This manual claims that teachers addressing students’ mistakes forthrightly is a form of white supremacy. It sets forth indicators of “white supremacy culture in the mathematics classroom,” including a focus on “getting the right answer,” teaching math in a “linear fashion,” requiring students to “show their work” and grading them on demonstrated knowledge of the subject matter. “The concept of mathematics being purely objective is unequivocally false,” the manual explains. “Upholding the idea that there are always right and wrong answers perpetuates ‘objectivity.’ ” Apparently, that’s also racist.
It's funny how every single conversation about this stuff spends most of its time debating whether its really happening and almost no time on the actual content of the ideology being pushed.
Fair enough. I think a lot of these concerns are based on misrememberings and misunderstandings. I think it would have been helpful to mention when you posted the link that you had misremembered it, though.
The user “causi” might as well delete it. The whole premise of their top-level comment was characterizing a “racial sensitivity guide” of which they have no adequate recollection, and cannot provide evidence that it even existed as they described.
A more cynical person might say they’ve injected a red herring into the thread, wasting others’ time.
The phrase "punctuality and logical objectivity (were|are) "white expectations" not to be applied to persons of color" does not appear anywhere in that image, nor does any assertion of the kind.
It's just a (somewhat stereotypical - "bland is best," really?) list of the aspects of Western culture which the author believes derive from "white culture." It doesn't even present these points as negative.
If one were to criticize it, one could point out the degree to which many so-called "white cultural" ideals were founded on a religion not created by white people (Christianity) and non-Judeo-Christian (Greek and Hindu) philosophy and ideals. The European cultural complex didn't arise in a vaccuum.
Also that many of those bullet points seem to be so vague as to be universal. For instance, I doubt white people alone recognize the existence of cause and effect, respect for authority or patriarchy.
But the general premise - that the constructs of white culture (to the degree that it exists, which itself is debated) are considered default in American society, as a function of white supremacy reinforcing that default on all races and suppressing cultural variance in the name of conformity to a white-derived normativity - is correct.
That can't be, unless one subscribes to a highly racist view of the world. Concepts like "objective, rational linear thinking" are not anything to do with "whiteness" and to even suggest it is, is really quite profoundly insulting to everyone who isn't white.
>Concepts like "objective, rational linear thinking" are not anything to do with "whiteness" and to even suggest it is, is really quite profoundly insulting to everyone who isn't white.
Yes. As I spent several short paragraphs saying exactly just that, and in agreement with that premise. "The constructs of white culture" to which I referred do not refer to the specific items of that list, which I explicitly disclaimed and criticized in my comment.
I probably should have been more specific, precise and exact in my wording and not assumed that people would have put forth more than the minimum effort for a comment that goes against the HN zeitgeist, but I suspect I could have posted an entire essay and no one here would have bothered to actually read all the words. Mea culpa.
I did read it. You attempted to disclaim the specific claims of the poster whilst also arguing its premise was correct, but the premise of "white culture" (which does not exist, no debate needed) being functions of "white supremacy" that "suppresses cultural variance" is just nonsensical racist gibberish from the US left. Slowmovintarget has it right - we've seen this sort of thing before except with "the rich" or "capitalism" being used instead of "whiteness".
These are not the constructs of "white culture." That's merely the straw-man label adopted by the propaganda poster. This is just basic anti-capitalist, collectivist nonsense, with "whiteness" being the scapegoat instead of "the Jews" or "the rich."
Its the only society, that integrated large scale the best of other cultures, leaving behind some of the worst in all cultures. Japanese culture, Indian culture, european cultures it all went into the melting pot and was accepted, aprreciated and cheered on. That happened nowhere else on that scale, except as a remimport from america, and its very insulting to insinunuate that the rejected cultural pieces were rejected out of racism. The truth is, lots of other cultures have horrible bits and pieces, that were washed out of the blending for a very good reason. This very debate, would never be had, in a majority of countries around the globe.
Which makes it all the more incredible, that the "host" operating system culture of the melting pot is attacked and a attempt is made to "regulate" cultural blending. Its a majority vote system, without a need for regulation. The good thing goes in ("Yoga" as a sport, meditation system), the bad stays out (insane cults, backwards religous practices that are abanduned in all religions). Nobody needed to facilitate or control that, but a sparse, neutral stove and a pot.
The people vote, with there feet, there wallets and there actions. Unemployed commisars, race-warriors and revolutionarys are free to form a line for unemployment checks.
European cultures (if we're talking about the US) were never, and still aren't, accepted by indigenous people, much less cheered on. Africans were brought over as slaves and had all connections to their heritage and culture removed, Asians were harassed, sterotyped and made an underclass. Many immigrants had to change their names to "Americanized" versions at Ellis Island, especially Jews.
Yes, a lot of progress has been made in regards to cultural integration in the US, but that progress has been due to attempts to "regulate" cultural blending, rather than in spite of it. The US has had to be forced time and again - often with persistent struggle and at times violence - to live up to its stated ideals. Even today, the most prominent political movement across the US and Europe is explicitly Eurocentric and xenophobic, seeing immigration as a threat to cultural integrity and civil society and multicultural and multiracial integration is at best harmful, at worst impossible.
Are you American? As a foreigner it seems to me that america is incredibly culturally integrated and open compared to virtually anywhere else. People in my home country wouldn’t put up with a fraction of what Americans put up with.
What “hoarding of power?” Sonia Ghandi, who moved to India at 18, was president of India’s biggest political party for 30 years, but was never able to run for Prime Minister because she was born in Italy. Meanwhile the daughter of an Indian is vice president in the US. The US has got to be one of the few countries in the world where the country’s founding people no longer hold a majority of the political power.
Your one example is terrible, because the US would have prevented Sonia Ghandi from being President too for not being born in the country. Harris was born in the US.
If you believe the anglo-saxon majority no longer holds the majority of the political power in the US, ouf.
Anglo Saxons are distinctly a minority in the US, both in population and power. The last two Presidents had primarily German (Trump) and Irish ancestry (Biden). The House speaker has Italian ancestry, while the Senate majority leader is Jewish. On the Supreme Court there is a single Anglo Saxon.
Attacking mathematics and science is part of the playbook for collectivists. Objective truth is the enemy of propaganda. Propaganda is the means to control belief, which is required to control behavior, which is required to centralize control of economic activity (people's behavior) and redistribute resources.
Therefore the idea of the existence of objective truth (2+2=4) must destroyed and subverted in order for the "goodfact" to be swallowed.
I just finished The Road to Serfdom by F. A. Hayek, and he lays this out fairly well as a strategy of totalitarian regimes, or groups with totalitarian impulses. (Chapter 11, I believe)
I'll admit to not knowing my history well at this point, but wasn't mathematics and the sciences rather important to socialist countries? Now I don't doubt that ideology will have been injected in certain instances, but at least in mathematics and physics that's a bit harder to do and hence there were quite a number of accomplished mathematicians in the USSR.
Even beyond the natural sciences, this seems to have been somewhat the case. I remember my uncle, who is as unabashedly anti-communist as it gets, once showing me a grammar of some minority language written in the GDR and commenting, almost with admiration, that "back then, they invested a lot of resources on things like that".
Yes, the Soviet Union was actually quite enlightened on the science front, and they were especially good in teaching Mathematics and elevating talent. Just compare a soviet era math textbook to its “western” counterpart, the former is much more in-depth. Also, they were famously anti-religion, putting science above it.
> were never, and still aren't, accepted by indigenous people, much less cheered on.
This is simply not true. The natives embraced a lot of aspects of European culture. Nearly all developed métis cultures and deep commercial trade. Most of the initial treaties were ones of friendship and alliances.
Go visit a native museum, they want to preserve their culture, not expunge it from European influence. They hate wokeness and mass immigration too, by the way.
>The framework recommends eight times that teachers use a troubling document, “A Pathway to Equitable Math Instruction: Dismantling Racism in Mathematics Instruction.” This manual claims that teachers addressing students’ mistakes forthrightly is a form of white supremacy. It sets forth indicators of “white supremacy culture in the mathematics classroom,” including a focus on “getting the right answer,” teaching math in a “linear fashion,” requiring students to “show their work” and grading them on demonstrated knowledge of the subject matter. “The concept of mathematics being purely objective is unequivocally false,” the manual explains. “Upholding the idea that there are always right and wrong answers perpetuates ‘objectivity.’ ” Apparently, that’s also racist.
While others covered the guide part, I just wanted to note some context about punctuality. In case people didn't know, this is actually very culture-dependent. For example in Australia, guides organising meetings with
people from indigenous mobs sometimes refer to the meetings as running on "black man's time" . Essentially it means that the community comes first for them so when they have something else to do, it takes priority. The local person may turn up an hour late, or not at all, and it's not seen as lack of respect from their side.
Any international business has training for these kind of cultural differences.
For example, when and if you refer to people by their first name or more formally with full titles for example. Whether working long into the night means you're dedicated or inefficient.
These are not right or wrong as such, they're just culture, but if you blunder into the situation assuming that everyone has the same culture, then someone bluntly telling you what's wrong with an idea can seem intentionally aggressive and rude, or someone trying to politely say no might seem vague, or worse, be taken as tacit agreement.
Very interesting, cause that’s a reasonable explanation for that alternative norm.
Ultimately, if we’re gonna ask what norms are good we should have some cost function we are trying to optimize and ask for norms that encourage it to go up. Totally reasonable for different people to have different ideas of the good life and get different norms following that.
If you want examples finding them will not be hard. Scroll down to find the infographic from the National Museum of African American History & Culture. It has points on work ethic, future orientation and time that fit fine with what you’re responding to.
It would be a stunning twist of irony if this website uploaded images to a proprietary image dataset used for training AI models, pitching "uncorrelated data"
That highlights one of my "concerns" about AI training sets. There seems to be a very real risk of accidentally adding an image that you have no rights to, so what happens if someone finds out and demand their image removed from all models trained on that image?
You can't really back an image out of a model, you can only retrain without that image.
Creative Commons should add a new license that prohibits the use of ones work to train AI.
Because Creative Commons really needs another non-open license variant like non-commercial that no one really understands what they allow and don't allow so you're better off just not using them if you're being conservative. </s>
We are building an opt-in list, because a lot of people do want to be able to prompt AI with something like, "a cat in the style of me" or "me riding a dinosaur". That will be shared publicly, of course.
1. In practical scenarios, how do you know if x' is really drawn from p(x)? Even if you could compute log p(x') under the true data distribution, you can only verify that the support for x' is non-zero. one sample is not enough to tell you if x' drawn from p(x).
2. In high dimensional settings, x' that is not exactly equal to an example within the training set can have arbitrarily high generalization error. here's a criminally under-cited paper discussing this: https://arxiv.org/abs/1801.02774