As goofy as I personally think this is, it's pretty cool that we're converging on something like C3P0 or Plankton's Computer with nothing more than the entire corpus of the world's information, a bunch of people labeling data, and a big pile of linear algebra.
There probably is, since I believe tensors were basically borrowed from Physics at some point. But it's probably not of much practical use today, unless you want to explore Penrose's ideas about microtubules or something similarly exotic.
Gains in AI and compute can probably be be brought back to physics and chemistry to do various computations, though, and not limited to only protein folding, which is the most famous use case now.
For what it's worth, the idea of a "tensor" in ML is pretty far removed from any physical concept. I don't know its mathematical origins (would be interesting I'm sure), but in ML they're only involved because that's our framework for dealing with multi-linear transformations.
Most NNs work by something akin to "(multi-)linear vector transformation, followed by elementwise nonlinear transformation", stacked over and over so that the output of one layer becomes the input of the next. This applies equally well to simple models like "fully-connected" / "feed-forward" networks (aka "multi-layer perceptron") and to more-sophisticated models like transformers (e.g. https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436...).
It's less about combining lots of tiny local linear transformations piecewise, and more about layering linear and non-linear transformations on top of each other.
I don't really know how physics works beyond whatever Newtonian mechanics I learned in high school. But unless the underlying math is similar, then I'm hesitant to run too far with the analogy.
I realized that my other answer may have come off as rambling for someone not at all familiar with modern physics. Here's a summary:
Most modern physics, including Quantum Mechanics (QM) and General Relativity (GR) is represented primarily through "tensor fields" on a type of topological spaces called "manifolds". Tensor fields are like vector fields, just with tensors instead of vectors.
These tensor fields are then constrained by the laws of physics. At the core, these laws are really not so much "forces" as they're symmetries. The most obvious symmetries is that if you rotate or move all objects within a space, the physics should be unaltered. Now if you also insist that the speed of light should be identical in all frames of reference, you basically get Special Relativity (SR) from that.
The forces of electromagnetism, weak and strong force follow from invariance under the combined U(1) x SU(2) x SU(3) symmetries. (Gravity is not considered a real force in General Relativity (GR), but rather as interaction between spacetime and matter/energy, and what we observe as Gravity is similar to time dilation of SR, but with curved space)
Ok. This may be abstract if you're not familiar with it, and even more if you're not familiar with Group Theory. But it will be referenced further down.
"Manifolds" are a subset of topological spaces that are Euclidian or "flat" locally. This flatness is important, because it's basically (if I understand it correctly myself) the reason why we can use linear algebra for local effects.
I will not go into GR here, since that's what I know least well, but instead focus on QM which describes the other 3 forces.
In QM, there is the concept of the "Wave Function" which is distributed over space-time. This wave-function is really a tensor with components that give rise to observable fields, such as magnetism, the electric field and to the weak and strong forces. (The tensor is not the observed fields directly, but a combination of a generalization of the fields and also analogues to electric charge, etc.)
So how physics calculations tends to be done, is that one starts with assuming something like an initial state, and then impose the symmetries that correspond to the forces. For instance, two electrons wavefunctions may travel towards the same point from different directions.
The symmetries will then dictate what the wave function looks like at at each later incremental point in time. Computationally, such increments are calculated for each point in space using tensor multiplication.
While this is "local" in space, points in space immediately next to the point we're calculating for need to be include, kind of like for convolutional nets.
Basically, though, it's in essence a tensor multiply for each point in space to propagate the wave function from one point in time to the immediate next point.
Eventually, once the particles have (or have not) hit each other, the wave functions of each will scatter in all directions. The probability for it to go in any specific direction is proportional to the wave function amplitude in that direction, squared.
Since doing this tensor multiplication for every point in space requires infinite compute, a lot of tricks are used to reduce the computation. And this where a lot of our intuitions about "particles" show up. For simple examples, one can even do very good approximations using calculus. But fundamentally, tensor multiplication is the core of Quantum Mechanics.
This approach isn't unique to QM, though. A lot of other Physics is similar. For instance, solid state physics, lasers or a lot of classical mechanics can be described in similar frameworks, also using tensors and symmetry groups. (My intuition is that this still is related to Physics involving local effects on "locally flat" Manifolds)
And this translates all the way up to how one would do the kind of simulations of aspects of physical worlds that happen in computer games inside GPU's, including the graphics parts.
And here I believe you may see how the circle is starting to close. Simulations and predictions of physical systems at many different levels of scale and abstraction tend to reduce to tensor multiplication of various sorts. While the classical physics one learns in high school tend to have problems solvable with calculus, even those are usually just solutions to problems that are fundamentally linear algebra locally.
While game developers or ML researches initially didn't use the same kind of Group Theory machinery that Physics have adapted, at least the ML side seem to be going in that direction, based on texts such as:
(There appears to be a lot of similar findings over the last 5-6 years or so, that I wasn't fully aware of).
In the book above, the methodology used is basically identical to how theoretical physics approach similar problems, at least for networks that describe physical reality (which CNNs tends to be good for)
And here is my own (current) hypothesis why this also seems to be extendable to things like LMM, that do not at face value appear like physics problems:
If we assume that the human brain evolved the ability to navigate the physical world BEFORE it developed language (should be quite obvious), it should follow that the type of compute fabric in the brain should start out as optimized for the former. In practice, that means that at the core, the neural network architecture of the brain should be good at doing operations similar to tensor products (or approximations of such).
And if we assume that this is true, it shouldn't be surprising that when we started to develop languages, those languages would take on a form that were suitable to be processed in compute fabric similar to what was already there. To a lesser extent, this could even be partially used to explain why such networks can also produce symbolic math and even computer code.
Now what the brain does NOT seem to be evolved to do, is what traditional Turing Machine computers are best at, namely do a lot very precise procedural calculations. That part is very hard for humans to learn to do well.
So in other words, the fact that physical systems seem to involve tensor products (without requiring accuracy) may be the explanation to why Neural Networks seem to have a large overlap with the human brain in terms of strengths and weaknesses.
My understanding (as a data engineer with a MSc in experimental particle physics a long time a ago), is that the math representation is structurally relatively similar, with the exception that while ML tensors are discrete, QM tensors are multi-dimensional arrays locally but are defined as a field over continous space.
Tensors in Physics are also subject to various "gauge" symmetries. That means that physical outcomes should not change if you rotate them in various ways. The most obvious is that you should be able to rotate or translate the space representation without changing the physics. (This leads to things like energy/momentum conservation).
The fundamental forces are consequences of some more abstract (at the surface) symmetries (U(1) x SU(2) x SU(3)). These are just constrains on the tensors, though. Maybe these constraints can be in the same family as backprop, though I don't know how far that analogy goes.
In terms of representation, the spacetime part of Physics Tensors is also treated as continous. Meaning that when, after doing all the matrix multiplication, you come to some aggregation step of calculations, you aggregate by integrating instead of summing over spacetime (you sum over the discrete dimensions). Obviously though, for when doing the computation in a computer, even integration reduces to summing if you don't have an exact solution.
In other words, it seems to me that what I originally replied to, namely the marvel about how much of ML is just linear algebra / matrix multiplication IS relatively analogous to how brute force numerical calculations over quantum fields would be done. (Theoretical Physicists generally want analytic solutions, though, so generally look for integrals that are analytically solvable).
Both domains have steps that are not just matrix multiplication. Specifically, Physics tend to need a sum/integral when there is an interaction or the wave function collapses (which may be the same thing). Though even sums can be expressed as dot products, I suppose.
As mentioned, Physics will try to solve a lot of the steps in calculations analytically. Often this involves decomposing integrals that cannot be solved into a sum of integrals where the lowest order ones are solvable and also tend to carry most of the probability density. This is called perturbation theory and is what gives rise to Feynmann diagrams.
One might say that for instance a convolution layer is a similar mechanic. While fully connected nets of similar depth MIGHT theoretically be able to find patterns that convolutions couldn't, they would require an impossibly large amount of compute to do so, and also make regularization harder.
Anyway, this may be a bit hand-wavy from someone who is a novice at both quantum field theory and neural nets. I'm sure there are others out there that know both fields much better than me.
Btw, while writing this, I found the following link that seems to take the analogy between quantum field theory and CNN nets quite far (I haven't had time to read it)
I browsed the linked book/article above a bit, and it's a really close analogy to how physics is presented.
That includes how it uses Group Theory (especially Lie Algebra) to describe symmetries, and to use that to explain why convolutional networks work as well as they do for problems like vision.
The notation (down to what latin and greek letters are used) makes it obvious that this was taken directly from Quantum Mechanics.
Is this a trick question? OpenAI blatantly used copyrighted works for commercial purposes without paying the IP owners, it would only be fair to have them publish the resulting code/weights/whatever without expecting compensation. (I don't want to publish it myself, of course, just transform it and sell the result as a service!)
I know this won't happen, of course, I am moreso hoping for laws to be updated to avoid similar kerfuffles in the future, as well as massive fines to act as a deterrent, but I don't dare to hope too much.
I was envisioning a future where we've done away with the notion of data ownership. In such a world the idea that we would:
> have all of OpenAI's data for free
Doesn't really fit. Perhaps OpenAI might successfully prevent us from accessing it, but it wouldn't be "theirs" and we couldn't "have" it.
I'm not sure what kind of conversations we will be having instead, but I expect they'll be more productive than worrying about ownership of something you can't touch.
So in that world you envision someone could hack into openai, then publish the weights and code. The hacker could be prosecuted for breaking into their system, but everyone else could now use the weights and code legally.
I think that would depend on whether OpenAI was justified in retaining and restricting access to that data in the first place. If they weren't, then maybe they get fined and the hacker gets a part of that fine (to encourage whistleblowers). I'm not interested in a system where there are no laws about data, I just think that modeling them after property law is a mistake.
I haven't exactly drafted this alternative set of laws, but I expect it would look something like this:
If the data is derived from sources that were made available to the public with the consent of its referents (and subject to whatever other regulation), then walling it off would be illegal. On the other hand, other regulation regarding users' behavior world be illegal to share without the users consent and might even be illegal to retain without their consent.
If you want to profit from something derived from public data while keeping it private, perhaps that's ok but you have to register its existence and pay taxes on it as a data asset, much like we pay taxes on land. That way we can wield the tax code to encourage companies that operate in the clear. This category would probably resemble patent law quite a bit, except ownership doesn't come by default, you have to buy your property rights from the public (since by owning that thing, you're depriving the masses of access to it, and since the notion that it is a peg that fits in a property shaped hole is a fiction that requires some work on our part to maintain).
This is alleged, and it is very likely that claimants like New York Times accidentally prompt injected their own material to show the violation (not understanding how LLMs really work), and clouded in the hope of a big pay day rather than actual justice/fairness etc...
Anyways, the laws are mature enough for everyone to work this out in court. Maybe it comes out that they have a legitimate concern, but the way they presented their evidence so far in public has seriously been lacking.
Prompt injecting their own article would indeed be an incredible show of incompetence by the New York Times. I'm confident that they're not so dumb that they put their article in their prompt and were astonished when the reply could reproduce the prompt.
Rather, the actual culprit is almost certainly overfitting. The articles in question were pasted many times on different websites, showing up in the training data repeatedly. Enough of this leads to memorization.
They hired a third party to make the case, and we know nothing about that party except that they were lawyers. It is entirely possible, since this happened very early in the LLM game, that they didn’t realize how the tech worked, and fed it enough of their own article for the model to piece it back together. OpenAI talks about the challenge of overfitting, and how they work to avoid it.
The goal is to end up with a model capable of discovering all the knowledge on its own. not rely on what humans produced before. Human knowledge contains errors, I want the model to point out those errors and fix them.
the current state is a crutch at best to get over the current low capability of the models.
Or rather, I have an unending stream of callers with similar-sounding voices who all want to make chirpy persuasive arguments in favor of Mr Altman's interests.
These models literally need ALL data. The amount of work it would take just to account for all the copyrights, let alone negotiate and compensate the creators, would be infeasible.
I think it’s likely that the justice system will deem model training as fair use, provided that the models are not designed to exactly reproduce the training data as output.
I think you hit on an important point though: these models are a giant transfer of wealth from creators to consumers / users. Now anyone can acquire artist-grade art for any purpose, basically for free — that’s a huge boon for the consumer / user.
People all around the world are going to be enriched by these models. Anyone in the world will be able to have access to a tutor in their language who can teach them anything. Again, that is only possible because the models eat ALL the data.
Another important point: original artwork has been made almost completely obsolete by this technology. The deed is done, because even if you push it out 70 years, eventually all of the artwork that these models have been trained on will be public domain. So, 70 years from now (or whatever it is) the cat will be out of the bag AND free of copyright obligations, so 2-3 generations from now it will be impossible to make a living selling artwork. It’s done.
When something becomes obsolete, it’s a dead man walking. It will not survive, even if it may take a while for people to catch up. Like when the vacuum tube computer was invented, that was it for relay computers. Done. And when the transistor was invented, that was it for vacuum tube computers.
It’s just a matter of time before all of today’s data is public domain and the models just do what they do.
> The amount of work it would take just to account for all the copyrights, let alone negotiate and compensate the creators, would be infeasible.
Your argument is the same as Facebook saying “we can’t provide this service without invading your privacy” or another company saying “we can’t make this product without using cancerous materials”.
Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.
This is bigger than the greed of any group of people. This is a technological sea change that is going to displace and obsolesce certain kinds of work no matter where the money goes. Even if open models win where no single entity or group makes a large pile of money, STILL the follow-on effects from wide access to models trained on all public data will unfold.
People who try to prevent models from training on all available data will simply lose to people who don’t, and eventually the maximally-trained models will proliferate. There’s no stopping it.
Assume a world where models proliferate that are trained on all publicly-accessible data. Whatever those models can do for free, humans will have a hard time charging money for.
That’s the sea change. Whoever happens to make money through that sea change is a sub-plot of the sea change, not the cause of it.
If you want to make money in this new environment, you basically have to produce or do things that models cannot. That’s the sink or swim line.
If most people start drowning then governments will be forced to tax whoever isn’t drowning and implement UBI.
>Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money
It used to be that property rights extended all the way to the sky. This understanding was updated with the advent of the airplane. Would a world where airlines need to negotiate with every land-owner their planes fly above be better than ours? Would commercial flight even be possible in such a world? Also, who is greediest in this scenario, the airline hoping to make a profit, or the land-owners hoping to make a profit?
Your comment seems unfair to me. We can say the exact same thing for the artist / IP creator:
Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.
Once the IP is on the internet, you can't complain about a human or a machine learning from it. You made your IP available on the internet. Now, you can't stop humanity benefiting from it.
Talk about victim blaming. That’s not how intellectual property or copyright work. You’re conveniently ignoring all the paywalled and pirated content OpenAI trained on.
First, “Plaintiffs ACCUSE the generative AI company.” Let’s not assume OpenAI is guilty just yet. Second, assuming OpenAI didn’t access the books illegally, my point still remains. If you write a book, can you really complain about a human (or in my humble opinion, a machine) learning from it?
There's zero doubt that people will still create art. Almost no one will be paid to do it though (relative to our current situation where there are already far more unpaid artists than paid ones). We'll lose an immeasurable amount of amazing new art that "would have been" as a result, and in its place we'll get increasingly bland/derivative AI generated content.
Much of the art humans will create entirely for free in whatever spare time they can manage after their regular "for pay" work will be training data for future AI, but it will be extremely hard for humans to find as it will be drowned out by the endless stream of AI generated art that will also be the bulk of what AI finds and learns from.
AI will just be another tool that artists will use.
However the issue is that it will be much harder to make a career in the digital world from an artistic gift and personal style: one's style will not be unique for long as AI will quickly copy it and so make the original much less valuable.
AI will certainly be a tool that artists use, but non-artists will use it too so very few will ever have the need to pay an artist for their work. The only work artists are likely to get will be cleaning up AI output, and I doubt they'll find that to be very fulfilling or that it pays them well enough to make a living.
When it's harder to make a career in the digital world (where most of the art is), it's more likely that many artists will never get the opportunity to fully develop their artistic gifts and personal style at all.
If artists are lucky then maybe in a few generations with fewer new creative works being created, AI almost entirely training on AI generated art will mean that the output will only get more generic and simplistic over time. Perhaps some people will eventually pay humans again for art that's better quality and different.
The prevalence of these lines of thought make me wonder if we'd see a similar backlash against Star-Trek style food-replicators. "Free food machines are being be used by greedy corporations to put artisanal chefs out of business. We must outlaw the free food machines."
I'll gladly put money on music that a human has poured blood, sweat, tears and emotion into. Streaming has already killed profits from album sales so live gigs is where the money is at and I don't see how AI could replace that.
Lol, you really want content creators to aid AI in replacing them without any compensation? Would you also willingly train devs to do your job after you've been laid off, for free?
What nonsense. Just because doing the right thing is hard, or inconvenient doesn't mean you get to just ignore it. The only way I'd be ok with this is if literally the entire human population were equal shareholders. I suspect you wouldn't be ok with that little bit of communism.
There is no way on Earth that people playing by the existing rules of copyright law will be able to compete going forward.
You can bluster and scream and shout "Nonsense" all you want, but that's how it's going to be. Copyright is finished. When good models are illegal or unaffordable, only outlaws -- meaning hostile state-level actors with no allegiance to copyright law -- will have good models.
We might as well start thinking about how the new order is going to unfold, and how it can be shaped to improve all of our lives in the long run.
I think there’s no stopping this train. Whoever doesn’t train on all available data will simply not produce the models that people actually use, because there will be people out there who do train models on all available data. And as I said in another comment, after some number of decades all of the content that has been used to train current models will be in the public domain anyway. So it will only be a few generations before this whole discussion is moot and the models are out there that can do everything today’s models can, unencumbered by any copyright issues. Digital content creation has been made mostly obsolete by generative AI, except for where consumers actively seek out human-made content because that’s their taste, or if there’s something humans can produce that models cannot. It’s just a matter of time before this all unfolds. So yes, anyone publishing digital media on the internet is contributing to the eventual collapse of people earning money to produce content that models can produce. It’s done. Even if copyright delays it by some decades, eventually all of today’s medial will be public domain and THEN it will be done. There are 0 odds of any other outcome.
To your last point, I think the best case scenario is open source/weight models win so nobody owns them.
> We've designed society to give rewards to people who produce things of value
Is that really what copyright does though? I would be all for some arrangement to reward valuable contributions, but the way copyright goes about allocating that reward is by removing the right of everyone but the copyright holder to use information or share a cultural artifact. Making it illegal to, say, incorporate a bar you found inspiring into a song you make and share, or to tell and distribute stories about some characters that you connected with, is profoundly anti-human.
I'm shocked at how otherwise normally "progressive" folks or even so called "communists" will start to bend over for IP-laws the moment that they start to realize the implications of AI systems. Glad to know that accusations of the "gnulag" were unfounded I guess!
I now don't believe most "creative" types when they try to spout radical egalitarian ideologies. They don't mean it at all, and even my own family, who religiously watched radical techno-optimist shows like Star Trek, are now falling into the depths of ludditism and running into the arms of defending copyright trolls
If you're egalitarian, it makes sense to protest when copyright is abolished only for the rich corporations but not for actual people, don't you think? Part of the injustice here is that you can't get access to windows source code, or you can't use Disney characters, or copy most copyrighted material... But OpenAI and github and whatnot can just siphon all data with impunity. Double standard.
Copyright has been abolished for the little guy. I’m talking about AI safety doomers who think huggingface and Civit.AI are somehow not the ultimate good guys in the AI world.
This is a foul mischaracterization of several different viewpoints. Being opposed to a century-long copyright period for Mickey Mouse does not invalidate support for the concept of IP in general, and for the legal system continuing to respect the licensing terms of very lenient licenses such as CC-BY-SA.