That belief is wrong. Today's robots can't be made useful in everyday life no matter how advanced the software. The hardware is too inflexible, too unreliable, too fragile, too rigid, too heavy, too dangerous, too expensive, too slow.
In the past the software and hardware were equally bad, but today machine learning is advancing like crazy, while the hardware is improving at a snail's pace in comparison. Solving robotics is now a hardware problem, not a software problem. When the hardware is ready, the software will be comparatively easy to develop. Without the right hardware, you can't develop the appropriate software.
OpenAI is right to ignore robotics for now. It's a job for companies with a hardware focus, for at least the next decade.
"When the hardware is ready, the software will be comparatively easy to develop." I take it you've never written any software for a robot? The long tail of the real world takes years and years to handle. Probably the most advanced robotics company, at the cutting edge of the ML+Robotics, is Covariant and their entire business model rests on an understanding that the long tail can and should be handled by humans.
I agree that OpenAI is right to cut out the hardware, but all your reasoning about why is wrong.
The reason, which they state, is that data collection on physical devices is slow and modification to those devices is slow and maintenance on those devices is expensive. You want to simulate everything, not because it reproduces the real world in high fidelity, that doesn't matter, but because it gives you approximations with sufficient variety and complexity that you can continually challenge your AI, and you can do all that at 1M fps.
Not for everyday tasks with anywhere near the efficiency, reliability, speed, and cost that humans have without robots. You can't waldo any robot to do the laundry or the cooking in a normal home anywhere near as well as a human can do it. (I'd love to see you try!)
Sure, you can make a robot that can do work humans can't. You can make a robot stronger, or more precise, or better suited to repetitive motion than a human. Those attributes are useful in specialized tasks. But generally not for the everyday tasks humans do today that we want robots to help us with. For everyday tasks you need a robot that is comparable in speed, efficiency, weight, reliability, durability, flexibility, sensor capability, and cost to a human. Not one of those areas, but all of them simultaneously. That's the hard part.
But the "Hardware lags behind" only makes sense to Sci-fi like expectations of robot agility but the software isn't even remotely close to embody that hardware. Even In the real world robotics applications TODAY this statement falls flat by one simple demonstration:
Use existing arm + teleoperation and conduct X amount of tasks (could be a mobile robot too, or a car for that matter). Now find a software that have same versatility in task execution as the human.
Most softwares for simple robotics manipulation tasks lose out to human operating it directly, bar efficiency maybe, in an static controlled environment even using the same control and perception system. Yet human controlling these arms directly show that the hardware is capable enough to conduct those tasks.
The "hardware lags behind" statement is if anything just a convenient excuse from the software / automation developers in Robotics, (also being one of them myself) shifting the blame to others, or have a sense of false highground.
The need of Lidar on early self driving cars was the same motivation; somehow softwares couldn't just use camera but needed an additional 6th sense, that humans don't even need, and still performed quite bad.
That doesn't mean OpenAI robotics leaving wasn't a good idea. It seems like it was but for other reasons.
Self-driving cars have been unable to succeed based on their lack of a broad understanding of what's happening on the road (ie, "too many corner cases"). Self-driving cars would be a huge change and their failure is very significant.
Waldo Farthingwaite-Jones was born a weakling, unable even to lift his head up to drink or to hold a spoon. Far from destroying him, this channeled his intellect, and his family's money, into the development of the device patented as "Waldo F. Jones' Synchronous Reduplicating Pantograph". Wearing a glove and harness, Waldo could control a much more powerful mechanical hand simply by moving his hand and fingers.
The world will be ready for robotics revolution - creating immense demand for robotics software - when you'll be able to get a decent arm for the price of a fridge, not the price of a fancy car; just as we got the computer revolution not when we developed capable computers, but only after we developed affordable capable computers.
But the expense of such is a product of the lack of demand for them. Cars (again) are produced with engines now having astoundingly good accuracy.
Of course, an accurate arm is going to always be expensive than a simple arm but cars, chips, phones and whatnot show that with large scale processes and heavy capital investment, accuracy and cheapness are compatible. But today, with software incapable of doing useful things with those arms, the people and institutions with the capital to make accurate robot arms cheap, through economies of scale, are not going to mobilize that capital.
just as we got the computer revolution not when we developed capable computers, but only after we developed affordable capable computers
Some technological advances happen through a feedback loop of commodities getting sold and producers improving those commodities (cars are an example here). Other technologies require a leap where a significant clump of capital has to be devoted to creating an advanced device for which there's no sellable (and sometimes no operable) device. (the biggest example of technological leap was the Manhattan project). It might be the case that things will happen that way with robots. But I'd also say it's an open question.
Humans are excellent to handle the long-tail when they're already handling the rest. Take driving. We're already seeing cars with large cognitive assistance, taking more and more an active role in 'easy' tasks. Think Tesla's autopilot. You're supposed to be there and 'take over' in case the 'machine' fails to handle the 'long tail' or decides to give you the responsibility of whatever happens next (because you trained it to do so).
Driving is a very complex task, you need training, experience, anticipation and (very important) context. There's no easy way to scramble all the details necessary for a decision in a human brain in the time to take the decision 'correctly'. Similar problem for industrial automation where you call the 'long tail' person once in a while and that person probably doesn't have the expertise of reconstructing the context, after 3 turnover phases in your provider.
I think we're taking this problem the wrong way, and aiming for the lower fruits, and higher and higher, while handwaving the long tail and sending it over the fence to the human. We should be putting the human at the center of this, and extend their capabilities, reduce the repetitiveness, help, not take over.
The paper I like a lot on this is 'automation should be like Iron Man, not like Ultron'.
The earlier group often says since Edge cases that they can't automate now constitute only <5% of training scenarios they encounter, they've automated 95% of the job. But with what you're saying, we can't really expect the calculus to work that way.
The average human spends most of their time barely engaged, our brains and bodies are operating far below what we're capable of, the romanticised sci-fi vision of a world filled with intelligent robots performing every menial task for humans builds on the idea that humans have better things to do, but do we? We already have enough knowledge and resources to end world hunger, to bring a high standard of living to every human, but we choose not to: our problem is social, not software or hardware.
As an aside, I'd dispute the claim that hardware is lagging behind software: Tesla has lots of money and lots of smart people and they haven't been able to deliver self driving cars after more than a decade of promises (because of software).
You're absolutely wrong. Anyone with basic electronics knowledge and a few hundred bucks can build a passable robot body out of hobby grade servos and 3D printed parts. If you're willing to spend $10k+ you can make something quite capable.
Programming it to then actually do anything, let alone anything useful in the real world, is still out of reach for all but a tiny fraction of companies.
Hardware still has a long way to go before it's as capable as biological systems but it's usable. Real world AI is far from that in most areas.
And that will not even be taking into account the time-to-maintenance of such a system.
On the other hand, Boston Dynamics' manifold, where they do the control of the dozens of hydraulical parts, is an absolute marvel of technology that shows what you can achieve with 45 (?) years of dedicated focus.
You might be able to teleoperate their robot for something useful in a human environment, and I guess that would be a gamechanger. But even there I want to wait-and-see if they can escape the fate of many that came before them.
If it worked that way, my job would be much easier.
Later you will have a Spot robot chasing a person and getting it to stop, surrendering at the machine without being threatened by it, but just by recognizing that there's no longer a point in running away.
Also, regular ML researchers sit at tables with laptops. Robotics people need electronics labs and electronics technicians, machine shops and machinists, test tracks and test track staff...
If you have to build stuff, and you're not in a place that builds stuff on a regular basis, it takes way too long to get stuff built.
I wonder why they don't invest in establishing the competency for robotics. The potential return might seem enormous, though their choices might signify that they don't agree.
Or maybe they just aren't willing to leave their comfort zone. 'Software will eat the world' is a convenient idea for people who want to stay in that comfort zone.
Reinforcement learning can work quite well if you produce the hardware, so that your simulation model perfectly matches the real-world deployment system. On the other hand, training purely on virtual data has never really worked for us because the real world is always messier/dirtier than even your most realistic CGI simulations. And nobody wants an AI that cannot deal with everyday stuff like fog, water, shiny floors, rain, and dust.
In my opinion, most recent AI breakthroughs have come from restating the problem in a way that you can brute-force it with ever-increasing compute power and ever-larger data sets. "end to end trainable" is the magic keyword here. That means the keys to the future are in better data set creation. And the cheapest way to collect lots of data about how the world works is to send a robot and let it play, just like how kids learn.
Given that, unless they want to commercialise fruit picking or warehouse robots, it seems sensible.
One of the reasons ML-based AI is pretty dumb still is possibly that this autonomous exploration side of AI is largely ignored.
It all seems to tie back into what Judea Pearl talks about in his "book of Why" (how you can't model intelligence without modelling learning of causal inference) or what Jeff Hawkins explores with his "reference frames of reference frames of the world" 1000 brains theory.
How successful do you think attempts to monetize this will be? Apart from Kiva at Amazon, I'm not even sure most shelf-moving robots are profitable enterprises (GreyOrange, Berkshire Grey, etcetera). I'm very skeptical of more general purpose warehouse robots such as you see from Covariance, Fetch, etcetera. I don't really know too much about fruit-picking other than grokking how hard it would be and how little it would pay.
To be clear, I'm not saying these companies make no money or have no customers. But it's not clear to me that any of them are profitable or likely will be soon, and robots are very expensive. I'm happy to learn why I'm wrong and these companies/technologies are further ahead than I realize.
It seems madisonmay didn't read the article either, or they would have known that the podcast they were referring to was the exact source used by the article.
I think OpenAI has progressively narrowed down its core competency - for a company like 3M it would be something like "applying coatings to substrates", and for OpenAI it's more like "applying transformers to different domains".
It seems like most of their high-impact stuff is basically a big transformer: GPT-x, copilot, image gpt, DALL-E, CLIP, jukebox, musenet
their RL and gan/diffusion stuff bucks the trend, but I'm sure we'll see transformers show up in those domains as well.
Not to mention a bunch of relatively inexpensive reinforcement learning research relying on consumer knockoffs of Spot from Boston Dynamics...
Really does seem like they are following the money and while there's nothing wrong with that it's also nothing like their original mission.
The VC community is in denial about how much Go resembled a problem purpose built to be solved by deep neural networks.
The achievements of RL are so dramatically oversold that it can probably be called the new snake oil.
I would say that it is likely, intuitively, that these systems were trained through things that look much like RL in the millions of years of evolution. But that process is obviously not getting repeated in each individual organism, who is born largely pre-trained.
And for any doubt, the poverty of the stimulus argument should put it to rest, especially when looking at simpler organisms than vertebrates, which can go from egg to functional sensing, moving, eating, predator avoiding in a matter of minutes or hours.
None. The prevalent ideas in ML are a) "training" a model via supervised learning b) optimizing model parameters via function minimization/backpropagation/delta rule.
There is no evidence for trial & error iterative optimization in natural cognition. If you'd try to map it to cognition research the closest thing would be behaviorist theories by B.F. Skinner from 1930s. These theories of 'reward and punishment' as a primary mechanism of learning have been long discredited in cognitive psychology. It's a black-box, backwards looking view disregarding the complexity of the problem (the most thorough and influential critique of this approach was by Chomsky back in the 50s)
The ANN model that goes back to Mcculloch & Pitts paper is based on neurophysiological evidence available in 1943. The ML community largely ignores fundamental neuroscience findings discovered since (for a good overview see https://www.amazon.com/Brain-Computations-Edmund-T-Rolls/dp/... )
I don't know if it has to do with arrogance or ignorance (or both) but the way "AI" is currently developed is by inventing arbitrary model contraptions with complete disregard for constraints and inner workings of living intelligent systems, basically throwing things at the wall until something sticks, instead of learning from nature, like say physics. Saying "but we don't know much about the brain" is just being lazy.
The best description of biological constraints from computer science perspective is in Leslie Valiant work on "neuroidal model" and his book "circuits of the mind" (He is also the author of PAC learning theory influential in ML theorist circles) https://web.stanford.edu/class/cs379c/archive/2012/suggested... , https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...
If you're really interested in intelligence I'd suggest starting with representation of time and space in the hippocampus via place cells, grid cells
and time cells, which form sort of a coordinate system for navigation, in both real and abstract/conceptual spaces. This likely will have the same importance for actual AI as Cartesian coordinate system in other hard sciences. See https://www.biorxiv.org/content/10.1101/2021.02.25.432776v1
Also see research on temporal synchronization via "phase precession", as a hint on how lower level computational primitives work in the brain https://www.sciencedirect.com/science/article/abs/pii/S00928...
And generally look into memory research in cogsci and neuro, learning & memory are highly intertwined in natural cognition and you can't really talk about learning before understanding lower level memory organization, formation and representational "data structures". Here are a few good memory labs to seed your firehose
There is a niche of people trying to combine cognitive mapping with RL, or indeed arguing that old RL methods are actually implemented in the brain. But it looks like they don't much benefit to show in applications for it. They seem to have no shortage of labor or collaborators at their disposal to attempt and test models. It certainly must be immensely simpler than rat experiments.
Having said that, yes I do believe that progress can come considering how nature accomplish the solution and what major components we are still missing. But common-sense-driven tacking them on there has certainly been tried.
The missing component is memory. Once models have memory at runtime — once we get rid of the training/inference separation - they’ll be much more useful.
I genuinely believe how we as a society act once human labour is replaced is first aspect of the great filter.
But so many of the little problems have been solved. Batteries are much better. Radio data links are totally solved. Cameras are small and cheap. 3-phase brushless motors are small and somewhat. Power electronics for 3-phase brushless motors is cheap. 3D printing for making parts is cheap.
I used to work on this stuff in the 1990s. All those things were problems back then. Way too much time spent on low-level mechanics.
You can now get a good legged dog-type robot for US$12K, and a good robot arm for US$4K. This is progress.
I'd just note that "decades away" means "an unforeseeable number of true advances away" - which could mean ten years or could mean centuries.
And private companies can't throw money indefinitely at problems others have been trying to solve and failing at. They can it once and a while but that's it.
We have been at this since at least the dawn of the industrial revolution and do not have it right yet.
Backing off and taking it slow now to let some cultural adjustments happen is a responsible step.
My cultural norms are repulsed by the thought of me not working as much as possible, it is how I expect my value to society to be gauged (and rewarded).
This line of reasoning will be (is) obsolete and we need another in its place globally.
I hope some may have better ideas of what these new cultural norms should look like than I with my too much traditional indoctrination.
I only know what I will not have it look like;
humanity as vassals of non corporeal entities or elites.
That hasn't stopped the march of progress so far. Conveniently (or not), humanoid robots do not appear likely for the foreseeable future. But keep worrying, the problem you list are appearing in other fashions anyway.
The ability to train huge models does not belong to a single entity and many of these models get shared with everyone. So you can right now type "import transformers" and have thousands of trained models at your fingertips. All these toys are ours (thanks to important work done for free by some of us) we just need imagination to use them.
Humans ARE genral bipedal robots. The price of these robots is determined by the minimum wage.
Robotics research is going to be extremely binary. It's expensive and frustrating, and there's little use for it until it works as well as human labor, which is a high bar.
But, once that Rubicon is crossed, I believe there will be a sort of singularity in that space. It's related to but somewhat orthogonal to the singularity that's prognosticated for g a i.
No need for bipeds, car factories employ dumb robot arms, no humans needed. Not very general purpose robots though.
The first country/company to create robots that can be instructed similar to a humans to do any job will indeed have great benefits, but how long until that happens? Not within any amount of time that an investor wants to see. I'm unsure if I will ever see that in my life (counting on ~60 years to go still maybe?)
If suddenly robot manipulators could grasp any object, operate any knob/switch, tie knots, manipulate cloth, with the same manipulator, on first sight, that would be quite a feat.
But then there's still task planning which is a very different topic. And ... and .... So much still to develop for generally useful robots.
Just getting it to navigate itself using vision would mean building a complex system with a lot of pieces (beyond the most basic demo anyway). You need separate neural nets doing all kinds of different tasks and you need a massive training system for it all. You can see how much work Tesla has had to do to get a robot to safely drive on public roads. 
From where I am sitting now, I think we are making good inroads on something like an "Imagenet moment" for robots. (Well, I should note that I am a robotics engineer but I mostly work on driver level software and hardware, not AI. Though I follow the research from the outside.)
It seems like a combination of transformers plus scale plus cross domain reasoning like CLIP  could begin to build a system that could mimic humans. I guess as good as transformers are we still haven't solved how to get them to learn for themselves, and that's probably a hard requirement for really being useful in the real world. Good work in RL happening there though.
Gosh, yeah, this is gonna take decades lol. Maybe we will have a spark that unites all this in one efficient system. Improving transformer efficiency and achieving big jumps in scale are a combo that will probably get interesting stuff solved. All the groundwork is a real slog.
RL, which I think this particular story is about, is an odd-duck. I have papers on this and I personally have mixed feelings. I am a very applications/solutions-oriented researcher and I am a bit skeptical about how pragmatic the state of the field is (e.g. reward function specification). The argument made by the OpenAI founder on RL not being amenable to taking advantage of large datasets is a pretty valid point.
Finally, you raise interesting points on running multiple complex DNNs. Have you tried hooking things to ROS and using that as a scaffolding (I'm not a robotics guy .. just dabble in that as a hobby so curious what the solutions are). Google has something called MediaPipe, which is intriguing but maybe not what you need. I've seen some NVIDIA frameworks but they basically do pub-sub in a sub-optimal way. Curious what your thoughts are on what makes existing solutions insufficient (I feel they are too!)
Yes unless the industry sees value in a step change in the scale on offer to regular devs, progress on massive nets will be slow.
Hooking things together is pretty much my job. I have used ROS extensively in the past but now I just hook things together using python.
But I consider what Tesla is doing to be pretty promising, and they are layering neural nets together where the output of three special purpose networks feed in to one big one etc. They call that a hydra net. No framework like ROS is required because each net was trained in situ with the other nets on the output of those nets, so I believe all compute logic is handled within the neural network processor (at some point they integrate standard logic too but a lot happens before that). Definitely watch some Karpathy talks on that.
And currently I am simply not skilled enough to compose multiple networks like that. So I could use multiple standalone networks, process them separately, and link them together using IPC of some kind, but it would be very slow compared to what's possible. That's why I say we're "not there yet". Something like Tesla's system available as an open source project would be a boon, but the method is still very labor intensive compared to a self-learning system. It does have the advantage of being modular and testable though.
I probably will hand compose a few networks (using IPC) eventually. I mean right now I am working on two networks - an RL trained trail following network trained in simulation on segmentation-like data (perhaps using Dreamer V2), and a semantic segmentation net that is trained on my hand labeled dataset with "trail/not-trail" segmentation. So far my segmentation net works okay. And a first step will actually be to hand-write an algorithm to go from segmentation data to steering. My simulation stuff is almost working. I built up a training environment using Godot video game engine and hacked the shared memory neural net training add on to accept image data, but when I run the sim in training on DreamerV2, something in the shared memory interface crashes and I have not resolved it. 
But all of this is a hobby and I have a huge work project  I am managing myself that is important to me, so the self driving off road stuff has been on pause. But I don't stress about it too much because the longer I wait, the better my options get on the neural network side. Currently my off road rover is getting some mechanical repairs, but I do want to bring it back up soon.
Thx for the pointers on Tesla. Had not seen the Hydranet stuff. There was a Karpathy talk about 2 weeks back at a CVPR workshop .. he revealed the scale of Tesla's current generation deep learning cluster . It is insane! Despite being in industrial research, I don't foresee ever being able to touch a cluster like that.
A lot of our current research involves end-to-end training (some complex stuff with transformers and other networks stitched together). There was a CVPR tutorial on autonomous driving , where they pretty much said autonomy 2.0 is all about end-to-end. I've spoken to a few people who actually do commercial autonomy, and they seemed more skeptical on whether end2end is the answer in the near-term.
One idea we toy with is to use existing frozen architectures (OpenAI releases some and so do other big players) and do a small bit of fine-tuning.
EDIT: Imagine the "credit unions" I mention in the following linked comment, but holding homes and manufacturing space to be used by members.
I don't work for openAI but I would guess they are going to keep working on RL (e.g hide and seek, gym, DoTA style Research) to push the algorithmic SoTA. But translating that into a physical robot interacting with the physical world is extremely difficult and a ways away.
With the mentioning that they can shift their focus to domains with extensive data that they can build models of action with etc... Why not try the following (If easily possible)
Take all the objects on the various 3D warehouses (thingiverse, and all the other 3d modeling repos out there) -- and have a system whereby an OpenAI 'Robotics' platform can virtually learn to manipulate and control a 3D model (solidworks/blender/whatever) and learn how to operate it.
It would be amazing to have an AI robotics platform where you feed it various 3D files of real/planned/designed machines, and have it understand the actual constituancy of the components involved, then learn its degrees of motion limits, or servo inputs etc... and then learn to drive the device.
Then, give it various other machines which share component types, built into any multitude of devices - and have it eval the model for familiar gears, worm-screws, servos, motors, etc... and have it figure out how to output the controller code to run an actual physically built out device.
Let it go through thousands of 3D models of things and build a library of common code that can be used to run those components when found in any design....
Then you couple that code with Copilot and allow for people to have a codebase for controlling such based on what OpenAI has already learned....
As Copilot is already built using a partnership with OpenAI...
Are there any Open* organizations for robotics that could perhaps fill the void here? I think robotics is really important and I think the software is a big deal also, but it's important that actual physical trials of these AIs are pursued. I would think that seeing something in real space like that offers an unparalleled insight for expert observers.
I remember the first time I ever orchestrated a DB failover routine, my boss took me into the server room when it was scheduled on the testing cluster. Hearing all the machines spin up and the hard drives start humming, that was a powerful and visceral moment for me and really crystallized what seemed like importance about my job.
I think even if you have intuitions about an approach, and have promising results, if you're trying to arrive at something new, it's really hard to know how far away you are.