"you want to stack 6 blocks on one another? great, let me collect 1,000 examples of doing that in VR, and I'll train my policy on this and see how that works"
instead, we change the question:
"you want to stack 6 blocks on one another? great, that's one possible thing out of thousands you might want to do. so lets create a dataset of 1,000 examples of tuples: one 'query' demonstration, and a second demonstration as the target behavior to train the network on, when it sees the query. The training data is now 1,000 tuples of (query_demo, target_demo)), trained again with supervised learning."
Once this is trained, we can sub in (in theory) any arbitrary desired demonstration, and the network will learn how to "extract" what is intended, and uses the demonstration as a crutch that is being imitated. It's a bit of a change of mindset, but a very powerful one, much more general one, and much more exciting one.
If you want to contact someone, check their public profile on their website and see if they've said there's a preferred way (some people want everything to a certain email address, call them directly, never call them, flat out tell you not to contact them or more commonly just say email them and get to the point). Follow whatever they suggest.
Write something simple and clear, and be upfront about what you're asking for. Make it as easy as possible for the person to help you (this applies to both reading and answering the question). I'm far more likely to reply if I can open an email, type a sentence or two, and then move on.
With your message, I don't know if you're just after datasets, help with a particular problem, a mentor, business partner or what. I also don't know what area of audio/sound classification so if I was actually in that area then I'd not know right now if I could help or not (whereas if you'd said human voices, bird chirps, etc. I'd have a better idea).
Essentially, assume most people are pleasant and helpful but also extremely busy.
To resolve this limitation we need either to create deep networks that do not need big data sets, which does not seem possible right now. Or we try to factorize the problem differently: for example, we separate learning how the control of the robot (e.g. how to move something for point a to b) and learning how to abstractly solve the problem (in this case, learning that you need to stack the blocks in a certain order).
"Herbert: grab my car keys."
"Herbert: set the dinner table."
"Herbert: put the mail in the mailbox."
Obviously, cost will be a big deal, but I think we'll get there.
Some videos from that competition:
Consequences will likely be as profound and unpredictable as the internet turned out to be.
Herbert: oh, touch me there <3
I'm using one algorithm? That's a rather controversial claim. And it's advanced? How so?
You can describe any collection of algorithms as a single, more complicated, one. And it seems reasonable to say that humans are quite advanced given we can't yet replicate some of their reasoning capabilities.
In order to get there, you need a robot that knows a lot of stuff.
I can see the case for familiarity, especially for robotic pets and attendants for children, but the human body and its limbs are the result of millions of years of evolution, in conditions that don't exist or apply anymore.
It had to adapt to being born naked and having to grow up, possibly on its own, and feeding itself and fending for itself...
I mean, consider wheels vs. legs. You don't see the former anywhere in nature, but it's the most efficient mode of locomotion in human society.
Why aren't we exploring more efficient robotic bodies that would also be easier to program?
Say, a rotating trunk-like body with multiple octopus-like tentacles with suckers instead of fingers. Just an idea. Or a robot that is actually a swarm of ant-like machines.
For example wheels, they're easier to do that legs (by far) but gets stuck on cables, carpets and stairs.
We're still in the research phase, so polish isn't important (unless perhaps if giving a demo to an outside investor who isn't tech-savvy).
Doing that requires far more than simply being able to pick things up and put things down and walk around. To do that requires nearly human level AI that can "understand".
The post says that the imitation network takes the input from the vision network and processes it to infer the intent of the task. Isn't the "intent" always "to stack"? Or can the imitation also be just re-arranging blocks in another configuration?
This part is interesting, if I understood it well..
> "But how does the imitation network know how to generalize?
The network learns this from the distribution of training examples. It is trained on dozens of different tasks with thousands of demonstrations for each task. Each training example is a pair of demonstrations that perform the same task. The network is given the entirety of the first demonstration and a single observation from the second demonstration. We then use supervised learning to predict what action the demonstrator took at that observation. In order to predict the action effectively, the robot must learn how to infer the relevant portion of the task from the first demonstration."
Does this mean that the imitation network has been trained on stacking, un-stacking, throwing...and other such tasks, and then it identifies that "stacking" is what is being done in order to imitate it?
Is there an ELI5 for what the 2 NNs are actually learning?
You can then add tons of randomization to the simulation to make sure it doesn't overfit to the particulars of the simulated data. Like random filters on the input, moving the camera around and vibrating it, making cars and pedestrians behave unrealistically erratically, etc, or having sensors fail. If it can learn to handle these extreme situations in the simulators, hopefully it would be generalize even to rare scenarios that occur in real driving.
Humans interact in the VR env (with oculus/vive/etc) to train / configure the robots and assembly line.
Once complete, the factory line is built IRL.
The next step up in cheap arms is to build around Bioloid/Dynamixel style servos, which will increase the budget significantly, but you will end up with useable but coarse precision. Meaningful payloads will still cause servos to overheat but at least those brands will shut down rather than smoke -- usually.
If you want to do serious research you will need a serious budget. Arms are hard. This is not to say you can't have a lot of fun and learn a lot with something less. A friend build a smart task lamp around Bioloids -- the goals was a lamp that would autonomously aim a work light where he was soldering by following the soldering iron tip using CV and a cheap web cam. This is totally within the payload and precision limits of hobby servos, and the software can run on a RasPi.
Now - granted, that's a lot of money for a small robot arm (actually, it isn't - price out a TeachMover or Rhino arm), but take a look at it - it's not just a robot arm.
Probably it's best additional use is as a 3D printer - so if you have given thought to buying one of those, too - well, here you can have both.
You can also purchase one on Amazon, if you prefer that method. It was also reviewed in March in Servo Magazine:
Now granted, this arm isn't a kit - but usually, if you want a kit, you'll sacrifice precision, and if you want precision, you'll get it (usually) pre-built.
Note that the lower the mass on the end of the arm, the more precise it will be and move (and be able to lift more). If the servos or motors are mostly near the base of the arm, that's going to be the best placement; the popular "pallet stacking" style parallel arm kits you see sold out there typically have this arrangement, with only a servo on the end for the gripper (one of these actually isn't bad, if you pair it up with quality metal-gear dual-ball bearing servos).
Another option would be to 3D print a robot arm; there are plenty of examples and files out there for that, if you already have a 3D printer or access to one.
You might look around on Ebay or similar for an older "vintage" desktop robot arm from the 1980s (and old TeachMover or similar arm is ideal), and try to repair or refurbish it (there are a few sellers that do this as well, and their work can be stellar - but you'll pay the price for it, too). For instance, I got an old Rhino XR-1 arm from Ebay for a couple hundred dollars - it's controlled using a simple serial port (RS-232) protocol. Most of those early arms work the same way.
Another option might be to try to replicate one of those early robot arms, or build your own arm from parts - pringles cans can make a great starting point!
Finally - and only do this if you are really serious - you can find on Ebay used industrial robot arms for sale; look for one that is "lab grade", as they'll usually be smaller in size and cleaner (most of the arms on there - while a great buy considering what they cost new - are so big you'll need a truck and pallet jack to move them, plus 440v to run 'em - not to mention all the hydraulic mess). The downside will probably be on the interfacing end; you might have to do something custom there (ie - rip out the old controllers and hack your own in place).
It's not easy to get a million training examples, if you can learn something useful from just one example, that's great.
If it can learn from 1, _you_ save a lot of effort, and the required expertise is lower.
Actually, for me the most interesting thing is that training on synthetic images transfers so well to real camera frames. That means that I might actually be able to do some cutting edge stuff without a huge curated dataset.
Can you out stack SHRDLU?
To take a Steve Jobs'ism, once tech advances sufficiently you have to introduce qualities that may have nothing to do with tech, to make things more pleasing to us humans.
Programming robots to stack blocks directly is pretty easy.
Didn't Deep Blue do the same thing as part of its chess training in the 1990s? Social/observational/modeling learning has been studied in psychological research for over 50 years: https://en.wikipedia.org/wiki/Social_learning_theory Isn't basically every recommender system, all of the vision systems, and the machine translation we all use... isn't all of that just observational learning?
I'm not saying it's not interesting. It's a cool toy. But Elon Musk gets an idea for a neat machine learning toy, using an idea that predates AI as a field, and Terry Winograd is supposed to be impressed by this?
The point is, AI has only been narrowly successful. Narrow success is rad. I'm not hating. But none of the promises of broad intelligence have really progressed. Siri is not a particularly meaningful advance beyond SHRDLU. Instead of just stacking blocks, she stacks text on Messages.app and pushes buttons in Weather.app.
More generally, it uses a deep neural network. Which is very different than the older approaches you mention, and can learn much more sophisticated functions. And has enabled a lot of results that would have been unimaginable previously.
As for the early AI researchers? They were insanely overly optimistic about how difficult AI would be. It didn't seem like a hard problem. They thought they could just assign a bunch of graduate students to solve machine vision over the summer. It seems pretty simple. We can see great without even thinking about it, how hard can it be?
But after biting their teeth into it a bit, I'm sure they would appreciate our modern methods. They might not be as elegant as they hoped. They wanted to find "top down" solutions to AI. Simple algorithms based on symbolic manipulation and logic and reasoning. Such an algorithm probably doesn't exist, and an enormous amount of the history of AI was wasted searching for it.
And even if they did discover our modern methods much earlier, they wouldn't have been able to use them. It's only recently that computers have gotten fast enough to do anything interesting. It's like they were trying to go to the moon with nothing but hand tools. Sometimes you just need to wait for the tech tree to unlock the prerequisite branches first.
Obviously, it's not entirely equivalent to what's described in the article, but the point still stands.
We have had hundreds of thousands of software engineers working in advertising companies or making hardware that gets used to sell products and keep the attention of the youth perpetually captured, hindering their growth as humans. I say we've regressed in a big way and they would think the same.
Check out Alan Kay's talks from this week if you still feel good about today's software industry.
(I'm trying to watch all of his recent talks. Despite [or because of?] being critical of the industry, they are some of the most inspiring videos I've seen in years.)
Of course they imagined abstracts just like we do.
Yeah I saw it great videos.
Since both cooperated with DARPA at a high level they were aware of defense-related efforts which your typical AI grunt wouldn't necessarily see.
Also both are extraordinarily imaginative and bright, so it is more likely they could show you things they've done that you would not imagine.
That doesn't change that I think there is a bunch of things they didn't imagined and I think they would both agree. (Minsky unfortunately can't anymore)
And sure they can probably see things I can't it's not a competition.
In this case, the arm still has to be shown the specific task, but is able to generalize across different block locations, textures, lighting etc.
Make me a peanut butter and jelly sandwich.