
Open Problems in Robotics - haltingproblem
https://scottlocklin.wordpress.com/2020/07/29/open-problems-in-robotics/
======
Animats
\- Motion planning: already discussed.

\- Multiaxis singularities: much less of a problem than it used to be. We
don't need closed-form solutions any more; we have enough CPU power at the
robot to deal with this. You need some additional constraint, like "minimize
jerk" when you have too many degrees of freedom.

\- Simultaneous Location and Mapping. SLAM for short: Getting much better.
Things which explore and return a map are fairly common now. LIDAR helps. So
does having a heading gyro with a low drift rate. Available commercially on
vacuum cleaners.

\- Lost Robot Problem: hard, but in practical situations, markers of some
kind, visual or RF, help.

\- Object manipulation and haptic feedback: Look up the DARPA manipulation
challenge. Getting a key into a lock is still a hard problem. It's
embarrassing how bad this is. Part of the problem is that force-sensing wrists
are still far too expensive for no good reason. I once built one out of a 6DOF
mouse, which is just a spring-loaded thing with optical sensors. Something I
was fooling around with before TechShop went down. I have a little robot arm
with an end wrench and a force sensing wrist. The idea was to get it to put
the wrench around the bolt head by feel. Nice problem because a 6DOF force
sensor gives you all the info you can get while holding an end wrench.

\- Depth estimation: LIDAR helps. The second Kinect was a boon to robotics.
Low-cost 3D LIDAR units are still rare. Amusingly, depth from movement
progressed because of the desire to make 3D movies from 2D movies. (Are there
still 3D movies being made?)

\- Position estimation of moving objects: the military spends a lot of time on
this, but doesn't publish much. Behind the staring sensor of a modern all-
aspect air to air missile is - something that solves that problem.

\- Affordance discovery: Very little progress.

The real problem: solve any of these problems, make very little money. If
you're qualified to work on any of these problems, you can go to Google,
Apple, etc. and make a lot of money solving easier problems.

~~~
deepGem
"The real problem: solve any of these problems, make very little money" \-
Just curious why have you come to this conclusion ?

Object manipulation has potential products in dishwashing and vegetable
chopping - sufficiently large markets, potential billion $ outcomes for a
startup which takes the early mover lead. Two robotic hands that can work in
co-ordination just as human hands do. Extremely difficult to solve, but money
is there.

~~~
NalNezumi
I'm not the OP, but the real issue in robotics is two-fold and he mention
both.

1\. The cost & technical limitation. The cost, as some of the solution per
problem OP suggest include "Lidar, visions and sensor" and all of those thing
cost a lot by themselves. On top of that you need good actuators (Harmonic
drives etc) for precision, and then good computing unit to handle all of it.
Now you also need more power supply, and if it is mobile it needs a battery.
Now your robot have a price tag only Arabic princes and well funded academic
labs are interested in. And we haven't even touched the cost of try&fail
iterative engineering process to make these work. In that process most
companies/labs realizes the problem/condition have to be severely limited (run
time, indoor/outdoor, general applicability vs made for one & one application
only).

2\. Human resource. Really, there's not much money in robotics, as a robotics
engineer. Software robotics engineer can get a better working condition,
security, salary & fulfillment in a software company. Mechanical, EE,
embedded, etc is in a similar situation. Most robotics people I know are in
for the passion. Application specific development (that is the norm right now)
also require very niche knowledge that is hard to find.

~~~
bluGill
The points you make are all true until they are not. Many useful robots should
be able to work well enough for the purpose at a reasonable cost. Part of that
is mass production brings prices down.

The problem is of course until we can solve the problem of making it work at
all price isn't a consideration. You can go bankrupt advancing the state of
the art which then is enough for someone else to take your work and (repeat
the bankruptcy part many times until someone works it out), and finally
someone makes a ton of money with their useful robot.

~~~
jbay808
Servos are mass produced but still expensive!

You're right, but also software is weird. The world is absolutely littered
with good computers that other people have already paid for, just waiting to
be put to use for your project to expand their capabilities. If every new web
app required its own little pocket computer to run, that you had to convince
your customers to buy and keep in their home, it would be much less profitable
to develop web apps.

The world is not littered with idle, high-performance robots waiting around
for your motion planning algorithm to kick them into action. That makes it a
lot more expensive and a lot less profitable.

------
cpgxiii
In the area of Motion Planning (my own area of research), the most that can be
said is that _practical_ solutions exist for a tiny subset of cases,
_workable_ methods exist for a larger subset, _expensive_ methods exist for a
still larger subset, and everthing else might as well be impossible.

\- If you've got a low-dimensional problem, say 2D or 3D, without uncertainty
(or at least bounded enough to pad obstacles and ignore it), search-based
planners like A* and its derivatives _work_. Add uncertainty, complex non-
holonomic constraints, limited horizons, etc and it becomes much harder.

\- If you've got a higher dimensional problem, say a 6/7-DoF arm, even multi-
armed or humanoid robots, and you don't have uncertainty and dynamics (or can
ignore them), sampling-based planners like RRTs and PRMs and their derivatives
will often practically work. Actually useful guarantees of finding a solution
or trying to find an _optimal_ solution in useful time are still very much
unsolved.

\- If your problem is basically open, and something approximating the
"straight-line" from A to B is in the same local minima as a solution,
trajectory optimization will work for a lot of problems. Motion planning is
very much non-convex, though, so it's very easy to go from a problem solvable
with optimization to a problem that isn't.

\- Planning with non-rigid objects, significant uncertainty (effectively
continuous MDPs or POMDPs), and/or complex dynamics are all very unsolved
problems.

For motion planning problems in the gray area of "practically solvable", the
art is figuring out how to simplify the problem as much as possible to make it
tractable - highly optimized collision checking (generally speaking the
performance bottelneck), combining search/sampling-based + optimization
methods to get an initial solution from a global method and then refining it
towards a local minima with optimization, or using special hardware or sensors
to bound dynamics and uncertainty so it can be ignored.

~~~
alwahi
sorry for sidetracking your answer but what actually does convex mean in the
context of optimization. I remember looking at a book called convex
optimization.

Your statement that > Motion planning is very much non-convex,

suggests to me that you are very much talking about the same thing. I
understand convexity as in a shape. Why is convex good and concave bad in
terms of optimization?

I don't want you to dumb down the answer too much as I am a trained Mechanical
Engineer but then my major isn't math. Hope you understand :)

~~~
YeGoblynQueenne
Convex/ non-convex optimisation refers to the shape of the error function
we're trying to optimise. In convex optimisation we can assume it's, well,
convex:

    
    
      .               . ε
       \             /
        \           /
         \         /
          \       /
           '._ _.'
    
              ^
        Global optimum
    

In non-convex optimisation we can't make any assumption about the shape of the
error function:

    
    
          ,--.            .--.                               ε
         /    \          /    \              ,--.
        /      \        /      \            /    \        /
      .'        \      /        \          /      \      /
                 \    /          \        /        \    /
                  `--'            \      /          `--'
                   ^               \    /            ^
                   |                `--'             |
                   |                 ^               |
                   |                 |               |
                   |                 |               |
                   `--------- Local optima ----------'
    
    

"Optimisation" means that we're trying to find an optimum of a function - a
maximum or a minimum. We're usually interested in the minimum of a function,
particularly a function that represents the error of an approximator on a set
of training data. Generally we prefer to find a _global_ minimum of the error
function because then we can expect the resulting approximator to generalise
better to data that was not available during training.

If the error function has a convex shape we're basically guaranteed to find
its global minimum. In non-convex optimisation, we're guaranteed to get stuck
to local minima.

(Ok, the above is a bit tongue in cheek, there's no _guarantee_ of getting
stuck to local minima, but it's very likely).

~~~
dorgo
>Generally we prefer to find a _global_ minimum of the error function because
then we can expect the resulting approximator to generalise better to data
that was not available during training.

Sorry to nitpick, but is this true? We are doing optimization here and a
global minimum is just a better solution than a non-global minimum. Is there a
connection to generalisation here?

~~~
YeGoblynQueenne
I'ts like cpgxiii says. You're right to nitpick though, because there are no
certainties. We optimise on a set of data sampled from a distribution that is
probably not the real distribution, so there's some amount of sampling error.
Even if we find the global optimum of our sampled data, there's no reason why
it's going to be close to the global optimum of our testing data.

But - there are some guarantees. Under PAC-Learning assumptions we can place
an upper bound on the expected error as a function of the number of training
examples and the size of the hypothesis space (the set of possible models).
The maths is in a paper called Occam's Razor:
[https://www.sciencedirect.com/science/article/abs/pii/002001...](https://www.sciencedirect.com/science/article/abs/pii/0020019087901141)

Unfortunately, PAC-Learning presupposes that the sampling distribution is the
same as the real distribution, i.e. what I said above we can't know for sure.

In any case, I think most people would agree that a model that can reach the
global minimum of training error on a large dataset has better chance to reach
the global minimum of generalisation error (i.e. in the real world) than a
model that gets stuck in local minima on the training data. Modulo
assumptions.

------
contingencies
Robotics founder here.

Popular conceptions of "robots" are unrealistically general.

In industry, we do not build robots, we build automation systems.

Given the choice, would you prefer an automation system with some
environmental assumptions, high speed and perfect repeatability (ie. entire
industrial automation world), or no environmental assumptions, crushingly high
cost, slow speed and poor reliability (eg. walking robot that accesses your
fridge because cute or novel)?

It seems both the lay population and academia assumes the latter, but industry
demands the former.

This is perhaps why there are so many "open problems" \- because they're not
actually real world _problems_ : they're just academic _nice-to-haves_.

~~~
thdrdt
Thinking about car painting robots it's clearly about cost.

You can program a car painting robot in some hours. This will be way cheaper
than a robot that is aware of the car and knows how it should move to perform
the best paint job.

Another example is your washing machine. It would be nice to have a washing
robot that would sort your clothes and wash them. But it is way cheaper to
sort clothes yourself. So a cloth sorting problem might be an open problem but
is indeed a nice-to-have.

~~~
bluGill
A robot that is aware of the car is long term better because you don't have to
spend hours programming for each car and paint scheme. However the problem is
hard enough that right now we spend hours because it works and industrial
automation has enough advantages that it is worth doing anyway.

~~~
roland35
That robot may be more valuable in a company that makes custom items like toys
or signage. For car manufacturing, they are making hundreds of thousands of
cars a year of the same exact shape, so a few hours if setup isn't to bad!

~~~
bluGill
A few setup for cars adds up, which is why cars are a single color. If the
robot had some artistic sense it could put stripes on the car that artfully
flows with the lines of the car. No reason it can't be done today, but it
isn't because that is more hours.

I believe most car parts are paint everything and don't worry about over spray
which brings costs down. That only works if you have a single color.

------
helltone
One problem I encountered while working on robotics is that many commonly used
algorithms yield approximate solutions with no error bounds. They work 99.99‰
of the time. This is fine from a computer science or math point of view, but
very scary from an engineering perspective, specifically when there are humans
nearby. A big part of me struggles to accept the suitability of algorithms
coming from gaming engines or machine learning etc for real world heavy duty
robots. The lack of rigour in the field is astonishing.

~~~
fluffy87
This is the scariest part of using machine learning as an engineer on any
practical application as well.

Without an error bound, ML can’t be in charge of anything that could put human
lives at risk.

This is also why I don’t understand all the hype about FSD / L5 autonomous
driving. We don’t even know yet if such error bounds even exist, so we don’t
even know if machine learning is even the right tool for FSD yet. All
certification entities for control systems that put human lives at risk in
aviation, automotive, etc. require those error bounds. So it actually doesn’t
really matter if Tesla comes up with a “maybe L5” system, without right error
bounds, their cars won’t be certified as L5 and drivers will need to keep
hands on the steering wheel.

~~~
m0zg
> Without an error bound, ML can’t be in charge of anything that could put
> human lives at risk

Humans don't have "error bounds" either, and you trust them just fine.

~~~
b3kart
Two counterpoints here:

1) Humans _can_ estimate their own uncertainty. Ask a person to show how long
a meter is, and they'll give you an estimate. Then ask them to show you the
"error bounds", i.e. what they're "quite certain" the meter is longer than and
shorter than. You are likely to get sensible bounds. Now, humans aren't
_amazing_ at this, but the brain does have capacity for estimating how
uncertain it is.

2) No, you don't _really_ trust humans. This very fact that humans are often
imperfect in estimating their own uncertainty makes us very stupid sometimes.
How many times you were _sure_ you know something for a fact, for it to turn
out to be completely false. This is why society tries to not put too much
responsibility into the hands of a single person, or at least to provide help
and/or safety mechanisms if that is the case.

~~~
m0zg
Ask a programmer to estimate how long it'll take them to code something to see
if we really have error bars on complex functions. The margin will be so wide
as to be completely useless.

We do have error bars on simple measurements already. They're right there in
the data sheet for the sensor. What you're asking for are error bars on things
several levels distant in the layers of abstraction. Humans suck at that. We
only cope the same way machines do: through constant negative feedback.

~~~
snovv_crash
Programmers can give you good error bounds. Ask what the best and worst case
is for solving a problem, and 80% of the time they will be within those
bounds.

The problem is that management doesn't want to hear about the worst case,
because it looks bad for them politically.

------
HALtheWise
Everything listed in this article is framed as a software problem, but the
beauty of robotics is that many problems can fall to either software,
hardware, or electrical solutions.

Without further ado, a very incomplete list of hardware and electrical
innovations that would push robotics forward.

\- Cheaper and smaller low-backlash actuators. Motors and associated gearboxes
are big, heavy, and expensive, which is a big part of why our robots have
singularities in their designs, making the motion planning problem more
difficult.

\- Actuators with good force-speed curves. Muscles have both great torque at
zero speed and great speed at zero torque. (weight lifting and throwing a
baseball). Only hydraulics come close to matching both numbers, and they're
heavy, expensive, and tend to leak oil on the carpet.

\- Cheaper force/torque sensors. Most robots today don't even have torque
sensing on all their actuators, let alone the sort of dense 6dof-sensing full-
body surfaces that animal skin provides.

\- Across-the-board robustness improvements. Robots more mechanically
complicated than a quadcopter tend to break a lot. This makes approaches like
training ML models directly on hardware difficult.

\- Reliably low-latency wireless. There's been a lot of hype about cloud
robotics, and there's a lot of potential in offboard sensing, but we need a
cheap communication system that can deliver <30ms latency for those systems to
work reliably. WiFi is great until you pass through a metal doorway and the
signal degrades for 200ms.

\- Cheap and light lidars: true depth sensing with long range that works
outside means a lot of hard computer vision problems get easier, and means
your robot can be smaller since it needs fewer cameras

The thing I love about working in robotics is that we don't need to solve all
the software problems and all the hardware problems to make a system that
works well in the real world. We get to pick and choose which is easier, and
often the solutions in software space depend intimately on the kind of
substrate they need to run on.

------
mchusma
I work in robotics. The problem for pretty much everything is one of scale and
commercialization. Every one of these issues has been addressed, but almost
none have a good and bundled solution.

It feels like computers in the 70s. All the pieces are there but there aren't
mass produced PCs yet somewhat because there aren't the suppliers to make
things easier.

This is changing fast.

For example, SLAM has many things that work, but they require fine tuning, and
most contain undocumented features that require reading the source to find.
Slamcore is a company working on this, I hope there are more.

Teleportation was "possible but hard", but Freedom Robotics has a good
solution now that mostly "just works"

Robotic bases work well, but they will ship them to you without things plugged
in so you have to find the issue and fix it yourself. AWS Deepracer is clearly
a prototype for a solid wheeled base. The documentation and build quality is
an order of magnitude higher than anything else in the space. My guess is they
launch it as a useful base in the next year or so.

Depth estimation is pretty good with Intel Realsense now, and the new OpenCV
OAK is another attempt here. I think this is more solved and packaged than
anything else.

ROS2 was only properly released in June 2020, and it is a huge step up from
ROS1 for commercial applications. It probably needs another 1-2 years for the
community to finalize supporting it.

I think for founders finding known robotic solutions and making them into
robust commercial products is a great space to work in.

The next few years are very interesting in this space, as even 1 year ago
everything was much harder and it's rapidly getting easier.

~~~
MereInterest
> Teleportation was "possible but hard", but Freedom Robotics has a good
> solution now that mostly "just works"

For somebody not in robotics, what does "teleportation" mean in this context.
I assume that it doesn't mean the Star Trek style beaming that comes up from a
Google search.

~~~
rocketflumes
I think OP meant "teleoperation" \- controlling a robot from afar

------
stillsut
The pile of problems in Robotics reminds me of the challenges faced by
computer vision before modern methods were developed. To generalize the issue:
humans learn and perform vision and navigational tasks "below" the level of
language.

To use a vivid example, you can use language to teach a child how to hold a
pencil and write, how to recognize digits, and how to do two digit addition.
But there's some cognitive ability "below" the level language a child needs to
be capable of in order to take the hints from language and develop the desired
skills. In computer vision, we now have something like this in ConvLayers.

In today's robotics we see the hyper-systematization and mathematization of
natural concepts like getting an agent to recognize its own bearing. This is
what I refer to as "at the level of language", and I think as long as we try
to solve these individual navigation problems from mathematical first
principles we'll never arrive anywhere near the performance of animal
"instinct". We need a new framework that doesn't look like an enumeration of
commands within an imperative program.

~~~
rajansaini
That's a very interesting perspective. There have been many articles
complaining about the "hype" behind ML, but I too wonder if DNNs could assist
with controls, especially when it comes to reacting to sensor data. After all,
it's just matrix math.

------
krisoft
Ugh. This is making a hash of it.

What is true: The understanding of laymen about what robots can do is very
much out of touch with reality. Making real world robots is very hard. I think
it's movies which made people think that any 6 year old can just plop together
a C-3PO.

What is the hash then? (I omit the points where I lack experience.)

1; Motion planning

"Even things like a model of where the robot is, with respect to the
surroundings" That's not motion planing. Motion planing starts from when you
have a model of yourself and your surrounding. There are theoretical
challenges (imagine a flytrap, or maze) but the real world challenges in my
experience come from that under the inaccuracies and glitchyness of perception
you are expected to make okay-ish decisions.

3; SLAM

"there’s always going to be new obstacles (a pair of shoes, a book)"

SLAM is about localisation. It gives you a 6dof pose from some fixed
coordinate system. It does not deal with tripping hazards. That's obstacle
detection. It does indeed deal with "mapping" but only in as much as to get
you that pose estimate.

"not turn-key to where there would be a SLAM module you can buy for your
robot."

Sure you can. The "Intel Realsense t265" is for example one such a module.

The inside-out tracking of the Oculus Quest is an other (albeit one you will
have hard time buying for your robot.)

6; Depth estimation.

That one is odd. True, getting full fidelity depth maps out of monocular
images is a research problem. But if in reality you want to estimate the
distance to your beer bottle you will use stereo images, or a Kinect like
depth sensor. It is obviously a hard engineering challenge to make it four 9
robust, but not an unheard of challenge. If his definition of an "open
problem" is that there are people writing research papers about it then it
will remain an open problem for a while. But if he just wants to depth
estimate stuff, then there are already working good ways. Maybe cost
prohibitive, maybe not robust enough for his liking.

9; Scene understanding

Humanity made insane leaps and bounds on this one. Again there are engineering
tradeoffs. How much accuracy you get for how much watts. What kind of training
data you need, etc etc. Our systems are nowhere as good as a 5 year old human
child, but I think we can handle that "beer bottle obscured by ketchup bottle"
challenge if we try.

~~~
cpgxiii
_Depth estimation_

I wish it were as solved as vendors would like to believe it is. If you want a
sensor for medium-range applications, say order 0.5m to 10m depth with better
than 1cm accuracy, that works indoors and outdoors, with reflective and
untextured objects, doesn't interfere with other sensors of its kind, it
simply does not exist. Traditional active and passive stereo is OK provided
you have texture, ToF so long as the surfaces aren't reflective and you don't
have sunlight to worry about, structured light if you can control lighting and
can otherwise control/avoid interference from other sensors. There's some
promise in learned stereo matching, but collecting enough data and running
fast enough inference are big challenges to practical use.

Existing depth sensors for manipulation tasks, even indoors in reasonably
controlled lighting, are still mostly insufficient given that the objects they
struggle to see are often the objects you want to manipulate.

~~~
fxtentacle
I strongly disagree!

I have seen fantastic results with stereo cameras, colored lights, and self-
calibration.

For most use cases, it's no problem if your robot will stop for a few seconds,
rotate the camera axis around a bit, and then continue. But that appears to be
good enough to calibrate the features for tracking things like a reflective
and transparent glass jar.

As for the precision, I agree that 1cm at 10m distance doesn't work. But 1cm
precision at 50cm distance is doable. And for a robot arm, you mainly need the
precision when you're close to the object.

And yes, I am talking about what you probably meant with learned stereo
matching. I would call it close to solved because we can by now do
unsupervised training and achieve usable results. While I had trouble
reproducing this specific paper, the general idea is valid:
[https://github.com/google-research/google-
research/tree/mast...](https://github.com/google-research/google-
research/tree/master/depth_from_video_in_the_wild)
[https://arxiv.org/abs/1904.04998](https://arxiv.org/abs/1904.04998)

We are also seeing good results from using random YouTube videos to train AI
vision.

But given that you are so sure that this is unsolved, I wonder if I should
start a company to sell my depth estimate pipeline. Would you have any example
image pairs that are causing problems, so that I can see visually what fails?

~~~
cpgxiii
If you have a vision system that "just works" and produces high-quality
pointclouds in an actual kitchen environment, with reflective appliances,
silverware, shiny countertops, shiny ceramic dishware, and glasses, we would
absolutely use it.

We have in-house research work on both learned monocular depth (so-so for
robotics tasks) and learned stereo disparity (much more promising), so
progress here isn't impossible, but none of the off-the-shelf products really
_solve_ the problem.

~~~
fxtentacle
It "just works" for our use case. We put in 4K @ 60fps and receive 960x540 @
20fps of stereo correspondence pairs. So every matched pixel is averaged over
3 frames in time and 4 pixels in every direction in space, meaning 9x9
convolution kernels. In other words, we make the video super clean by area
sampling in space and time.

The specific part about our system that makes it usable for me is that for
pixels that cannot be matched with a predetermined quality, it'll return a gap
marker instead of guessing. For SfM, that means you can just skip those pixels
that are affected by reflections moving around. Cooking pots and plates tend
to have enough scratches, design, or shape markers to work OK. Wet white floor
will usually be flagged as "unknown" except for the grey gaps in between
tiles. As for glass, our system can return up to 2 flows per pixel, meaning
for a glass mug we get both the mug and a see-through estimate.

If you look at Sintel Clean+"s0-10", you'll find that there are some learned
matching algorithms that perform quite well under those conditions:
[http://sintel.is.tue.mpg.de/quant?metric_id=6&selected_pass=...](http://sintel.is.tue.mpg.de/quant?metric_id=6&selected_pass=1)

We're H-v3 (2nd place) and that 0.284 EPE for s0-10 (slow movements, small
disparities) is quite workable, because that means you have on average less
than 0.1 pixels of disparity error on the 960x540 depth maps.

As for monocular depth, I see that as mostly of a memorization task again. You
train a good stereo matching, then do unsupervised learning on stereo data to
get the monocular AI.

I'm at hajo.me and I currently work on stereo matching and depth mapping with
the goal of improving VR. Can you disclose which company or what in general
you're working on? Also, do you know any discussion groups where optical flow
/ stereo matching people usually hang out?

~~~
cpgxiii
Yes, in general if you can capture multiple views you can make up for many
dropouts and artifacts. The more you can move, the more likely that you'll get
a complete reconstruction. Single-view artifacts are more of a problem when
reaching into confined areas or during visual servoing where you don't
necessarily have the freedom to move around to get a better view.

The results for the Sintel benchmark do look interesting. Do you have a report
or some sort of overview of your approach? It would be nice to see a similar
benchmark on real recorded scenes, especially if that provided a way to
compare learned matching algorithms with available sensor hardware.

I think there's reasonable promise in learned stereo matching, especially if
we include more information than just visible spectra. Human eyes are so much
more than two RGB imagers, so we shouldn't limit our robots to that either.
Monocular depth, I agree, seems to be mostly a memorization problem. In cases
where you can effectively memorize everything, it will work quite well and the
savings in hardware complexity (and physical sensor size) will be well worth
the training complexity. I actually think it has much more promise as a backup
depth perception method on cars, since objects in a driving context are mostly
consistent in size. I have doubts about how useful it will be in manipulation
tasks where you may encounter similar objects at a range of sizes.

I'm part of a research group at TRI working on home manipulation tasks
although I have something of a hobby interest in outdoor robotics as well - in
general I'd say the depth sensors for outdoor tasks are often better, but much
more expensive and out of reach for hobbyist users.

~~~
fxtentacle
One trick that I have seen for confined environments is to rotate the plate
with both stereo cameras around its forward axis. That way, you can convert
the left-right stereo to up-down stereo and especially with highly reflective
stuff, there's a chance that that will be enough of a change to reduce
reflections.

------
NalNezumi
I love the ever prevalent pessimism/cautious optimism in the field of robotics
(industry and academics alike). It's a fresh breath of air from the ever over-
hyping ML/AI field.

On a related note, an open problem I see in practicality is also: How do you
manage an robotics company effectively? iRobot seems to be succeeding in this
well, and so does some industrial robotics arm companies but the latter is
more about industrial automation than the more general "robotics" company out
there.

Some companies that go very broad general solution seems to be struggling, the
application based robotics seems also fail more often than succeed.

There have emerged a lot of management methods and theories around software
development (Agile etc), but what's the efficient management method for
robotics?

Having been working at a few robotics company as a junior, It have always been
either: 1\. Someone with extensive research/engineering experience in a
_subfield_ of robotics in management, that can't manage the other subfields
2\. Some one with _too general_ knowledge and can't balance between each
robotics-subfields needs including production & reliability + cost.

and both seems to do pretty bad, while the second one slightly favourable.

------
sfvisser
Autonomous robots sound cool, but there are so many problems that could
benefit from robotic automation that don’t require full autonomy. In
controlled environments where we know the entire state of the (local) world
including the position of our robotic swarm we can deterministically plan a
motion and leave the world in an expected end state. Maybe use a bit CV to
make sure our assumptions are right, if not hit the break else continue.

Construction, mining, infrastructure, agriculture, manufacturing, logistics
all contain problems spaces that could use non- or semi-autonomous robotic
automation. Still a difficult problem, but how can we expect full autonomy
without controlled robotic environments first?

Yes we use robots in manufacturing and a few warehouses, but that’s it..?

~~~
bluGill
As an employee of John deere I can tell you that we are currently shipping
most of the automation you suggested. Humans are still in the tractor but they
often are not driving them.

------
lisper
> Guys like Rodney Brooks seemed to accept this and built various robots that
> would learn how to walk using primitive hardware and feedback oriented ideas
> rather than programmed ideas. There was even a name for this; “Nouvelle AI.”
> No idea what happened to those ideas; I suppose they were too hard to make
> progress on, though the early results were impressive looking.

The problem was that subsumption didn't scale, but the idea was incorporated
into the so-called three-layer architecture:

[http://www.flownet.com/ron/papers/tla.pdf](http://www.flownet.com/ron/papers/tla.pdf)

(I am the author of that paper. AMA.)

~~~
pilingual
This paper seems to pre-date behavior trees. Any comment on how TLA relates?
(I skimmed but will give a closer look when I have more time.)

I was curious what Brooks now thinks about subsumption and found this from a
few months ago:

 _The approach to controlling robots, the subsumption architecture that it
proposed led directly to the Roomba, a robot vacuum cleaner, which with over
30 million sold is the most produced robot ever._ ¹

Like many, I got a Roomba not long after the pandemic began. I was
disappointed by its poor sensorimotor system. Within a few days its IR cover
was scuffed up and it was covered in scratches from getting stuck under an
office chair. Brooks doesn't say that 30mm Roombas incorporate subsumption,
and no doubt after a decade or two of programmers I wonder about the nature of
the codebase. The Roomba's behavior is entirely unpredictable, as sometimes it
will bump into something full speed and sometimes it will slow down as it
approaches. There are a number of other issues too long to mention including
the charging contacts and recently it started roaming around with its charger
still attached for no apparent reason.

¹ [https://rodneybrooks.com/peer-review/](https://rodneybrooks.com/peer-
review/)

~~~
lisper
> This paper seems to pre-date behavior trees.

Indeed. By a good 20 years :-)

> Any comment on how TLA relates?

I've been out of the field for a long time so BTs are new to me. All I know
about them is from skimming the wikipedia article. But they look to me like a
more formal implementation of the TLA sequencing layer.

------
tgflynn
I think one of the problems is that we don't have the right people working on
the right problems.

Self-driving cars seem to have absorbed most of the world's high-level
robotics efforts for the past decade or more. I was always skeptical of that
application because my experience has shown that weak-AI works best when
there's a human backup and/or the stakes of an error are not too high. That
isn't the case with self-driving cars where a lethal mistake can occur in less
than a second.

I think we would be much better off today if much of the self-driving car
effort had been focused on household utility robots and/or business
applications. No it wouldn't be "saving lives", but it would be saving
countless life-hours spent on mind-numbing tasks that are a major reason for
why life is so unpleasant for so many people.

~~~
mdorazio
What household tasks do you think are well-suited to robotics that aren't
already addressed?

The reason so much focus has gone into autonomous vehicles is because there
are trillions of dollars to be made there by the companies that solve it, and
many obvious use cases will result in large profits even with imperfect
solutions (where "imperfect" means it works only in specific design domains
like the southwestern US during the day with no rain).

~~~
tgflynn
All of them - cleaning, cooking, laundry, etc. Everything that a human needs
to do to maintain reasonable living conditions and would hire someone else to
do if they were rich.

EDIT: I think that if you could come up with a really good solution for all of
that most people would be willing to pay about as much as they do for a car.

------
danbmil99
TL; DR: Robotics is hard. Nature is impressive.

(Source: 2 years working on a humanoid bipedal robot. Now, I am constantly
amazed that people can balance all that weight on two spindly little legs.
Running is a miracle)

------
monkeydust
Honestly, late yesterday evening after work I was looking at floor full of
toys that my little kids were playing with in yet another lockdown day
thinking I wish there was a robot that I could build or buy to tidy this up.

Did some research found this research project, promising but from 2018 and
looks like it didn't go anywhere.

[https://youtu.be/geub-Nuu-Vw](https://youtu.be/geub-Nuu-Vw)

So now I'm thinking what about the build option.

~~~
mvn9
Build it. There is a global market of parents who will buy it.

But on the other hand, why not accept the toys on the floor? You are fighting
entropy for no reason. You sleep at night, you will work tomorrow during the
day, and when you look again, the toys are in an equally dispersed state. Why
not let them stay in that state for days until you need to hover?

~~~
tuatoru
> Why not let them stay in that state for days until you need to ho[o]ver?

Risk of personal injury. (The Lego-on-the-stairs scenario.)

Also, some people just like a calm visual field at home.

~~~
mvn9
All very reasonable arguments. What I don't understand is cleaning up in the
evening. There is no visual field to perceive if you are asleep.

~~~
fxtentacle
It's to make sure that when you walk around half-asleep early in the morning
trying to get a diaper, that you don't step on a pointy lego brick.

~~~
mvn9
Do you walk heel or toe first [1]? I would assume that lego bricks only hurt
when moving heel first since humans had to deal with stony environments for
quite some time. Not that you should stop cleaning up but this could be
another technology to deal with the bricks.

[1]
[https://news.ycombinator.com/item?id=1086446](https://news.ycombinator.com/item?id=1086446)
Barefoot Running

~~~
fxtentacle
Heel first. Interesting idea :)

------
wazoox
What I notice from this thread (and many similar ones) is that actually AI may
replace sooner or later lawyers, some programmers, system administrators,
engineers of various ilks; but the sweeper, the toilet cleaner, the janitor,
the restaurant maid and the dishwasher, the delivery guy and the auto repair
man, all of the relatively "low skilled" jobs are absolutely safe.

------
carrolldunham
A bit shocked to realise how I'd completely forgotten about robotics whereas
it had seemed the grandest challenge. Is robotics itself slipping toward being
marginal as physical manipulation is becoming less important over time? The
sci-fi robots don't even walk these days, they're holograms etc. I'm not a
software engineer so it's not just that

------
tsimionescu
> I’ll point out that the humble housefly has no problem understanding the
> concept of “shit in front of you; avoid,”

Surely in the case of the house fly, it would be more like 'dinner is served'
than 'avoid'!

But in all seriousness, this is a good list to remind us how vastly far away
from human-level AI we are.

~~~
jayd16
Flys move reflexively don't they? They bump into walls and windows all the
time.

------
modeless
The objections to neural nets as the solution to _all_ the problems on his
list are the same as the old objections to neural nets as the solution to
computer vision, speech recognition, translation, playing Go, etc. The
objections will fall in the face of overwhelming evidence that neural nets
simply work better than other approaches to these types of problems.

For a long time software was the reason robots didn't work, but that time is
ending. Hardware will be the bottleneck soon if it isn't already. No general
purpose robot will ever be successful in the mass market using electric motors
and gearboxes at each joint. We need simpler, cheaper, lighter, more robust,
more reliable, backdrivable, force-sensing actuators.

~~~
iamgopal
Any research/potential example of such actuators ?

~~~
cpgxiii
I would say some of the most promising work right now is in quasi-direct-drive
actuators, basically low-gear-ratio brushless motors. Standalone electric
motors are basically linear in terms of electrical power->torque, and even
with a gearbox, provided the gear ratio is low enough, you can assume near-
linear behavior. Brushless motors tend to be easier to design around specific
size and shape constraints, and often run slower than brushed.

The "secret" to a lot of this actuator work is that outside of highly
repetitive industrial tasks, actuators are rarely running at full power so you
don't need to design them (and their cooling) for continuous max power
operation. Somewhat like human muscles, you can pick smaller motors and/or
lower gear ratios that cover the normal cases, and still be OK with rare
overloads so long as the overload isn't long enough to cause overheating.

------
paulpauper
>Multiaxis singularities -this one blew my mind. Imagine you have a robot arm
bolted to the ground. You want to teach the stupid thing to paint a car or
something. There are actual singularities possible in the equations of motion;
and it is more or less an underconstrained problem. I guess there are
workarounds for this at this point, but they all have different tradeoffs.
It’s as open a problem as motion planning on a macro scale.

This is a math problem applied to robotics, not a unique problem to robotics.
If there are workarounds, then isn't the problem solved?> Factories have
robotic arms and they seem to do an adequate job making cars and other stuff
in spite of singularities.

~~~
jononor
The workarounds are designed to solve particular cases. A robot arm in
manufacturing is stationary, has a controlled environment, and often a fixed
task (or at least task type). The task is know in advance, and humans are
involved in developing the solution for each particular task. In an open
environment one cannot rely on those things. When the task is not known up-
front. Or the new tasks come too quickly for a human to be involved in solving
it. Or with influence from other actors, some possibly uncooperative or
adversarial.

A lot of 6 axis robots actually work in 3D+3D. That is, they position their
arm/tooling first into a work pose, then perform their actually work in a 3D
space referenced from that point/orientation. Then poses are chosen such that
the space it works in is singularity free. And moving between poses there are
then explicit solutions (chosen by human) for dealing with singularities.

~~~
jnxx
And that's just equivalent to a human lifting a chair without hurting himself.

------
blauditore
> 6\. Depth estimation

At least indoors, this was already handled pretty well 10 years ago by Kinect,
and there are many somewhat robust approaches based on binocular vision. Not
sure if this qualifies as "very much an open problem".

> 9\. Scene understanding

The mentioned example is not the best one - anticipation of collisions has
been a feature in cars for many years now (usually some warning system,
sometimes with automatic emergency breaking). But it's a very constrained
problem; more generic scene understanding is of course still very difficult.

------
heyitsguay
I know that there's a lot of successful work in specific controlled
environments (company X's factory floor), and that a lot of environment
understanding/SLAM is broadly unsolved in arbitrary uncontrolled environments,
but what about specific uncontrolled environments? What if i want a beer
serving robot to learn a consistent, high-accuracy model of just my house, the
way it is, and I'm willing to put in some technical work? Can i do any better
than completely hand crafting a 3D object model or similar?

~~~
cpgxiii
If you have high-quality 3D sensors (or in some cases, good-enough 2D cameras)
and usable IMU/odometry, you can assemble quite reasonable 3D models for many
small environments. The biggest challenges in a home environment are that many
items we want in our houses (stainless steel appliances, chairs with thin
legs, glass anything, reflective floor materials) are almost pathologically
bad from a computer vision standpoint and are very hard to see and avoid in an
uncontrolled household setting. Vision-based navigation against a known
environment is doable with reasonable reliability, but adding fixed markers or
positioning systems goes a long way towards robustness.

------
nojvek
What really surprises me is Apple is now 2 trillion dollars with another
gajillion sitting in cash. Why are they being an optimization company and not
really pushing the edge on robotics and automation? I was excited about Apple
car but it seems that project is ded.

Why not invest that money in moonshot ideas ? It seems Elon musk is the crazy
one with Tesla, Solar City and SpaceX.

It blows my mind how their massive rockets reach orbit and land back. Why
haven’t we been making breakthroughs like this in robotics?

~~~
cpgxiii
As difficult as it is, control for launching and landing a rocket is vastly
easier computationally than even apparently simple robotics problems.

Robotics is simply hard, from algorithmic, computational, and engineering
perspectives. Only some parts of it can be solved by throwing money and raw
compute power at it. Even in areas where progress has been made, you generally
need experts to decompose the problem at hand into something that can be
tackled with existing techniques,and those experts are expensive and in
relative demand.

------
syntaxing
I was really into FIRST robotics when I in HS and would love to make something
at home. I thought about a VEX kit but didn't really want an erector set like.
Does anyone have a recommendation for something like Misty Robotics [1]. I can
only think of the Mindstorm but wanted to do some visual AI with it.

[1] [https://www.mistyrobotics.com/](https://www.mistyrobotics.com/)

------
r34
The only solution to mentioned problems is IMO: we'll have to adapt our
environment to the moving machines. At least in early stage, before they will
do it autonomously. Language will adapt more by itself (to make machines pass
Turing test we need change in language, just as we need change in machines -
people will adapt to useful machines)

------
palad1n
Is there a similar report for open problems in AI or ML?

~~~
tuatoru
I don't know about any report, but an analogous problem (to "get me a beer")
might be something like: -

"Hey Siri, write up minutes of the meeting we just had and email them to
everybody".

There are quite a few open problems in that, including analogous navigation of
an ill-defined environment which nevertheless has regularities.

------
sudosysgen
Well, at least two of these problems have been solved already and just need to
be implemented.

For example, my cheap mirrorless cameras solves two of the problems on the
list. It's phase detection sensors can both do depth estimation and position
estimation quite rapidly in order to make the tracking autofocus systems work
(autofocusing is really calculating the distance between object and lens).

You would really just have to do a bit of integration, it's really a solved
problem already.

~~~
HALtheWise
The parent comment was clearly written by someone with no experience in
robotics. In fact, every single problem on this list is "solved" in the sense
that there are examples of systems that perform it in some limited context.
Generalizing those subsystems to handle the difficult edge cases and
integrating those subsystems together is 99% of what makes robotics as a whole
difficult, and it would be similarly deceptive to say that AI is "solved"
because Microsoft Clippy exists.

~~~
sudosysgen
I do have experience in robotics. There exists robust solutions to depth
estimation and position estimation that are deployed in the real world, in
difficult applications where they have been generalized to work with any
object you can draw a box around. There is a difference between a problem that
is solved in theory and a problem that is solved enough so that you can buy
polished products that implement a solution and work essentially without
failure. I don't think you can compare optical phase detection in the context
of position estimation and depth detection to clippy in the context of AI.
Phase detection is quite literally a closed form optical solution to the
problem of "how far is this object away from me" as long as the object is
within a few thousand times the physical aperture of the lens. It's mature
enough that you can use it to drive a motor in response to movements or "that
bird", "that teapot", "the closest object in that clump of pixels", "the
farthest objects in that clump of pixels", as well as calculate the velocity
of the aforementioned object in three dimensions. For all intents and
purposes, if you have a problem of the order "what is the distance of that
object" as well as "how is that distance changing over time", then you can
solve it, and indeed it has been solved to very high reliability, using phase
detection. That's what it means for a problem to be solved.

In other words, I wouldn't call an implementation where you can click anywhere
on an image and receive a distance, all of the time, with almost any lens
imaginable in any environment where there are enough photons hitting the
sensor "performed in a limited context".

The technology simply hasn't been used in mainstream robotics, mainly because
it's patented and difficult to implement from scratch, but these are all
implementation problems not fundamental problems.

~~~
cpgxiii
Structure from focus is an established technique that can produce quite good
results, but it's not quite as easy as the existence of good autofocus in
cameras would suggest. SfF techniques often operate the the inverse of an
autofocus system - rather than focus on a specific point to figure out depth
at that point, you run through a full focus cycle and then for each area of
the image, you find the focus setting that made that area in focus.

The challenge there is that running through a full focus cycle and capturing
frames as you go is quite time-consuming and introduces serious artifacts if
anything in view is moving. Phase-detection autofocus is so fast in part
because it's only using a sparse set of autofocus points, rather than the full
sensor resolution.

The other major problems with SfF are the same the photographers encounter all
the time - the depth value is only as accurate as the depth of field will
allow, and shallow depth of field requires wide apertures (and is generally
easier on larger sensors, as well). Large lenses and sensors may work well for
stationary scanning applications, but struggle to be useful on robots.

~~~
sudosysgen
A lot of these problems have been fixed by recent advancements. For example,
splitting a few thousand pixels into two and using that as a phase detection
sensor provides universal coverage.

As far as large lenses and sensors not working well on sensors, I think you'd
be surprised just how well large lenses designed for the sensors they belong
on can work. Indeed, by far the reason why photography lenses are huge is
because of lens IS and zoom. Making small, light, fast normal lenses with
larger stabilized sensors works perfectly fine and can be made very light.

After all, humans have two huge sensors (slightly bigger than full-frame) with
f/3 lenses, and it works perfectly fine.

Now, to what I think is missing from the state of the art in robotics:

>Phase-detection autofocus is so fast in part because it's only using a sparse
set of autofocus points, rather than the full sensor resolution.

This was true a decade ago, but now if you look at Canon sensors or some Sony
sensors not subject to the patent, every single pixel is a phase detection
point. Indeed, the pixels are cut in 2 or 4 photodiodes each only receiving
light from one half or one quarter of the aperture.

This means that every single pixel can detect phase information, all 45
million of them.

Herein lies the major difference between Structure from Focus and phase
detection : in a phase detection system, it is not needed at all to run a
focus cycle. Instead, two waveforms are generated, one corresponding to one
section of the aperture and one corresponding to another section of the
aperture.

The two waveforms "match" when the incident rays correspond to the same point,
that is to say, when focus is achieved. However, it really isn't necessary to
actually achieve focus - focus simply offsets the phase of the two waveforms.

Therefore, by simply computing the phase difference of the two waveforms, one
can instantly know, given knowledge of the lens characteristics, the distance
of the subject, without having to achieve a focus cycle (!)

Indeed, phase detection actually works very similarly to parallax, in that in
actuality you can use it to construct two split, offset images.

Of course, you can also use two stereo cameras, but then you have the issue of
having to motorize the cameras to achieve convergence, without which stereo
overlap is minimimal, wherehas per-pixel phase detection provides complete
overlap and is much more precise.

If you want to see this in practice, the RAW files of a Canon EOS R5 actually
encode distance information for _every single pixel_.

Also,

>the depth value is only as accurate as the depth of field

Yes. The depth value is precise, in a modern camera, to about one 5000th of
the diameter of the aperture, which is, for a 50mm f1.8 normal lens, able to
compute depth accurately for any subject within 50-60 meters. Which is better
than LiDAR of the the same size, easily. And you can do so for a small
fraction of the cost by simply upgrading hardware that is already necessary.

It is true that modern cameras actually do move the focus and re-calculate
depth, which might give the impression that if you wanted to calculate depth
you would actually need to move focus. But in reality, this is done in order
to correct for the small misalignment in different lenses, as well as for the
fact autofocus motors do not have accurate encoders and will very frequently
miss steps. But a modern camera already knows how much the image plane needs
to actually shift before even engaging the autofocus motor at all.

In essence, SfF in the State of the Art is an inversion of the state of
autofocus 20 years ago, using contrast detection as the autofocus method.
However, modern autofocus has progressed so much in this time-frame that it
has solved almost all of the issues of SfF.

~~~
cpgxiii
_As far as large lenses and sensors not working well on sensors, I think you
'd be surprised just how well large lenses designed for the sensors they
belong on can work. Indeed, by far the reason why photography lenses are huge
is because of lens IS and zoom. Making small, light, fast normal lenses with
larger stabilized sensors works perfectly fine and can be made very light._

The current cameras and lenses on the market work very well, that's not the
problem. The problem is that even "small" cameras and lenses are huge in
comparison to what can be integrated into a robot. I use a couple M4/3
cinema/studio cameras for vision projects because they're the smallest
affordable cameras with controllable interchangeable lenses, and even with
their "small" size they are at the upper limit of anything I could hope to fit
in a practical robot.

 _After all, humans have two huge sensors (slightly bigger than full-frame)
with f /3 lenses, and it works perfectly fine._

The human eye is vastly more complex and capable than a fixed plane imager and
capable of much more complex focus behavior than any existing lenses are.
Camera-eye comparisons are appealing, but the focus behavior of them is really
very different.

 _However, modern autofocus has progressed so much in this time-frame that it
has solved almost all of the issues of SfF_

Canon has done a very nice job with their autofocus, but I will believe they
have solved depth perception with phase detection when they apply it to their
industrial cameras. Canon makes very nice RV-series parts recognition cameras
that do a very good job with challenging materials, but they use structured
light. In theory, phase-detection SfF would be perfect for these cameras but
they haven't done it.

 _It is true that modern cameras actually do move the focus and re-calculate
depth, which might give the impression that if you wanted to calculate depth
you would actually need to move focus. But in reality, this is done in order
to correct for the small misalignment in different lenses, as well as for the
fact autofocus motors do not have accurate encoders and will very frequently
miss steps. But a modern camera already knows how much the image plane needs
to actually shift before even engaging the autofocus motor at all._

The minor problem is that you generally want depth _plus_ RGB, which is
obviously solved by adding a second camera but then you face basically the
same size and overlap issues that stereo cameras bring. Not entirely a free
lunch.

The other problem here is political, namely that SfF via phase-detection is
(at least for now) entirely in the hands of Canon and Sony, two companies
(Sony in particular) with track records related to robotics and machine vision
users that can best be described as ranging from "somewhat uninterested" to
"useless" to "self-sabotaging". Unless one of them commits to making an actual
product and ships it, we will remain in a world where phase-detection SfF is
technically possible but entirely out of reach.

~~~
sudosysgen
> _The current cameras and lenses on the market work very well, that 's not
> the problem. The problem is that even "small" cameras and lenses are huge in
> comparison to what can be integrated into a robot. I use a couple M4/3
> cinema/studio cameras for vision projects because they're the smallest
> affordable cameras with controllable interchangeable lenses, and even with
> their "small" size they are at the upper limit of anything I could hope to
> fit in a practical robot._

But this is largely because modern lenses are seriously bloated, with IS for
older cameras that have non-stabilized sensors, bloated focusing mechanisms to
make floating elements work, bloated zooming mechanisms that make everything
so much worse, and so on.

If we want to design a lens for robotics, we really don't need much more than
a double-gauss normal lens with a unitary focusing mechanism.

One of the lenses I use when I need to kludge something in a project is
exactly that, a 1970s Minolta 55 f/1.7 lens for a full-frame image circle.
Despite having a large 32mm aperture, it has a diameter of 56mm and a length
of 44mm. I don't know if that's too large for your robotics applications, but
it isn't very big and it's much smaller than the LiDAR sensors I've worked
with. You could likely make this even smaller with modern techniques. It would
also cost about 40$ to build.

> _The human eye is vastly more complex and capable than a fixed plane imager
> and capable of much more complex focus behavior than any existing lenses
> are. Camera-eye comparisons are appealing, but the focus behavior of them is
> really very different._

My point was moreso that in pure optical sizes, the human eye is bigger and
takes up more volume than the optical system I'm proposing. But as far as
focusing behaviours, if you were to mathematically model how the human eye
gets depth information to drive the focus cycle its quite similar to a phase
detection system, with the phase information being extracted through the
parallax.

> _Canon has done a very nice job with their autofocus, but I will believe
> they have solved depth perception with phase detection when they apply it to
> their industrial cameras. Canon makes very nice RV-series parts recognition
> cameras that do a very good job with challenging materials, but they use
> structured light. In theory, phase-detection SfF would be perfect for these
> cameras but they haven 't done it._

I think this can be explained quite simply. In a fully controlled environment,
using structured light but cheap lenses and small sensors is much more cost-
effective. Why would they use expensive APS-C+ sized sensors, that they don't
seem to be all that good at making, when you can get away with very cheap
parts?

> _The minor problem is that you generally want depth plus RGB, which is
> obviously solved by adding a second camera but then you face basically the
> same size and overlap issues that stereo cameras bring. Not entirely a free
> lunch._

But this is precisely the point of OSPDAF or split-pixel phase detection. You
get both depth and RGB at the same time, because the RGB sensor doubles as a
phase detection sensor. It's not like old cameras where you had to use a
mirror that sent light to the dedicated phase detection sensor.

As for political problems, I agree. Canon won't ever sell you their DPAF
sensors. Sony, even though they are really bad at robotics, should in theory
be willing to sell anyone their sensors, and it should be possible to make a
prototype. Their tech in this would be limited relatively to Canon, though.

But, to the point, a solution to a problem that's proprietary is still a
solution to the problem.

~~~
cpgxiii
There are a number of quite good modern primes that are tiny with big
apertures, especially for smaller sensor sizes. Even then, though, lens plus
camera is almost always much bigger than the current crop of depth cameras
(K4A, Realsense D400, L500, Lucid Helios). Even those cameras is still too big
for some applications, that's where PMD's tiny little cameras have a niche.

There's always going to be some room for larger, better sensors, but a lot of
applications are in need of _more_ sensors to get better coverage, and size is
definitely one of the tightest limitation on where they can be applied.

 _I think this can be explained quite simply. In a fully controlled
environment, using structured light but cheap lenses and small sensors is much
more cost-effective. Why would they use expensive APS-C+ sized sensors, that
they don 't seem to be all that good at making, when you can get away with
very cheap parts?_

I mean sure, greed is a plenty good explanation of Canon's behavior. The RV-
series are pretty big and plenty expensive, so I don't think using better
optics would be an issue. Indeed, if single-frame PDAF depth worked out, it
would considerably speed up the parts recognition cycle time, which is
actually somewhat slow due to the multiple patterns required by the structured
light system. The projector and structured light processing work isn't free,
either, so it seems like depth they got for free would be better in every way.

Dual Pixel AF II, at least in marketing speak for OSPDAF, has been around for
longer than the RV-series, so I can't help but have doubts about the actual
practical applicability of PDAF SfF.

 _But this is precisely the point of OSPDAF or split-pixel phase detection.
You get both depth and RGB at the same time, because the RGB sensor doubles as
a phase detection sensor._

Yes, that's true, but the problem of making a good RGBD camera is that you
generally want a deeper depth of field in RGB so as much of the image is in
focus as possible, while maximizing PDAF SfF depth quality requires as shallow
a depth of field as possible. It's not impossible to solve this, but the
fundamental optical features you want out of the depth part are the opposite
of the features you need from the RGB part. You could address this by changing
aperture between each depth/RGB capture, although you'd definitely have to use
an electromagnetic aperture to avoid lifetime issues. You'd still end up with
a slower framerate taking successive depth/RGB frames, but that's doable for
many applications.

------
fxtentacle
Motion planning is also an Open Problem for Humans!

Let's say we are standing on a high hill and I point to another hill and say:
"Walk over there". Do you expect any human to find a reasonably good path by
themselves? I would personally try to use a map. How do military robots solve
this? They use satellite images.

And in general, this article seems very pessimistic to me. My home-built
computer vision pipeline can do localization and mapping, loop detection,
object segmentation and depth estimation at levels that are "good enough" for
indoor drone flight. So I would assume that someone with a generous serving of
financial resources would be able to solve most of those problems, except for
2 issues:

1\. You need to memorize how the environment works. That's why newborn kids
take stupid decisions, they lack the stochastic priors. Lucky for us,
memorizing is AIs strong point.

2\. You need to have mechanics that are more forgiving. If I accidentally
position my hand the wrong way, I might accidentally squeeze someone a bit,
but I won't crush their bones, because the mechanics of my arm are flexible.
We'll need better accentuators and elastic casings.

And just for the sake of discussion, here's my replies for each problem
category.

Simultaneous Location and Mapping: There are libraries that work well enough
for your robot to localize itself in a building-sized environment with just a
single camera. I'd consider this solved.
[https://github.com/raulmur/ORB_SLAM2](https://github.com/raulmur/ORB_SLAM2)

As for the obstacles, also for humans it is mostly guessing if you want to
step on that blanket or if there'll be something fragile or slippery inside.

Lost Robot Problem: Most SLAM solutions are good enough that you could just
regenerate the map from scratch every time that there was a gap in your
perception. ORB-SLAM2 also has a loop and merge detection module so that if
you reset its tracking and then it walks into a known environment, it can
merge the old data into its new state.

Depth estimation: It works well enough in practice.
[https://www.stereolabs.com/zed-2/](https://www.stereolabs.com/zed-2/)

Scene understanding: I don't know about you, but when I drive the highway, I
sometimes have dead flies on my windshield. Apparently, they aren't that
clever after all.

Position estimation: It works exceptionally well for VR markers. In general,
those solutions tend to be called "Visual Odometry"
[https://www.youtube.com/watch?v=fh5dLF3dmr0](https://www.youtube.com/watch?v=fh5dLF3dmr0)

Affordance discovery: This is mostly a memorization problem, so a perfect
candidate for AI.

~~~
rimliu
Alas, your problem #1 is not really about memorization, it is about
understanding. Take a human to the house they have never been before and tell
them: "make me some tea". Now try that with any robot you want.

It is you being optimistic, not the article being optimistic.

~~~
fxtentacle
Give a human kid a piece of paper and an envelope and say "make it fit" and
roughly 1 out of 5 will fail because they have not yet had the opportunity to
watch their parents fold a letter.

For your example, I see an upfront memorization component, which is that your
request only works with humans that have previously seen how tea is made. That
would be an unsuperwised AI which watches a youtube tutorial and then reduces
the task to "get hot water + get tea bag + get container + combine"

Please note how by cultural memorization is again implicitly added. I might
use a trash can as the container, but due to our shared culture we'll agree
that a mug works best. So this gets reduced with more unsupervised AI to
resolve "container" to a list of tolerable objects.

Next comes an exploration phase where human and robot just randomly open
cupboards to see what is inside. YOLO should be good enough to recognize the
water cooker, the tea bags, and the mug.

Next up comes again memorization. Kids cannot reliably turn on a machine that
they have not seen before, so intelligence is probably of little use. Instead,
they learn by imitating. An AI would probably again crawl random YouTube
videos, check that the cooker looks similarly, then try to imitate that.

I hope I have illustrated that a lot of what we think of as understanding is
not much more than repeating a similar situation which we have previously
experienced.

That would also be my theory as to why meditating and thinking about an action
can actually improve our skill at doing it. We're memorizing a fantasy
simulation.

------
altindag
Classic line from this piece

 _This was before the marketing departments of google and other frauds made
objective thought about this impossible._

------
hoseja
There are some really interesting, if maybe ranty and not-politically-correct
comments on that blog.

------
visarga
> We all know emergent systems are super important in all manner of phenomena,
> but we have no mathematics or models to deal with them. So we end up with
> useless horse shit like GPT-3.

Says the author who confesses his lack of domain knowledge in the intro.

So much entitlement. Why didn't he invent something better?

------
mrfusion
I’m excited to see gpt-3 or similar technology applied to these problems.

A robot can now know lots of common sense. Beer is kept in the fridge. The
fridge is in the kitchen. Cans should not be shaken, etc, etc. so if something
like gpt-3 can help with a high level plan, and regular computer vision can
handle the low level stuff like obstacle avoidance, fridge and beer
identification, etc

We could have something really interesting.

(Get in touch if you want to work on something like that. I think it would be
a blast!)

Edit. What makes me think gpt-3 has some good common sense. I saw someone ask
it “can I do a bench press with a cat?” And it said “no, the cat will bite
you.” It’s kind of like we’ve achieved what that common sense database project
was going for.

