
Controlling a 2D Robotic Arm with Deep Reinforcement Learning - formalsystem
https://blog.floydhub.com/robotic-arm-control-deep-reinforcement-learning/
======
narenst
Nice! As you go from 2D to 3D what are some unique issues you are expecting to
run into?

~~~
formalsystem
What's nice about the move is that the reinforcement learning algorithm
doesn't fundamentally change. The action and state spaces will be larger since
a joint has more degrees of freedom in 3D than it does in 2D so learning may
take longer so we'll need to also increase the size of the replay buffer and
the episode length.

I'm planning on writing a followup post on 3D so stay tuned!

------
kejaed
If the author reads this we’ve used VPython for simple simulation
visualizations such as this and it has worked great both inside and outside of
Jupyter notebooks.

[http://www.glowscript.org/docs/VPythonDocs/index.html](http://www.glowscript.org/docs/VPythonDocs/index.html)

~~~
formalsystem
I'll check it out thank you for the note!

------
giocampa
This is so inspiring! Have you ever thought of implementing such a thing in
ROS?

~~~
formalsystem
Happy to collaborate on it if you like, shouldn't be too hard

------
adamnemecek
Mark (the author) is working on yuri.ai ([https://yuri.ai](https://yuri.ai)),
a deep reinforcement learning platform for games. Drop him a line or sign up
if you are interested!

~~~
ipsum2
That's an unfortunate name. (Try Googling "yuri ai" without quotes)

~~~
adamnemecek
Haha its a Command and conquer reference I believe.

------
rcfox
It would be interesting to see a plot of the error compared to the analytical
solution as it trains.

~~~
formalsystem
The analytical solution wouldn't involve any training, you'd solve a system of
equations where position of the finger is equal to where the goal is
constrained by the equations which tell you how an arm moves.

The advantage of the RL approach is that it doesn't need to know how an arm
moves but then involves some training.

~~~
rcfox
Sorry, I meant compare the RL version as it trains to the analytical version.

It's certainly neat that inverse kinematics can be learned from zero
knowledge, but I would have a hard time trusting it to operate a real arm in
an industrial setting.

~~~
formalsystem
You'd be entirely right not to trust it as it is in an industrial setting.
There's been some research around safe exploration that would add additional
terms to the reward function to do things like punish flailing around and such
but I haven't experimented with those techniques myself.

