
Andrej Karpathy talks about how Tesla's NNs are structured and trained [video] - ojn
https://www.youtube.com/watch?v=oBklltKXtDE
======
protomikron
Fun fact for all of you:

Some time ago (around ~10 years) this guy (the presenter) was internet famous
for being a Rubik cube speed solver and making tutorials and videos about
that:
[https://www.youtube.com/watch?v=609nhVzg-5Q](https://www.youtube.com/watch?v=609nhVzg-5Q)

~~~
nsilvestri
I'll always know him as badmephisto. In a recentish reddit AMA, he says he
still keeps a cube on his desk so he can practice a bit and not forget his
algorithms.

~~~
lanius
Can you provide the link to that AMA?

------
jacquesm
The competition in this space is great but I can't help but wonder what would
happen if instead all these companies pooled their resources and went after
the goal collectively. There is so much duplication going on and the paths do
not seem to me - as an outsider - to be all that divergent, which is usually a
pre-condition for having a lot of independent efforts one of which will
succeed.

It's as if everybody wants to be the one to exclusively own the tech. Imagine
every car manufacturer having a completely different take on what a car should
be like from a safety perspective. We have standards bodies for a reason and
given the fact that there are plenty of lives at stake here maybe for once the
monetary angle should get a back-seat (pun intended) to safety and a joint
effort is called for. That would also stop people dying because operators of
unsafe software are trying to make up for their late entry by 'moving fast and
breaking things' where in this case the things are pedestrians, cyclists and
other traffic participants who have no share in the monetary gain.

~~~
jfoster
> The competition in this space is great but I can't help but wonder what
> would happen if instead all these companies pooled their resources and went
> after the goal collectively.

It would probably slow down. 9 women can't have a baby in 1 month. Besides
that, the disagreements about approach, politics, or eventual competitive
interests would probably bring things to a halt for a long time.

I don't think the solutions to this problem are resource-constrained. Many
companies would happily find more resources in order to be first to market
with this technology.

~~~
p1esk
Exactly. Look at Human Brain Project: $1B and 10 years later what exactly have
they achieved? Just like you said - disagreements about approach, politics, or
eventual competitive interests _did_ bring things to a halt for a long time

~~~
jacquesm
The human brain project was DOA from day #1. Unrealistic goals, no clear
reason why more money would lead to better results and no concrete
deliverables that anybody needed.

~~~
p1esk
Same can be said about level 5 autonomy!

~~~
h5rea4he
To me, the fatal flaw of HBP (or USA's competing HBI) is that they are akin to
"cargo cult science", the idea that we can replicate the superficial
structures to a significant enough degree that they system they impart will
suddenly somehow become activated.

But just like the Melanesians with their coconut-shell headsets, there won't
be anyone listening on the other end...

~~~
p1esk
If we replicate a car with a sufficient accuracy it will start. I don’t see
why this wouldn’t apply to any other piece of machinery, including a brain.

HBP should have tried a simpler task first, e.g. replicate a fruit fly’s
brain.

~~~
mac01021
Having replicated the fruit fly's brain, what are you going to do?

Replicate the rest of the fruit fly so that you can install the brain in it?
And then replicate the world so that your software-simulated fruit fly has a
natural context in which to operate?

Or just stimulate the brain with random inputs not associated with any real-
world stimulus?

~~~
p1esk
I’m pretty sure simulating fly’s neural inputs would be a far easier task than
simulating its brain. After verifying it works correctly we would proceed to
simulating a more complex brain, say a frog. And so on.

~~~
mac01021
To simulate the fly's neural input, you need to simulate the fly's entire
environment, including fluid dynamics for the air around the fly, the physics
of every object the fly interacts with, hormonal responses to changes in the
flies blood concentration of various substances (O2, glucose, ...).

This already strains our technical capabilities (at least for the amount of
money we are willing to spend on it).

For any animal whose behavior is the product of a lot of learning, and who
deals with other such animals in daily life, you have to solve all the
problems you had to solve for the fly and also deal with a much larger
connectome and also deal with the fact that, until you can passably simulate
mouse, you will never be able to simulate the way a mouse learns to behave in
the presence of another mouse.

~~~
p1esk
No need to simulate any of the environment. You only need to record the neural
inputs to a fly’s brain. Sure that’s also challenging, but nowhere as
challenging as simulating the entire fly’s brain. My point is if you manage to
accomplish the latter, the former would be a breeze.

~~~
mac01021
> You only need to record the neural inputs to a fly’s brain.

I don't think that's true. As soon as your simulated fly's behavior diverges
from the actual fly's, all of the recorded input after that point is
invalid/useless because it will not match the simulated fly's
position/orientation/whatever.

Also, how many dollar of investment and years into the future do you think we
are from being able to record all of a fly's neural input while it is moving
freely?

~~~
p1esk
To start with you only need to study input-response pairs (sensory input
causing motor commands). You enter the neural inputs from a real fly into the
simulation, and compare responses of the real fly and the simulated one. Once
you understand what's going on, proceed to do sequence of inputs, and compare
the sequence of responses. The goal is not to produce the identical sequence
of actions by tuning the simulation, it's understanding how the actions are
being computed from the inputs.

Having a detailed simulation like this would greatly accelerate Numenta's
style research, where instead of piecing together information from published
papers, you would get it straight from the experiments you control.

------
timzaman
His team is hiring;

[https://www.tesla.com/careers/job/software-
engineerdeeplearn...](https://www.tesla.com/careers/job/software-
engineerdeeplearning-49779)

[https://www.tesla.com/careers/job/machine-
learninginfrastruc...](https://www.tesla.com/careers/job/machine-
learninginfrastructureengineerautopilot-48125)

[https://www.tesla.com/careers/job/machine-
learningscientista...](https://www.tesla.com/careers/job/machine-
learningscientistautopilot-48414)

~~~
soulslicer0
lol i just did the interview and failed. had to find shortest path between
tesla chargers. all in C++. completed it but failed

~~~
karpathy
Very likely not for the positions above! (they all focus on Python)

~~~
boulos
Slightly off-topic (but related to the side thread where people didn’t realize
you went to Tesla): your HN about still says you’re at OpenAI :).

~~~
karpathy
oops, thank you, fixed :)

~~~
boulos
As long as you’re here, I’d love to get an updated PyTorch vs TF from you in
the sibling thread as payment :).

------
modeless
Awesome presentation. Crazy that they're developing their own training
hardware too. It's going to be a very crowded space very soon. Can they really
stay ahead of everyone else in the industry? Can it really be cheaper to staff
up whole teams to design chips for cutting edge nodes, fabricate them, build
supporting hardware and datacenters and compilers, than to just rent some TPUs
on Google Cloud?

I can see the case for doing their own edge hardware for the cars (barely),
but I really don't think doing training hardware will pay off for them. If
they're serious about it, they should spin it out as a separate business to
spread the development cost over a larger customer base.

Also, I'm really curious whether the custom hardware in the cars is benefiting
them at all yet. Every feature they've released so far works fine on the
previous generation hardware with 1/10 the compute power. At some point won't
they need to start training radically larger networks to take advantage of all
that untapped compute power?

~~~
m0zg
Nothing crazy about it. TPU-like stuff is ~10x the energy efficiency of GPUs
and several times the speed. When you're spending megawatt-hours and days to
train a single model, it adds up in both real and opportunity costs.

Also, Google TPU TOS prohibits the use of TPUs for stuff that competes with
Google (and I'm assuming with other companies under Alphabet umbrella), at
Google's sole determination. Not that it would be a good idea to upload
Tesla's proprietary data into Google Cloud even if it did not. Cloud, after
all, is just somebody else's computer.

~~~
modeless
> Google TPU TOS prohibits the use of TPUs for stuff that competes with Google

I don't think this is true. If you're talking about
[https://news.ycombinator.com/item?id=19855099](https://news.ycombinator.com/item?id=19855099),
it doesn't apply to TPU hardware, as is explained in the comments there.

> TPU-like stuff is ~10x the energy efficiency of GPUs

10x is probably overstating it when talking about newer GPUs because they have
ML hardware in them now. Also, that still doesn't make it a good idea to build
your own chips because there will soon be many third party options to choose
from. Doing your own chips is a bet that you will out execute dozens of
companies ranging from startups to industry giants. Simply taking your pick of
the best commercially available options is likely to be a better choice in the
near future.

~~~
m0zg
Yep that's the clause. The clause itself is not that problematic for Tesla.
What's problematic is that it can be changed over time, and it'd be foolish to
single-source something as important as deep learning compute without the
option to go elsewhere. Not to mention the rather extravagant Cloud pricing.
So Tesla is taking a page out of Steve Jobs' playbook and it will control its
own core tech. That's smart, especially considering that they already have
bits and pieces of the IP that they'll need.

~~~
modeless
That clause doesn't apply at all though. It's not even for the same product.

As for single-source, the models are written in PyTorch, not TPU machine code,
and they're pretty standard models anyway (e.g. Resnet-50). They can easily
transfer to other hardware if necessary. There's not a ton of lock-in there.
It doesn't justify the massive costs of ASIC development just to avoid this
nonexistent lock-in and imaginary TOS clause.

------
sdan
Really liked this talk.

Looks like they are really nicely orchestrating workloads and training on
numerous nets asynchronously.

As a person in the AV industry I think Tesla's ability to control the entire
stack is great for Tesla... maybe not for everyone who can't afford/doesn't
have a Tesla.

~~~
natch
>maybe not for everyone who can't afford/doesn't have a Tesla.

Affordability is not as much of an issue as some make it out to be. Cost-wise
it's like owning a Camry or an Accord, if you go for the lower end models. If
you mean not everyone can afford a new car, then sure I agree with you.

Edit: if you think I'm wrong about this, please explain or ask me to clarify
anything?

~~~
sdan
As a small anecdote, my parents couldn't afford/didn't want to spent over $30k
for a car. Surely we could've gotten a Tesla for $5k+ more, but given the
relatively new infrastructure with electric charging stations (and the fact
that none are available in the apartment I live in) my parents didn't find all
the new cool features appealing and instead got a regular Toyota Sienna that
has nothing fancy, just enough to take the family around.

Similarly, the infrastructure around electric charging stations I believe
hasn't fully matured yet and as a result many people who've already owned a
car, I believe will stick with gas cars since there's no huge incentive to
change, unless it becomes easier to charge (faster, more convenient).

Do note that I don't have a drivers license. I never intend on getting one (I
believe in what I do in the AV industry). I'm just guessing on the habits of
people, not that I have any real experience in buying a gas/electric car.

Also note I didn't think you were wrong, not sure why the downvotes.

~~~
natch
Thanks for the reply. Living in an apartment is not a big issue for me... we
have two, live in an apartment with no charger.

That's interesting that you relate something you do in the AV industry to not
ever getting a car... what's that about?

I do think that it's possible in the future the majority of people will never
need to own a car.

~~~
sdan
I think for a good majority of people especially in America will go along the
lines of "If it ain't broke, don't fix it" for gas cars.

Regardless, I think Ghost Locomotion and Comma.ai have a lot of potential for
what they're doing now. I think they'll coincide with fully driverless cars
like Cruise, Waymo, or Aurora. Regardless of if they're electric or not (I
think electric cars will be more heavily adapted if we use Cars as a Service)

Also, yes, I don't ever intend of getting a license. This problem is really
fascinating and I'm excited to play a little part of it. By not having a
license, I can spend more time having the perspective of a person in 20-30
years when drivers licenses will become less common and utilize it in my work.

~~~
natch
>I can spend more time having the perspective of a person in...

This is awesome. I've always found that only by really experiencing the future
(or some part of the leading edge that is soon going to become the future for
most people) you gain much more understanding of it, ahead of others, and can
apply it in your life planning. It would be worth living this way even if only
temporarily just to get that as you say perspective... Not ready to give up
driving forever, heh, but it I might have to try a short sample of that non-
driving lifestyle sometime.

------
londons_explore
I'm still _amazed_ that Teslas team isn't using a map... I know maps get
outdated and are sometimes wrong, but having inaccurate knowledge of what's
around the corner is far far more helpful than not having any clue whats
around the corner.

The smart solution would be to consider a map a probabilistic thing, which
neural networks are really good at handling.

~~~
anonu
I'm still amazed Tesla has decided not to use lidar and instead just stick
with cheap cameras. Better sensors are there, they're available, they're cheap
and they can probably "see" better than plain old cameras... it doesn't make
too much sense not to use them IMHO. But then again, I am not coding NNs for
Tesla...

~~~
londons_explore
While LIDAR's are certainly 'better' from a technological standpoint than not
having anything, from a business standpoint it's less clear.

LIDAR's are cheap, but not cheap enough yet to not seriously affect the bottom
line if you put them into every car. It also will kill the resale price of
cars without it, which in turn hurts the companies image and stock price.

~~~
nielsole
If you are first to market with a level 4 autonomous taxi, unit economics will
likely be great regardless of whether you put LIDAR or cameras in.

~~~
londons_explore
Yes, but the current reality is they need to pay for the hardware, and still
make a profit, _without_ any guarantee of that autonomous taxi market.

It all depends what you see are the chances of the autonomous taxi market
developing within the lifetime of these cars.

------
Gravityloss
Interesting that they don't have a full 3D world model. I'm certainly not a
machine learning expert. I'm still amazed the route from image recognition to
a 2D map of "what's drivable" to autonomous driving is so direct. One would
expect to hit a ceiling really soon with that approach.

To me it seems we're still in really early days.

~~~
spyder
They're doing 3D for the road path, and even predicting it beyond corners:

[https://youtu.be/Ucp0TTmvqOE?t=8137](https://youtu.be/Ucp0TTmvqOE?t=8137)

And later in the video they show 3D reconstruction from cameras and saying
they use it in the car.

Watching the full talk is recommended if you have the time (talk starts around
1:10:00 in the video)

~~~
Gravityloss
Thanks, seems my original comment is wrong then!

------
eanzenberg
One thing I didn't quite understand is how training sub-graphs in parallel
works. If you are editing a sub-graph of a monolith type model, aren't you
affecting other graphs that have dependencies on the one you're editing? If
these are independent graphs, then what's a "sub-graph" even mean?

~~~
punnerud
In PyTorch you have full control on the graph and weight, everything feels
like Python. So feeding some of the learning between “sub-graph” is easy. Not
sure if this is possible on Tensorflow/Keras?

He describes the sub-graph training in the context that they they have all the
predictors in one big model, and with control of the network can feedforward
and train sub-graph (read sub-parts) of the model.

~~~
eanzenberg
This is possible in keras, just drive new models that are functions of a
monolith model and train independently. I still don’t understand the point
though. If you train a “subgraph”, the other tasks dependent on the part of
the graph will have to get retrained anyways, since those edits will affect
the other tasks.

------
fyp
For those who want to learn more, I would start with Mask-RCNN where you have
a very similar architecture: one shared backbone with multiple heads that can
be retrained for various tasks (bounding boxes, masks, keypoints, etc):
[https://youtu.be/g7z4mkfRjI4?t=628](https://youtu.be/g7z4mkfRjI4?t=628)

------
kegan
Anyone knows why Andrej's team chooses PyTorch (as oppose to say TensorFlow?)

~~~
jeffshek
Some potential reasons:

\- TensorFlow is great at deployment, but not the easiest to code. PyTorch
isn't frequently used in production until recently.

\- If you have the resources for great AI engineers and researchers, your team
will be good enough to build and deploy both frameworks.

\- Preference toward the easier framework your tech leads prefer.

\- Lots of new academic research is coming in PyTorch

\- TensorFlow is undergoing a massive change from 1.1x to 2.0; if you choose
TensorFlow, write on 1.1x just to then refactor to TF 2.0? Or write on TF 2.0
now and deal with all new edge cases? Or write in PyTorch (easier) but handle
the more difficult deployment process.

\- ML code quickly rots. Bad PyTorch code is just bad Python code. Bad
TensorFlow code can be a nightmare to debug.

\- PyTorch's eager execution makes coding NNs much easier to prototype and
build.

------
laichzeit0
The good news for me is that the upper bound for fully autonomous self-driving
cars is no more than 50 years away. What a time to be alive. If it happens
before then, that will be an absolute bonus.

------
diveanon
Andrej Karpathy is such a treasure.

He is an excellent presenter who really has a passion for teaching.

Im not really involved with the industry, so I cant really speak to how he
holds up to other experts. However he is by far the most digestable resource I
have found for learning about NN and science behind them.

If you are just discovering him now, google his name and just start reading.
His work is truly binge worthy in the most meaningful way.

------
SloopJon
The description of SmartSummon about halfway through the talk is interesting.
One of the views looks like SLAM using a particle filter, but Andrej seems to
say that it's done entirely within a neural net.

------
alexnewman
Jeeze and I can't get my pytorch to stop leaking memory. I couldn't imagine
trying to drive a car with it

~~~
Joky
Pytorch is used to train models on servers/cloud, not to drive the car later.
The trained model is converted to something native to the embedded environment
of the car.

------
jfoster
I wonder if the environment the car discovers includes elevation. Would be
necessary for handling many carparks.

------
ngcc_hk
Wow

------
adamnemecek
The trick for level 5 is learning the mapping between the lidar point cloud
and the video stream. It’s the best of both worlds.

~~~
pgodzin
Tesla doesn't have have a lidar point cloud at all

~~~
adamnemecek
That’s the point. You train with lidar, deploy with cameras.

------
mkagenius
Oh he's no longer with OpenAI? Sam Altman must be worried about this..

~~~
spectramax
Without meaning offense to Sam, I thought he was an investor / YC head. What
credentials does he have to be at OpenAI?

~~~
visarga
Teaching the CS231n course at Stanford, for one.

~~~
_cs2017_
Andrej Karpathy taught cs231n. The question is about Sam Altman who didn't
teach anything related to NL. He's on the board of OpenAI because he was one
of its founding investors.

------
new_realist
Meanwhile Waymo is way ahead.

~~~
grecy
Do you belittle everyone that gets second place in the Olympics because the
winner is "way ahead"?

Your comment just reeks of anger and hostility.

It seems like you'd rather Tesla didn't try at all, and instead we all just
give up and go back to the status quo.

~~~
adrr
Elon belittles lidar saying it is doomed and will never work yet Waymo and
Cruise will probably be operating self driving taxi fleets in California next
year. Tesla deserves getting dumped on for those comments because they are no
where near self driving.

~~~
xiphias2
Andrej Karpathy just started working on Tesla's software 2 years ago, before
what Chris Lattner did was a mess (he wanted to just have 1 task that learns
magically everything), Andrej had to start everything from scratch.

Waymo had a 20 year advantage, but Google lost many key people there in the
meantime as Larry Page didn't want to launch partial self driving.

I think both approaches are great and I wouldn't want to choose between the 2,
just be a happy user of the end result of the competition.

~~~
jacquesm
> Larry Page didn't want to launch partial self driving

That's the responsible thing to do.

~~~
xiphias2
If you look at self driving automation as a black box, sure.

But at the same time people understand that on a highway Tesla Autopilot is
safe enough to be used on a long boring road, and dricers generally feel less
tired (and can focus more on the harder parts of the road).

~~~
wstrange
There have been notable Tesla fatalities on "long boring roads". The software
has a long way to go..

------
mindfulplay
Just listening to this talk scares me. The amount of errors - even in a
seemingly normal, sunny day - is mind boggling to think people trust this
crap.

How can we rely on the output of eight cameras? This is not a kid's science
project.

It's all fancy neural networks until someone dies. Pretty callous and Silicon
valley-mindset for such an important and critical function of the car.

Will never buy a Tesla after having seen this.

~~~
panarky
_> mind boggling to think people trust this crap_

It's also mind boggling to think we currently trust organic tissue to do this
crap, some of which is bathed in psychoactive chemicals.

And yet we do, and as a result, horrendous catastrophes occur every minute of
every day.

 _> It's all fancy neural networks until someone dies_

No, that can't be the standard, not when people are dying _right now_ in the
current regime.

Unless the new regime kills and maims _at a higher rate_ than the current
regime, there is no reason for fear.

~~~
thebruce87m
No, it needs to be a lower rate since it will kill at random. Today’s rate
includes drunk drivers, people on their phone and other “unsafe” drivers. If
you are an attentive driver your chance of death would actually go up if the
overall death rate was the same.

~~~
slim
if you are an attentive driver you can get killed anytime by these other
drivers

~~~
thebruce87m
I’m maybe missing your point, but this is reflected in the overall death rate
now. Similarly in the future the overall death rate will include deaths of the
occupants of self-driving cars and also the cars/people that they hit.

The perception will be different however, since in the public eye the drunk
driver “had it coming” and the only real loss is the other
driver/passengers/pedestrians. If a self-driving car kills its own occupants
and the occupants of another car then there are more “innocents” than in the
drunk-drive scenario.

