Andrej Karpathy talks about how Tesla's NNs are structured and trained [video]

protomikron · on Nov 10, 2019

Fun fact for all of you:

Some time ago (around ~10 years) this guy (the presenter) was internet famous for being a Rubik cube speed solver and making tutorials and videos about that: https://www.youtube.com/watch?v=609nhVzg-5Q

nsilvestri · on Nov 10, 2019

I'll always know him as badmephisto. In a recentish reddit AMA, he says he still keeps a cube on his desk so he can practice a bit and not forget his algorithms.

lanius · on Nov 11, 2019

Can you provide the link to that AMA?

eyeundersand · on Nov 10, 2019

Wow! Thank you.

What a blast from the past! I learned how to solve the Rubik's Cube blindfolded by watching him, back in the day. His tutorials are perfect and I've probably recommended his channel to ~50 people myself. Crazy he only has 36.6k subs.

Glad to see he's doing well.

jacquesm · on Nov 10, 2019

The competition in this space is great but I can't help but wonder what would happen if instead all these companies pooled their resources and went after the goal collectively. There is so much duplication going on and the paths do not seem to me - as an outsider - to be all that divergent, which is usually a pre-condition for having a lot of independent efforts one of which will succeed.

It's as if everybody wants to be the one to exclusively own the tech. Imagine every car manufacturer having a completely different take on what a car should be like from a safety perspective. We have standards bodies for a reason and given the fact that there are plenty of lives at stake here maybe for once the monetary angle should get a back-seat (pun intended) to safety and a joint effort is called for. That would also stop people dying because operators of unsafe software are trying to make up for their late entry by 'moving fast and breaking things' where in this case the things are pedestrians, cyclists and other traffic participants who have no share in the monetary gain.

jfoster · on Nov 10, 2019

> The competition in this space is great but I can't help but wonder what would happen if instead all these companies pooled their resources and went after the goal collectively.

It would probably slow down. 9 women can't have a baby in 1 month. Besides that, the disagreements about approach, politics, or eventual competitive interests would probably bring things to a halt for a long time.

I don't think the solutions to this problem are resource-constrained. Many companies would happily find more resources in order to be first to market with this technology.

oneshot908 · on Nov 10, 2019

Also agreed, when hedge funds don't silo their quants, instead of seeing 50 different strategies from 50 quants, they get 50 variants of the same strategy, source:

https://www.amazon.com/gp/product/1119482089/ref=ppx_yo_dt_b...

p1esk · on Nov 10, 2019

Exactly. Look at Human Brain Project: $1B and 10 years later what exactly have they achieved? Just like you said - disagreements about approach, politics, or eventual competitive interests did bring things to a halt for a long time

jacquesm · on Nov 10, 2019

The human brain project was DOA from day #1. Unrealistic goals, no clear reason why more money would lead to better results and no concrete deliverables that anybody needed.

p1esk · on Nov 10, 2019

Same can be said about level 5 autonomy!

h5rea4he · on Nov 10, 2019

To me, the fatal flaw of HBP (or USA's competing HBI) is that they are akin to "cargo cult science", the idea that we can replicate the superficial structures to a significant enough degree that they system they impart will suddenly somehow become activated.

But just like the Melanesians with their coconut-shell headsets, there won't be anyone listening on the other end...

p1esk · on Nov 11, 2019

If we replicate a car with a sufficient accuracy it will start. I don’t see why this wouldn’t apply to any other piece of machinery, including a brain.

HBP should have tried a simpler task first, e.g. replicate a fruit fly’s brain.

mac01021 · on Nov 11, 2019

Having replicated the fruit fly's brain, what are you going to do?

Replicate the rest of the fruit fly so that you can install the brain in it? And then replicate the world so that your software-simulated fruit fly has a natural context in which to operate?

Or just stimulate the brain with random inputs not associated with any real-world stimulus?

p1esk · on Nov 11, 2019

I’m pretty sure simulating fly’s neural inputs would be a far easier task than simulating its brain. After verifying it works correctly we would proceed to simulating a more complex brain, say a frog. And so on.

mac01021 · on Nov 14, 2019

To simulate the fly's neural input, you need to simulate the fly's entire environment, including fluid dynamics for the air around the fly, the physics of every object the fly interacts with, hormonal responses to changes in the flies blood concentration of various substances (O2, glucose, ...).

This already strains our technical capabilities (at least for the amount of money we are willing to spend on it).

For any animal whose behavior is the product of a lot of learning, and who deals with other such animals in daily life, you have to solve all the problems you had to solve for the fly and also deal with a much larger connectome and also deal with the fact that, until you can passably simulate mouse, you will never be able to simulate the way a mouse learns to behave in the presence of another mouse.

p1esk · on Nov 14, 2019

No need to simulate any of the environment. You only need to record the neural inputs to a fly’s brain. Sure that’s also challenging, but nowhere as challenging as simulating the entire fly’s brain. My point is if you manage to accomplish the latter, the former would be a breeze.

mac01021 · on Nov 14, 2019

> You only need to record the neural inputs to a fly’s brain.

I don't think that's true. As soon as your simulated fly's behavior diverges from the actual fly's, all of the recorded input after that point is invalid/useless because it will not match the simulated fly's position/orientation/whatever.

Also, how many dollar of investment and years into the future do you think we are from being able to record all of a fly's neural input while it is moving freely?

p1esk · on Nov 24, 2019

To start with you only need to study input-response pairs (sensory input causing motor commands). You enter the neural inputs from a real fly into the simulation, and compare responses of the real fly and the simulated one. Once you understand what's going on, proceed to do sequence of inputs, and compare the sequence of responses. The goal is not to produce the identical sequence of actions by tuning the simulation, it's understanding how the actions are being computed from the inputs.

Having a detailed simulation like this would greatly accelerate Numenta's style research, where instead of piecing together information from published papers, you would get it straight from the experiments you control.

account73466 · on Nov 10, 2019

>> if instead all these companies pooled their resources and went after the goal collectively.

That would be a bad idea because like in evolutionary processes you need this diversity of ideas to locate better local optima even if it will take longer.

paraschopra · on Nov 10, 2019

Standards shouldn't emerge too soon. I think for self driving tech, at the current stage, competition is good because there are lots of unsolved questions. Competition will ensure the best tech is ultimately available to consumers.

Of course, it's not a binary choice. Things like data should probably be pooled but the use of data in tech should compete.

jacquesm · on Nov 10, 2019

Ditto validation tech frameworks. If those are not standardized then people will not be able to make an informed choice about which solution is the safest other than to wait a decade and do a bodycount.

konschubert · on Nov 10, 2019

I think that there is still a need for some brilliant insights and breakthroughs, it isn't just a matter of getting the work done.

So actually, I think it's one of these situations where having a lot of independent efforts might be worth it.

londons_explore · on Nov 10, 2019

> There is so much duplication going on

In the self-driving world, the duplication is necessary - different companies are taking different directions, and nobody really knows which will work out.

In the ML hardware world, the duplication is mostly unnecessary. People are developing their own inference hardware ASIC's because they're relatively simple (compared to designing a CPU from scratch, designing a TPU is pretty simple because there are so few operations, and no complex out-of-order execution), and you can't buy one off the shelf yet.

As soon as ML hardware becomes available to buy off the shelf without a massive price premium, everyone will switch to that.

p1esk · on Nov 10, 2019

I do research in ML hw field: there are currently a couple hundred designs to run a convolutional NN inference. A couple of dozen have been built. They have pretty different underlying technologies (CMOS, floating gate, ReRAM/memristors, etc), different ideas (systolic arrays, analog crossbars, cache organization, lookup tables, data reuse, TDM, using spikes, etc), wildly different power (from microwatts to hundreds of Watts), size, speed, precision, flexibility, cost, ease of use/integration, etc. This is just convnet inference. Lots more is needed to do training in hw, again with multiple choices on how to do it.

So which one design you suggest we all use for all our ML needs?

londons_explore · on Nov 11, 2019

The duplication I see in the commercial world in ML inference hardware is in designs similar to the TPU... So a big ~128x28 accumulating mat-mul array, with enough memory throughput to get one operand in and out fast, and enough cache to store the other operand (weights) and switch between which weights are used so the mat-mul array can very efficiently do larger matrix sizes.

Also lookup tables for a bunch of activation functions.

That basic design can efficiently implement nearly any neural net architecture as long as the layer sizes are at least 128x128 and fixed point is okay.

The other exotic designs you suggested are more academic research things, and not yet deployed at scale in anyone's datacenters.

ArtWomb · on Nov 10, 2019

We could see some of this play out in the China EV market in the coming years. State sponsored subsidies around infrastructure standardization. Combined with foreign investment and competition spurring innovation.

What I've seen personally is what can be loosely termed "emergent consensus". Historical competitors (and often it gets whittled down to two giants, such as Boeing and Airbus) will work in secret on research. But after years of experimentation arrive at very similar outcomes. An optimal answer that could only be arrived at through constant trial and error, product evolution and iteration.

Regarding Karpathy's PyTorch presentation I don't thing anything that wasn't already public was revealed. The FSD board with custom NPUs is a Work of Art. I like that there are dual redundant streams. And the scale of the dataset is already well know: 4096 HD-images per step!

If I had to speculate, the "Dojo Cluster" may be envisioned as an effort to share data and compute with industry partners as a cloud SaaS product and ancillary revenue stream. But that is pure speculation ;)

Inside Tesla’s Neural Processor In The FSD Chip

https://fuse.wikichip.org/news/2707/inside-teslas-neural-pro...

mkolodny · on Nov 10, 2019

Fortunately, some companies do share a significant amount of what their cars have learned so far. Uber publishes a ton of papers about their self-driving research [0][1]. Waymo released an open autonomous driving dataset, and publishes papers as well [2][3].

Of course, papers and data aren't code. But I think a lot more is being shared than people realize.

[0] https://eng.uber.com/author/raquel-urtasun/ [1] https://eng.uber.com/research/?_sft_category=research-self-d... [2] https://waymo.com/open [3] https://arxiv.org/pdf/1812.03079.pdf

d_burfoot · on Nov 10, 2019

I don't know about sharing tech, but there should definitely be a shared evaluation benchmark, and some kind of oversight agency should be involved. The idea would be: if you want to be permitted to operate an AV on public roads, you need to demonstrate that your vehicle's vision system can detect pedestrians and obstacles with near-perfect accuracy on a large shared image database, most of which is NOT distributed to researchers.

credit_guy · on Nov 10, 2019

Competition is good. It keeps you honest. The Manhattan project experienced competition, of the do-or-die type. It only was not internal, it was from the Nazis, Japan, and later the soviets.

Right now, the competition in the self-driving area is metaphorically as close to do-or-die as you can have in peace time. GM as a regular car manufacturer is toast, Cruze is pretty much their only hope. Uber is bleeding, if they pull self-driving off, they are kings. The German manufacturers are watching in disbelief as Tesla is starting to eat their pie. Conversely, Tesla knows that in what they are doing (electric cars) they don't enjoy any fundamental moat. If the Germans get their act together, they'll be able to make equally performant electric cars, but true luxury. The only one that's not really about survival is Waymo.

maelito · on Nov 10, 2019

I wonder if the French civil nuclear electricity program that led to this level of low carbon emissions, or the TGV (high speed rail system), could be good examples of what you're asking for.

Maybe the companies actually building the nuclear plants and trains and rails were actually in competition ?

atoav · on Nov 10, 2019

There was a time in the medival ages where alchemists were kidnapped by kings and held in chambers so they would only generate knowledge for them. This obviously lead to a similar duplication to the one you describe, right up to calculus where Newton kept the thing hidden in a drawer and then Leibnitz had the same idea.

Once that kind of secrecy was gone our whole technical progress was accelerated, because people could build on the discoveries of other people.

Right now we are going back to the alchemist model in some ways (the highest profile people work for the big companies and don’t share their discoveries). This makes progress slower.

modeless · on Nov 10, 2019

> the highest profile people work for the big companies and don’t share their discoveries

I have to strongly disagree with this for the specific case of AI/ML. The big company labs are publishing open access papers non-stop, often with code and sometimes even datasets. They're more open than some areas of academia, in fact.

atoav · on Nov 10, 2019

Really? I heard the opposite from someone in the field, who told me that they do publish but it is never the really relevant stuff. I can’t really judge that myself to be honest

nl · on Nov 10, 2019

I work in the field. New and relevant stuff is published frequently by Google, Facebook and Microsoft (as well as smaller companies like NVidia etc). Apple very occasionally publishes too.

doomlaser · on Nov 10, 2019

Leon Gatys, the guy who published A Neural Algorithm for Artistic Style that revolutionized the entire industry while he was at a university is now employed at Apple. They have published at least one thing that was semi-interesting on cognitive functioning in older adults and how it can be expressed through smartphone usage: http://delivery.acm.org/10.1145/3310000/3300398/a168-gordon....

Interesting and certainly compelling, but not earth-shaking like style transfer has been.

kiwicopple · on Nov 10, 2019

Ideally everyone would collaborate on inputs and compete on outputs. All the data gathering, tagging, mapping etc could be put into a shared domain, and then after that the companies decide what to do with it and how to commercialise it.

Easier said than done, but I think it would strike the right balance between reducing duplicate work, and incentivising progress.

arketyp · on Nov 10, 2019

You can apply this line of reasoning on many markets, like the pharma or food industry which also have safety concerns. It strikes me as the kind initiatives EU attempts nowadays when realizing we are running behind on some tech and want to leverage the one possible advantage we have as a great centralizing power. Not too different from communist states, actually. I agree with the sentiment that redundancy seems wasteful, but it seems to me a necessary evil as a driving force in development, as with the right to private property in general.

jacquesm · on Nov 10, 2019

I'm reminded of the Manhattan project and I don't think they would have succeeded in their goal if they had tried to run 10 of those at once. There just aren't that many really great scientists in a space this narrow.

iguy · on Nov 10, 2019

They built complete factories for three different technologies, and finally delivered two totally different bomb designs.

Which is a sign they were pretty worried about choosing to put all their smart guys on one path, and picking the wrong one. And the number of smart guys they had was certainly a real constraint. Even under their very stressed circumstances, exploration and competition were important.

mattrp · on Nov 10, 2019

This is correct - read the fantastic “making of the atomic bomb” - the planners kept competition alive for as long as possible because no one really knew which path was going to work. In contrast the nazi project, among other reasons, failed because it focused nearly exclusively on heavy water and didn’t have internally competing projects.

tim333 · on Nov 10, 2019

It's kind of a different class of problem. With self driving you can have someone like Hotz or the Cruise guys knock up a system with limited resources and have it work quite well. You just can't do that with developing nuclear weapons.

Also self driving is not a problem where you can put a bunch of geniuses in a room and have them calculate the correct design. There are too many unknowns. It needs experimentation and trial and error.

arketyp · on Nov 10, 2019

Fair point. And the moon mission. Autonomous driving is such a consumerist issue though, seems quite well suited for a free market dynamic. I'd rather see a great joint effort on fusion energy or something.

hgoel · on Nov 10, 2019

Talk about twisting the argument. The Manhattan project was in competition with the Soviets and the Nazis, it was less a result of some unified front of all the smartest scientists in the world, and more of a result of "if we don't succeed first, someone much worse probably will".

Your logic of "if there were 10 of them simultaneously, it wouldn't have worked out" is flawed and self serving in that if you ignore all competition and just divide a single competitor into any number of smaller entities, of course at some point they'll be too small to be viable.

eanzenberg · on Nov 10, 2019

Competition is good

nfoz · on Nov 10, 2019

Choice is good, alternative implementations are good. But I think competition is bad. It is wasteful and antisocial.

eanzenberg · on Nov 11, 2019

Competition is good and directly leads to the progress we’ve seen in the western world.

nfoz · on Nov 12, 2019

"Progress" is a rhetorical device used to paint whatever actually happened in a positive light. But coincident with the positive elements of the rise of western society, is a whole lot of tragedy. Wars with unprecedented brutality and scale, genocide, factory farming, mass extinctions and the systematic destruction of the planet, selfishness as a defining cultural trait.

Replacing competition with cooperation may have accelerated the positive aspects of progress (e.g. science), and maybe slowed or prevented many of the negative ones.

Here's a treatise on the topic, if you're interested in this counterpoint: https://www.alfiekohn.org/contest/

alwayslearning0 · on Nov 10, 2019

I think the diversity you see in cameras and lidar placement and existence is worth it enough to have different paths forward. Tesla seems insistent that it can be done sans lidar. It's definitely worth it to see which approach works best.

logicallee · on Nov 10, 2019

>Imagine every car manufacturer having a completely different take on what a car should be like from a safety perspective. We have standards bodies for a reason

Roads are also governed by public bodies. Road signs are standardized and public.

I think the government should take a much larger role in defining self driving cars. For example, rather than using computer vision to recognize signs, signs could be active standardized beacons; instead of having to recognize lanes, they could be repainted with rfid chips that are trivial for cars to recognize and follow.

Avoiding driving into people is also something that was somewhat regulated by crosswalks with pedestrian lights. Would it be absurd for the crosswalk to know roughly how many people are at it, then broadcast this to the car, rather than having the car have to recognize them?

There are many things the government could do with transportation infrastructure that would benefit everyone, many of which are literally impossible for companies to do separately. Can you imagine if we had to wait until IBM (or Siemens or Google or Apple) got into the business of launching satellites before we got GPS? There is a good chance that to this day cell phones wouldn't know their location or give anyone any mapping applications.

To me, self driving cars are similar. Many parts of transportation are a public good.

kortex · on Nov 10, 2019

> Would it be absurd for the crosswalk to know roughly how many people are at it, then broadcast this to the car

Yes, completely absurd. For one, many places are too sparse, poor, or unstandardized for this to be remotely economical. Two, people often don't cross at crosswalks (jaywalking). Level 5 AV is a thing where 99% coverage isn't good enough. You need a lot of nines.

That's why many think lidar systems are a crutch. If lidar can't work in snow or heavy rain (at least 1-2% of days in the north), then you need fallback which must still be >99% as effective to avoid incidents. But then why not just use the fallback?

Generating data? Sure. But for actual use in the data pipeline? Makes you rely on an ultimately untenable solution.

logicallee · on Nov 10, 2019

In sparsely populated areas smart roads could inform the car that there hasn't been any movement whatsoever in the whole area for hours. (Along the entire road and adjacent to the roads). How the car responds (driving somewhat faster, perhaps) could mean a large measure of safety as compared to when there are a large group of people about to cross a rural road at night that the car may or may not see visually.

Anyway it's just one example. RFID or similar embedded beacons in the road paint would make roads much easier for cars to follow. They are expensive for any one company to do but cheap for the government to do if it is done everywhere at once and lasts 5-10 years. Road signs that broadcast what they are (instead of needing to be "seen" and interpreted visually instead of through radio by the car), are similar.

Finally, a government standard could coordinate cars into a caravan, avoiding pileups for example, and giving the participating cars several advantages you can find by Googling "car caravan". (Though this might be achieved by industry.)

timzaman · on Nov 10, 2019

His team is hiring;

https://www.tesla.com/careers/job/software-engineerdeeplearn...

https://www.tesla.com/careers/job/machine-learninginfrastruc...

https://www.tesla.com/careers/job/machine-learningscientista...

throwaway010718 · on Nov 10, 2019

Any guess what the compensation is like for these positions ?

drchewbacca · on Nov 10, 2019

If you get rejected just edit 10 words on your resume and then resubmit. Do this a 100hz and you'll get in eventually :)

choppaface · on Nov 10, 2019

Less than Uber, but more than Waymo, who is only offering ~$20k-ish stock packages like a late start-up who expects to 10x. Depends on how you value Tesla stock, though. See levels.fyi

cloudwalking · on Nov 10, 2019

I know people at both Tesla and Waymo, and Tesla does not pay as well as Waymo.

choppaface · on Nov 13, 2019

levels.fyi suggests Tesla pays higher total compensation given that the stock is liquid. The Tesla salaries posted there are less than what I’ve seen in Waymo offers, but having liquid equity is a major differentiator in this case. A Waymo IPO could be 5 years off.

soulslicer0 · on Nov 10, 2019

lol i just did the interview and failed. had to find shortest path between tesla chargers. all in C++. completed it but failed

lsh · on Nov 10, 2019

you were supposed to bore a tunnel directly between the two points

mkagenius · on Nov 10, 2019

jeez, trick question

karpathy · on Nov 10, 2019

Very likely not for the positions above! (they all focus on Python)

boulos · on Nov 10, 2019

Slightly off-topic (but related to the side thread where people didn’t realize you went to Tesla): your HN about still says you’re at OpenAI :).

karpathy · on Nov 10, 2019

oops, thank you, fixed :)

boulos · on Nov 10, 2019

As long as you’re here, I’d love to get an updated PyTorch vs TF from you in the sibling thread as payment :).

bitL · on Nov 10, 2019

ars · on Nov 10, 2019

How does that work? Where you given the algorithm to use? Or was it really a data science question, rather than a programming question?

meheleventyone · on Nov 10, 2019

Without more details it sounds like an algorithm problem you'd be expected to solve from prior knowledge. Stuff like a breadth first search from the start point, up through various path finding algorithms to applying heuristics (I believe route finding on roads exploits the road topology).

f00_ · on Nov 10, 2019

Could you have just used dijkstra or breadth first search?

sdan · on Nov 10, 2019

Thanks Tim! Would love to apply, but still a student. Hoping to join in the future given how nicely orchestrated your team has been training nets.

ojn · on Nov 10, 2019

Apply for internship. Doing well on an internship is a great way to get a foot in the door after graduation.

modeless · on Nov 10, 2019

Awesome presentation. Crazy that they're developing their own training hardware too. It's going to be a very crowded space very soon. Can they really stay ahead of everyone else in the industry? Can it really be cheaper to staff up whole teams to design chips for cutting edge nodes, fabricate them, build supporting hardware and datacenters and compilers, than to just rent some TPUs on Google Cloud?

I can see the case for doing their own edge hardware for the cars (barely), but I really don't think doing training hardware will pay off for them. If they're serious about it, they should spin it out as a separate business to spread the development cost over a larger customer base.

Also, I'm really curious whether the custom hardware in the cars is benefiting them at all yet. Every feature they've released so far works fine on the previous generation hardware with 1/10 the compute power. At some point won't they need to start training radically larger networks to take advantage of all that untapped compute power?

antpls · on Nov 10, 2019

Watch the presentation from 6 months ago, where they explain the decision to build their own hardware for inferring : https://youtu.be/Ucp0TTmvqOE?t=4309

It's not surprising that they also build the hardware for training. Correct me if I'm wrong, but Google use the same TPUs for training and inference, because the underlying operations are the same : multiply then add numbers. Once Tesla built the hardware for inferring, the design of the hardware for training is probably similar.

Unlike Google's TPUs, Tesla have a specific use case for the hardware (computer vision for automotive), and maybe than means they can further optimize the computation pipeline with their own specialized hardware.

spyder · on Nov 11, 2019

Very good video, it contains answers to many of the questions that people are speculating here about and other interesting things about Tesla's custom chip.

- It's under 100W so they can retrofit into old cars

- lower part cost, so they can do full redundancy with doubling the parts

- they estimated that 50 TOPS is needed for self-driving

- lower latency with batch size of 1 compared to TPU's 256

+ GPU for post-processing

- security: only code signed by Tesla can run on the chip

- at the time (2016) there was no neural net accelerator chips

- some part's are built from bought IP (so not reinventing them) Probably things like the 12 ARM CPUs, LP DDR4 memory, video encoder, maybe the separate post-processing GPU too...

- physical size of the board is small

- performance example: on CPU 1.5 FPS, on GPU (600 gflop) 17 FPS, on Tesla's NN accelerator 2100 FPS

- Besides the convolution even the ReLU and Pooling is implemented in hardware

- Paying attention to the energy efficiency down to the arithmetic and data type usage.

- The silicon cost is less than their previous hardware (HW 2.5)

- old hardware 110 FPS new one 2300 FPS

- 144 TOPS compared to NVidia's Drive Xavier 21 TOPS

modeless · on Nov 10, 2019

Google has Edge TPUs for use outside datacenters, and they don't support training. Neither do the chips Tesla made for their cars. It's a pretty different problem.

antpls · on Nov 10, 2019

I wouldn't be so sure. Edge TPUs could be the exact same architecture than Google Cloud TPUs, but as you need less computation power for inferring than training, they have simply less transistors on the die and could be underclocked.

In other words, Cloud TPUs could be the same architecture than Edge TPUs but scaled to an higher frequency and more packed.

I guess we need sources to confirm.

londons_explore · on Nov 10, 2019

Training is currently done in floating point math, whereas inference can be done fixed point without much loss of performance. Fixed point is ~10x cheaper in terms of power and silicon area for equal performance.

Also, training requires a lot more RAM per unit of compute, since it needs to store all past layer activations, whereas for inference, that is unnecessary.

As far as I know, no player who has developed dedicated ML hardware (as opposed to using GPU's) uses the same hardware for both inference and training.

modeless · on Nov 10, 2019

Edge TPUs support 8 bit integer math only. Training is floating point. That's not a small change. https://coral.withgoogle.com/docs/edgetpu/models-intro/#mode...

breatheoften · on Nov 10, 2019

I think the size of the networks they are training might already be good motivation for developing custom hardware for training.

I would expect their training hardware to be something specifically aimed at optimizing memory bandwidth to support distributing training of their “shared” hydra feature. It’s interesting that the shared hydra feature extractor is able to converge as they keep adding more and more output predictions under a training regime of interleaving asynchronous updates to the model from different predictor networks ...

Seems to me the formula they are pursuing with custom hardware might be to support a strategy of 1. keep adding more predictions based on same feature 2. Increase the span of time represented by batches used to train the recurrent networks

Both pursuits seem very data efficient in terms of the amount of training data they could conceivably collect per unit time of observation ...

Custom hardware with a problem specific memory architecture aimed at efficiently supporting training with very large rnn time slices could be developed that’s more about “make it possible to train this proposed model at all” rather than “make it faster/cheaper to train existing common model architectures”. When custom hardware is required to make it possible to train the model they want, the validity of the hardware development cost bet might end up being more about the effectiveness of the model they think they want than it is about maintaining general purpose performance parity vs any off the shelf hardware options ...

jeffshek · on Nov 10, 2019

At Tesla's scale and priorities, they'd probably be less keen on using external cloud providers. Using TPUs at their scale would certainly require Google's AI consultants to supervise which isn't ideal for Tesla.

Not agreeing or disagreeing with their decisions, but if you have the resources, you can certainly design a custom chip that performs a specific type of task very well that beats other competitors. Nvidia's GPUs are have to be reasonably good at training across different NNs. You could have a chip that's exceptional good at training one/two specific types of tasks.

For most companies, this would be a bad idea. However, Tesla knows how to produce hardware.

choppaface · on Nov 10, 2019

> At Tesla's scale and priorities, they'd probably be less keen on using external cloud providers.

Not sure if it’s still the case today, but previously Tesla’s training was done on-prem and with their own in-house Tensorflow clone.

And yes, if you get TPUs from GCloud, you are likely to be working with their engineers to get things working. Those engineers tend not to have much business conflict of interest, though. They want to help you because your problems are likely more interesting than what they’d otherwise be assigned.

chronic739i · on Nov 10, 2019

> At Tesla's scale and priorities, they'd probably be less keen on using external cloud providers.

By hardware-hours, Tesla is hardly one of the top companies training deep networks. Planet Labs (satellite imaging), Netflix, Pornhub, to name a few.

What's the info they'd leak to the Google consultants? How much data or TPUs they're using? This is practically public information.

Udik · on Nov 10, 2019

What does Netflix do with NNs?

SloopJon · on Nov 10, 2019

Their own page says, "our recommendation algorithms ... learning characteristics that make content successful ... optimize the production of original movies and TV shows ... optimize video and audio encoding, adaptive bitrate selection, and our in-house Content Delivery Network ... and advertising".

https://research.netflix.com/research-area/machine-learning

Here's another article on the subject:

https://becominghuman.ai/how-netflix-uses-ai-and-machine-lea...

Udik · on Nov 10, 2019

At a first glance, they don't seem to be problems in which ML learning can have a huge impact. Netflix in the US has a catalogue of about 4000 movies plus a few hundred tv series. It's tiny- compare with 153707 items on a niche recommendation website (criticker.com).

Characteristics that make content successful.. here again, it's mostly decent quality content plus marketing. I doubt the scripts are reviewed by NNs.

I have no idea about the network and delivery stuff, but I guess a well designed network can take care of most of it.

My strong impression is that, similar to other well known cases (Uber, WeWork) Netflix is a mostly traditional company (a media company) that very strongly wants to be seen as a tech company.

dzdt · on Nov 10, 2019

Netflix put a lot of work into recommendation back in the dvd delivery days when their catalog was absolutely massive. Now in the streaming space the catalog they license is much smaller so recommendation is less important; basically they just advertise the popular stuff for your demographic.

Its a bit odd that legally one can rent out physical disks, but there is no corresponding way to legally get permission to rent out streaming content without negotiating with the rightsholder. But thats how it is...

mlyle · on Nov 10, 2019

> Its a bit odd that legally one can rent out physical disks, but there is no corresponding way to legally get permission to rent out streaming content without negotiating with the rightsholder. But thats how it is...

It's hard to think of a good fair regime to do this under.

At least with a physical disk, there's a maximum reasonable rate that you can turn the disk around between users and you need to have enough copies for whatever the lifecycle peak demand is.

We do have an audio compulsory licensing system for things that are purely songs. But with video works, there's not a clear boundary for "how big" the work is-- how do you treat 30 hour anime series vs a 5 minute Pixar short? How do you treat continuing medical education videos vs. fluff amateur made content? Etc.

millettjon · on Nov 11, 2019

Do those really take more than 70,000 gpu hours to train a model?

m0zg · on Nov 10, 2019

Nothing crazy about it. TPU-like stuff is ~10x the energy efficiency of GPUs and several times the speed. When you're spending megawatt-hours and days to train a single model, it adds up in both real and opportunity costs.

Also, Google TPU TOS prohibits the use of TPUs for stuff that competes with Google (and I'm assuming with other companies under Alphabet umbrella), at Google's sole determination. Not that it would be a good idea to upload Tesla's proprietary data into Google Cloud even if it did not. Cloud, after all, is just somebody else's computer.

modeless · on Nov 10, 2019

> Google TPU TOS prohibits the use of TPUs for stuff that competes with Google

I don't think this is true. If you're talking about https://news.ycombinator.com/item?id=19855099, it doesn't apply to TPU hardware, as is explained in the comments there.

> TPU-like stuff is ~10x the energy efficiency of GPUs

10x is probably overstating it when talking about newer GPUs because they have ML hardware in them now. Also, that still doesn't make it a good idea to build your own chips because there will soon be many third party options to choose from. Doing your own chips is a bet that you will out execute dozens of companies ranging from startups to industry giants. Simply taking your pick of the best commercially available options is likely to be a better choice in the near future.

m0zg · on Nov 10, 2019

Yep that's the clause. The clause itself is not that problematic for Tesla. What's problematic is that it can be changed over time, and it'd be foolish to single-source something as important as deep learning compute without the option to go elsewhere. Not to mention the rather extravagant Cloud pricing. So Tesla is taking a page out of Steve Jobs' playbook and it will control its own core tech. That's smart, especially considering that they already have bits and pieces of the IP that they'll need.

modeless · on Nov 10, 2019

That clause doesn't apply at all though. It's not even for the same product.

As for single-source, the models are written in PyTorch, not TPU machine code, and they're pretty standard models anyway (e.g. Resnet-50). They can easily transfer to other hardware if necessary. There's not a ton of lock-in there. It doesn't justify the massive costs of ASIC development just to avoid this nonexistent lock-in and imaginary TOS clause.

oneshot908 · on Nov 10, 2019

Not remotely true. TPUs and GPUs are neck in neck with each other right now w/r to overall efficiency, check out https://mlperf.org/press#mlperf-training-v0.6-results for more details.

GPU advantage: more refined ecosystem and you can buy them for $<1000 or get laptops with them built in, and if NVDA has sweat more software engineering blood and tears than GOOG into your model's functions, it will run better on them

TPU advantage: Colab has a free tier that lets you play with them at no charge and if GOOG has sweat more software engineering blood and tears into your model's functions, it will run better on them.

All IMO of course. And deep down it can get more complicated than that, but I salute GOOG for being the first company to ship competitive AI HW, doubly so at scale.

sdan · on Nov 10, 2019

Stuff like their TPU and Waymo's Honeycomb Laserbear (something along those lines... their lidar naming system is pretty long) shows that Google is making good products for a limited reach of people.

TPU? Seems like it has a lot of potential, but not for people directly competing with them.

Waymo's Laserbear lidar? Seems like it has a lot of potential, but not for AV companies directly competing with them.

Google's playing this game pretty fiercely... which given their size is pretty bad/daunting.

sgt101 · on Nov 10, 2019

>Nothing crazy about it. TPU-like stuff is ~10x the energy efficiency of GPUs and several times the speed. When you're spending megawatt-hours and days to train a single model, it adds up in both real and opportunity costs.

could you share the stats on this? Google told me to use a K-80 for training.

vintermann · on Nov 10, 2019

> Also, Google TPU TOS prohibits the use of TPUs for stuff that competes with Google

Can it be true? Then again, Apple's app store behaviour seems to suggest such demands are tolerated. Antitrust is really asleep in the US, isn't it.

roystonvassey · on Nov 10, 2019

Also, the software part of it (NNs and their algorithms) have been so widely researched and published that competitive advantages here are harder to come by than in hardware RD.

Also, vendor lock-in is a huge challenge in the cloud space. I don’t think Tesla would be comfortable with the fact that all their training data sits on a potential competitor’s datacenter.

jacquesm · on Nov 10, 2019

A car is a hardware device as well, and an electric car does not have the kind of power budget that allows you to throw oodles of standard pieces at it without paying a severe penalty in range.

codeulike · on Nov 10, 2019

Compared to the energy needed to move the car, everything else is pretty irrelevant. Power hungry features like Heating/AC only makes a few % difference to range.

mrep · on Nov 10, 2019

From autonomy day hacker news comment: "Pegasus consumes about 500Watts, compared to under 100 Watts for Tesla's FSD computer. Elon in particular emphasized the performance per watt (as it's always possible to cram more chips to increase performance if you ignore cost and power consumption). The comparison made in the video: 500Watts for an hour consumes about 2-3 miles of range. In a city in slow traffic, going 12mph, that's a significant range reduction. So you might have a 10% improvement in range for the Tesla ASIC in low speed conditions" [0].

Also, Tesla is/was planning on running these chips even while not in autonomous mode in order to spot new scenarios which it can then record what the car sees and what the human does in order to collect a lot more unique road scenarios to train their models with.

[0] Credit Robotbeat: https://news.ycombinator.com/item?id=19729743

maxerickson · on Nov 10, 2019

Good ol' velocity-cubed.

Increasing their compute per watt allows them to get more compute per watt, so they will pursue it lots, low speed energy savings are a way of making the range extension seem like a big deal when it isn't (maintaining highway speeds requires tens of kilowatts).

mlyle · on Nov 10, 2019

It was a big deal for Tesla, though, because they had to fit into a tiny power budget. HW2.0 wasn't enough for what they wanted to do, and to retrofit existing cars they had to consume a similar amount of power.

Power efficiency does make a range difference and is worth seeking, but Tesla is exaggerating this, IMO, to conceal one of the ways that the churn-heavy cycle for FSD has imposed organizational costs (an entire chip-level hardware program, in this case).

maxerickson · on Nov 11, 2019

I would expect them to seek compute/watt basically until they stopped making progress.

An actual working system might slow them down, but even then, better hardware is one of the ways to make it better.

mlyle · on Nov 11, 2019

Designing ICs is a very expensive treadmill. If you can buy fast enough hardware to do your job, it's really hard to justify spending a lot for something that may be nominally a little better on some metric.

There's commodity computing of comparable throughput that other automakers can incorporate without too much cost, but it doesn't fit into the existing TM3 vehicles because it has a larger footprint in power, space, and cooling than HW2.

Here, Tesla spends a bunch of money to develop HW3 and overcome this problem, but the advantage in a new car design between HW3's efficiency and commodity hardware is limited.

codeulike · on Nov 10, 2019

Wow, I stand corrected

barney54 · on Nov 10, 2019

The compute for autonomous driving could be 10% of the total energy budget of the car. https://cleantechnica.com/2017/10/13/autonomous-cars-shorter...

dna_polymerase · on Nov 10, 2019

So would you trust the company that owns one of your biggest competitors in this field (Waymo) with the stuff that decides over success: data?

sdan · on Nov 10, 2019

Did they say they were building their own training hardware? I thought it was just their inference hardware (the boards on the teslas)?

modeless · on Nov 10, 2019

Yes https://youtu.be/oBklltKXtDE?t=572

sdan · on Nov 10, 2019

Oops, didn't catch that. For some reason I thought it was a orchestration engine like Stripe's Railyard.

joenathanone · on Nov 10, 2019

>Also, I'm really curious whether the custom hardware in the cars is benefiting them at all yet. Every feature they've released so far works fine on the previous generation hardware with 1/10 the compute power.

The latest OTA finally brings a hardware v3 only feature, traffic cone visualization, and traffic cone automatic lane change.

londons_explore · on Nov 10, 2019

I would guess that while the new hardware has the same features, the accuracy might be lower on the old GPU's because they are forced to use smaller networks or to run them at lower frame rates.

navigatesol · on Nov 10, 2019

Traffic cones today,1 million self driving robo taxis in 6 months, right?

thebruce87m · on Nov 10, 2019

Stay ahead? Are they actually ahead?

sdan · on Nov 10, 2019

Really liked this talk.

Looks like they are really nicely orchestrating workloads and training on numerous nets asynchronously.

As a person in the AV industry I think Tesla's ability to control the entire stack is great for Tesla... maybe not for everyone who can't afford/doesn't have a Tesla.

natch · on Nov 10, 2019

>maybe not for everyone who can't afford/doesn't have a Tesla.

Affordability is not as much of an issue as some make it out to be. Cost-wise it's like owning a Camry or an Accord, if you go for the lower end models. If you mean not everyone can afford a new car, then sure I agree with you.

Edit: if you think I'm wrong about this, please explain or ask me to clarify anything?

sdan · on Nov 10, 2019

As a small anecdote, my parents couldn't afford/didn't want to spent over $30k for a car. Surely we could've gotten a Tesla for $5k+ more, but given the relatively new infrastructure with electric charging stations (and the fact that none are available in the apartment I live in) my parents didn't find all the new cool features appealing and instead got a regular Toyota Sienna that has nothing fancy, just enough to take the family around.

Similarly, the infrastructure around electric charging stations I believe hasn't fully matured yet and as a result many people who've already owned a car, I believe will stick with gas cars since there's no huge incentive to change, unless it becomes easier to charge (faster, more convenient).

Do note that I don't have a drivers license. I never intend on getting one (I believe in what I do in the AV industry). I'm just guessing on the habits of people, not that I have any real experience in buying a gas/electric car.

Also note I didn't think you were wrong, not sure why the downvotes.

natch · on Nov 10, 2019

Thanks for the reply. Living in an apartment is not a big issue for me... we have two, live in an apartment with no charger.

That's interesting that you relate something you do in the AV industry to not ever getting a car... what's that about?

I do think that it's possible in the future the majority of people will never need to own a car.

sdan · on Nov 10, 2019

I think for a good majority of people especially in America will go along the lines of "If it ain't broke, don't fix it" for gas cars.

Regardless, I think Ghost Locomotion and Comma.ai have a lot of potential for what they're doing now. I think they'll coincide with fully driverless cars like Cruise, Waymo, or Aurora. Regardless of if they're electric or not (I think electric cars will be more heavily adapted if we use Cars as a Service)

Also, yes, I don't ever intend of getting a license. This problem is really fascinating and I'm excited to play a little part of it. By not having a license, I can spend more time having the perspective of a person in 20-30 years when drivers licenses will become less common and utilize it in my work.

natch · on Nov 11, 2019

>I can spend more time having the perspective of a person in...

This is awesome. I've always found that only by really experiencing the future (or some part of the leading edge that is soon going to become the future for most people) you gain much more understanding of it, ahead of others, and can apply it in your life planning. It would be worth living this way even if only temporarily just to get that as you say perspective... Not ready to give up driving forever, heh, but it I might have to try a short sample of that non-driving lifestyle sometime.

londons_explore · on Nov 10, 2019

I'm still amazed that Teslas team isn't using a map... I know maps get outdated and are sometimes wrong, but having inaccurate knowledge of what's around the corner is far far more helpful than not having any clue whats around the corner.

The smart solution would be to consider a map a probabilistic thing, which neural networks are really good at handling.

anonu · on Nov 10, 2019

I'm still amazed Tesla has decided not to use lidar and instead just stick with cheap cameras. Better sensors are there, they're available, they're cheap and they can probably "see" better than plain old cameras... it doesn't make too much sense not to use them IMHO. But then again, I am not coding NNs for Tesla...

JoeSmithson · on Nov 10, 2019

I think they advertised self-driving as a future feature of the Model 3, so I think they are limited to whatever Model 3's currently have.

londons_explore · on Nov 10, 2019

While LIDAR's are certainly 'better' from a technological standpoint than not having anything, from a business standpoint it's less clear.

LIDAR's are cheap, but not cheap enough yet to not seriously affect the bottom line if you put them into every car. It also will kill the resale price of cars without it, which in turn hurts the companies image and stock price.

nielsole · on Nov 10, 2019

If you are first to market with a level 4 autonomous taxi, unit economics will likely be great regardless of whether you put LIDAR or cameras in.

londons_explore · on Nov 11, 2019

Yes, but the current reality is they need to pay for the hardware, and still make a profit, without any guarantee of that autonomous taxi market.

It all depends what you see are the chances of the autonomous taxi market developing within the lifetime of these cars.

tim333 · on Nov 10, 2019

Musk doesn't like it:

>“Lidar is a fool’s errand,” Elon Musk said. “Anyone relying on lidar is doomed. Doomed! [They are] expensive sensors that are unnecessary. It’s like having a whole bunch of expensive appendices. Like, one appendix is bad, well now you have a whole bunch of them, it’s ridiculous, you’ll see.” https://techcrunch.com/2019/04/22/anyone-relying-on-lidar-is...

dna_polymerase · on Nov 10, 2019

I recommend watching George Hotz's take on this: https://www.youtube.com/watch?v=IxuU5L2MEII

leesec · on Nov 10, 2019

You're underestimating the cost issue. Obviously if it was literally 0% extra cost they would have a value benefit. The problem is making a great and cheap and profitable electric vehicle.

option · on Nov 10, 2019

lidars are used to generate humongous amounts of labeled training data for depth perception networks so that you don’t have to use them during inference

mattrp · on Nov 10, 2019

Kaparthy has a good presentation outlining why they aren’t focused on lidar.. it’s pretty compelling logic.

Shoop · on Nov 10, 2019

Do you have a link handy? I couldn't find it with a quick google.

mattrp · on Nov 11, 2019

It’s addressed in the autonomy day presentation if I’m not mistaken.

cycrutchfield · on Nov 10, 2019

It’s not like they decided not to use it just because. Here is an interesting breakdown:

https://cleantechnica.com/2016/07/29/tesla-google-disagree-l...

mattrp · on Nov 10, 2019

I could be wrong but I recall Lyft is using hyper accurate maps.

Gravityloss · on Nov 10, 2019

Interesting that they don't have a full 3D world model. I'm certainly not a machine learning expert. I'm still amazed the route from image recognition to a 2D map of "what's drivable" to autonomous driving is so direct. One would expect to hit a ceiling really soon with that approach.

To me it seems we're still in really early days.

spyder · on Nov 11, 2019

They're doing 3D for the road path, and even predicting it beyond corners:

https://youtu.be/Ucp0TTmvqOE?t=8137

And later in the video they show 3D reconstruction from cameras and saying they use it in the car.

Watching the full talk is recommended if you have the time (talk starts around 1:10:00 in the video)

Gravityloss · on Nov 11, 2019

Thanks, seems my original comment is wrong then!

eanzenberg · on Nov 10, 2019

One thing I didn't quite understand is how training sub-graphs in parallel works. If you are editing a sub-graph of a monolith type model, aren't you affecting other graphs that have dependencies on the one you're editing? If these are independent graphs, then what's a "sub-graph" even mean?

punnerud · on Nov 10, 2019

In PyTorch you have full control on the graph and weight, everything feels like Python. So feeding some of the learning between “sub-graph” is easy. Not sure if this is possible on Tensorflow/Keras?

He describes the sub-graph training in the context that they they have all the predictors in one big model, and with control of the network can feedforward and train sub-graph (read sub-parts) of the model.

eanzenberg · on Nov 10, 2019

This is possible in keras, just drive new models that are functions of a monolith model and train independently. I still don’t understand the point though. If you train a “subgraph”, the other tasks dependent on the part of the graph will have to get retrained anyways, since those edits will affect the other tasks.

antpls · on Nov 10, 2019

First time I read about "sub-network" is in this AI Google blog post : https://ai.googleblog.com/2019/09/recursive-sketches-for-mod...

They talk about the concept of "modular network". The article itself links to the Wikipedia page : https://en.wikipedia.org/wiki/Modular_neural_network

Not sure it's exactly the same idea, but it looks similar.

paraschopra · on Nov 10, 2019

I think their architecture might be their secret sauce. But I'm curious about this too.

fyp · on Nov 10, 2019

For those who want to learn more, I would start with Mask-RCNN where you have a very similar architecture: one shared backbone with multiple heads that can be retrained for various tasks (bounding boxes, masks, keypoints, etc): https://youtu.be/g7z4mkfRjI4?t=628

kegan · on Nov 10, 2019

Anyone knows why Andrej's team chooses PyTorch (as oppose to say TensorFlow?)

jeffshek · on Nov 10, 2019

Some potential reasons:

- TensorFlow is great at deployment, but not the easiest to code. PyTorch isn't frequently used in production until recently.

- If you have the resources for great AI engineers and researchers, your team will be good enough to build and deploy both frameworks.

- Preference toward the easier framework your tech leads prefer.

- Lots of new academic research is coming in PyTorch

- TensorFlow is undergoing a massive change from 1.1x to 2.0; if you choose TensorFlow, write on 1.1x just to then refactor to TF 2.0? Or write on TF 2.0 now and deal with all new edge cases? Or write in PyTorch (easier) but handle the more difficult deployment process.

- ML code quickly rots. Bad PyTorch code is just bad Python code. Bad TensorFlow code can be a nightmare to debug.

- PyTorch's eager execution makes coding NNs much easier to prototype and build.

ankeshanand · on Nov 10, 2019

https://twitter.com/karpathy/status/868178954032513024

tigershark · on Nov 10, 2019

Not at expert, but as far as I understood PyTorch is much better to build new models, while with tensorflow it’s easier to assemble the predefined blocks. Source: somewhere in the motivations on why Fast.Ai courses switched to PyTorch for the second edition.

m0zg · on Nov 10, 2019

Because PyTorch literally triples researcher productivity. Imagine a deep learning framework which you can actually debug when something goes wrong and which you don't have to fight every step of the way to do even simple things. That's PyTorch.

laichzeit0 · on Nov 10, 2019

The good news for me is that the upper bound for fully autonomous self-driving cars is no more than 50 years away. What a time to be alive. If it happens before then, that will be an absolute bonus.

diveanon · on Nov 11, 2019

Andrej Karpathy is such a treasure.

He is an excellent presenter who really has a passion for teaching.

Im not really involved with the industry, so I cant really speak to how he holds up to other experts. However he is by far the most digestable resource I have found for learning about NN and science behind them.

If you are just discovering him now, google his name and just start reading. His work is truly binge worthy in the most meaningful way.

SloopJon · on Nov 10, 2019

The description of SmartSummon about halfway through the talk is interesting. One of the views looks like SLAM using a particle filter, but Andrej seems to say that it's done entirely within a neural net.

alexnewman · on Nov 10, 2019

Jeeze and I can't get my pytorch to stop leaking memory. I couldn't imagine trying to drive a car with it

Joky · on Nov 10, 2019

Pytorch is used to train models on servers/cloud, not to drive the car later. The trained model is converted to something native to the embedded environment of the car.

jfoster · on Nov 10, 2019

I wonder if the environment the car discovers includes elevation. Would be necessary for handling many carparks.

ngcc_hk · on Nov 10, 2019

adamnemecek · on Nov 10, 2019

The trick for level 5 is learning the mapping between the lidar point cloud and the video stream. It’s the best of both worlds.

ben_w · on Nov 10, 2019

That might be a trick. It’s not the trick human brains use. It might be equivalent to the way that when we say we want to “look at” a thing, we often also want to touch it.

https://kitsunesoftware.wordpress.com/2017/07/27/why-do-peop...

pgodzin · on Nov 10, 2019

Tesla doesn't have have a lidar point cloud at all

adamnemecek · on Nov 10, 2019

That’s the point. You train with lidar, deploy with cameras.

ojn · on Nov 10, 2019

That falls apart as soon as the map and the real world deviates and you need to drive based on what’s in front of you.

Lidar helps you spot obstructions, but won’t tell you what they are and won’t help you figure out what to do to avoid them.

Want an example? Cruise’s first real world demo got stuck behind a simple taco truck in downtown SF.

antpls · on Nov 10, 2019

The parent meant : gather training data with Lidar and Camera, then build a model with that data to learn to reconstruct a 3D space only from Camera data, and then embed that model in the cars.

Tesla is already using a model to rebuild a 3D space from Camera data only, the parent suggests to improve the quality of the transformation with high quality 3D representations from Lidar.

It's deep learning all the way down.

Geee · on Nov 10, 2019

2D to 3D transform is simple trigonometry (using stereo / motion) and should be possible to learn without lidar. I think this is already a solved problem. One option though is to add lidars in random Teslas (e.g. 1/1000) to help with the labeling / learning.

adamnemecek · on Nov 10, 2019

The field is called photogrammetry. It's not...precise. Point clouds have a much richer structure.

sheeshkebab · on Nov 10, 2019

One could also train a car driving model driving in grand theft auto... but are all these tricks really what level 5 is about? I doubt

mkagenius · on Nov 10, 2019

Oh he's no longer with OpenAI? Sam Altman must be worried about this..

timdorr · on Nov 10, 2019

He's been at Tesla for almost 2 and a half years now: https://techcrunch.com/2017/06/20/tesla-hires-deep-learning-...

spectramax · on Nov 10, 2019

Without meaning offense to Sam, I thought he was an investor / YC head. What credentials does he have to be at OpenAI?

jayparth · on Nov 10, 2019

Perhaps founding the initiative....

visarga · on Nov 10, 2019

Teaching the CS231n course at Stanford, for one.

_cs2017_ · on Nov 10, 2019

Andrej Karpathy taught cs231n. The question is about Sam Altman who didn't teach anything related to NL. He's on the board of OpenAI because he was one of its founding investors.

mkagenius · on Nov 10, 2019

He did? I only knew of 183B which is "How to start a startup"

spectramax · on Nov 10, 2019

Thanks, that makes sense. So now the question is - how did Sam become a YC/Angel investor from someone who taught a class at Stanford? I think we need an interview with Sam.

new_realist · on Nov 10, 2019

Elon Musk poached him, and for that was kicked off the OpenAI board.

Pazzaz · on Nov 10, 2019

I've never heard that that was the reason for Elon leaving the OpenAI board. The official announcement said "As Tesla continues to become more focused on AI, this will eliminate a potential future conflict for Elon." Do you have a source?

slim · on Nov 10, 2019

that statement is the politically correct version of "poached an employee"

cyrux004 · on Nov 10, 2019

for over 2 years now

new_realist · on Nov 10, 2019

Meanwhile Waymo is way ahead.

_dp9d · on Nov 10, 2019

Do you belittle everyone that gets second place in the Olympics because the winner is "way ahead"?

Your comment just reeks of anger and hostility.

It seems like you'd rather Tesla didn't try at all, and instead we all just give up and go back to the status quo.

Fricken · on Nov 10, 2019

Meanwhile General Motors is way ahead.

adrr · on Nov 10, 2019

Elon belittles lidar saying it is doomed and will never work yet Waymo and Cruise will probably be operating self driving taxi fleets in California next year. Tesla deserves getting dumped on for those comments because they are no where near self driving.

xiphias2 · on Nov 10, 2019

Andrej Karpathy just started working on Tesla's software 2 years ago, before what Chris Lattner did was a mess (he wanted to just have 1 task that learns magically everything), Andrej had to start everything from scratch.

Waymo had a 20 year advantage, but Google lost many key people there in the meantime as Larry Page didn't want to launch partial self driving.

I think both approaches are great and I wouldn't want to choose between the 2, just be a happy user of the end result of the competition.

kayoone · on Nov 10, 2019

> before what Chris Lattner did was a mess (he wanted to just have 1 task that learns magically everything), Andrej had to start everything from scratch

any source for that?

xiphias2 · on Nov 10, 2019

Sorry, I can't find the source, I'm reading Elektrek all the time and often there are people in the comments section with knowledge about what happens inside Tesla.

jacquesm · on Nov 10, 2019

> Larry Page didn't want to launch partial self driving

That's the responsible thing to do.

xiphias2 · on Nov 10, 2019

If you look at self driving automation as a black box, sure.

But at the same time people understand that on a highway Tesla Autopilot is safe enough to be used on a long boring road, and dricers generally feel less tired (and can focus more on the harder parts of the road).

wstrange · on Nov 11, 2019

There have been notable Tesla fatalities on "long boring roads". The software has a long way to go..

heavenlyblue · on Nov 10, 2019

It’s just that Google is a much larger target for negative PR response than any of the other companies.

navigatesol · on Nov 10, 2019

Ah yes,the story of Tesla: everything yesterday was crap,but everything tomorrow will be awesome.

xiphias2 · on Nov 10, 2019

Did you really follow how drivers reacted to the changes before/after Karpathy came there? People were extremely dissatisfied with the first versions of Autopilot 2 (as it was worse than Autopilot 1 when it appeared), but it improved a lot, and now it's definitely better.

navigatesol · on Nov 10, 2019

I recall everyone saying Tesla "FSD" was awesome and industry leading, I see them saying that today, and I expect they'll be telling their grandchildren the same while being driven around in a Waymo.

In all seriousness, I like Karpathy and his work. Seems like a good guy. Not sure why you'd want to work to enrich a guy like Elon Musk though.

xiphias2 · on Nov 12, 2019

> Not sure why you'd want to work to enrich a guy like Elon Musk though

If you look at the alternative (Dieselgate), which is killing millions of people every year thanks to air pollution (sadly I'm highly affected by it, trying to not go near any polluted area is what my life is about at an age of 37), I'm happy for Elon Musk and I hope Tesla's mission will succeed as soon as it is possible.

deboflo · on Nov 10, 2019

I live in San Francisco. Every time I see Cruise vehicles, the driver has his hands on the steering wheel.

adrr · on Nov 11, 2019

Cruise vehicles go thousands of miles without human intervention and they have a license to run their cars without a driver like Waymo.

tim333 · on Nov 10, 2019

Waymo has a significantly different approach using lidar rather than just vision. The approaches seem to have different strengths and weaknesses. Waymo is actually able to do full autonomy but in very restricted environments - basically semi deserted suburbs. Tesla's autopilot works in real city rush hour traffic but not reliably enough to be let lose on it's own. It remains to be seen which will win or if it will be some other solution.

dsego · on Nov 10, 2019

But can waymo drive straight towards a concrete barrier on the highway?

https://youtu.be/fKyUqZDYwrU

m0zg · on Nov 10, 2019

Except, well, Waymo doesn't actually build cars, and has no plans to do so.

pciexpgpu · on Nov 10, 2019

Just listening to this talk scares me. The amount of errors - even in a seemingly normal, sunny day - is mind boggling to think people trust this crap.

How can we rely on the output of eight cameras? This is not a kid's science project.

It's all fancy neural networks until someone dies. Pretty callous and Silicon valley-mindset for such an important and critical function of the car.

Will never buy a Tesla after having seen this.

panarky · on Nov 10, 2019

> mind boggling to think people trust this crap

It's also mind boggling to think we currently trust organic tissue to do this crap, some of which is bathed in psychoactive chemicals.

And yet we do, and as a result, horrendous catastrophes occur every minute of every day.

> It's all fancy neural networks until someone dies

No, that can't be the standard, not when people are dying right now in the current regime.

Unless the new regime kills and maims at a higher rate than the current regime, there is no reason for fear.

thebruce87m · on Nov 10, 2019

No, it needs to be a lower rate since it will kill at random. Today’s rate includes drunk drivers, people on their phone and other “unsafe” drivers. If you are an attentive driver your chance of death would actually go up if the overall death rate was the same.

slim · on Nov 10, 2019

if you are an attentive driver you can get killed anytime by these other drivers