Hacker News new | past | comments | ask | show | jobs | submit login
Jim Keller moves to AI chip startup (reuters.com)
281 points by vnorilo on Jan 6, 2021 | hide | past | favorite | 89 comments



"Intel has just published a news release on its website stating that Jim Keller has resigned from the company, effective immediately, due to personal reasons.

"Intel’s press release today states that Jim Keller is leaving the position on June 11th ( 2020 ) due to personal reasons. However, he will remain with the company as a consultant for six months in order to assist with the transition." [1]

Exactly six months later he took a new job. Some may want to look back at their comment on the subject. [2] [3]

Still waiting for the Story between Jim and Gerard.[4]

[1] https://www.anandtech.com/show/15846/jim-keller-resigns-from...

[2] https://news.ycombinator.com/item?id=23496083

[3] https://news.ycombinator.com/item?id=23493046

[4] https://news.ycombinator.com/item?id=23497336



This requires a post of its own. This is a must watch for anyone interested in cpu architecture. The clarity with with he talks about some of the complex problems in cpu design is brilliant. The interviewer does a decent job to make it more palatable for a larger audience.


I wish the interviewer was more technical. He doesn't do interviews very often.

Lex has a PhD in machine learning but doesn't seem to be familiar with branch prediction, apparently.


If I was a curious noob, how would I tackle this?

My provisional answer is that I'd host a conversation between experts.

For example, I'm now very curious about Apple Silicon M1 wrt to Java's Memory Model and Project Loom (structured concurrency). But I don't know nearly enough to even ask smart questions, much less understand the answers.

So my dream future perfect interview would have Ron Pressler, Doug Lea, and one or two people really smart about M1 (the only name I know is Dan Luu) sit around and chat it up.

I'd ask them open ended questions, like "What's new and different?" "What happens next?" "What are you excited about?"

The conversations would likely happen over multiple sessions and different mediums. Because the experts would share and ask each other stuff which would prompt followups.

As podcast host, I'd try to be catalyst, try to remove myself from the convo as much as possible. I can't think of any examples, role models. While I'm a huge fan of Ezra Klein and Adam Gordon Bell (Corecursive), I'm not confident I could lean in like they do.

One tactic both Lex and AGB do really well is prompt their guests to explicitly define jargon. I suspect that some of the perceptions of Lex's ignorance are him trying to make topics more accessible. eg Working close to the metal with AI, I'm quite confident Lex knows about branch prediction.


I think you are right about Lex and also thanks for the compliment!

Even if you know about branch prediction, then asking the guest to explain it, maybe even pretending not to know about it, is a great way to have concepts introduced and make things more approachable.

Lex wouldn't be as popular as he was if he didn't have a good sense for the level of knowledge his ideal listener has about the subject.


I don’t see a problem with that. Lex’s podcast is aimed at a general technical audience of different backgrounds, so he often asks questions on behalf of listeners who are technical but might not be experts in the field. To be fair, machine learning and CPU design are vastly different fields with little overlap.

I mean, I have a PhD and had no idea what branch prediction was until I listened to that podcast.


The part at 01:24:00 shocked me... Couple of books per week for 50 years... Damn man, that is a whole another scale.

I loved also how he explained why books are good, some takes 20 years of his experience and writes it in 200 pages...


This was one of the most impressive things I've ever seen. The sheer mind puzzle exploration of "well..if we had a CPU the size of the sun, here's why it still wouldn't work". Guy seems unbelievably brilliant.


His comment about technology being a long, unbroken chain of abstraction layers changed the way I look at a lot of things in life.

Absolutely fascinating interview.


He also had the best answer to Lex's meaning of life question (last few minutes of the interview). Really made me stop and re-listen. It's very rare for someone on a podcast to think about every word they say.


I'm not really a fan of Lex Fridman's style but this was a great watch


Thank you. This is definitely one of the best interviews I have read. The precision of the statements and the usage of analogies to explain the topics is astonishing.


I meant listened to. Sorry.


Is there a transcript of it?


I very much doubt so. You could always the podcast and listen to it while you commute.


Had no idea this man was so prolific, and Lex had interviewed him. I'm sure it'll be a very interesting interview.

Thanks for the link!


Very very impressive interview - thanks !


and his career a history book


I often find it hard to believe that a single person can make much of a difference in such intricate problem domains as chip design but in his case the evidence is overwhelming. Also goes to show what a shit show Intel has become since even he was not able to right that ship. I think in the CPU space it will be all ARM and RISC ten years from now and since Intel never really managed to become a dominant player in any other (relevant) field, they are pretty much done for.


Part of it may also be the situation the company is in, and it' mindset, when Keller is hired.

There's no arguing that Keller is a smart guy, but he doesn't design an entire CPU architecture himself. If you're desperate and say "Okay, we are hiring the smartest guy we can find to build our new CPU, and give everything he needs to make it happen", then perhaps you get the AMD64, Zen or the A4 and A5. If you try to just dump a smart guy into a team as just another engineer, maybe you get nothing, like Intel.

Perhaps AMD, who already knew him, just gave Keller everything he need to build a team that can deliver on a new architecture, even when he's no longer there. Same with Apple. Intel on the other hand may have been unwilling to grant Keller the same level of autonomy and control. Then it also makes sense that he would leave Intel, for personal reasons, those being: "I can't work here, they won't let me do my job".


One of the tendencies of shrinking companies is exacerbated executive infighting.

If the company is growing, there are new X-of-Y positions to move up to.

If the company is stable or shrinking, people start watching out for their own careers with knives out.

AMD possibly avoided this because of size & realization of what needed to be done. Intel's too big & old: I would be very surprised if they weren't much more internally resistant to that sort of change.

And one can only deal with your colleagues throwing up brick wall after brick wall on every bit of minutiae for so long, as least if you're talented enough to have other options.


This is an absolutely on-point observation about company dynamics that a large number of people in the tech industry have never had to experience. It's why growth is so critical.


Past a certain size threshold the organizational and social dynamics of human relationships seems to be the predominant factor in getting anything done. i.e. There’s a limit to how big of a headwind we can cope with.


and describes the difference between AWS and MS circa 2008-2021


It is hard to believe. However out of my own experience: Execution, decision making, vision and goals, people, alignment, setbacks. A very complex mixture and the more people there are, the more important a driver is.

PS: Nice nickname. Read the book from Th. Mann - over a period of 7 years I think.


I like those words. Execution, decision making, etc. But can you explain a bit more? What about those concepts, eg. setbacks, are you saying?


If you have someone in charge who knows how to run things and help people do their jobs better while having a vision of the future you have a much better chance of success.


Goethe’s was always my favourite rendition (ignoring pt2). Just enough romance and woe


I learned an important lesson at my first job out of school: high-quality tech people are more common than the person who can effectively lead them. It was a tough lesson for me because I had spent my entire academic career striving to become a top-notch engineer. Don't confuse a leader with a manager. They might be the same, they might not.


>Also goes to show what a shit show Intel has become since even he was not able to right that ship.

I think we are already starting to see fruit of his work. Intel doesn't need Jim Keller for CPU uArch design. Intel has had their uArch roadmap ready, and they were the best in the Industry if it wasn't for the 10nm delay. They also have work in the pipeline all being held back by their process node.

Jim described it in one of his interview ( Sorry I spent 10 min but couldn't find the source, so I may have remembered it wrong ) about not having process node held back your chip design, where he has experience in doing so in AMD and Apple. Being flexible enough to back port your design should anything happen as Plan B. Where previously Intel was just keep waiting for the process guys to fix it. That in itself is a huge workflow changes. It is hard to imagine the amount of work required to push this through especially with all the internal politics at Intel.

And Intel is at least looking at TSMC / alternative paths for some of their product lineup now ( Gaming Focused Large- Die Size GPU ) . Whether that is decided or not is unclear. But at least we have Rocket Lake launching soon which is sort of a half baked Willow Cove ( Used in Tiger Lake ) ported back to 14nm on Desktop. And we have Sapphire Rapid as well as other product roadmap hinting at multiple node ( Shown in Investor meetings notes ). That is at least showing Intel has changed their Internal design to be flexible enough in case of another 10nm like fiasco. And I think Jim Keller has some credit in this transition.

That is of course, having flexible design still doesn't fix their problem if TSMC is 2 years ahead of Intel in leading edge node design, volume and cost. And as I have repeatedly stated, Intel's problem is not design, but their business Model. And It would not surprise me if TSMC have shipped more 5nm wafers last year ( 2020 ) than Intel's entire 10nm production history since 2017.

Just let that sink in for a bit.


Honest question, are we sure he didn't make a difference? Dude usually shows up, does the work with the team, leaves.. only year, two or plus we see the results.


There's a very big delay between finalizing a design and actually etching said design in silicon, and bringing a design to silicon of course involves a lot of non-trivial work. This is kind of why Intel had the whole tick-tock thing, as whilst the current design is being put into silicon, the design team can work on the next iteration of the design.Also why AMD could be very confident about their next Zen iteration being a lot faster when they were releasing Zen 2.


I only know the name of one prominent individual within the CPU space that is not a CEO or major scientist; and that's Jim Keller.

Why? Because I never ever see any articles talking about any other interesting employees. Every single time it's Jim Keller.

I'm sure he's good, his interview with Lex Fridman shows that he's knowledgeable and creative, but there's no way he's as exclusive a major force as the media portrays him.


Just going by the interview posted by another commenter, it seems to me a big reason is that he seems to enjoy the "people challenge" about as much as the technical challenge.


One argument is perhaps that bad management can be stifling and it can be hard to achieve good outcomes under bad management. The semiconductor space is perhaps difficult because you have very long lead times and the cost of each iteration is high: if you have different parts of the organisation pulling in different directions, you're unlikely to have a good outcome, and iterating to unify that direction is very difficult.


Keller was involved with the GPU initiative at Intel, I believe. He certainly wasn't there to fix the process node deadlock.


No, but being at an org that is losing because of the process node deadlock and turning in on itself is not fun.


Novel development in a complex problem space isn't something one can just throw manpower at and expect progress. I'm sure that a certain amount of his fame is due to being a famous architect, as name recognition always compounds. There's no way he's more influential than the rest of the industry combined (shoulders of giants and whatnot), but I would be hard pressed to find press recognition of other hardware engineers. Still, he is undoubtedly exceptional.

For example, one could round up as many scientists as they could find in 1900, but there is no number that would guarantee the progress made in theoretical physics by someone like Einstein alone.


Henry Poincaré ? Max Planck? Marie Curie? Pauline and Heisenberg were a bit later I think.


Worse than that I wonder if the trouble at Intel (e.g. inability to develop post 14nm chips plus one insane instruction set extension after another -- I wonder if the point of AMX is to have a big die area that is mostly unused that doesn't need to be cooled) isn't something that people like him are running from but rather something they are going to bring with then wherever they wind up.


>one insane instruction set extension after another

You're probably going to see a whole lot more of this sort of thing given the limits to process scaling. Keeping things simple and backwardly compatible made sense when you could just throw more transistors at the problem. Now you're seeing more and more specialized circuitry that software people are just going to have to deal with.


I am not against a new instruction. At first blush the new JavaScript instruction in arm might seem like a boondoggle but it is a simple arithmetic operation.

Compare that to the non-scalable SIMD instructions that mean you have to rewrite your code to take advantage of them and resultingly people don't bother to use them at all.

AMX allocates a huge die area to GEMM functionality that gets used a lot less in real numerics than you'd gather from reading a linear algebra textbook.

There are other approaches to the problems the industry faces other than 'fill up the die with registers that will never be used', nvidia and apple are going that way and that is why they are succeeding and Intel is failing.


As I understand it, Apple have a direct equivalent to Intel's AMX as an undocumented instruction set on their new Apple Silicon laptop processors, it just took a while for people to figure it out because the whole thing was hidden behind an acceleration library that is implemented very differently on Intel-based Macs.


> find it hard to believe that a single person can make much of a difference ... goes to show what a shit show Intel has become since even he was not able to right that ship.

These statements almost seem contradictory. What if instead of "not being able to right that ship", it is instead an example to the contrary?


> Also goes to show what a shit show Intel has become since even he was not able to right that ship

Was he exclusively working on "that ship" ... wait, I thought you said "chip".


Intel died with Andy Grove. Every CEO after that has been living off the momentum he created.

If his successors applied his management style, mobile phones would have an Intel processor in them.


I think it is more like being a good trainer kind of thing.

The person can be a key motivator to bring the team forward, or reach decisions they wouldn't otherwise have taken.


That's not quite true.

Intel started out with memory. They never left and are on by far the cutting edge with memory tech. In particular Optane NVM DIMMs are so fast they basically define a new layer in the performance/cache hierarchy. Intel might see a shift in their focus over time away from CPUs to chalcogenide based persistent memory, where it seems they have held the lead for some time now.


Isn't 80+% of Intel's revenue from CPUs with another some percent from mobile chips? So while they may produce memory it's almost irrelevant as far as current revenue goes.


You know you've made it as an engineer when you switching jobs gets an article in reuters.


That's a very academic way of looking at it. Keller doesn't strike me as someone who cares about that.


So is he 10x or how much?


10000x at least. Whole staffed departments at rivals can't bring to market what he did repeatedly.


Nobody does that stuff alone. I think a more accurate description would be that he seems to be a force multiplier for the team he leads, which can have a bigger impact than any singular engineering feat.


That's what Xx means often. Leads can depress teams' output, maintain it, or enhance it, and he presumably enhanced hugely.


He played a large role in Apple’s ARM SoCs and in AMD’s Zen, possibly the largest role of any individual.

Keller can move the entire market if he’s given enough space and resources. 10,000x is closer to him than 10x.


If there are only a handful of people capable of doing this or even just one then 10x or 10000x doesn't make sense because that is a productivity metric. 10000 normal engineers wouldn't be able to do it just like 10 juniors can't do the job of one senior.


IMHO, the correct way to look at it is to compare impact/results.

A 10x engineer might not get 10x done, but their work will be 10x better in a combination of ways: quality, maintainability, speed, portability, extendability, etc. Hopefully the ways in which their work is better fits the priorities of the organization.

That’s the only way you can really call someone an N-xer compared to a productive individual contributor.

Someone like Jim Keller is a big multiplier at a higher level. People today understand the value an executive like Steve Jobs brings, but usually there’s debate on the value before it becomes clear a few years later.


He's a great football coach that's also a great player.


Has anybody on HN worked with him and could say a little bit more about his individual contributions and management style?


You can have a sense of how it would be to work with him in this interview with Lex Fridman[1]. I'm also curious how it would be to work with him since it sounds like he is full of himself[3]. But he I believe he has the right for it since all what he has achieved[3]. The guy also read couple books a week[4] for last decades

[1] https://www.youtube.com/watch?v=Nb2tebYAaOA

[2] https://www.youtube.com/watch?v=Nb2tebYAaOA&t=3973s

[3] https://en.wikipedia.org/wiki/Jim_Keller_(engineer)

[4] https://www.youtube.com/watch?v=Nb2tebYAaOA&t=5040s


I watched that interview half a year ago, and came away with the impression of a curious humble guy who was interested in digging deep below the surface and working on big advances.

I think you meant to refer to [2], and that part of the interview is Lex Fridman interjecting him when he was trying to make a deep point about how to think about things.


It's a bit corny that Reddit and HN go over the top about this guy like he's the centre of the universe, just because he's the only player they know the name of and could point out in a photo.


Folks like to think of themselves as enlightened or even ahead of the curve, but they succumb to personality cult like in Pharaoh's times.


They speak of him, as gamers did on forums back on the day about Carmack and Abrash.


Anandtech story has some more details about the companies current and past chip designs: https://www.anandtech.com/show/16354/jim-keller-becomes-cto-...


Don't know why Reuters was posted over this.


I think it's a timing thing. There was an earlier post with the Anandtech article late yesterday, but because so much of HN is in NA timezones it was likely smothered by the morning.

Love the TechTechPotato breakdown of this BTW :)



> The chips also operate on the assumption that future software will involve programmers giving high-level directions while artificially intelligent computers write much of the nitty gritty code required to implement those human ideas.

I can't wait to stop writing machine code.


It'll turn more like trying to teach someone to do a trick, with all the misunderstandings and frustrations to boot.

Mind you, that same thing is now done to software devs, that is, product manager tries to explain what they want, designers and devs interpret in a certain way.


It's just another layer (or two) of abstraction. Like the jump from machine code to something like Ruby.


Exactly this. Maybe after Neuralink we'll get there. Maybe, because there's still a human on one end who doesn't know what they really want or need.


About 6 months ago my job required that I finally get my hands dirty writing x86 assembly. It's my first real foray into assembly coding.

There are a few aspects of it that I'm really enjoying:

- I can now actually understand the disassembled code that I see during debugging. This includes recognizing some of the assembly patterns that appear because of ABI requirements and/or common programming idioms.

- I'm becoming comfortable with a programming idiom that I've never really used in the past: registers, flags, various kinds of memory addressing.

- It helps my understanding of compilers' lower levels / backends, and the related problems: register allocation, instruction selection, etc.

- It provides a clear path for my first attempt at writing JIT code (using Xbyak[0]).

So as Richard Feynman might have said, it's great fun!

[0] https://github.com/herumi/xbyak


This page has some details of Tenstorrent's current chips [1]. Looks like a manycore simd design focused on tensor ops from a brief look. Apparently they also have some sort or compression scheme to boost memory bandwidth.

1: https://www.tenstorrent.com/technology/


They have a Hot Chips presentation on YouTube with a lot more detail: https://www.youtube.com/watch?v=HLjumOyWj0g

I don't notice anything in particular that stands out vs. the many other AI chips people are making, at first glance. But I'm far from an expert. There are several other technical videos on their YouTube channel as well: https://www.youtube.com/channel/UC7041p6DlAh0r4_Fnlk10pQ


Thanks for these! My assumption is that hiring Keller means they plan to gain advantage by world class execution rather than some crazy architectural leap of faith.


Watching the video I think turning each tensor into packets is quite clever as you get some ability to send them around a network and organise layers/data manipulation/transforming the layers/compression all as part of the stack.

I’m pretty surprised no-one has actually exposed the actor model for parallelising neural networks, it seems it would work quite well and allow you to have a layer per node (or actually many split configurations). Maybe data locality would be an issue with actor based approaches. They seem to be solving this at a lower level but with less knowledge of the actual parallelism in software.


Also Jim was almost certainly the one doing the interviewing, so there’s likely something interesting in there that he felt was worth his time.


Their marketing material states: "Facilitating machines to go beyond pattern recognition and into cause-and-effect learning".

I wonder what they are referring to. Are they accelerating what SHAP's GradientExplainer [1] does? (namely: crafting inputs at a specific layer, propagating forward to see the influence on class prediction, and sort of backpropagating to pixels) Or is it about something more related to Judea Pearl's work on causality?

[1] https://github.com/slundberg/shap#deep-learning-example-with...


PiedPiper for machine learning?



What was your prediction based on?


Good question. I also predicted in 2015, that Donald Trump could become president, since he was giving public speeches about that before running. So yeah, I was right. Also, chances were 50-50 since there are only 2 parties. So I predicted and he become president.

I think the same goes with this guy, Jim Keller quit Intel and could join another company sooner or later, and that will not be a former one, likely a startup. We are Genius.


A lot of people in this thread are asking questions about Jim Keller single-handed contributions.

Could it be possible he's "famous for being famous"?

E.g. Some might say Jeff Dean (Google) fits this mode a bit, whereas Sanjay Ghemawat (Google) has contributed arguably just as much if not more - but is mentioned radically less than Jeff.


You know, he does have a more than a passing resemblance to Jim Raynor, of Starcraft...


Silicon Ronin is at it again. This time to revolutionize the startup world and shake up the hold of big players.


Cool. How do I invest in them?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: