Hacker News new | past | comments | ask | show | jobs | submit login
AI pioneer Fei-Fei Li has a vision for computer vision (ieee.org)
75 points by samizdis 10 days ago | hide | past | favorite | 55 comments





> Li: My lame example is if I have a flat tire on the highway, what do I do? Right now, I open a “how to change a tire” video. But if I could put on glasses and see what’s going on with my car and then be guided through that process, that would be cool. But that’s a lame example. You can think about cooking, you can think about sculpting—fun things.

Kind of wild that the new Google AI demo app basically already does this. It's hard to even imagine what's possible... we need more future-dreaming sci-fi.


I've been extolling the educational (and entertainment) possibilities of lightweight portable A.R. ever since the first prototypes of Google Glass.

- Imagine being able to look down at a breadboard, and have it overlay explanations, diagrams, realtime current flows, etc.

- Anyone in the UNIVERSE who has ever had to learn card manipulation, juggling from a book or even a video knows how annoying it is to have to mentally reverse the picture in your head while trying to learn it. Imagine putting a person in front of you who is juggling (mills mess, box pattern, etc) and slowing it down to exactly see how to learn it.

- Imagine being able to practice drawing by putting a model of any human in the center of your room, and walk around it as you sketch on actual canvas.

- Imagine being able to head to the nearest open soccer field, and throw down a bunch of ninjas throwing ninja stars at you, and you have to physically duck and weave while running full tilt down the field.

These took me a grand total of 10 minutes to think of. Anyone who doesn't understand the potential of AR simply lacks vision. (no pun intended)


Expanding on that, what would happen if we had immediate access to the knowledge required to perform any job?

I like this prediction by Alvin Toffler:

> People of post-industrial society change their profession and their workplace often. People have to change professions because professions quickly become outdated. People of post-industrial society thus have many careers in a lifetime. The knowledge of an engineer becomes outdated in ten years. People look more and more for temporary jobs.

https://en.wikipedia.org/wiki/Future_Shock

Instead of "losing" jobs by AI (ok, some tasks of those will be replaced), there will be re-mixed jobs: dev-designers? medic-programmers? more and more specialized branches of every permutation.


> Anyone who doesn't understand the potential of AR simply lacks vision.

I see your point.


It needs to be much more mainstream (going down in cost and being more portable will help) but there’s already AR piano teaching and what not.

Weird to contextualize this as "put on glasses".

Like, I have a cellphone with a high resolution camera in my pocket - it would be more then enough to be able to have it image recognize the parts I should be looking for and highlight which bolt I undo next (and ideally, hint and show an arrow if I'm looking at the wrong thing).

One of my primary uses of YouTube videos over service manuals is just to confirm that when it says "undo the secondary philangy bolts" that I'm clear on exactly what that is.

That said, I don't even think that's the key use-case: the key use-case would be letting those YouTubers include the image tags to the right part of the video, since half the benefit is seeing someone who's done it show how they do it, and how they move and place the parts (which usually avoids some "buy Manufacturer Service Tool 3 and 19 to undo intermediate part 81).


Yeah, it's strange to assume that a new display will suddenly give us a lot of really clever software. If someone can build a thing to contextually guide us in a wide range of everyday tasks, that's a billion dollar app just for smartphones. Sure it might be more convenient to have two hands free, but is that really the blocker?

I have a motor cyclists wrist mount for my phone for exactly this use case (well, so I can use my boroscope camera with two hands and see the video stream while I do).

But adding some level of gaze tracking or hands free voice while I use it would also work for say, stopping or starting video playback.


While I certainly appreciate the practicality of these technologies I do feel uneasy when thinking about a future where more or less everyone owns AR devices, as they denormalize interacting with others to do these activities. I was taught how to cook first from my mother and I later learned more with a flatmate. Similarly, when I had a flat tire my friends who were more into cars helped me out. Both have helped me develop and strengthen good personal relationships which I still have to this day.

I know smartphones are extremely valuable as I have learned a lot from the internet, and they are great tools when you have no other options, but they are already starting to become the default way of doing anything, and that way is solitary. AR devices would only exacerbate this trend by just being even more convenient to use. I love the technology but I wouldn’t want to live in a future where asking to your AR glasses before of your friends is the norm.

I’m not sure what is the best way to move forward without sounding like a luddite.


Perhaps you would no longer ask your friends about changing tires, but AR will open up new opportunities for interacting with them. Maybe virtually visiting a different country together, or studying new language together with a (virtual) native speaker.

Also, flat tire on a unfamiliar road, especially a remote location, can be very stressful. More likely your friends wouldn’t be there.


are we limited though to waiting for glass though? can't we already do these things via a lens like Google lens? I get the ergonomics of the glasses is key, but we don't have the software to do these things elegantly on a lens yet do we? But there's nothing stopping us working on them even without the hardware.

It really sucks to do this stuff with only one hand. Not to mention, the mental load of focusing your phone.

Bye Bye fun.

They should really make it mandatory to know and demonstrate you know how to change a wheel, check the oil dip-stick and all basic car maintenance. That was the best part of owning the car, knowing you were taking care of the car.

A part of cooking is experimenting. If we are now needing VR goggles because you can't follow a recipe something has gone terribly wrong.

I'd wager that these devices will take away skill rather than gain. I know I'm not the VR generation, and very skeptically annoyed for that all this is coming from Google, Microsoft and co. Some things should be real and practical skills everyone should know. The more we rely on such technology the more failure that will come from it.


I find these kinds of responses interesting. It's not like you have to use these glasses, so this isn't really about you. It's about what you think other people should do, but shoulds are kinda funny. Why should someone know how to change a wheel? Why should people experiment with cooking? You're telling people what they should enjoy, but that's a bit weird isn't it?

> Why should someone know how to change a wheel?

Because if your car tyre blows out on the motorway causing a an accident because you didn't check the pressure. You should of known that by not checking your wheel, even just kicking it to gauge the pressure.

> Why should people experiment with cooking?

I'm not saying should. If people like simple food, great. If people want dead food like McD, fine.

I find the best part of cooking is experimenting. Making the same cheese toastie over and over again gets boring. The same with pizza.

> You're telling people what they should enjoy

No I'm not. I'm not telling you shouldn't be on HN right now. I'm saying there are definitely should's in life you should know without needing some sort of technological device. Like how to look after your car so you don't cause a catastrophe.

Should an airline not look after their aeroplanes? Boeing didn't and look what happened.


What does checking the tyre pressure have to do with changing a wheel? Most people do know how to check their tyre pressure, but have never had to change a wheel. My car doesn't even have a spare tyre. If my tyre catastrophically fails in a way the emergency puncture repair kit can't fix, I'm just calling the recovery service. It might set me back a few hours, but I haven't even had a puncture in 15 years of driving so I'll take the hit.

Moreover, what does it matter if someone knows how to check their tyre pressure, provided they know that it needs to be checked to keep the car safe, and where to take it to get it checked? Most garages here will do those sort of basic checks for free.


> should of known

You should _have_ looked into grammar and spelling before you started posting on the internet.


I disagree with the parent poster (as I've replied elsewhere) but this is a boring and facile response. Do better.

thank you, your feedback is very important to me! I will now do some soul searching.

These kinds of responses seem interesting but they really aren't.

It's just a modern variant of an old man yelling at progress and being a crotchety contrarian for contrarians sake.

All too often the basis of these kinds of arguments are a just-so fallacy where the person using them is oblivious to the fact that generations prior to them would look at their skill-set and the skills that they promote as the best as worse than the ones they had.

The people making these arguments just want to tell everyone that they're better than the kids today and everything is going to pot.


I'm 35. Not that old, still young-ish. When you reach 35, you run out of hope, shiny when your still navigating a life that was promised to me that never came. I see where the future is heading and by god is it going to suck.

The younger generations will always think it's better, but that's okay they lack life experience, only time will tell.

Mental illness is on the rise. More people are more depressed than ever. People are poor, technology is closed off. And you telling me it's going to get better because some $corp releases some AI goggles? Can you afford to compete? How am I suppose to reach out to the world of capitalism when corporate runs the show?

I've been working twenty years in IT. All before CloudOps, DevOps, SRE's were a thing. I'm a Unix engineer. When I started virtualization wasn't even a thing.

My apartment only has 2Mb ADSL, I can't even play VRChat if I desired to. That was revolutionary for it's time. I have a Valve Index gathering dust.

You have to embrace, you have no choice otherwise your thrown out. I'm not sorry for being cynical when all I've seen the past fifteen years of promises given to me which have been broken every time. The cycle continues and we all lap the koolaid with our eyes closed. Even myself hopes it will get better, we want better but no, no one wants to turn the page of the plastic age.

It takes a lot to open your eyes and see how reality truly is but that's okay. We don't need to look at it because we now have hi-tech computer monitors in-front of our eyes feeding us soon to be sponsored and controlled information to tell us that everything is okay, what we want. Everything will be better with these glasses, I was promised it too and they screwed on that. I'm more unhappy with an latest iPhone than I have ever been, I can't even listen to music without requiring to own Bluetooth headphones.

Hey, look. I commented exactly like your comment said. Kudos. This is less than what's real and we are doing nothing about it other than letting the corps have more of their cake so they can put up further control in the walled gardens we all live in. We can't even settle on world peace. The divide is bigger than ever and we have technology to thank for that.

Sorry for speaking my mind, your not allowed to do that nowadays because we all should be smiling and nodding lapping whatever is given to us. Some younger generations do, and are they in for a shock with the after taste. I feel sorry for them.

I'm sure this will rub some the wrong way. Feel free to change my view because while technology has gotten better allowing us to achieve futuristic things, we can launch rockets to Mars. A surgeon can preform microscopic operations from their house, fly a helicopter in space. I acknowledge it But what does that really get us? At what cost? It's not all that bad, right? You really think you're going to get it easy in your life time because of? Nah.

Here's a GPT-prompt for you: Restore harmony to the world.


How are VR goggles different from referencing a textbook, Googling a tutorial or watching a YouTube video?

Perhaps it’s the more instantaneous nature of it. Maybe you are making the point that it is more advantageous to memorize common life skills instead of needing a technical crutch. But how often do we need to change a tire? And when we do, wouldn’t it be better to have the fastest reference guide?


The arguments that you’re making can be made for literally any useful piece of technology ever.

Eye glasses? They’ll make you too reliant on sight!

Computer? The best part about writing is maintaining your typewriter!

If you need a car because you can’t ride a horse something has gone terribly wrong.

“I know I’m not the VR generation”. Sounds good. Then you don’t need to complain into the either about nothing.


Perhaps you might consider technology like this to help assist folks in learning how to do complete basic car maintenance. After all, the first time you changed a tire you probably needed some kind of assistance. Similarly knowing how to follow a recipe is one thing, but knowing what a soup needs to adjust its color or taste is a skill which you can help with.

No accounting for taste.

My favorite part of owning a car is when the automatic transmission shifts for me, and when the ABS pumps the brakes for me.


AI will help us develop technology to explore the universe and solve climate change, and access virtually unlimited energy. Everyone will have superpowers. Sounds awesome and very fun.

If we are now needing VR goggles because we can't make irl friends, something has gone terribly wrong.

I mean sure, and the fun part of LEGO is making new creations. But before you get to that point, building several sets by instruction helps build associations of how to get certain results.

For someone new to cooking, the recipe says "slice thinly" and they have no idea if they are doing a good job or not. They could simply ask an AI if they are doing good and it could provide feedback.


>And the whole part of cooking is experimenting.

Ummm no. Not unless you are already very knowledgeable and experienced in cooking.

I cook nearly every meal I eat BTW.


I cook every meal I eat too. Tonight is Egg,Feta,Red Pepper, pie with a rosemary base.

> Not unless you are already very knowledgeable and experienced in cooking.

I disagree. The only knowledge is knowing what ingredient to add what to effect the flavor. Sweetness, sourness, punch, hotness.

Which yes, AI could provide you with, however any search engine can or even a decent cooking book. It's not new.

Maybe AI is the new cooking book, but you sure not going to get a decent flavor compared from a well written recipe from a book.


It is not fun being broke, exhausted and having little time to prepare a meal before you drop from exhaustion then do it again the next day. Hoping might get something edible is not fun. You just want something that isnt boiled shoe leather and is inexpensive. If you were not born into a life where people taught you to cook I can assure you learning to cook decent meals on your own is not fun in any way.

This "Fun experimental cooking" is a very,very privileged thing.


yeah most people just want to make food that is healthy and tastes good. Perfectly happy to follow a recipe like a LEGO kit.

Same, and I hate "experimenting" cooking. I memorize simple recipes and do them a lot.

I don't wanna experiment cooking, I want to have eaten.


> That was the best part of owning the car

> And the whole part of cooking is experimenting.

Both seem out of touch


You took pride in your car. If it was your car, you saved up for and not one your parents bought you you had a connection. You knew you were looking after it. I've never had fancy cars, they were almost in the state of rust buckets. People used to knock their doors in to it because "it's old, I mustn't care".

No, I saved money to buy that car. It's a feeling you just can't experience without looking after the car practically. The feeling of changing the oil and hearing the different it made.

The principle of saving and owning things too is gone.


This is your experience, but I'm not sure you can generalise it to most people's experience. I like cars - admire them, even - but my car was first and foremost a tool, and my relationship with it was as such. I did the work required to keep it running, but I didn't want to spend more time than that on it. Ditto my 3D printer.

There are a lot of things that I'm willing to give the personal touch to, but there are many things where I just don't care and just want it to work, and you can't know ahead of time what that is for everyone.

Given that, I'm more than comfortable with giving everyone the tools to Not Give A Shit:tm: for any given subject - if they want to experiment, they will, but that burden shouldn't be pushed onto everyone for every subject. After all, time spent on things you don't care about is time you can't use on the things you do care about.


yeah the best part of cooking is eating

Recent and related:

A stubborn computer scientist accidentally launched the deep learning boom - https://news.ycombinator.com/item?id=42106623 - Nov 2024 (39 comments)

The deep learning boom caught almost everyone by surprise - https://news.ycombinator.com/item?id=42057139 - Nov 2024 (188 comments)


I think the key feature of great AR is that we don't have it yet, so there are no limits on what capabilities you can imagine it could have.

If you instead think of each dreamed about great AR application as a smartphone app, an uncomfortable realism creeps in about how apps work and that someone has to build and integrate them

"But if I could put on glasses and see what’s going on with my car and then be guided through that process, that would be cool. "

It would. And if we built that smartphone app, it would be really useful today. But doing that, for the general case of "what am I looking at, and how do I fix it" is AGI-hard


It seems to me, her example is a way of saying, write better documentation/manuals. Better recipes, better car tire change.

I found out about Fei Fei Li when Computer History Museum hosted her for an interview on her life's work (1hr 16m) https://youtu.be/JgQ1FJ_wow8

I might be rude for this comment, but can anyone explain what did Fei-Fei Li accomplish in AI to be considered a pioneer?

I read her autobiography and I still do not understand. The only thing she did was create the ImageNet dataset, by paying Amazon Mechanical Turk. Am I missing something? I don’t understand how is she in the same breathe as Cunn, Goodfellow, Hinton, even Karpathy?


She has been exceptionally good at creating a narrative IMO. I agree that her actual accomplishments are less clear. At Stanford, a person in her position gets added as last author to all her students’ work so her citation count is very skewed due to that. I’ve not heard from anyone in the field (past students, CV researchers) about her capabilities, and I’ve heard loads about other PIs like Chris Re, Percy Liang, etc.

Also, I hate to say this but when I was at Stanford there was a distinct sense of promoting women in AI and she was asked to speak in or co teach courses/lectures for what seemed to be that reason alone. For example the course on “AI for human thriving” and such.


I think there is value in including women in this way even if the objective scientific output is not the same as her peers.

Data work is traditionally both incredibly valuable and something no one wants to actually do. Further, ImageNet has probably been cited a _lot_ in other important research.

I agree that it feels a _tad_ underwhelming but that's what these people do - try to strike big with some research and then spend their lives convincing others of the value of that research. If you're lucky, you might even do this a few times as she appears to be trying to do.


Disagree. Stanford researchers have been making datasets for years and years. SQUAD is another Stanford dataset. Everyone knows that publishing datasets gets you citations. But that gig is now sorta over because the word is out.

Sure maybe that's true now. Less true when ImageNet was created though. And in any case, what is this argument that this is a "citation-hack"? Like - yeah if you found the dataset useful during your own research you should cite it... ImageNet did in fact provide value for many years. All the original work for guided diffusion trained on ImageNet, just as a for instance. Of course now we have superior, larger datasets like LAION and whatever OpenAI uses internally. But w.r.t. the times, it was valuable.

ImageNet dataset is the main thing AFAIK. But even so I find Dr. Li's contribution big enough. For a context, datasets for computer vision at her time were mostly small, so nn was rarely considered a good method for CV. Not until AlexNet won the challenge, and the world changes after that. I remember many people initially scoffed at ImageNet, arguing that the dataset was flawed and that a “bad” method like NN (AlexNet) could only win because of those flaws. Simply saying “paying” is an understatement because we also need to account for the academic politics of her time. A little fun fact, even if most research papers nowadays try to propose new dataset, if we take imagenet and pretrain the backbone, we usually end up with a very strong baseline.

Btw, not sure why you think Karpathy has a bigger impact than Fei-Fei Li. I can't think what he is doing that is actually changing the playing field.


While the folks you mentioned advanced deep learning techniques, Fei-Fei Li transformed computer vision by creating ImageNet, which fuelled progress in the computer vision field. Beyond that, she’s been a champion of ethical, human-centered AI and has worked to make AI more accessible.

Hard to believe that ImageNet fueled advancements in that field. Dataset creation is a very widely known “citation hack” because it forces others to cite your work even though the dataset is often used for thrift reasons over any genuine value.

I think it's telling that someone who "only" orchestrated one of the most used pre-training and benchmark datasets in computer vision is seen as less accomplished than those who developed algorithms that require those datasets to work.

Yes, dataset creation and curation is less glamorous, but it's important work. I've worked with a few modelers who could stand to learn that first hand.


I think the AI eye will be the trend of the future

The whole interview feels weird like its a giant PR/advertisement justification for her $250 million raise

This comes as peculiar, the 3d world generation demo that was on HN a short while ago was quite lackluster—many commented how underwhelming the end product was despite the insane valuation and world names working on the product.

I'm getting Theranos vibes all over again. I certainly hope this isn't another grift by getting our guards down as it was with Elizabeth Holmes.


> The whole interview feels weird like its a giant PR/advertisement justification for her $250 million raise

That's entirely what this is.

You shouldn't raise at a $1B valuation without a single product in the market.

Google fired a shot across their bow by coming out with an even better world model the very next day after they announced their raise.

"World models" are going to become commoditized anyway, just like image and video models [1]. The product is what matters, and as it stands, companies like World Labs and Decart are just fancy research labs without a revenue stream.

Without a product, this is probably going to be Character.ai all over again.

[1] Stable Diffusion, Flux, Hunyuan, LTX-1, ...


I think she's obviously right.

The reason that cars make stupid mistakes that a human never would, is that cars are trained to classify 2D images (and act accordingly). Humans on the other hand have a 3D model of the world that understands what is and isn't possible, and are trained to map 2D images to that 3D space.

The world is 3D so obviously the latter approach works way better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: