Okay the bike example is cute and impressive, but the human interaction seems to...

usaar333 · on Sept 25, 2023

That bike example seemed a mix of underwhelming (for being the demo video) and even confusing.

1. It's not smart enough to recognize from the initial image this is a bolt style seat lock (which a human can).

2. The manual is not shown to the viewer, so I can't infer how the model knows this is a 4mm bolt (or if it is just guessing given that's the most likely one).

3. I don't understand how it can know the toolbox is using metric allen wrenches.

Additionally is this just the same vision model that exists in bing chat?

biot · on Sept 25, 2023

The bike shown in the first image is Specialized Sirrus X. You can make out from the image of the manual that it says "spacer/axle/bolt specifications". Searching for this yields the following Specialized bike manual which is similar: https://www.manualslib.com/manual/1974494/Specialized-Epic-E... -- there are some notable differences, but the Specialized Sirrus X manuals that are online aren't in the same style.

The prior page (8) shows "SEAT COLLAR 4mm HEX" and, based on looking up seat collar in an image search, the part in question matches.

In terms of the toolbox, note that it only identified the location of the Allen wrench set. The advice was just "Within that set, find the 4 mm Allen (Hex) key". Had they replied with "I don't see any sizes in mm", the conversation could've continued with "Your Allen keys might be using SAE sizing. A compatible size will be 5/32, do you see that in your set?"

freetime2 · on Sept 26, 2023

It bugged me that they made no mention of torque. The manual is really clear on that part with a big warning:

> WARNING! Correct tightening force on fasteners (nuts, bolts, screws) on your bicycle is important for your safety. If too little force is applied, the fastener may not hold securely. If too much force is applied, the fastener can strip threads, stretch, deform or break. Either way, incorrect tightening force can result in component failure, which can cause you to lose control and fall. Where indicated, ensure that each bolt is torqued to specification. The following is a summary of torque specifications in this manual...

The seat collar also probably has the max torque printed on it.

When they asked if they had the right tool, I would have preferred to see an answer along the lines of "ideally you should be using a torque wrench. You can use the wrench you have currently, but be careful not to over tighten."

somecallitblues · on Sept 26, 2023

The seat collar also probably has the max torque printed on it. <<<< Nope. There's no need for a torque wrench on that one.

usaar333 · on Sept 25, 2023

Ah good find. yah, I tried bing and it is able to read a photo of that manual page and understand that the seat collar takes a 4mm hex wrench (though hallucinated and told me the torque was 5 Nm, unlike the correct 6.2, suggesting table reading is imperfect).

Toolbox: I just found it too strong to claim you have the right tool, when it really doesn't know that. :)

In the end it does feel like the image reader is just bolted onto an LLM. Basically, just doing object recognition and dumping features into the LLM prompt.

cooper_ganglia · on Sept 25, 2023

Like a basic CLIP description: Tools, yellow toolbox, DEWALT, Allen wrenches, instruction manual. And then just using those keywords in the prompt. Yes, you’re right, it does feel like that.

og_kalu · on Sept 25, 2023

A few of these wouldn't be possible with something like that. Look at the last picture, the graph analysis.

https://imgur.com/a/iOYTmt0

gisely · on Sept 25, 2023

Yep. This example basically convinced me that they were unable to figure out anything actually useful to do with the model's new capabilities. Which makes me wonder how capable the new model in fact is.

usaar333 · on Sept 25, 2023

Yah, pretty sure it is the same feature that's been in Bing Chat for 2 months now. Which feels really like there's only one pass of feature extraction from the image, preventing any detailed analysis beyond a course "what do you see". (Follow-up questions of things it likely didn't parse are highly hallucinated).

This is why they can't extract the seat post information directly from the bike when the user asks. There's no "going back and looking at the image".

Edit: nope, it's a better image analyzer than Bing

og_kalu · on Sept 25, 2023

>Yah, pretty sure it is the same feature that's been in Bing Chat for 2 months now.

It's not. Feel free to try these queries:

https://twitter.com/ComicSociety/status/1698694653845848544?... (comic book page in particular, from a be my eyes user)

Or these https://imgur.com/a/iOYTmt0 (graph analysis in particular, last example) and see Bing fail them.

mcbutterbunz · on Sept 25, 2023

Right. It appeared that the response to the first image and question would have been the same if the image wasn't provided.

I wasn't impressed with the demo but we'll see what real world results get.

psbp · on Sept 25, 2023

Google demoed this a few months ago

https://www.deepmind.com/blog/rt-2-new-model-translates-visi...

m3kw9 · on Sept 25, 2023

They are really good at keeping demos as demos

dragonwriter · on Sept 25, 2023

I don’t know, a lot of Google demos and papers introduce techniques that are productized fairly soon, just usually not by Google.

Workaccount2 · on Sept 25, 2023

The implementation that manifests itself as an extremely creepy, downright concerning level of dubious moral transgressions isn't nearly as publicly glamorous as their tech demos.

michelb · on Sept 25, 2023

It’s just a hiring article.

rmbyrro · on Sept 25, 2023

Hiring to produce more demos, to hire more to produce even more demos...

michelb · on Sept 26, 2023

As long as they choose not to work elsewhere, it seems effective for Google.

TeMPOraL · on Sept 25, 2023

Yes. As long as the hirees do some actual work in between producing demos, this even makes sense as a hiring approach.

hereonout2 · on Sept 25, 2023

I feel they could have used a more convincing example to be honest. Yeah it's cool it recognises so much but how useful is the demo in reality?

You have someone with a tool box and a manual (seriously who has a manual for their bike), asking the most basic question on how to lower a seatpost. My 5 year old kid knows how to do that.

Surely there's a better way to demonstrate the ground breaking impacts of ai on humanity than this. I dunno, something like how do I tie my shoelace.

amelius · on Sept 25, 2023

> With a few tweaks this is a general purpose solver for robotics planning.

Yeah, but with an enormous ecological footprint.

Also, not suitable for small lightweight robots like drones.

TOMDM · on Sept 25, 2023

Even on something the size of a car chatgpt won't be running locally, the car and drone are equally capable of hitting openai's API in a well connected environment.

What needs to happen with the response is a different matter though.

dist-epoch · on Sept 25, 2023

What's the ecological footprint of a human doing the same job? Especially when you factor in 18+ years of preparing.

bamboozled · on Sept 26, 2023

Humans don't spend 18+ years preparing how to lower a seat post or drive a truck or even do pretty much most jobs. No one is solely training for 18 years to do anything.

Most of those 18 years are having a fucking great time (being young is freakin awesome) and living a great life is never a waste or a negative ecological footprint.

Society artificially slows education down so it takes 18 years to finish school because parents need to be off at work, so 18 years of baby sitting is preferred. By 18, kids are at the age where they will no longer be told what to do so it's off to the next waste of time, college, then 30 years of staring at a blinking box...or whatever.

When I was 12, I decided I wanted to drive a car, I'd never driven a car in my life, but I took my parents car and drove it around wherever I liked with absolutely no issue or prior instruction. I did this for years.

The youth are very capable, we just don't want them to be too capable...

RivieraKid · on Sept 25, 2023

This is what I'm most excited about. There's been a minor breakthrough recently: https://pressroom.toyota.com/toyota-research-institute-unvei...

og_kalu · on Sept 25, 2023

There are already a few research demos.

For driving - https://wayve.ai/thinking/lingo-natural-language-autonomous-...