Hacker News new | past | comments | ask | show | jobs | submit login
Gemini "duck" demo was not done in realtime or with voice (twitter.com/parmy)
1039 points by apsec112 on Dec 7, 2023 | hide | past | favorite | 659 comments



I did this at university. It was our first comp sci class ever, we were given raspberry pi's. We had no coding experience or guidance, and were asked to create "something". All we had to work with was information on how to communicate with the pi using putty. Oddly, this assignment didn't require us to submit code, but simply demonstrate it working.

My group (3 of us) bought a moisture sensor to plug into the pi, and had the idea to make a "flood detection system" that would be housed under a bridge, and would send an email to relevant people when the bridge home from work is about to flood.

So for our demonstration, we had a guy in the back of the class with gmail open ready to send an email saying some variation of "flood warning". Our script was literally just printing lines with wait statements in between. Running the script, it prints to the screen "awaiting moisture", and after 3 seconds it will print "moisture detected". In that 3 seconds I dip the sensor into the glass of water. Then the script would wait a few more seconds before printing "sending email to xxx@yyy.com". We then opened up our email, our mate at the back of the room hit send, and an email appeared saying flood warning, and we would get full marks.


Related, I work with industrial control systems. We’d call this “smoke and mirrors”. Sometimes the client would insist on seeing a small portion of a large project working well before it was ready. They’d misunderstand that 90% of the bulk of the work is not visible to the user but they’d want to see a finished state.

We’d set up a dummy HMI and have someone pressing buttons on it for the demo, and someone in the next room manually driving outputs and inputs to make it seem like it was working. Very common.


I, too, work with industrial control systems. If any of us did that sort of thing, we'd be fired instantly -- and rightfully so.


There would be no problem if you told the client that you were faking the back end behavior and if the client's motivation is that they wanted to see the workflow to make sure that there wasn't a misunderstanding on what you were supposed to be implementing, then a mocked backend would be perfectly fine for their purposes.


Absolutely true. The comment, however, very, very strongly implied that the client wasn't made aware that the demo was faked. If the client was made aware, then why would they have someone hiding in the next room to "make it seem like it was working"?


To me it sounds like a useful thing, for communicating your vision and getting early feedback on whether you are building the right thing. Like uh, a live action blueprint. Let's say the client is like "Email???? That's no good. It needs to be a text message". Now you have saved yourself the trouble of implementing email.


> To me it sounds like a useful thing

To me it sounds like lying...

Context matters a ton though. Are you presenting the demo as if the events are being automatically triggered (as in the OP) or are you presenting as this is your plan? Explicitly. If it is implicit, it's deceptive. If you explicitly do not say what parts are faked, it is lying. Of course in a magic show this is totally okay because you're going to the show with the explicit intention to be lied to, but I'm not convinced the same is true for business but I'm sure someone could make a compelling argument.


There have literally been people sued over this because they used it to get funding; the most extreme example, where they kept doing it well beyond demo purposes, is Theranos.

And yet, you’ll still have people here acting like it’s totally fine.

As you said, it’s one thing to demonstrate a prototype, a “this is how we intend for it to work.” It’s a whole other thing to present it as the real deal.


How about in the original iPhone keynote, when Steve says “this cable is just here for it to work on our projector” - if you follow closely, there was no phone demo’d that day with screen on and no cable. I’m sure the main board was still external.


No, the main board wasn't external at that time. The Wallaby were a different design.


Definitely a useful thing to do to validate UI/UX/fitness.

Relying on blackbox components first mocked then later swapped (or kept for testing) with the real implementation is also a thing, especially when dealing with hardware. It's a bit hard to put moisture sensors on CI!

That said...

I would have recommended at least being open about it being a mockup, but even when doing so I've had customers telling me "so why can't you release this tomorrow since it's basically done!? why is there still two months worth of work, you're trying to rip us off!!!"


It definitely is a thing, we literally teach this to CS students as part of the design process in the UX design course.

It’s called prototyping, in this case it would be a hi-fi prototype, and it lets you communicate ideas, test the implementation and try out alternatives before committing to your final product.

Lo-fi prototyping usually precedes it and is done on pen and paper, Figma, or similar very basic approaches.

Google (unsurprisingly) uses it and has instructional videos about it: https://youtu.be/lusOgox4xMI


I've done similar things by manually inserting and updating rows in the database etc to "demo" some process.

Like you say it can be useful as a way to uncover overall process or UX issues before all the internals are coded.

I'm quite open about what I'm doing though, as most of our clients are reasonable.


the phrase I've heard is "dog and pony show" lol


Or “Wizard of Oz” — as a standard HCI practice for understanding human experience


The industry term is “art of the possible”.


I did this as well. I worked on a localized navigation system back when I was in a school, and unfortunately we broke all available GPS receivers in our hands over the course---that particular model of RS-232 GPS modules was really fragile. As a result we couldn't actually demonstrate a live navigation (and it was incomplete anyway). We proceeded to finish the GUI nevertheless, and then pretended that this is what you see during the navigation, but never actually ran the navigation code. It was an extracurricular activity and didn't affect GPA or anything, for the curious, but I remain kinda uneasy about it.


Learning those fraud skills needed later in the so-called "tech" industry.


Is this cheating? It sounds like cheating and reflects quite poorly on you.


> It was our first comp sci class ever, we were given raspberry pi's. We had no coding experience or guidance, and were asked to create "something".

Garbage in, garbage out.


Wow this is such an awful excuse.

Here’s a whole list of projects intended for kids.

https://all3dp.com/2/best-raspberry-pi-projects-for-kids/

It includes building out a whole weather station which includes a humidity sensor as one of the many things it can do.


> Wow this is such an awful excuse.

yes for whomever organized such a curse and didn't give such guidance.

And besides curse asked for project to do something. It did. It printed lines. We can call the email gimmick, the marketeering strategy, making a turd look good.

Don't blame students for failure of whomever designed the curse.


So did they disclose that all the Pi did was printed lines?

The problem with the email isn’t it’s a gimmick etc. it’s that it appears quite clear that the students created the impression that it was the Pi doing it.

Your excuse that it’s difficult for first year college students with no coding experience to do something useful with the Rapberry Pi is disproven by the fact that there exist many extremely useful projects that kids with no coding experience can do, so college students almost certainly should be able to do without needing to resort to gimmicks.

So I don’t understand your complaints about the course. It’s clearly not too hard which is what you’re implying. And if you’re suggesting that the wording for the project wasn’t clear enough then that’s a huge claim to make considering you don’t know what the wording was.

Also, college (at least in the U.S.) was never about playing funny word games with the professor. There’s a level of maturity, reasonableness, and respect that is expected of the students. None of which is indicated in the response here.


> There’s a level of maturity, reasonableness, and respect that is expected of the students.

Given that the general teaching style of colleges isn't unique to the US, and based on my experience throughout my degree at a similar institution, I somehow doubt that statement.

> It’s clearly not too hard which is what you’re implying.

It sounded like the students received literally no guidance, in the way the course is described. These types of assignments usually result in those with previous programming experience showing off their skills, while the actual rookie students are left in the mud. I.e. an assignment that targets the top 20% of the class.

Regardless, to my knowledge I never cheated during my college degree, but I can't hold it against people that do. Criticism such as yours disregards the reality that students face, pressure to graduate with good marks and whatnot. Not cheating will put you at an disadvantage, because your competition is actively doing so and they are already skewing the marks that way. If the intention of the assignment was to identify honest work, it was certainly structured wrong (as a required submission would have eliminated the cheaters).


That's another issue going on: you using your cheating to belittle Google scan which again plays against any ethical ground you might still had


Creativity is a good thing, sad to see trust abused this way.


More like cheaters in, cheaters out.


It's plausible to me that they weren't provided with what they needed precisely because pervasive cheating allowed their predecessor classmates to complete the assignments.


This can depend a lot on the context, which we don't have a lot of.

Looking at this a different way, they gave first-year students, likely with no established pre-requistites, an open-ended project with fixed hardware but no expectation to submit the final project for review. If they wanted to verify the students actually developed a working program, they could have easily asked for the Pi's to be returned along with the source code.

A project like this was likely intended to get the students to think about the "what" and not worry so much about the "how." Faking it entirely may have gone a bit further than intended, but would still meet the goal of getting the students to think about what they could do with this computer (if they knew how)

While university instructors can vastly underestimate student's creativity, they are, generally speaking, not stupid. At the very least, they know if you don't tell students to submit their work, you can often count on them doing as little as possible.


> If they wanted to verify the students actually developed a working program, they could have easily asked for the Pi's to be returned along with the source code.

Wait, is your argument honestly "it's not cheating because they just trusted the students"?

There's a huge difference between demoing something as "this is what we did" vs "we didn't quite get there, but this is what we're envisioning."

Edit: You all are responding very weirdly. The cheating is because you're presenting "something" that is not that thing. Put a dog in a dress and call it a pretty woman and I'll call you a conman.


No, the argument is, "It's not cheating because it wasn't a programming assignment."


> Put a dog in a dress and call it a pretty woman and I'll call you a conman.

Well if you're the TA and you're unwilling/too lazy to call out the conman, I call you an accomplice! Also, since when was the ideal scientific rigour ever build on interpersonal trust?


No, it’s not cheating because the ask was “something” not “some program”


Which is only not cheating if it was presented as not a program and a fellow project mate sending out an email.

In US colleges at least (only because that’s where I have personal experience…not because I believe standards are any higher or lower here), this is cheating if they led their professor to believe that it was indeed the raspberry pi sending out an email rather than someone at the back of the class.


While it’s minimal (and some might consider it below the bar), they did successfully use the pi to read an external moisture sensor and print the results to the screen.

They did use the hardware provided, and did use software to accomplish a goal. If the teacher just wanted to test what problem solving skills the students walked in, I’d say that’s a fair result.


Again, it’s easy. Did they present it as that? Or did they fake stuff to make it appear to the professor that they got the hardware and software to do more than it did?


I'm honestly impressed by how many people aren't getting this. The distinction is so clear too. Maybe needs to be presented in a different way?

For anyone not getting why this is cheating, try this. Pretend you are the teacher. You saw students pull this stunt and say that it was a live demo. Would you feel like you were deceived if later you found out it was not in fact how they presented it? (I'm honestly unsure how you can answer this in the negative)

Or how about you watch an ad of someone taking to a box and the box responding, then you buy the box and it's a pile of rubber bands and paper clips. Would you want your money back?


You're assuming the objective was to develop a functioning program on the Pi.

But what if the Pi was only ever meant as a story-telling device to get the students thinking about the kinds of things computer programs can do?

Sure, some of students would be able tell a story by building a functioning program, but dvsfish simply found another way to tell theirs.


Have you ever been to the Olympics? Because I think you could medal in gymnastics.

>> It was our first comp sci class ever

>> asked to create "something"

It doesn't matter what that "something" was if you are claiming that you made "something" else. The lie is not necessarily the final result, it is the telling of the final result. The context of the whole thread is also about deceit because rtfa.


It certainly reflects poorly on the institution for not requiring anything other than a dog and pony show for grading.


BS. The CEO of one of the largest public companies just did it and he is fine. Board and shareholders all happy.


I gleefully await Matt Levine’s article on AI demos as securities fraud.


Well done, you're halfway to secure a job at Google, no eh ethics/morals needed.


Of course it's cheating


It's very obviously cheating. They didn't do what the assignment asked.


I'd call it cheating too but yeah. I like the pi and sensors though. Sounds like the start of something cool. Wish I could get a product like this to put in my roof to detect leaks. That would be useful.


If the teacher was competent, they would've asked to see the code.


The view up on that high horse must be interesting! Were you the kid who reminded teachers about homework?

Literally all that matters is that they passed.


> Were you the kid who reminded teachers about homework?

Are you trying to bully me or something? Not going to work with me. You've revealed your poor character with that comment.


Kind of? Yes but they still demonstrated as much as was expected from them, which was very little to begin with.


It depends on what the intention of the assignment was. If it was primarily to help the students understand what these devices _could_ be used for, then it's fine. If it was to have them actually do it, well, then the professor should have at least tried to verify that. Given that it's for first-years who have no experience programming, it really could be either.


Well, you literally had a backend


do things that don't scale


more like backhand lol


There’s a story about sales guys at a company (NewTek?) who faked a demo at CES of an Amiga 500 with two monitors showing the “Boing” ball bouncing from one screen to the next. This was insane because the Amiga didn’t have support for multiple monitors in hardware or software so nobody could figure out how they did it. Turns out they had another Amiga hidden behind running the same animation on the second monitor. When they started them at the right offset it looked believable.


My version of this involved a Wii remote: freshmen-level CompSci class, and the group had to build a simple game in Python to be displayed at a showcase among the class. We wrote a space invaders clone. I found a Bluetooth driver that allowed your Wiimote to connect to your Mac as a game controller, so I set up a basic left/right tilt control using a Wiimote for our space invaders clone.

The Wiimote connection was the star of the show by a long shot :P


Are you looking for a job at Google? Don't be evil. They have enough scammers there already, no help needed, including PR at hacker forums


Sounds very familiar... UoM, UK ? :)


This is so crazy. Google invented transformers which is the bases for all these models. How do they keep fumbling like this over and over. Google Docs created in 2006! Microsoft is eating their lunch. Google creates the ability to change VM's in place and makes a fully automated datacenter. Amazon and Microsoft are killing them in the cloud. Google has been working on self driving longer than anyone. Tesla is catching up and will most likely beat them.

The amount of fumbles is monumental.


I was at MS in 2008 September and internally they had a very beautiful and well functioning Office web already (named differently, forgot the name but it wasn't sharepoint if I recall correctly, I think it had to do something with expense reports?) that would put Google Docs to shame today. They just didn't want to cannibalize their own product.


Microsoft demoed Office Web Apps in 2008 L.A PDC it seems: https://www.wired.com/2008/10/pdc-2008-look-out-google-docs-...


Don't forget they also invented XHR (aka fetch) in 2001. https://en.wikipedia.org/wiki/XMLHttpRequest


Kind of, using it became known as "AJAX" and it took many many years (and the addition of promises to JS) before the more sophisticated "Fetch API" became available.

Even then usage of AJAX declined rather slowly as it was so established, and indeed even now it's still used by many websites!


I assume you mean the decline in the use of the term AJAX as it was now just the standard and you don’t need to use that to describe your site or tool as being capable of being highly interactive and dynamic vs just static.

Before the invention of the xmlhttprequest there was so little you could do with JS most dynamic content was some version of shifty tricks with iframes or img tags or anything that could trigger the browser to make a server request to a url that you could generate dynamically.

Fetch was the formalization of the xmlhttprequest (hence the use of xhr as the name of the request type ). Jquery wrapped it really nicely and essentially popularized (they may have invented async js leveraging callbacks and the like), the creation of promises was basically the formalization and standardization of this.

So AJAX itself is in fact used almost in the entire totality of the web, the term has become irrelevant given the absolute domination of the technology.


Funny, I asked Google Bard to guess what the actual product name was from the comment.

"It was probably Office Web Apps. It was a web-based office suite that was introduced in 2008. It included Word Web App, Excel Web App, Powerpoint Web App, and OneNote Web App. It was not SharePoint, but it was based on SharePoint technology."


Does bard browse the web yet? Is it possible it read the parent comment?

Wild that we have to ask these questions.


Don’t forget that McAfee was delivering virus scanning in a browser in 1998 with active x support, TinyMCE was full wysiwyg for content in the browser by 2004, and Google docs was released in 2006 on top of a huge ecosystem of document solutions and even some real-time co-authoring document writing platforms.

2008 is late to the party for a docs competitor! Microsoft got the runaround by Google and after Google launched docs they could have clobbered Microsoft which kind of failed to respond properly in kind, but they didn’t push the platform hard enough to eat the corporate market share, and didn’t follow up with a share point alternative that would appeal to the enterprise, and kind of blew the opportunity imo.

I mean to this day Google docs is free but it still hasn’t unseated Word in the marketplace, but the real killer app that keeps office on top is Excel, which some companies built their entire tooling around.

It’s crazy interesting to look back and realize how many twists there were leading us to where we are today.

Btw it was Office Server or Sharepoint Portal earlier (this is like Frontpage days so like 2001?) and Microsoft called it Tahoe internally. I don’t think it became Sharepoint until Office 365 launched.

The XMLHTTP object launched in 2001 and was part of the dhtml wave. That gave a LOT of the capabilities to browsers that we currently see as browser-based word processing, but there were efforts with proprietary extensions going back from there they just didn’t get broad support or become standards. I saw some crazy stuff at SGI in the late 90s when I was working on their visual workstation series launch.


Google Apps have several other problems as well.

1. Poor Google Drive interface makes managing documents difficult.

2. You cannot just get a first class Google Doc file which you can then share with others over email, etc. Very often you don’t want to just share a link to a document online.

3. Lack of desktop apps.


NetDocs was an effort in 2000/2001 that is sometimes characterized as a web productivity suite. There was an internal battle between the Netdocs and Office groups, and Office won.

https://www.zdnet.com/article/netdocs-microsofts-net-poster-...

https://www.eweek.com/development/netdocs-succumbs-to-xp/


>I was at MS in 2008 September and internally they had a very beautiful and well functioning Office web already

So why did they never release that and went with Office 365 instead?


They did, it was called Office Online with Word, PowerPoint, Excel and SkyDrive (later OneDrive). Everything got moved under the Office 365 umbrella because selling B2B cloud packages (with Sharepoint, Azure AD, Power BI, Teams, Power Automate) was more lucrative than selling B2C subscriptions.


Classic innovator’s dilemma!


Interesting how it seems like MS may have been right this time? They were able to milk Office for years, and despite seeming like it might, Google didn't eat their lunch.


People still email word docs around. It’s nuts. Maybe Exchange is smart enough to intercept them and say “hey use this online one instead”? At least for intra-org..


I think the ability to actually email the docs around is half the value proposition. Having to always refer back to the cloud versions is annoying as hell when you're not actually collaborating, just showing someone a thing.


I email Word docs around. It’s like low-tech version control - I know exactly what was in the doc, and can recover it easily.


You are right - that's the feature!

Plus it's point in time - I'm sending you the document as it is now, and I might start cutting it about or changing it after to send to someone else, but this the version I want to send you.


More like the Acquirer's dilemma.

Google Analytics - acquired 2004, renamed from Urchin Analytics

Google Docs - acquired 2004, renamed from Writely

Youtube - acquired 2005

Android - acquired, 2005 (Samsung have done more to advance the OS than Google themselves)


Google doesn't know how to do anything else.

A product requires commitment, it requires grind. That 10% is the most critical one, and Google persistently refuses to push products across the finish line, just giving up on them and adding to the infamous Google Product Graveyard.

Honestly, what is the point? They could just maintain the core search/ads and not pay billions of dollars for tens of thousands of expensive engineers who have to go through a bullshit interview process and achieve nothing.


If they tried to focus on ads, then they wouldn’t have the talent to support the business. They probably don’t need 17 chat apps - but they can’t start saying no without having other problems.


They only hire some talent to prevent other companies to hire them.

It's a way to strangle the competition. But also not good for the industry in general.


While it is crazy, it's not too surprising. Google has become as notorious for product ineptitude as they have been for technical prowess. Dominating the fundamental research for GenAI but face planting on the resulting consumer products is right in line with the company that built Stadia, GMail/Inbox, and 17 different chat apps.


>Google Docs created in 2006

tech was based on an acquired company, Google just abused their search monopoly to make it more popular(same thing they did with YT). This has been the strategy for every service they've ever made, Google really hasn't launched a decent in-house product since Gmail and even that was grown using their search monopoly as free advertising

>Google Docs originated from Writely, a web-based word processor created by the software company Upstartle and launched in August 2005


> Google really hasn't launched a decent in-house product since Gmail

What about Chrome? And Chromebooks?


Sorry if this was a joke and I didn't spot it. Chrome was based on WebKit which was itself based on KHTML if memory serves. Chromebooks are based on a version of that outside engine running on top of Linux which they also didn't create.


It's not a joke. Just because they didn't write everything from scratch (Chromebooks also are made with hard disks that Google didn't create from directly mining raw materials and performing all intermediate manufacturing stages) doesn't mean they haven't released successful products that they didn't just buy in.


They used the KDE-derived HTML renderer, sure, but they wrote the whole Javascript runtime from scratch, which was what gave it the speed they used as a selling point.


Chrome as a project was still a Google thing even if they used Konqueror's rendering library.

The process model was the novel selling point at the time from my memory [1].

[1] https://www.scottmccloud.com/googlechrome/


The faster javascript runtime was what made it a success IMO.


The leveraging their search monopoly to push it and paying other software to sneak it into installs is what made it a success.


Nah. Safari did plenty of monopoly abuse and sneaking into installs, but still never got big on Windows. At some point the user experience does matter.


Webkit is not a browser.


If you have a Mac you can download the Webkit browser here: https://webkit.org/downloads/

Which uses the WebKit engine and is kindof a showcase for Safari, granted, but it still exists as distinct browser under that name.


Chromebooks are worse version of the netbooks from 2008, which ran an actual desktop OS. Chromebooks are OLPCs for the rich world, designed with vendor lock-in built in. They eventually end up at discount wholesale lots if not landfills because how quickly they go obsolete.


mmm, WebKit?



That’s extremely outdated. There’s very little WebKit code remaining in Chromium today.


Ahh yep you’re right. Thanks


It was a fork from the beginning and it is blink engine for 10 years.


I laughed out loud for this one


You bring up fumbles, but they still have the more products with more than a billion users than any company in the world.

This is what Google has always cared about. Bring application to the billions of users.

People are forgetting Google is the most profitable AI company in the world right now. All of their products use ML and AI.

So who is losing?

The goal of Gemini isn't to build a chatbot like ChatGPT despite Google having Bard.

The goal for Gemini is to integrate it into those 10 products they have with a billion users.


This is like critiquing Disney for putting out garbage and then defending them because dummies keep giving them money regardless of quality. Having standards and expectations of greatness is a good thing and the last thing you want is for mediocrity to become acceptable in society.


> People are forgetting Google is the most profitable AI company in the world right now. All of their products use ML and AI.

> So who is losing?

The people who use their products, which are worse than they’ve been in decades? The people who make the content Google now displays without attribution on search results?


Sure, but I think Google's commanding marketshare is more at risk than it has been in a long time due to their fumbles in the AI space.


> All of their products use ML and AI.

Is that supposed to be a vote of confidence for the current state of Google search?


I demo'd a full browser office suite in 1998 called Office Wherever (o-w.com). It used Java applets to do a lot of the more tricky functions.

Shopped it around VCs. Got laughed out of all the meetings. "Companies storing their documents on the Internet?! You're out of your mind!"


Some things are just too ahead of their times.

Globe dot com was basically Facebook, but the critical mass wasn't there. Nor were the smartphones.


Im curious if the code would be available somewhere? I have to admit I'm curious how it worked!


Netscape had a feature called LiveConnect that allowed interaction between Java and JavaScript. See http://medialab.di.unipi.it/web/doc/JavaScriptGuide/livecon.... for some examples of how it worked. Even though AJAX wasn't available yet in 1998, I think you could have used LiveConnect to achieve the same thing. Java applets had the ability to make HTTP requests to the originating host (the host that served the applet to the browser).


Long lost! Unless Internet Archive has a copy of the site? Never checked! The main domain was officewherever.com.


I agreed until the last bit. Waymo is making continuous progress and is years ahead of everyone else. Tesla is not catching up and won't beat anyone. Tesla plateaued years ago and has no clue how to improve further. Their Partial Self Driving app has never been anywhere near reliable.


I say it again and again: sales, sales. Money is earned in enterprise domains.

And this business is so totally different to Google in every way imaginable.

Senior Managers love customer support, SLAs - Google loves automation. Two worlds collide.


Google customer support says "Won't Fix [Skill Issue]"


Google Workspace works through resellers, they train less people, and those people give the customer support instead. IMO Google's bad reputation comes from their public customer support.


If you want the kind of support that, when there is a fault with the product, can get the fault fixed - then unfortunately Google Workspace's support is also trash.

Good if you want someone else to google the error message for you though.


> IMO Google's bad reputation comes from their public customer support.

Garbage in = garbage out.

If Google cannot deign to assign internal resources and staffing towards providing first-party support for paid products, it's not a good choice over the competition. You're not going to beat the incumbent (Office 365) by skimping on customer service.


They are an ads company. Focus is never on "core" products.


Sundar Pichai should have been out of Google long ago.


> Google Docs created in 2006

Word and Excel have been dominant since the early 1980s. Google has never had a real shot in the space.


You mean 1990s? I don't think Word and Excel even existed until the late 80s, and nobody[0] used them until Windows 3.1.

[0] yes, not literally nobody. I know about the Windows 2.0 Excel or whatever, but the user base compared to WordPerfect or 1-2-3 was tiny up until MS was able to start driving them out by leveraging Windows in the early-mid 90s.


It's reassuring that the biggest tech company doesn't automatically make the best tech. If it were guaranteed that Google's resources would automatically trump any startup in the AI field, then it would likely predict a guaranteed dominance of incumbents and consolidation of power in the AI space.


Isn't it always easier to learn from others' mistakes?

Google has the problem that it's typically the first to encounter a problem, and it has the resources to approach it (from search), but the incentive to monetize it (to get away from depending entirely on search revenue). And, management.


I don't know if that really excuses Google in this case because it's a productization problem. Google never tried to release a ChatGPT competitor until after OpenAI had. OpenAI has been wildly successful as the first mover, despite having to blaze some new product trails. Even after months of watching them and with near-infinite resources, Google is still struggling to catch up.


Outside of outliers like gmail, Google didn’t get their success with product. The organization is set up for engineering to carry the day, funded by search.

An AI product that makes search irrelevant is an existential threat, but I don’t think Google has the product DNA to pull off a replacement product for search themselves. I heard Google has been taken over by more business / management types, but it is still missing product as a core pillar.


Considerng the number of messaging apps they tried to launch, if there's at least one thing that can be concluded, it's that it isn't easier to learn from their own mistakes.


It's the curse of the golden goose.

They can't do anything that threatens their main income. They are tied to ads and ads technology, and can't do anything about it.

Microsoft had a crisis and that drives focus. Google... they probably mistreat their good employees if they don't work on ads.


I was with you until the Tesla hot take. I'd bet dollars to donuts that Tesla doesn't get to level 4 by the end of the decade. Waymo is already there.


I agree, but I also bet Waymo doesn't exist by the end of the decade. Not just because it's Google but because it's hard to profit from.


I could see that in the coming years the value of Waymo for Google is not actually in collecting revenue from transportation fees but to collect multi modal data to feed into its models.

The amount of data that is collected by these cars is massive.


None of that matters. They'll still make heaps of profit long into the future unless someone beats them in Search or Ads.

AI is a threat there, but it'd require an AI company to transform the culture of Internet use to stop people 'Googling', and that will require two things: something significantly better than Google Search that's worth switching to, and a company that is willing to reject whatever offer Google makes to buy it. Neither is very likely.


I would love to see internal data on volume of search at google. Depending on the interpretation of them chatGPT can meet both of your requirements. Personally, I still search instead of chatGPT mostly, but I have seen other users chatGPT more and more.

Also "interesting" to see the if results being SEO spam generated using AI will keep seo search viable.


The difference seems to be the top leadership.

Nadella is an all time great CEO. Pichai is an uninspired MBA-type.


Nadella is as much of an MBA type as Pichai. Their education and career paths are incredibly similar.

The difference is Nadella is a good CEO and Pichai isn’t.

Part of it could also be a result of circumstance. Nadella came at a time when MS was foundering and he had to make what appeared to be fairly obvious decisions (pivot to cloud…he was literally picked because of this, and reducing dependence on Windows…which was an obvious necessary step for the pivot to cloud). Pichai OTOH was selected to run Google when it was already doing pretty well. His biggest mandate was likely to not upset the Apple cart.

If roles were reversed, I suspect Nadella would still have been more successful than Pichai, but you never know. I’d Nadella introduction to the CEO job was to keep things going as they were, and Pichai’s was to change the entire direction of the company, maybe a decade later Pichai would have been the aggressive decision maker whereas Nadella would have been the overly cautious guy making canned demos.


>>Google Docs created in 2006! Microsoft is eating their lunch.

Of all the things, this.

I use both Google and Microsoft office products. One thing that strikes you is just how feature rich Microsoft products are.

Google doesn't look like is serious about making money.

I squarely blame rockstar product managers and OKRs for this. Not everything can be a 1000% profitable product built in the next quarter. A lot of things require small continuous improvement and care over years.


Microsoft’s killer product is Excel. I didn’t realize how powerful it was until I saw an expert use it. There are entire billion dollar organisations that would collapse without Excel.


Engineer-driven company. Not enough top-down direction on the products. Too much self-perceived moral high ground. But lately they've been changing this.


Uhh, no, not really; quite the opposite in fact.

Under Eric Schmidt they were engineer-driven, during the golden era of the 2000s. Nowadays they're MBA driven, which is why they had 4 different messaging apps from different product managers.


Lack of top-down direction is what allowed that situation. Microsoft is MBA-driven and usually has a coherent product lineup, including messaging.

Also, "had." Google cleaned things up. They still sometimes do stuff just cause, but it's a lot less now. I still feel like Meet using laggy VP9 (vs H.264 like everyone else) is entirely due to engineer stubbornness.


I would say that Microsoft's craziness around buying Kin and Nokia, and Windows 8, RT edition, etc etc, was far more fundamental product misdirection than anything Google has ever done.


Microsoft failed to enter the mobile space, yeah. Google fumbled with the Nexus stuff, even though they succeeded with the Android software. But bigger picture, Microsoft was still able to diversify their revenue sources a lot while Google failed to do so.


That's true, although Pixel seems good as a successor, but the big thing Microsoft did was use what they had to get into new markets.

Procuring Azure is a good option for lots of companies because most companies' IT staff know AD and Microsoft in general, and Microsoft's cloud offers them a way to use the same (well, not the same, but it's too late by then) tools to manage their company IT.

I'm not disagreeing with its success, but I do think they had a much simpler journey, as to my understanding a lot of it involved cloudifying their locked-in enterprise customers, rather than diversifying into new markets.


What frustrates me about Google is they fumbled in a lot of markets that aren't far from their established ones. Zoom ate their lunch with video chat, and now MS Teams seems to be beating GSuite. Maybe YouTube -> social networking would've been doable, but they botched it with G+. The old Google was only good at facing new technical challenges, not making products. Now that's changing, and I think at least they can make Google Cloud work.

I also don't see anything big Google has leveraged Android for, besides Pixel, which is actually more to cement Android cause they know they don't have enough control with software alone. At least I have decent amount of faith in them pulling that off.


20 versions of .net is wonderful. Changing the names of features over and over again is great too. I am also pleased that windows ten is the last version of windows.


The same Microsoft that squandered MSN messenger and Skype and then brought us the abomination that is MS teams?


The same Microsoft that recently brought us "New Teams" and "New Outlook" and gave us a reskinned version of the same programs but now we have it installed twice?


Those are two messaging apps regular people can actually name, unlike all of Google's messaging apps. MSN Messenger survived 13 years supposedly. Skype was also a big thing for several years MS owned it.

And I hate Teams personally, but lots of teams use it.


> but lots of teams use it.

I bet most team members who switched from Slack to Microsoft Teams do not feel like they consented or were asked for their opinions beforehand.


That distinction doesn't matter to Microsoft. Also it's funny how Google chat products once again go unmentioned... Their alternative is Chat (or idk, chat.google.com), and it's possibly even worse than Teams.


I think Google chat(s)'s issue may be lack of features and marketing, Microsoft Teams is drowning with bugs, performance issues, poor UI/UX design, etc.


I haven't used Teams enough to say, but Chat suffers from the latter things. UI keeps randomly changing in big ways, threading is confusing, there's serious lag just switching chats, takes forever to load, feature set is only on par with Slack circa 2015.


The golden era of the 2000s produced no revenue stream other than ads on Google Search.


Exactly. I never cared for the "golden age" Google. Maybe the old days were fun, but it wasn't going to be tenable forever.


My engineer friend who work at Google would strongly disagree with this assertion. I keep hearing about all sorts of hijinks initiated by senior PMs and managers trying to build their fiefdoms.


Disagree with which part? The hijinks are there, no denying it. Kind of a thing at any company, but remedied by leaders above those PMs taking charge.


> Google has been working on self driving longer than anyone. Tesla is catching up and will most likely beat them.

I agree with your general post but I disagree with this. Tesla's FSD is so far behind Google it's almost negligent on the part of Tesla despite having so much more data.


I can tell you exactly why. It’s because they have a separate vp and org for all these different products like search, maps, etc. none of them talk to each other and they all compete for promotions. There is no one shot caller same thing with gcp. Google does not know products.


A lot of companies have this structure. You have the Doritos line, the Pepsi line for example etc… maybe you find some common synergies but it’s not unusual.

What would the ideal setup in your opinion?


Big companies are where innovation goes to die.


There is little to no inertia inside google to build and invent stuff. But there is a massive bloat to ship stuff.


Tesla will not beat them at self driving simply due to hardware at the very least


Errr sorry what’s the innovation of google docs exactly ? Being able to write simultaneously with somebody else? Ok, so this is what it takes for a top notch docs app to exist? Microsoft been developing this product for ages, Google tried to steal the show, although had little to no experience in producing and marketing office apps…

Besides collaborative reuniting is a no feature and there is much more important stuff than this for a word processor to be useful.


Microsoft eating Google's lunch on documents is laughable at best. Not to mention it confuses the entire timeline of office productivity software??


Is paid MS Teams is more or less common than paid GSuite? It's hard to find stats on this. GSuite is the better product IMO, but MS has a stronger b2b reputation, and anecdotally I hear more about people using Teams.


Nobody pays for Teams, but everyone pays for Office, and if you get Teams for free with it ...


This is how it became so popular so fast. If they had charged for it, all those Teams users would still be using Zoom.


Not to mention it integrates with Azure365, which damn near certainly the IT department has already standardized on, feels comfortable with, and has been flooded with enough propaganda to believe anything else is massively less secure. Plus Teams has tons of knobs and buttons for managing what your users do with it... and companies love managing their employees lol.

Sure, Teams is a steaming pile of crap to use day-to-day as a chat app, the search is slow and vague - and depending on policy, probably links you to messages that no longer exist in the archive lol. Oh you want to download message history? Nah gotta get an admin to do that bruh.


I'm in one nonprofit org using MS 365 and Teams, and listening to the guy behind the original decision talk about that ecosystem, I think its popularity really does come from propaganda. I was almost convinced until I actually used it... what a piece of junk. It's ugly for me and borderline unusable for our nontechnical users. I'm in charge now and considering ditching it.

The only saving grace is that members who can't deal with it are using local MS Office, which has some integrations with 365, thus making it kinda viable. But I feel like it's still a net negative.


Does anyone use paid GSuite for anything other than docs/drive/Gmail ? In all companies I've worked at, we've used GSuite exclusively for those, and used slack/discord for chat, and zoom/discord for video/meetings.

I know that MS Teams is a more full-featured product suite, but even at companies that used it, we still used Zoom for meetings.


My company uses Meet. It works great! I like it more than Zoom.


Counterpoint: I take probably 3/4 of my meetings on Zoom and 1/4 on meet. So on any given day I'm probably doing at least 1 on meet. If I look back on any day at all the meetings with unaceptable audio lag or very degraded video quality? They are always all "meet". It is just hands-down worse when networks are unreliable.

In addition meet insists I click on the same about 4 or 5 different "Got it" feature popups every single call, and every call also insists on asking me if I want to use Duet AI to make my background look shit which just adds to annoyance.


It's a lot better than it used to be. In 2020, universities that already had GSuite (which includes Meet) still paid to put their classes on Zoom. Personally I like Zoom more today, mostly because even my high-end laptop can struggle with Meet.


I like meet too, but the inability to send messages to breakout rooms is quite annoying.


zoom is horrible. Meet works for me.


GSuite for calendar makes sense too. Chat sucks, and Meet would be decent if it weren't so laggy, but those are two things you can easily not use.


I worked at many companies in my times and all of them used teams except from one that used slack but all used MS products, none used googles.


Gsuite is clearly a lot better product than Office365. I feel like I'm taking crazy pills when I see many institutions make the wrong choice here.

I base about 50% of my choice of employer on what they choose in that area.


GSuite is an awful product for an employer.

If you have a problem there’s no one available to help you.

On the MS side they will literally pull an engineer who is writing the code for the product you have a problem for to help resolve the issue if you’re large enough.

The part you see in your browser isn’t the only part of the product a company has to buy. In fact, it’s not even the most expensive bit. If you see the most expensive plans for most SAAS products (ie the enterprise plans) almost the entire difference in costs is driven by support illustrating the importance and value of support.

Google unfortunately is awful at this.


Teams will likely still be around in 20 years. I doubt gsuite will exist in 5... or even 1.


GSuite has existed since 2006, so it's not like Google lacks focus on it.


Kinda. In 2006 they launched "Google Apps for your domain." The name quickly changed to "Google Apps" and then in 2016 it became "GSuite." In 2020 they changed the name to Google Workspace. And of course, in 2022 they tried to kick all of the free "Gsuite Legacy" users off the platform and make them pay for Google Workspace lol.


That's ancient by google metrics!!!


> Microsoft is eating their lunch.

Well, that is trully shocking.


Also hard to say it’s really true. OpenAI is certainly, is Microsoft without OpenAI’s tech eating Google’s lunch?


Given MSFT's level of investment in OpenAI, and all the benefits that accrue from it, they're one and the same.


It is yet to be seen if MSFT has actually gained a benefit. Maybe from marketing perspective it has insane potential to print big bucks, but it is a bit too soon to announce that the efforts to deliver Copilot (all tools+agents) far and wide was/is successful.

We'll get a definitive answer in a few years. Til then, OpenAI benefits from the $ value from their end of products, MSFT eats the compute costs, but also gets a stock bump.


The whole Gemini webpage and contents felt weird to me, it's in the uncanny valley of trying to look and feel like an Apple marketing piece. The hyperbolic language, surgically precise ethnic/gender diversity, unnecessary animations and the sales pitch from the CEO felt like a small player in the field trying to pass as a big one.


It's funny because now the OpenAI keynote feels like it's emulating the Google keynotes from 5 years ago.

Google Keynote feels like it's emulating the Apple keynote from 5 years ago.

And the Apple keynote looks like robots just out of an uncanny valley pretending to be humans - just like keynotes might look in 5 years, but actually made by AI. Apple is always ahead of the curve in keynote trends.


You know those memes where AI keeps escalating a theme to more extreme levels with each request?

That's what Apple keynotes feel like now. It seems like each year, they're trying to make their presentations even more essentially 'Apple.' They crossed the uncanny valley a long time ago.


"make it feel more like a hospital"


To me it feels more like a cult. Wear this kind of shoes and clothing. Make these hand gestures, talk that way. They look and sound fake, over processed and living in their own bubble detached from the rest of the world.


I’m what many would describe as a bit of an Apple fanboy but for the last few years I’ve been skipping most keynotes. They are becoming pretty unbearable.

I used to look forward to watching them live but now I just go back after the event and skip through the videos to see the most relevant bits (or just read the various reports instead).


I hadn’t thought about it until just now, but the most recent Apple events really are the closest real-person thing I’ve ever seen to some of the “good” computer generated photorealistic (kinda…) humans “reading” with text-to-speech that I’ve seen.

It’s the stillness between “beats” that does it, I think, and the very-constrained and repetitive motion.


Is there such a concept as a “reverse uncanny valley”??

Where humans behave so awkwardly that they seem artificial but are just not quite close enough…

If so, Apple have totally nailed the reverse uncanny valley!


Hmm. Like the "NPC fetish" stuff that was going around for a brief minute?


The more I think about this the more it rings true...


I got the same vibes. Ultra and Pro. It feels tacky that it declares the "Gemini era" before it's even available. Google really want to be seen as level on the playing field.


I’m imagining the project managers are patting themselves on the back for checking all the performative boxes, blind to the absolute satire of it all.


> surgically precise ethnic/gender diversity

What does that mean and why is it bad?

Diversity in marketing is used because, well, your desired market is diverse.

I don't know what it means for it to be surgically precise, though.


I imagine the commenter was calling out what they perceived to be an inauthentic yet carefully planned facade of diversity. This marketing trend rubs me the wrong way as well, because it reminds me of how I was raised and educated as a 90s kid to believe that racism was a thing of the past. That turned out to be a damaging lie.

I don't mean to imply that companies should avoid displays of diversity, I just mean that it's obvious when it's inauthentic. Virtue signaling in exchange for business is not progress.


It think it could be a seen as a good thing, it's a little chicken and egg. If you want to increase diversity at a company, one good way would be to represent diversity in your keynotes in order to make it look to a diverse base that they would be happy working there, thus increasing the diversity at the company.


You'd prefer the alternative with just a few white guys in the picture and no consideration given at all to appearing diverse?


The alternative is to just be authentic and not put up a fake show.


(and for the result to be authentically diverse)


Just take a group of people that actually know and work together and you're authentic. Forced diversity is idiotic: either you do it or you don't, but you show what you're doing to be authentic.

Imagine how cringe it would be if only white guys were allowed to work at Google and they displayed in all their marketing a fully diverse group of non-white girls. That would be... inauthentic.

Just the fact girls are less than guys in IT is something we should demonstrate, understand, change if needed. Not hide behind a facade of 50/50 display everywhere as if the problem was already solved or that it was even a problem in the first place.



It's bad if the makeup of the company doesn't reflect the diversity seen in the marketing, because it doesn't reflect any genuine value and is just for show.

Now, I don't know how diverse the AI workforce is at Google, but the YT thumbnails show precisely 50% of white men. Maybe that's what the parent meant by "surgically precise".


It's new token black guy. It's not completely bad, just feels inauthentic.


Agreed with your comment. This is every marketing department on the planet right now, and it's not a bad thing IMO. Can feel a bit forced at times, but it's better than the alternative.


The alternative being showing actual level of diversity in the company?


Of course to normal people, this just seems like another Google keynote. If OP is counting the number of white people, maybe they're the weird one here.


A big red flag for me was that Sundar was prompting the model to report lots of facts that can be either true or false. We all saw the benchmark figures that they published and the results mostly showed marginal improvements. In other words, the issue of hallucination has not been solved. But the demo seemed to imply that it had. My conclusion was that they had mostly cherry picked instances in which the model happened to report correct or consistent information.

They oversold its capabilities, but it does still seem that multi-modal models are going to be a requirement for AI to converge on a consistent idea of what kinds of phenomena are truly likely to be observed across modalities. So it's a good step forward. Now if they can just show us convincingly that a given architecture is actually modeling causality.


i think this was demonstrated in that mark rober promo video[1] where he asked why the paper airplane stalled by blatantly leading the witness.

"do you believe that a pocket of hot air would lead to lower air pressure causing my plane to stall?"

he could barely even phrase the question correctly because it was so awkward. just embarrassing.

[1] https://www.youtube.com/watch?v=mHZSrtl4zX0&t=277s


Yeah, this was so obvious too. Clearly Mark Rober tried to ask it what to try and got stupid answers, then tried to give it clues and had to get really specific before he got a usable answer.


This has got to be satire! That is too funny.


The issue of hallucinations won't be solved with the RAG approach. It requires a fundamentally different architecture. These aren't my words but Yann LeCun's. You could easily understand if you spend some time playing around. The autoregressive nature won't allow the LLMs to create an internally consistent model before answering the question. We have approaches like Chain of Thought and others, but they are merely band-aids and superficially address the issue.


If you build a complex Chain if Thought style Agent and then train/finetune further by reinforcement learning with this architecture then it is not a band-aid anymore, it is an integral part of the model and the weights will optimize to make use of this CoT ability.


It's been 3.5 years since GPT-3 was released, and just over a year since ChatGPT was released to the public.

If it was possible to solve LLM hallucinations with simple Chain-of-Thought style agents, someone would have done that and released a product by now.

The fact that nobody has released such a product, is pretty strong evidence that you can't fix hallucinations via Chain-of-Thought or Retrieval-Augmented Generation, or any other band-aid approaches.


I agree: but I just wanted to say that there are specific subdomains where you can mitigate some of these issues.

For example, generating json.

You can explicitly follow a defined grammar to get what will always be a valid json output.

Similarly, structured output such as code can be passed to other tools such as compilers, type checkers and test suites to ensure that at a minimum the output you selected passes some minimum threshold of “isn’t total rubbish”.

For unstructured output this a much harder problem, and bluntly, it doesn’t seem like there’s any kind of meaningful solution to it.

…but the current generation of LLMs are driven by probabilistic sampling functions.

Over the probability curve you’ll always get some rubbish, but if you sample many times for structure and verifiable output you can, to a reasonable degree, mitigate the impact that hallucinations have.

Currently that’s computationally expensive, to drive the chance of error down to a useful level, but compute scales.

We may seem some quite reasonable outputs from similar architectures wrapped in validation frameworks in the future, I guess.

…for, a very specific subset of types of output.


I agree that the "forcing valid json output" is super cool.

But it's unrelated to the problem of LLM hallucinations. A hallucination that's been validated as correct json is still a hallucination.

And if your problem space is simple enough that you can validate the output of an LLM well enough to prove it's free of hallucinations, then your problem space doesn't need an LLM to solve it.


> your problem space doesn’t need an LLM to solve it

Hmmm… kinda opinion right?

I’m saying; in specific situations, you can validate the output and aggregate solutions based on deterministic criteria to mitigate hallucinations.

You can use statistical methods (eg. There’s a project out there that generates tests and uses “on average tests pass” as a validation criteria) to reduce the chance of an output hallucination to probability threshold that you’re prepared to accept… for certain types of problems.

That the problem space is trivial or not … that’s your opinion, right?

It has no bearing on the correctness of what I said.

There’s no specific reason to expect that just like you can validate output against a grammar to require output that is structurally correct, you can’t validate output against some logical criteria (eg. unit tests) to require output that is logically correct against the specified criteria.

It’s not particularly controversial.

Maybe the output isn’t perfectly correct if you don’t have good verification steps for your task, maybe the effort required to build those validators is high, I’m just saying: it is possible.

I expect we’ll see more of this; for example, this article about decision trees —> https://www.understandingai.org/p/how-to-think-about-the-ope..., requires no specific change in the architecture.

It’s just using validators or search the solution space.


Ever since the "stochastic parrots" and "super-autocomplete" criticisms of LLMs, the question is whether hallucinations are solvable in principle at all. And if hallucinations are solvable, it would of such basic and fundamental scientific importance that I think would be another mini-breakthrough in AI.


An interesting perspective on this I’ve heard discussed is whether hallucinations ought to be solved at all, or whether they are core to the way human intelligence works as well, in the sense that that is what is needed to produce narratives.

I believe it is Hinton that prefers “confabulation” to “hallucination” because it’s more accurate. The example in the discussion about hallucination/confabulation was that of someone who had been present in the room during Nixon’s Watergate conversations. Interviewed about what he heard, he provided a narrative that got many facts wrong (who said what, and what exactly was said). Later, when audio tapes surfaced, the inaccuracies in his testimony became known. However, he had “confabulated truthfully”. That is, he had made up a narrative that fit his recall as best as he was able, and the gist of it was true.

Without the ability to confabulate, he would have been unable to tell his story.

(Incidentally, because I did not check the facts of what I just recounted, I just did the same thing…)


> Without the ability to confabulate, he would have been unable to tell his story.

You can tell a story without making up fiction. Just say you don’t know when you don’t know.

Inaccurate information is worse than no information.


> You can tell a story without making up fiction. Just say you don’t know when you don’t know.

The point is that humans can't in general, because we don't actually know which parts of what we "remember" are real and which parts are our brain filling in the blanks. And maybe it's the same for nonhuman intelligences too.


It’s hard even when you don’t take into consideration that you don’t know what you’ve misremembered. Try writing a memoir. You’ll realize you never actually remember what anyone says, but caveating your dialog with “and then I think they said something like” would make horrible reading.


Don’t people just bulk qualify the dialogue, e.g. “I don’t remember the exact words. Tom said … then Dick replied … something like that”.

Often we don’t quote people and instead provide a high level description of what they said, e.g. “Harry described the problems with his car.”, where detail is omitted.


Sure. Some memoirs state this in the foreword. There are also memoirs that are "fictionalized", like James Frey's book "A Million Little Pieces." Originally published as a memoir in 2003, it was later revealed that many of the events Frey described were exaggerated or fabricated. This caused a lot of controversy at the time, but many subsequent memoirs followed this pattern and I think it's become quite accepted in the genre.


If “confabulation” is necessary, you can use confabulation for the use cases where it’s needed and turn it off for the use cases where you need to return actual “correct” information.


I've read similar thoughts before about AI art. When the process was still developing, you would see AI "artwork" that was the most inhumanly uncanny pictures. Things that twisted the physical forms that human artists perceive with the fundamental pixel format/denoising algorithms that the AI works with. It was just uniquely AI and not something a human being would be able to replicate. "There are no errors just happy accidents." In there you say there was a real art medium/genre with its own intrinsic worth.

After a few months AI developers refined the process to just replicate images so they looked like a human being made them, in effect killing what was the real AI art.


The best one I've run across so far is, "spicy autocomplete".


These LLMs do not have a concept of factual correctness and are not trained/optimized as such. I find it laughable that people expect these things to act like quiz bots - this misunderstands the nature of a generative LLM entirely.

It simply spits out whatever output sequence it feels is most likely to occur after your input sequence. How it defines “most likely” is the subject of much research, but to optimize for factual correctness is a completely different endeavor. In certain cases (like coding problems) it can sound smart enough because for certain prompts, the approximate consensus of all available text on the internet is pretty much true and is unpolluted by garbage content from laypeople. It is also good at generating generic fluffy “content” although the value of this feature escapes me.

In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.


> because for certain prompts, the approximate consensus of all available text on the internet is pretty much true

I think you're slightly mischaracterising things here. It has potential to be at least slightly and possibly much better than that. This is evidenced by the fact it is much better than chance at answering "novel" questions that don't have a direct source in the training data. Why it can do it is because at a certain point, to solve the optimisation problem of "what word comes next" the least complex strategy actually becomes to start modeling principles of logic and facts connecting them. It is not in any systematic or reliable way so you can't ever guarantee when or how well it is going to apply these, but it is absolutely learning higher order patterns than simple text / pattern matching, and it is absolutely able to generalise these across topics.


You’re absolutely right and I’m sure that something resembling higher-level pattern matching is present in the architecture and weights of the model, I’m just saying that I’m not aware of “logical thought” being explicitly optimized or designed for - it’s more of a sometimes-emergent feature of a machine that tries to approximate the content of the internet, which for some topics is dominated by mostly logical thought. I’m also unaware of a ground truth against which “correct facts” could even be trained for..


> I’m also unaware of a ground truth against which “correct facts” could even be trained for..

Seems like there are quite a few obvious possibilities here off the top of my head. Ground truth for correct facts could be:

1) Wikidata

2) Mathematical ground truth (can be both generated and results validated automatically) including physics

3) Programming ground truth (can be validated by running the code and defining inputs/outputs)

4) Chess

5) Human labelled images and video

6) Map data

7) Dependent on your viewpoint, peer reviewed journals, as long as cited with sources.


The first question I always ask myself in such cases: how much input data has a simple "I don't know" lines? This is clearly a concept (not knowing sth) that has to be learned in order to be expressed in the output.


What stops you from asking the same question multiple times, and seeing if the answers are consistent. I am sure the capital of France is always going to come out Paris, but the name of a river passing a small village might be hallucinated differently. Even better - use two different models, if they agree it's probably true. And probably the best - provide the data to the model in context, if you have a good source. Don't use the model as fact knowledge base, use RAG.


Can’t speak for other people but I find it more time consuming to get ChatGPT to correct its mistakes than to do the work myself.


What type of work? I'm really only interested in coding related help :)


Ha, probably an insignificant amount. The internet is nothing if not confidently-stated positive results, no matter how wrong they might be. No wonder this is how LLMs act.


> In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.

I would say it’s worse than Google search. Google tells you when it can’t find what you are looking for. LLMs “guess” a bullshit answer.


Not always, I think that is an unfair reflection of LLM's in their current state. See two trivial examples below:

https://chat.openai.com/share/ca733a4a-7cdb-4515-abd0-0444a4...

https://chat.openai.com/share/dced0cb7-b6c3-4c85-bc16-cdbf22...

Hallucinations are definitely a problem, but they are certainly less than they used to be - They will often say that they aren't sure but can speculate, or "it might be because..." etc.


I get the feeling that LLMs will tell you they don’t know if “I don’t know” is one of the responses in their training data set. If they actually don’t know, i.e. no trained responses, that’s when they start hallucinating.


> It simply spits out whatever output sequence it feels is most likely to occur after your input sequence... but to optimize for factual correctness is a completely different endeavor

What if the input sequence says "the following is truth:", assuming it skillfully predicts following text, it would mean telling the most likely truth according to its training data.


unfortunately, this is the product they want to sold.


I mean it's a demo. Isn't this kinda what they all do


I was fooled. The model release announcement said it could accept video and audio multi-modal input. I understood that there was a lot of editing and cutting, but I really believed I was looking at an example of video and audio input. I was completely impressed since it’s quite a leap to go from text and still images to “eyes and ears.” There’s even the segment where instruments are drown and music was generated. I thought I was looking at a model that could generate music based on language prompts, as we have seen specialized models do.

This was all fake. You are taking a collection of cherry picked prompt engineered examples, then dramatizing them for maximum shareholder hype. The music example was just outputting a description of a song, not the generated music we heard in the video.

It’s one thing to release a hype video with what-ifs and quite another to claim that your new multi-modal model is king of the hill then game all the benchmarks and fake all the demos.

Google seems to be in an evil phase. OpenAI and MS must be quite pleased with themselves.


Exactly. Personally I’m fine with both:

1) Forward looking demoes that demonstrate the future of your product, where it’s clear that you’re not there yet but working in that direction

or

2) Demoes that show off current capabilities, but are scripted and edited to do so in the best light possible.

Both of those are standard practice and acceptable. What Google did was just wrong. They deserve to face backlash for this.


This kind of moral fraud - unethical behavior - is tolerated for some reason. It's almost like investors want to be fooled. There is no room for due diligence. They squeel like excited Taylor Swift fans as they are being lied to.


This shouldn't be a surprise. Companies optimize for what benefits shareholders. Or if there's an agency conflict of interest, companies optimize for what benefits managements' career and bonuses (perhaps at the expense of shareholders). Companies pay lip service to external stakeholders, but really that's a ploy to reduce attention and the risk of regulation, there is no fundamental incentive to treat all stakeholders well.

If lying helps, which can happen if there aren't large legal costs or social repercussions on brand equity, or if the lie goes undetected, then they'll lie. This is what we necessarily get from the upstream incentives. Fortunately, lying in a marketing video is fairly low on the list of ethical violations that have happened in the recent past.

We've effectively got a governance alignment problem that we've been trying to solve with regulations, taxes and social norms. How can you structure guardrails in the form of an incentive system to align companies with ethical outcomes? That's the question and it's a difficult one. This question also applies to any form of human organization, including governments.


As long as you’re not the last one out, “being fooled” can be very profitable


"phase"?

My friend, all these large corporations are going to get away with exactly as much as they can, for as long as they can. You're implying there's nothing to do but wait until they grace us with a "not evil phase", when in reality we need to be working on restoring our anti-monopoly regulation that was systematically torn down over the last 30 years.


I too thought it was able to accept video.

Given the massive data volume in videos, I assumed it processed video into pictures by extracting a frame per second or something along those lines, while still taking the entire video as the initial input.

Turns out, it wasn't even doing that!


Seems reminiscent of a video where the lead research department within Google is an animation studio (wish I could remember more about that video)

Doing all these hype videos just for the sake of satisfying shareholders or whatever is just making me loose trust in their research division. I don't think they did anything like this when they released Bert.


I agree completely. When alphazero was announced I remember feeling like shocked over how they stated this revolutionary breakthrough as if it was like a regular thing. Alphafold and Alphacode are also impressive but this one just sounds like it was forced from Sundar and not the usual deepmind


Well put. I’m not touching anything Google does any more. They’re far too dishonest. This failed attempt at a release (which turns out was all sizzle and no steak) only underscored how far behind OpenAI they actually are. I’d love to have been a fly on the wall in the OAI offices when this demo video went live.


I, too, was fooled to think Gemini has seen and heard through a video/audio feed instead of showing still images and prompting though text. While it might seem not much difference between still images and a video feed, in fact it requires a lot of (changing) context understanding to not make the bot babbling like an idiot all the time. It also requires the bot to recognize the “I don’t know it yet” state to keep appropriate silence in a conversation with live video feed, which is notoriously difficult with generative AI. Certainly one can do some hacking, build in some heuristics to make it easier, but to make a bot seems like a human partner in a conversion is indeed very hard. And that has been the most impressive aspect of the showed “conversations”, which are unfortunately all faked :(


I went back to the video and it said Gemini was "searching" for that music, not generating it. Google has done some stuff with generative music (https://aitestkitchen.withgoogle.com/experiments/music-lm) but we don't know if they'll bring that into Gemini.


I bet OpenAI and MS do the same, but people have a positive perception of them due to the massive chatGPT hype wave.


Do you believe everything verbatim that companies tell you in advertising?


If they show a car driving I believe it's capable of self-propulsion and not just rolling downhill.


A marketing trick that has, in fact, been tried: https://arstechnica.com/cars/2020/09/nikola-admits-prototype...


Used to be "marketing tricks" were prosecuted as fraud.


still is. Nikola's CEO, Trevor Milton, was convicted of fraud and is awaiting sentencing.


Oh. Good to hear. Thank you.


If I recall correctly, that led to literal criminal fraud charges.

And iirc Tesla is also being investigated for fraudulent claims for faking the safety of their self driving cars.


Hmm, might I interest you in a video of an electric semi-truck?


When a company invents tech that can do this, how would their ad be different?


No, but most people tend to make a mental note of which companies tend to deliver and which ones work hard to mislead them.

You do understand the concept of reputation, right?


this was plausible


I have used Swype texting since the t9 days.

If I demoed swype texting as it functions in my day to day life to someone used to a querty keyboard they would never adopt it

The rate at which it makes wrong assumptions about the word, or I have to fix it is probably 10% to 20% of the time

However because it’s so easy to fix this is not an issue and it doesn’t slow me down at all. So within the context of the different types of text Systems out there, I t’s the best thing going for me personally, but it takes some time to learn how to use it.

This is every product.

If you demonstrated to people how something will actually work after 100 hours of habituation and compensation for edge cases, nobody would ever adopt anything.

I’m not sure how to solve this because both are bad.

(Edit: I’m keeping all my typos as meta-comment on this given that I’m posting via swype on my phone :))


Showing a product in its best light is one thing. Demonstrating a mode of operation that doesn't exist is entirely another. It would be like if a demo of your swipe keyboard included telepathic mind control for correcting errors.


I’m not sure I’d agree that what they showed will never be possible and in fact my whole point is that I think Google can most likely deliver on that in this specific case. Chalk it up to my experience in the space, but from what I can see it looks like something Google can actually execute on (unlike many areas where they fail on product regularly).

I would agree completely that it’s not ready for consumers the way it was displayed, which is my point.

I do want to add that I believe that the right way to do these types of new product rollout is not with these giant public announcements.

In fact, I think generally speaking the “right” way to do something like this demonstrates only things that are possible robustly. However that’s not the market that Google lives in. They’re capitalists trying to make as much money as possible. I’m simply evaluating that what they’re showing I think is absolutely technically possible and I think Google can deliver it even if its not ready today.

Do I think it’s supremely ethical the way that they did it? No I don’t.


The voice interaction part didn't look a far cry from what we are doing with Dynamic Interaction at SoundHound. Because of this I assumed (like many it seems) that they had caught up.

And it's dangerous to assume they can just "deliver later". It's not that simple. If it is why not bake it in right now instead of committing fraud?

This is damaging to companies that walk the walk and then people have literally said to me "but what about that Gemini"? and dismiss our work.


I feel that more than you realize

That was basically what magic leap did to the whole AR development market. Everyone deep in it knew they couldn’t do it but they messed up so badly that it basically killed the entire industry


So let's not give big tech benefit of the doubt on this one. We have to call them out but even then the lie is already half way around the world...


I don't care what google could, in theory, deliver on some time in the future maybe. That's irrelevant. They are demonstrating something that can't be done with the product as they are selling it.


Does swype make editing easier somehow? iOS spellcheck has negative value. I turned it off years ago and it reduced errors but there are still typos to fix.

Unfortunately iOS text editing is also completely worthless. It forces strange selections and inserts edited text in awkward ways.

I’m a QWERTY texter but text entry on iOS is a complete disaster that has only gotten worse over time.


I'm an iOS user and prefer the swipe input implementation in GBoard over the one in the native keyboard. I'm not sure what the differences are, but GBoard just seems to overall make fewer mistakes and do a better job correcting itself from context.


As I was reading Andrew's comment to myself, I was trying to figure out when and why I stopped using swype typing on my phone. Then it hit me – I stopped after I switched from Android to iOS a few years ago. Something about the iOS implementation just doesn't feel right.


Apple's version is shit. Period. That's why.


But you can install other keyboards like SwiftKey or Gboard which are closer to what you are used to on Android.

My only issue is that no keyboard implementation really supports more than two languages which makes me switch back to plain qwerty with autocomplete all the time.


Have you tried the native keyboard since iOS 17? It’s quite a lot better than older versions.


Hard disagree. I could type your whole comment without any typos completely blindly (except maybe "QWERTY" because uppercaps don't get autocorrected).


Apple autocorrect has a tendency to replace technical terms with similar words, eg. rvm turns into rum or ram or something.

It's even worse on the watch somehow. I take care to hit every key exactly, the correct word is there, I hit space, boom replaced with a completely different word. On the watch it seems to replace almost every word with bullshit, not just technical terms.


> seems to replace almost every word with bullshit

Sort of related, it also doesn't let you cuss. It will insist on replacing fuck with pretty much anything else. I had to add fuck to the custom replacement dictionary so it would let me be. What language I choose to use is mine and mine alone, I don't want Nanny to clean it up.


They've pretty much solved this with iOS 17. You can even use naughty words now, provided you use it for a day or so to have it get used to your vocabulary.


Maybe my fingers are just too big but the watch for anything like texting is basically impossible for me to use.


> However because it’s so easy to fix this is not an issue and it doesn’t slow me down at all.

But that's a different issue than LLM hallucinations.

With Swype, you already know what the correct output looks like. If the output doesn't match what you wanted, you immediately understand and fix it.

When you ask an LLM a question, you don't necessarily know the right answer. If the output looks confident enough, people take it as the truth. Outside of experimenting and testing, people aren't using LLMs to ask questions for which they already know the correct answer.


The insight here is that the speed of correction is a crucial component of the perceived long-term value of an interface technology.

It is the main reason that handwriting recognition did not displace keyboards. Once the handwriting is converted to text, it’s easier to fix errors with a pointer and keyboard. So after a few rounds of this most people start thinking: might as well just start with the pointer and keyboard and save some time.

So the question is, how easy is it to detect and correct errors in generative AI output? And the unfortunate answer is that unless you already know the answer you’re asking for, it can be very difficult to pick out the errors.


I think this is a good rebuttal.

Yeah the feedback loop with consumers has a higher likelihood of being detrimental, so even if the iteration rate is high, it’s potentially high cost at each step.

I think the current trend is to nerf the models or otherwise put bumpers on them so people can’t hurt themselves. That’s one approach that is brittle at best and someone with more risk tolerance (OpenAI) will exploit that risk gap.

It’s a contradiction then at best and depending on the level of unearned trust from the misleading marketing, will certainly lead to some really odd externalities

Think “man follows google maps directions into pond” but for vastly more things.

I really hated marketing before but yeah this really proves the warning I make in the AI addendum to my scarcity theory (in my bio).


I know marketing is marketing, but it's bad form IMO to "demo" something in a manner totally detached from its actual manner of use. A swype keyboard takes practice to use, but the demos of that sort of input typically show it being used in a realistic way, even if the demo driver is an "expert".

This is the sort of demo that 1) gives people a misleading idea of what the product can actually do; and 2) ultimately contributes to the inevitable cynical backlash.

If the product is really great, people can see it in a realistic demo of its capabilities.


I think you mean swipe. Swype was a brilliant third party keyboard app for Android which was better at text prediction and manual correction than Gboard is today. If however you really do still use Swype then please tell me how because I miss it.


Ha good point, and yes I agree Swype continues to be the best text input technology that I’ll never be able to use again. I guess I just committed genericide here but I meant the general “swiping” process at this point


I don't buy it. OpenAI did not have to do it with ChatGPT, and they always include a live demo when they release new products.

Maybe you can spice up a demo, but misleading to the point of implying things are generated when they're not (like the audio example) is pretty bad.


What is the latency is Swype? < 10ms? Not at all comparable to the video.


> This is every product.

Except actual good ones, like ChatGPT or Gmail (by their time).


You make a decent point, but you might underestimate how much this Gemini demo is faked[0].

In your Swype analogy, it would be as if Swype works by having to write out on a piece of paper the general goal of what you're trying to convey, then having to write each individual letter on a Post-it, only for you to then organize these Post-its in the correct order yourself.

This process would then be translated into a slick promo video of someone swiping away on their keyboard.

This is not a matter of “eh, it doesn't 100% work as smooth as advertised.”

0: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...


Its honestly pretty mind boggling that we’d even use querty on a smartphone. The entire point of the layout is to keep your fingers on the home row. Meanwhile people text with a single or two thumbs 100% of the time.


The reason we use qwerty on a smartphone is extremely straightforward: people tend to know where to look for the keys already, so it's easy to adopt to even though it's not "efficient". We know it better than we know the positions of letters in the alphabet. You can easily see the difference if you're ever presented with an onscreen keyboard that's in alphabetical order instead of qwerty (TVs do this a lot, for some reason, and it's a different physical input method but alpha order really does make you have to stop and hunt). It slows you down quite a bit.


That's definitely a good reason why, but perhaps if iOS or Android were to research what the best layout is for typical touch screen typing and release that as a new default, people would find it quite quick to learn a second layout and soon get just the benefits?

After all, with TVs I've had the same experience as you with the annoying alphabetical keyboard, but we type into they maybe a couple of times a year, or maybe once in 5 years, whereas if we changed our phone keyboard layout we'd likely get used to it quite quickly.

Even if not going so far as to push it as a new default for all users (I'm willing to accept the possibility that I'm speaking for myself as the kind of geeky person who wouldn't mind the initial inconvenience of a new kb layout if it meant saving time in the long run, and that maybe a large majority of people would just hate it too much to be willing to give it a chance), they could at least figure out what the best layout is (maybe this has been studied and decided already, by somebody?) and offer that as an option for us geeks.


Even most technically-minded people still use QWERTY on full-size computer keyboards despite it being a terrible layout for a number of reasons. I really doubt a new, nonstandard keyboard would get much if any traction on phones.


T9 was fine for typing and probably hundreds of millions of people used it.


It only worked because it had to, given phones just had keypads back then. As soon as qwerty-with-your-thumbs was available, everyone abandoned T9 and never looked back.


I use 8vim[0] from time to time, it's a good idea but needs a dictionary/autocompletion. You can get ok speeds after an hour of usage.

[0] https://f-droid.org/en/packages/inc.flide.vi8/


Path dependency is the reason for this, and is the reason why a lot of things are the way they are. An early goal with smart phone keyboards was to take a tool that everyone already knew how to use, and port it over with as little friction as possible. If smart phones happened to be invented before external keyboards the layouts probably would have been quite different.


"The entire point of the layout is to keep your fingers on the home row."

No, that is how you're told to type. You have to be told to type that way precisely because QWERTY is not designed to keep your fingers on the home row. If you type in a layout that is designed to do that, you don't need to be told to keep your fingers on the home row, because you naturally will.

Nobody really knows what the designers were thinking, which I do not mean as sarcasm, I mean it straight. History lost that information. But whatever they were thinking that is clearly not it because it is plainly obvious just by looking at it how bad it is at that. Nobody trying to design a layout for "keeping your fingers on the home row" would leave hjkl(semicolon) under the resting position of the dominant hand for ~90% of the people.

This, perhaps in one of technical history's great ironies, makes it a fairly good keyboard for swype-like technologies! A keyboard layout like Dvorak that has "aoeui" all right next to each other and "dhtns" on the other would be constantly having trouble figuring out which one you meant between "hat" and "ten" to name just one example. "uio" on qwerty could probably stand a bit more separation, but "a" and "e" are generally far enough apart that at least for me they don't end up confused, and pushing the most common consonants towards the outer part of the keyboard rather than clustering them next to each other in the center (on the home row) helps them be distinguishable too. "fghjkl" is almost a probability dead zone, and the "asd" on the left are generally reasonably distinct even if you kinda miss one of them badly.

I don't know what an optimal swype keyboard would be, and there's probably still a good 10% gain to be made if someone tried to make one, but it wouldn't be enough to justify learning a new layout.


Hold up young one. The reason for QWERTYs design has absolutely not been lost to history yet.

The design was to spread out the hammers of the most frequently used letters to reduce the frequency of hammer jamming back when people actually used typewriters and not computers.

The problem it attempted to improve upon, and which is was pretty effective at, is just a problem that no longer exists.


Also apocryphal: https://en.wikipedia.org/wiki/QWERTY#Contemporaneous_alterna...

And it does a bad job at it, which is further evidence that it was not the design consideration. People may not have been able to run a quick perl script over a few gigabytes of English text, but they would have gotten much closer if that was the desire. I don't believe that was their goal but they were just too stupid to get it even close to right.


> The design was to spread out the hammers of the most frequently used letters to reduce the frequency of hammer jamming

That's a folk myth that's mostly debunked.

https://www.smithsonianmag.com/arts-culture/fact-of-fiction-...


I’m curious how this works because all the common letters seem to be next to each other on the left side of the keyboard


The original intent I do believe was not separating the hammers per se, but also helping the hands alternate, so they would naturally not jam as much.

However, I use a Dvorak layout and my hands feel like they alternate better on that due to the vowels being all on one hand. The letters are also in more sensical locations, at least for English writing.

It can get annoying when G and C are next to each other, and M and W, but most of the time I type faster on Dvorak than I ever did on Qwerty. It helps that I learned during a time where I used qwerty at work and Dvorak at home, so the mental switch only takes a few seconds now.


You have to be taught to use the home row because the natural inclination for most people is to peck and hunt with their two index fingers. Watch how old people or young kids type. That being said staying on the home row is how you type fast and make the most of the layout. Everything is comfortably reachable for the most part unless you are a windows user ime.


If you learn a keyboard layout where the home row is actually the most common keys you use, you will not have to be encouraged to use the home row. You just will. I know, because I have, and I never "tried" to use the home row.

People don't hunt and peck after years of keyboard use because of the keyboard; they do it because of the keyboard layout.

If you want to prove I'm wrong, go learn Dvorak or Colemak and show me that once you're comfortable you still hunt and peck. You won't be, because it wouldn't even make sense. Or, less effort, find a hunt & peck Dvorak or Colemak user who is definitely at the "comfortable" phase.


People will actually avoid using their nondominant fingers like their ring or pinky. Its an issue with typing irrespective of layout. Its an issue with guitarplaying even. I am no pianist but I wouldn’t be surprised if new players have an aversion towards using those fingers as well.

I understand how dvorak is designed. I am still not convinced people will be using all their fingers especially their pinkys in a consistent manner that without learning that this is what you should work towards.


I really shouldn't be astonished at the number of people who will prioritize their own theorizing over the field report of someone who actually did the thing in question, but yet still somehow I am.

I did it. It happened. Theorizing about why it didn't happen is not terribly productive.


> Nobody really knows what the designers were thinking, which I do not mean as sarcasm, I mean it straight. History lost that information.

My understanding of QWERTY layout is that it was designed so that characters frequently used in succession should not be able to be typed in rapid succession, so that typewriter hammers had less chance of colliding. Or is this an urban myth?


My understanding (which is my recollections of a dive into typewriter history decades ago) is that avoiding typebar collisions was a real concern, but that the general consensus was that the exact final layout was strongly influenced by allowing salesmen to quickly type out 'typewriter' on the top row of letters.


I love this explanation.


The Twitter-linked Bloomberg page is now down.[1] Alternative page: [2] New page says it was partly faked. Can't find old page in archives.

[1] https://www.bloomberg.com/opinion/articles/2023-12-07/google...

[2] https://www.bloomberg.com/opinion/articles/2023-12-07/google...


The report from TechCrunch has more details - https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...


I am similarly enraged when TV show characters respond to text messages faster than humans can type. It destroys the realism of my favorite rom-coms.


I suppose this is a great example of how trust in authentic videos, audio, images, company marketing must be questioned and, until verified, assumed to be 'generated'.

I am curious, if the voice, email, chat, and shortly video can all be entirely generated in real or near real time, how can we be sure that remote employee is actually not a full or partially generated entity?

Shared secrets are great when verifying but when the bodies are fully remote - what is the solution?

I am traveling at the moment. How can my family validate that it is ME claiming lost luggage and requesting a Venmo request?


If you can't verify whether your employee is AI, then you fire them and replace them with AI.


The question is if an attacker tells you they lost access can you please reset some credential, and your security process is getting on a video call because you're a fully remote company let's say.


>I am traveling at the moment. How can my family validate that it is ME claiming lost luggage and requesting a Venmo request?

PGP


Now you have two problems.

(I say this in jest, as a PGP user)


Ask for information that only the actual person would know.


That will only work once if the channels are monitored.


You only know one piece of information about your family? I feel like I could reference many childhood facts or random things that happened years ago in social situations.


Make up a code phrase/word for emergencies, share it with your family, then use it for these types of situations.


Fair, but that also assumes the recipients ("family") are in a mindset of constantly thinking about the threat model in this type of situation and will actually insist on hearing the passphrase.


This will only work once.


I think it's also why we as a community should speak out when we catch them for doing this as they are discrediting tech demos. It won't be enough because a lie will be around the world before the truth gets out the starting gates but we can't just let this go unchecked.


At this point, probably a handwritten letter. Back to the 20th century we go.


The video itself and the video description give a disclaimer to this effect. Agreed that some will walk away with an incorrect view of how Gemini functions, though.

Hopefully realtime interaction will be part of an app soon. Doesn’t seem like there would be too many technical hurdles there.


The entirety of the disclaimer is "sequences shortened throughout", in tiny text at the bottom for two seconds.

They do disclose most of the details elsewhere, but the video itself is produced and edited in such a way that it's extremely misleading. They really want you to think that it's responding in complex ways to simple voice prompts and a video feed, and it's just not.


Yea, of all the edits in the video, the editing for timing is the least of concern. My gripe is that the prompting was different and in order to get that information you have to watch the video only on YouTube, expand the description and click on a link to a different blog article. Linking a "making of" video where they show this and interview some of the minds behind Gemini would have been better PR.


The disclaimer in the description is "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity."

That's different from "Gemini was shown selected still images and not video".


What I found impressive about it was the voice, the fast real-time response to video, and the succinct responses. So apparently all of that was fake. You got me, Google.


People don't really pay attention to disclaimers. Google made a choice knowing people would remember the hype, not the disclaimer.


I remember watching it and I was pretty impressed, but as I was walking around thinking to myself I came to the conclusion that there was something fishy about the demo. I didn't know exactly what they fudged, but it was far too polished to explain how well their current AI demos preform.

I'm not saying there have been no improvements in AI. There is and this includes Google. But the reason why ChatGPT has really taken over the world is that the demo is in your own hands and it does quite well there.


Indeed, and this is how Google used to be as a company. I remember when Google Maps & Earth launched, and how they felt like world-changing technology. I'm sure they're doing lots of innovative science and development still, but it's and advertising/services company now, and one that increasingly talks down to its users. Disappointing considering their early sense of mission.

Thinking back to the firm's early days, it strikes me that some HN users and perhaps even some Googlers have no memory of a time before Google Maps and simply can't imagine how disruptive and innovative things like that were at the time. Being able to browse satellite imagery for the whole world was something previously confined to the upper echelons of the military-industrial complex.

That's one reason I wish the firm (along with several other tech giants) were broken up; it's full of talented innovative people, but the advertising economics at the core of their business model warp everything else.


    :%s/Google/the team
    :%s/people/the promotion board
Conway's law applied to the corporate-public interface :)


No. The disclaimer was not nearly enough.

The video fooled many people, including myself. This was not your typical super optimized and scripted demo.

This was blatant false advertising. Showing capabilities that do not exist. It’s shameful behavior from Google, to be perfectly honest.


Yeah, and ads on Google search have the teeniest, tiniest little "ad" chip on them, a long progression of making ads more in-your-face and less well-distinguished.

In my estimation, given the context around AI-generated content and general fakery, this video was deceptive. The only impressive thing about the video (to me) was how snappy and fluid it seemed to be, presumably processing video in real time. None of that was real. It's borderline fraudulent.


They were just parroting this video on CNBC without any disclaimers, so the viewers who don't happen to also read hacker news will likely form a different opinion than those of us who do.


If there weren't serious technical hurdles they wouldn't have faked it.


The difference between “Hey, figure out a game based on what you see right now” vs “here is a description of a game with the only too possible outcomes as examples” cannot be explained by the disclaimer.


performance and cost are hurdles?


It can be realtime while still having more latency than depicted in the video (and the video clearly stated that Gemini does not respond that quickly).

A local model could send relevant still images from the camera feed to Gemini, along with the text transcript of the user’s speech. Then Gemini’s output could be read aloud with text-to-speech. Seems doable within the present cost and performance constraints.


I, too, was fooled to think Gemini has seen and heard through a video/audio feed instead of showing still images and prompting though text. While it might seem not much difference between still images and a video feed, in fact it requires a lot of (changing) context understanding to not make the bot babbling like an idiot all the time. It also requires the bot to recognize the “I don’t know it yet” state to keep appropriate silence in a conversation with live video feed, which is notoriously difficult with generative AI. Certainly one can do some hacking, build in some heuristics to make it easier, but to make a bot seems like a human partner in a conversion is indeed very hard. And that has been the most impressive aspect of the showed “conversations”, which are unfortunately all faked :(


Does it matter at all with regards to its AI capabilities though?

The video has a disclaimer that it was edited for latency.

And good speech-to-text and text-to-speech already exists, so building that part is trivial. There's no deception.

So then it seems like somebody is pressing a button to submit stills from a video feed, rather than live video. It's still just as useful.

My main question then is about the cup game, because that absolutely requires video. Does that mean the model takes short video inputs as well? I'm assuming so, and that it generates audio outputs for the music sections as well. If those things are not real, then I think there's a problem here. The Bloomberg article doesn't mention those, though.


Even your skeptical take doesn't fully show how faked this was.

> The video has a disclaimer that it was edited for latency.

There was no disclaimer that the prompts were different from what's shown.

> And good speech-to-text and text-to-speech already exists, so building that part is trivial. There's no deception.

Look at how many people thought it can react to voice in real-time - the net result is that a lot of people (maybe most?) were deceived. And the text prompts were actually longer and more specific than what was said in the video!

> somebody is pressing a button to submit stills from a video feed, rather than live video.

Somebody hand-picked images to convey exactly the right amount of information to Gemini.

> Does that mean the model takes short video inputs as well? I'm assuming so

It was given a hand-picked series of still images with the hands still on the cups so that it was easier to understand what cup moved where.

Source for the above: https://developers.googleblog.com/2023/12/how-its-made-gemin...


I'm ok with "edited for latency" or "only showing the golden path".

But the most impressive part of the demo, was the way the LLM just seemed to know when to jump in with a response. It appeared to be able to wait until the user had finished the drawing, or even jumping in slightly before the drawing finished. At one point the LLM was halfway though a response and then saw the user was now colouring the duck in blue, and started talking about how the duck appearing to be blue.

The LLM also appeared to know when a response wasn't needed because the user was just agreeing with the LLM.

I'm not sure how many people noticed that on a conscious level, but I positive everyone noticed it subconsciously, and felt the interaction was much more natural.

As you said, good speed-to-text and speech-to-text has already been done, along with multi-model image/video/audio LLMs and image/music generation. The only novel thing google appeared to be demonstrating and what was most impressive was this apparent natural interaction. But that part was all fake.


Audio input that's not text in the middle and video input are two things they made a big deal out of. Then they called it a hands on demo and it was faked.

> My main question then is about the cup game, because that absolutely requires video.

They did it with carefully timed images, and provided a few examples first.

> I'm assuming so, and that it generates audio outputs for the music sections as well

No, it was given the ability to search for music and so it was just generating search terms.

Here's more details:

https://developers.googleblog.com/2023/12/how-its-made-gemin...


Yes, that was obvious as soon as I saw it wasn’t live I clicked off. You can train any LLM to perform a certain task(s) well and google engineers are not that dense. This was obvious marketing PR as open AI has completely made google basically obsolete with 90% of my queries can be answered without wading through LLM generated text for a simple answer.


>without wading through LLM generated text

...OpenAI solved this by generating LLM text for you to wade through?


No. It solved it by (most of the time) giving the OP and I the answer to our queries, without us needing to wade through spammy SERP links.


If LLMs can replace 90% of your queries, then you have very different search patterns from me. When I search on Kagi, much of the time I’m looking for the website of a business, a public figure’s social media page, a restaurant’s hours of operation, a software library’s official documentation, etc.

LLMs have been very useful, but regular search is still a big part of everyday life for me.


Sure we now have options, but before LLMs, most queries relied solely on search engines, often leading to sifting through multiple paragraphs on websites to find answers — a barrier for most users.

Today, LLMs excel in providing concise responses, addressing simple, curious questions like, "Do all bees live in colonies?"


How do you tell a plausible wrong answer from a real one?


By testing the code it returns (I mostly use it as a coding assistant) to see if it works. 95% of the time it does.

For technical questions, ChatGPT has almost completely replaced Google & Stack Overflow for me.


In my experience, testing code in a way that ensures that it works is often harder and takes more time than writing it.


GPT4 search is a very good experience.

Though because you don’t see the answers it doesn’t show you, it’s hard to really validate the quality, so I’m still wary, but when I look for specific stuff it tends to find it.


Anyone remember the Google IO demo where they had their “AI” call a barber to book an appointment.

Turns out it was all staged.

Lost a lot of trust after that.

Google is stuck in innovators dilemma.

They make 300B of revenue which ~90% is ads revenue.

Their actual mission that management chain optimizes for is their $ growth.

A superior AI model that gives the user exactly what they want would crash their market cap.

Microsoft has tons of products with Billion+ profit, Google has only a handful and other than cloud they all tie to Ads.

Google is addicted to ads. If chrome adds a feature that decreases ad revenue, that team gets a stick.

Nothing at Google should jeopardize their ad revenue.

AI is directly a threat to Google’s core business model - ads. It’s obvious they’re gonna half ass it.

For OpenAI, AI is existential for them. If they don’t deliver, they’ll die.


No way, was Google Duplex fake?!



Bummer


That's not the only thing wrong. Gemini makes a false statement in the video, serving as a great demonstration of how these models still outright lie so frequently, so casually, and so convincingly that you won't notice, even if you have a whole team of researchers and video editors reviewing the output.

It's the single biggest problem with LLMs and Gemini isn't solving it. You simply can't rely on them when correctness is important. Even when the model has the knowledge it would need to answer correctly, as in this case, it will still lie.

The false statement is after it says the duck floats, it continues "It is made of a material that is less dense than water." This is false; "rubber" ducks are made of vinyl polymers which are more dense than water. It floats because the hollow shape contains air, of course.


This seems to be a common view among some folks. Personally, I'm impartial.

Search or even asking other expert human beings are prone to provide incorrect results. I'm unsure where this expectation of 100% absolute correctness comes from. I'm sure there are use cases, but I assume it's the vast minority and most can tolerate larger than expected inaccuracies.


> I'm unsure where this expectation of 100% absolute correctness comes from.

It's a computer. That's why. Change the concept slightly: would you use a calculator if you had to wonder if the answer was correct or maybe it just made it up? Most people feel the same way about any computer based anything. I personally feel these inaccuracies/hallucinations/whatevs are only allowing them to be one rung up from practical jokes. Like I honestly feel the devs are fucking with us.


Speech to text is often wrong too. So is autocorrect. And object detection. Computers don't have to be 100% correct in order to be useful, as long as we don't put too much faith in them.


Call me old fashioned, but I would absolutely like to see autocorrect turned off in many contexts. I much prefer to read messages with 30% more transparent errors rather than any increase in opaque errors. I can tell what someone meant if I see "elephent in the room", but not "element in the room" (not an actual example, autocorrect would likely get that one right).


Your caveat is not the norm though, as everyone is putting a lot of faith in them. So, that's part of the problem. I've talked with people that aren't developers, but they are otherwise smart individuals that have absolutely not considered that the info is not correct. The readers here are a bit too close to the subject, and sometimes I think it is easy to forget that the vast majority of the population do not truly understand what is happening.


Nah, I don’t think anything has the potential to build critical thinking like LLMs en masse. I only worry that they will get better. It’s when they are 99.9% correct we should worry.


People put too much faith in conspiracy theories they find on YT, TikTok, FB, Twitter, etc. What you're claiming is already not the norm. People already put too much faith into all kinds of things.


Okay, but search is done on a computer, and like the person you’re replying to said, we accept close enough.

I don’t necessarily disagree with your interpretation, but there’s a revealed preference thing going on.

The number of non-tech ppl I’ve heard directly reference ChatGPT now is absolutely shocking.


> The number of non-tech ppl I've heard directly reference ChatGPT now is absolutely shocking.

The problem is that a lot of those people will take ChatGPT output at face value. They are wholly unaware that of its inaccuracies or that it hallucinates. I've seen it too many times in the relatively short amount of time that ChatGPT has been around.


So what? People do this with Facebook news too. That's a people problem, not an LLM problem.


People on social media are absolutely 100% posting things deliberately to fuck with people. They are actively seeking to confuse people, cause chaos, divisiveness, and other ill intended purposes. Unless you're saying that the LLM developers are actively doing the same thing, I don't think comparing what people find on the socials vs getting back as a response from a chatBot is a logical comparison at all


There are far more people who post obviously wrong, confusing and dangerous things online with total conviction. There are people who seriously believe Earth is flat, for example.


How is that any different from what these AI chatbots are doing? They make stuff up that they predict will be rewarded highly by humans who look at it. This is exactly what leads to truisms like "rubber duckies are made of a material that floats over water" - which looks like it should be correct, even though it's wrong. It really is no different from Facebook memes that are devised to get a rise out of people and be widely shared.


Because we shouldn't be striving to make mediocrity. We should be striving to build better. Unless the devs of the bots are wanting to have a bot built on trying to deceive people, I just don't see the purpose of this. If we can "train" a bot and fine tune it, we should be fine tuning truth and telling it what absolutely is bullshit.

To avoid the darker topics to keep the conversation on the rails, if there were a misinformation campaign that was trying to state that the Earth's sky is red, then the fine tuning should be able to inform that this is clearly fake so when quoting this it should be stated as incorrect information that is out there. This kind of development should be how we can clean up the fake, but nope, we're seemingly quite happy at accepting it. At least that's how your question comes off to me.


Sure, but current AI bots are just following the human feedback they get. If the feedback is naïve enough to score the factoid about rubber duckys as correct, guess what, that's the kind of thing these AI's will target. You can try to address this by prompting them with requests like "do you think this answer is correct and ethical? Think through this step by step" ('reinforcement learning from AI feedback') but that's very ad hoc and uncertain - ultimately, the humans in the loop call the shots.


At the end of the day, if there is no definitive answer to a question, it should respond in such a manner. "While there are compelling reasons to think A or B, neither A nor B have been verified. They are just the leading theories." That would be a much better answer than "Option A is the answer even if some people think B is." when A is just as unproven as B, but because it answers so definitively, people think it is the right answer.

So the labels thing is something that obviously will never work. But the system has all of the information it needs to know if the question is definitively answerable. If it is not, do not phrase the response definitively. At this point, I'd be happy if it responded to "Is 1+1 = 2?" with a wish washy answer like, "Most people would agree that 1+1 = 2", and if it wanted to say "in base 10, that is the correct answer. however, in base 2, the 1+1 = 10" would also be acceptable. Fake it till you make it is not the solution here.


If we rewind a little bit to the mid to late 2010s, filter bubbles, recommendation systems and unreliable news being spread on social media was a big problem. It was a simpler time, but we never really solved the problem. Point is, I don’t see the existence of other problems as an excuse for LLM hallucination, and writing it off as a “people problem” really undersells how hard it is to solve people problems.


Literally everything is a "people problem"

You can kill people with a fork, it doesn't mean you should legally be allowed to own a nuclear bomb "because it's just the same". The problem always come from scale and accessibility


So you're saying we need a Ministry of Truth to protect people from themselves? This is the same argument used to suppress "harmful" speech on any medium.


I've gotten to the point where I want "advertisment" stamped on anything that is, and I'm getting to the point I want "fiction" stamped on anything that is. I have no problem with fiction existing. It can be quite fun. People trying to pass fiction as fact is a problem though. Trying to force a "fact" stamp would be problematic though, so I'd rather label everything else.

How to enforce it is the real sticky wicket though, so it's only something best discussed at places like this or while sitting around chatting while consuming


And who gets to control the "fiction" stamp? Especially for hot button topics like covid (back in 2020)? Should asking an LLM about lab leak theory be auto-stamped with "fiction" since it's not proven? But then what if it's proven later?


why should all computing be deterministic?

let me show you this "genius"/"wrong-thinking" person as to say about AL(artificial life) and deterministic computing.

https://www.cs.unm.edu/~ackley/

https://www.youtube.com/user/DaveAckley

To sum up a bunch of their content: You can make intractable problems solvable/crunchable if you allow just a little error into the result (which is reduced the longer the calculation calculates). And this is acceptable for a number of use cases where initial accuracy is less important that instant feedback.

It is radically different from a Von Neumann model of a computer - where there is a deterministic 'totalitarian finger pointer' pointing to some registry (and only one registry at a time) is an inherently limited factor. In this model - each computational resource (a unit of ram, and a processing unit) fights for and coordinates reality with it's neighbors without any central coordination.

Really interesting stuff. still in its infancy...


"Computer says no" is not a meme for no reason.


I'm a software engineer, and I more or less stopped asking ChatGPT for stuff that isn't mainstream. It just hallucinates answers and invents config file options or language constructs. Google will maybe not find it, or give you an occasional outdated result, but it rarely happens that it just finds stuff that's flat out wrong (in technology at least).

For mainstream stuff on the other hand ChatGPT is great. And I'm sure that Gemini will be even better.


The important thing is that with Web Search as a user you can learn to adapt to varying information quality. I have a higher trust for Wikipedia.org than I do for SEO-R-US.com, and Google gives me these options.

With a chatbot that's largely impossible, or at least impractical. I don't know where it's getting anything from - maybe it trained on a shitty Reddit post that's 100% wrong, but I have no way to tell.

There has been some work (see: Bard, Bing) where the LLM attempts to cite its sources, but even then that's of limited use. If I get a paragraph of text as an answer, is the expectation really that I crawl through each substring to determine their individual provenances and trustworthiness?

The shape of a product matters. Google as a linker introduces the ability to adapt to imperfect information quality, whereas a chatbot does not.

As an exemplar of this point - I don't trust when Google simply pulls answers from other sites and shows it in-line in the search results. I don't know if I should trust the source! At least there I can find out the source from a single click - with a chatbot that's largely impossible.


> it rarely happens that it just finds stuff that's flat out wrong

"Flat out wrong" implies determinism. For answers which are deterministic such as "syntax checking" and "correctness of code" - this already happens.

ChatGPT, for example, will write and execute code. If the code has an error or returns the wrong result it will try a different approach. This is in production today (I use the paid version).


Dollars to doughnuts says they are using GPT3.5.


I'm currently working with some relatively obscure but open source stuff (JupyterLite and Pyodide) and ChatGPT 4 confidently hallucinates APIs and config options when I ask it for help.

With more mainstream libraries it's pretty good though


I use chatgpt4 for very obscure things

If I ever worried about being quoted then I’ll verify the information

otherwise I’m conversational, have taken an abstract idea into a concrete one and can build on top of it

But I’m quickly migrating over to mistral and if that starts going off the rails I get an answer from chatgpt4 instead


I know exactly where the expectation comes from. The whole world has demanded absolute precision from computers for decades.

Of course, I agree that if we want computers to “think on their own“ or otherwise “be more human“ (whatever that means) we should expect a downgrade in correctness, because humans are wrong all the time.


> The whole world has demanded absolute precision from computers for decades.

Computer engineers maybe. I think the general population is quite tolerant of mistakes as long as the general value is high.

People generally assign very high value to things computers do. To test this hypothesis all you have to do is ask folks to go a few days without their computer or phone.


> The whole world has demanded absolute precision from computers

The opposite. Far too tolerant of the excuse "sorry, computer mistake." (But yeah, just at the same time as "the computer says so".)


Is it less reliable than an encyclopedia? It is less reliable than Wikipedia? Those aren't infallible but what's the expectation if it's wrong on something relatively simple?

With the rush of investment in dollars and to use these in places like healthcare, government, security, etc. there should be absolute precision.


Humans are imperfect, but this comes with some benefits to make up for it.

First, we know they are imperfect. People seem to put more faith into machines, though I do sometimes see people being too trusting of other people.

Second, we have methods for measuring their imperfection. Many people develop ways to tell when someone is answering with false or unjustified confidence, at least in fields they spend significant time in. Talk to a scientist about cutting edge science and you'll get a lot of 'the data shows', 'this indicates', or 'current theories suggest'.

Third, we have methods to handle false information that causes harm. Not always perfect methods, but there are systems of remedies available when experts get things wrong, and these even include some level of judging reasonable errors from unreasonable errors. When a machine gets it wrong, who do we blame?


Absolutely! And fourth, we have ways to make sure the same error doesn't happen again; we can edit Wikipedia, or tell the person they were wrong (and stop listening to them if they keep being wrong).


I find it ironic that computer scientists and technologists are frequently uberrationalists to the point of self parody but they get hyped about a technology that is often confidently wrong.

Just like the hype with AI and the billions of dollars going into it. There’s something there but it’s a big fat unknown right now whether any part of the investment will actually pay off - everyone needs it to work to justify any amount of the growth of the tech industry right now. When everyone needs a thing to work, it starts to really lose the fundamentals of being an actual product. I’m not saying it’s not useful, but is it as useful as the valuations and investments need it to be? Time will tell.


>I'm unsure where this expectation of 100% absolute correctness comes from. I'm sure there are use cases, but I assume it's the vast minority and most can tolerate larger than expected inaccuracies.

As others hinted at, there's some bias because it's coming from a computer, but I think it's far more nuanced than that.

I've worked with many experts and professionals through my career ranging across medicine, various types of engineers, scientists, academics, researchers and so on and the pattern I often see is the level of certainty presented that always bothers me and the same is often embedded in LLM responses.

While humans don't typically quantify the certainty of their statements, the best SMEs I've ever worked with make it very clear what level of certainty they have when making professional statements. The SMEs who seem to be more often wrong than not speak in certainty quite often (some of this is due to cultural pressures and expectations surrounding being an "expert").

In this case, I would expect a seasoned scientist to say something in response to the duck question that: "many rubber ducks exist and are designed to float, this one very well might, we'd really need to test it or have far more information about the composition of the duck, the design, the medium we want it in (Water? Mecury? Helium?)" and so on. It's not an exact answer but you understand there's uncertainty there and we need to better clarify our question and the information surrounding that question. The fact is, it's really complex to know if it'll float or not from visual information alone.

It could have an osmimum ball inside that overcomes most the assumed buoyancy the material contains, including the air demonstrated to make it squeak. It's not transparent. You don't know for sure and the easiest way to alleviate uncertainty in this case is simply to test it.

There's so much uncertainty in the world, around what seem like the most certain and obvious things. LLMs seem to have grabbed some of this bad behavior from human language and culture where projecting confidence is often better (for humans) than being correct.


Most people I worked with either tell me "I don't know" or "I think x, but with not sure" when they are not sure about something, the issue with LLMs is they don't have this concept.


The bigger problem is lack of context. When I speak with a person or review search results, I can use what I know about the source to evaluate the information I'm given. People have different areas of expertise and use language and mannerisms to communicate confidence in their knowledge or lack thereof. Websites are created by people (most times) and have a number of contextual clues that we have learned to interpret over the years.

LLMs do none of this. They pose as a confident expert on almost everything, and are just as likely to spit out BS as a true answer. They don't cite their sources, and if you ask for the source sometimes they provide ones that don't contain the information cited or don't even exist. If you hired a researcher and they did that you wouldn't hire them again.


1. Hunans may also never be 100% - but it seems they are more often correct. 2. When AI is wrong it's often not only slighty off, but completely off the rails. 3. Humans often tell you when they are not sure. Even if it's only their tone. AI is always 100% convinced it's correct.


It’s not AI it’s a machine learning model


If it’s no better than asking a random person, then where is the hype? I already know lots of people who can give me free, maybe incorrect guesses to my questions.

At least we won’t have to worry about it obtaining god-like powers over our society…


> At least we won’t have to worry about it obtaining god-like powers over our society…

We all know someone who's better at self promotion than at whatever they're supposed to be doing. Those people often get far more power than they should have, or can handle—and ChatGPT is those people distilled.


Let's see, so we exclude law, we exclude medical.. it's certainly not a "vast minority" and the failure cases are nothing at all like search or human experts.


Are you suggesting that failure cases are lower when interacting with humans? I don't think that's my experience at all.

Maybe I've only ever seen terrible doctors but I always cross reference what doctors say with reputable sources like WebMD (which I understand likely contain errors). Sometimes I'll go straight to WebMD.

This isn't a knock on doctors - they're humans and prone to errors. Lawyers, engineers, product managers, teachers too.


You think you ask your legal assistant to find some precedents related to your current case and they will come back with an A4 page full of made up cases that sound vaguely related and convincing but are not real? I don't think you understand the failure case at all.


That example seems a bit hyperbolic. Do you think lawyers who leverage ChatGPT will take the made up cases and present them to a judge without doing some additional research?

What I'm saying is that the tolerance for mistakes is strongly correlated to the value ChatGPT creates. I think both will need to be improved but there's probably more opportunity in creating higher value.

I don't have a horse in the race.


> Do you think lawyers who leverage ChatGPT will take the made up cases and present them to a judge without doing some additional research?

I generally agree with you, but it's funny that you use this as an example when it already happened. https://arstechnica.com/tech-policy/2023/06/lawyers-have-rea...


facepalm


> Do you think lawyers who leverage ChatGPT will take the made up cases and present them to a judge without doing some additional research

I really don’t recommend using ChatGPT (even GPT-4) for legal research or analysis. It’s simply terrible at it if you’re examining anything remotely novel. I suspect there is a valuable RAG application to be built for searching and summarizing case law, but the “reasoning” ability and stored knowledge of these models is worse than useless.


> Do you think lawyers who leverage ChatGPT will take the made up cases and present them to a judge without doing some additional research?

You don't?

https://fortune.com/2023/06/23/lawyers-fined-filing-chatgpt-...


What would be the point of a lawyer using chatGPT if it had to root through every single reference chatGPT relied upon? I don't have to doublecheck every reference of a junior attorney, because they actually know what they are doing, and when they don't, it's easy to tell and wont come with fraudulently created decisions/pleadings, etc


> Do you think lawyers who leverage ChatGPT will take the made up cases and present them to a judge without doing some additional research?

Oh dear.


Guessing from the last sentence that you are one of those "most" who "can tolerate larger than expected inaccuracies".

How much inaccuraciy would that be ?


Where did you get the 100% number from? It's not in the original comment, it's not in a lot of similar criticisms of the models.


Honestly I agree. Humans make errors all the time. Perfection is not necessary and requiring perfection blocks deployment of systems that represent a substantial improvement over the status quo despite their imperfections.

The problem is a matter of degree. These models are substantially less reliable than humans and far below the threshold of acceptability in most tasks.

Also, it seems to me that AI can and will surpass the reliability of humans by a lot. Probably not by simply scaling up further or by clever prompting, although those will help, but by new architectures and training techniques. Gemini represents no progress in that direction as far as I can see.


There's a huge difference between demonstrating something with fuzzy accuracy and playing something off as if it's giving good, correct answers. An honest way to handle that would be to highlight where the bot got it wrong instead of running with the answer as if it was right.

Deception isn't always outright lying. This video was deceitful in form and content and presentation. Their product can't do what they're implying it can, and it was put together specifically to mislead people into thinking it was comparable in capabilities to gpt-4v and other competitor's tech.

Working for Google AI has to be infuriating. They're doing some of the most cutting edge research with some of the best and brightest minds in the field, but their shitty middle management and marketing people are doing things that undermine their credibility and make them look like untrustworthy fools. They're a year or more behind OpenAI and Anthropic, barely competitive with Meta, and they've spent billions of dollars more than any other two companies, with a trashcan fire for a tech demo.

It remains to be seen whether they can even outperform Mistral 7b or some of the smaller open source models, or if their benchmark numbers are all marketing hype.


If a human expert gave wrong answers as often and as confidently as LLMs, most would consider no longer asking them. Yet people keep coming back to the same LLM despite the wrong answers to ask again in a different way (try that with a human).

This insistence on comparing machines to humans to excuse the machine is as tiring as it is fallacious.


Aside: this is not what impartial means.


To be fair, one could describe the duck as being made of air and vinyl polymer, which in combination are less dense than water. That's not how humans would normally describe it, but that's kind of arbitrary; consider how aerogel is often described as being mostly made of air.


Is an aircraft carrier made of a material that is less dense than water?


I think you can safely say that air is a critical component of an aircraft carrier. I suppose the frame of it is not made of air, but the ballasts are designed with air in mind and are certainly made to utilize air. The whole system fails without air, meaning that it requires air to function. It comes down to a definitional argument of the word "made" which is pointless.


I guess it's a purely philosophical question. But no normal person would say "my house is made of air" or "atoms are made of vacuum".


only if you average it out over volume :P


Is an aircraft carrier made of metal and air? Or just metal?


Where’s the distinction between the air that is part of the boat, and the air that is not? If the air is included in the boat, should we all be wearing life vests?


If I take all of the air out of a toy duck, it is still a toy duck. If I take all of the vinyl/rubber out of a toy duck, it is just the atmosphere remaining


The material of the duck is not air. It's not sealed. It would still be a duck in a vacuum and it would still float on a liquid the density of water too.


Well this seems like a huge nitpick. If a person said that, you would afford them some leeway, maybe they meant the whole duck, which includes the hollow part in the middle.

As an example, when most people say a balloon's lighter than air, they mean an inflated balloon with hot air or helium, but you catch their meaning and don't rush to correct them.


The model specifically said that the material is less dense than water. If you said that the material of a balloon is less dense than air, very few people would interpret that as a correct statement, and it could be misleading to people who don't know better.

Also, lighter-than-air balloons are intentionally filled with helium and sealed; rubber ducks are not sealed and contain air only incidentally. A balloon in a vacuum would still contain helium (if strong enough) but would not rise, while a rubber duck in a vacuum would not contain air but would still easily float on a liquid of similar density to water.

The reason why it seems like a nitpick is that this is such an inconsequential thing. Yeah, it's a false statement but it doesn't really matter in this case, nobody is relying on this answer for anything important. But the point is, in cases where it does matter these models cannot be trusted. A human would realize when the context is serious and requires accuracy; these models don't.


I’m not an expert but I suspect that this aspect of lack of correctness in these models might be fundamental to how they work.

I suppose there’s two possible solutions: one is a new training or inference architecture that somehow understand “facts”. I’m not an expert so I’m not sure how that would work, but from what I understand about how a model generates text, “truth” can’t really be a element in the training or inference that affects the output.

the second would be a technology built on top of the inference to check correctness, some sort of complex RAG. Again not sure how that would work in a real world way.

I say it might be fundamental to how the model works because as someone pointed out below, the meaning of the word “material” could be interpreted as the air inside the duck. The model’s answer was correct in a human sort of way, or to be more specific in a way that is consistent with how a model actually produces an answer- it outputs in the context of the input. If you asked it if PVC is heavier than water it would answer correctly.

Because language itself is inherently ambiguous and the model doesn’t actually understand anything about the world, it might turn out that there’s no universal way for a model to know what’s true or not.

I could also see a version of a model that is “locked down” but can verify the correctness of its statements, but in a way that limits its capabilities.


> this aspect of lack of correctness in these models might be fundamental to how they work.

Is there some sense in which this isn't obvious to the point of triviality? I keep getting confused because other people seem to keep being surprised that LLMs don't have correctness as a property. Even the most cursory understanding of what they're doing understands that it is, fundamentally, predicting words from other words. I am also capable of predicting words from other words, so I can guess how well that works. It doesn't seem to include correctness even as a concept.

Right? I am actually genuinely confused by this. How is that people think it could be correct in a systematic way?


I think very few people on this forum believe LLMs are correct in a systematic way, but a lot of people seem to think there's something more than predicting words from other words.

Modern machine learning models contain a lot of inscrutable inner layers, with far too many billions of parameters for any human to comprehend, so we can only speculate about what's going on. A lot of people think that, in order to be so good at generating text, there must be a bunch of understanding of the world in those inner layers.

If a model can write convincingly about a soccer game, producing output that's consistent with the rules, the normal flow of the game and the passage of time - to a lot of people, that implies the inner layers 'understand' soccer.

And anyone who noodled around with the text prediction models of a few decades ago, like Markov chains, Bayesian text processing, sentiment detection and things like that can see that LLMs are massively, massively better than the output from the traditional ways of predicting the next word.


> Is there some sense in which this isn't obvious to the point of triviality?

This is maybe a pedantic "yes", but is also extremely relevant to the outstanding performance we see in tasks like programming. The issue is primarily the size of the correct output space (that is, the output space we are trying to model) and how that relates to the number of parameters. Basically, there is a fixed upper bound on the amount of complexity that can be encoded by a given number of parameters (obvious in principle, but we're starting to get some theory about how this works). Simple systems or rather systems with simple rules may be below that upper bound, and correctness is achievable. For more complex systems (relative to parameters) it will still learn an approximation, but error is guaranteed.

I am speculating now, but I seriously suspect the size of the space of not only one or more human language but also every fact that we would want to encode into one of these models is far too big a space for correctness to ever be possible without RAG. At least without some massive pooling of compute, which long term may not be out of the question but likely never intended for individual use.

If you're interested, I highly recommend checking out some of the recent work around monosemanticity for what fleshing out the relationship between model-size and complexity looks like in the near term.


Just to play devil’s advocate: we can train neural networks to model some functions exactly, given sufficient parameters. For example simple functions like ax^2 + bx + c.

The issue is that “correctness” isn’t a differentiable concept. So there’s no gradient to descend. In general, there’s no way to say that a sentence is more or less correct. Some things are just wrong. If I say that human blood is orange that’s not more incorrect than saying it’s purple.


Because it is assumed that it can think or/and reason. In this case, knowing the concepts of density, the density of a material, detecting the material from an image, detecting what object this image is. And, most importantly, knowing that this object is not solid. Because then it could not float.


Maybe you simplify a bit what "guessing words from other words" means. HOW do you guess this, is what's mysterious to many: you can guess words from other words due to habit of language, a model of mind of how other people expect you to predict, a feedback loop helping you do it better over time if you see people are "meh" at your bad predictions, etc.

So if the chatbot is used to talking, knows what you'd expect, and listens to your feedback, why wouldn't it also want to tell the truth like you would instinctively, even best effort only ?

Sadly, the chatbots doesn't yet really care about the game it's playing, it doesn't want to make it interesting, it's just like a slave producing minimal low-effort outputs. I've talked to people exploited for money in dark places, and when they "seduce" you, they talk like a chatbot: most of it is lie, it just has to convince you a little bit to go their way, they pretend to understand or care about what you say, but end of the day, the goal is for you to pay. Like the chatbot.


Yeah. I think there's some ambiguity around the meaning of reasoning- because it is a kind of reasoning to say a Duck's material is less dense than water. In a way it's reasoned that out, and it might actually say something about the way a lot of human reasoning works.... (especially if you've ever listened to certain people talk out loud and say to yourself... huh?)


Bing chat uses gpt-4 and sites sources from it's retrieval.


I think this problem needs to be solved at a higher level, and in fact Bard is doing exactly that. The model itself generates its output, and then higher-level systems can fact check it. I've heard promising things about feeding back answers to the model itself to check for consistency and stuff, but that should be a higher level function (and seems important to avoid infinite recursion or massive complexity stemming from the self-check functionality).


I'm not a fan of current approaches here. "Chain of thought" or other approaches where the model does all its thinking using a literal internal monologue in text seem like a dead end. Humans do most of their thinking non-verbally and we need to figure out how to get these models to think non-verbally too. Unfortunately it seems that Gemini represents no progress in this direction.


> "Chain of thought" or other approaches where the model does all its thinking using a literal internal monologue in text seem like a dead end. Humans do most of their thinking non-verbally and we need to figure out how to get these models to think non-verbally too.

Insofar as we can say that models think at all between the input and the stream of tokens output, they do it nonverbally. Forcing the structure of reduce some of it to verbal form short of the actual response-of-concern does not change that, just as the fact that humans reduce some of their thought to verbal form to work through problems doesn't change that human thought is mostly nonverbal.

(And if you don't consider what goes on between input and output thought, than chain of thought doesn't force all LLM thought to be verbal, because only the part that comes out in words is "thought" to start with in that case -- you are then saying that the basic architecture, not chain of thought prompting, forces all thought to be verbal.)


You're right, the models do think non-verbally. However, crucially, they can only do so for a fixed amount of time for each output token. What's needed is a way for them to think non-verbally continuously, and decide for themselves when they've done enough thinking to output the next token.


Is it clear that humans can think nonverbally (including internal monologue) continuously? As in, for difficult reasoning tasks, do humans benefit a lot from extra time if they are not allowed internal monologue. Genuine question


The point of “verbalizing” the chain of thought isn’t that it’s the most effective method. And frankly I don’t think it matters that humans think non verbally. The goal isn’t to create a human in a box. Verbalizing the chain of thought allows us to audit the thought process, and also create further labels for training.


No, the point of verbalizing the chain of thought is that it's all we know how to do right now.

> And frankly I don’t think it matters that humans think non verbally

You're right, that's not the reason non-verbal is better, but it is evidence that non-verbal is probably better. I think the reason it's better is that language is extremely lossy and ambiguous, which makes a poor medium for reasoning and precise thinking. It would clearly be better to think without having to translate to language and back all the time.

Imagine you had to solve a complicated multi-step physics problem, but after every step of the solution process your short term memory was wiped and you had to read your entire notes so far as if they were someone else's before you could attempt the next step, like the guy from Memento. That's what I imagine being an LLM using CoT is like.


I mean a lot of problems are amenable to subdivision into parts where the process of each part is not needed for the other parts. It's not even clear that humans usually hold in memory all of process of the previous parts especially the it won't be used later.


> Humans do most of their thinking non-verbally and we need to figure out how to get these models to think non-verbally too.

That's a very interesting point, both technically and philosophically.

Where Gemini is "multi-modal" from training, how close do you think that gets? Do we know enough about neurology to identical a native language in which we think? (not rhetorical questions, I'm really wondering)


Neural networks are only similar to brains on the surface. Their learning process is entirely different and their internal architecture is different as well.

We don’t use neural networks because they’re similar to brains. We use them because they are arbitrary function approximators and we have an efficient algorithm (backprop) coupled with hardware (GPUs) to optimize them quickly.


I, a non-AGI, just ‘hallucinated’ yesterday. I hallucinated that my plan was to take all of Friday off and started wondering why I had scheduled morning meetings. I started canceling them in a rush. In fact, all week I had been planning to take a half day, but somehow my brain replaced the idea of a half day off with a full day off. You could have asked me and I would have been completely sure that I was taking all of friday off.


EDIT: never mind, I missed the exact wording about being "made of a material..." which is definitely false then. Thanks for the correction below.

Preserving the original comment so the replies make sense:

---

I think it's a stretch to say that's false.

In a conversational human context, saying it's made of rubber implies it's a rubber shell with air inside.

It floats because it's rubber [with air] as opposed to being a ceramic figurine or painted metal.

I can imagine most non-physicist humans saying it floats because it's rubber.

By analogy, we talk about houses being "made of wood" when everybody knows they're made of plenty of other materials too. But the context is instead of brick or stone or concrete. It's not false to say a house is made of wood.


> In a conversational human context, saying it's made of rubber implies it's a rubber shell with air inside.

Disagree. It could easily be solid rubber. Also, it's not made of rubber, and the model didn't claim it was made of rubber either, so it's irrelevant.

> It floats because it's rubber [with air] as opposed to being a ceramic figurine or painted metal.

A ceramic figurine or painted metal in the same shape would float too. The claim that it floats because of the density of the material is false. It floats because the shape is hollow.

> It's not false to say a house is made of wood.

It's false to say a house is made of air simply because its shape contains air.


This is what the reply was:

> Oh, it it's squeaking then it's definitely going to float.

> It is a rubber duck.

> It is made of a material that is less dense than water.

Full points for saying if it's squeaking then it's going to float.

Full points for saying it's a rubber duck, with the implication that rubber ducks float.

Even with all that context though, I don't see how "it is made of a material that is less dense than water" scores any points at all.


Yeah, I think arguing the logic behind these responses misses the point, since an LLM doesn't use any kind of logic--it just responds in a pattern that mimics the way people respond. It says "it is made of a material that is less dense than water" because that is a thing that is similar to what the samples in its training corpus have said. It has no way to judge whether it is correct, or even what the concept of "correct" is.

When we're grading the "correctness" of these answers, we're really just judging the average correctness of Google's training data.

Maybe the next step in making LLM's more "correct" is not to give them more training data, but to find a way to remove the bad training data from the set?


I don't see it as a problem with most non-critical uses cases (critical being things like medical diagnoses, controlling heavy machinery or robotics, etc).

LLMs right now are most practical for generating templated text and images, which when paired with an experienced worker, can make them orders of magnitude more productive.

Oh, DALL-E created graphic images with a person with 6 fingers? How long would it have taken a pro graphic artist to come up with all the same detail but with perfect fingers? Nothing there they couldn't fix in a few minutes and then SHIP.


>> Nothing there they couldn't fix in a few minutes and then SHIP.

If by ship, you mean put directly into the public domain then yes.

https://www.goodwinlaw.com/en/insights/publications/2023/08/...

and for more interesting takes: https://www.youtube.com/watch?v=5WXvfeTPujU&


After asserting it's a rubber duck, there are some claims without follow-up:

- Just after that it doesn't translate the "rubber" part

- It states there's no land nearby for it to rest or find food in the middle of the ocean: if it's a rubber duck it doesn't need to rest nor feed. (That's a missed opportunity to mention the infamous "Friendly Floatees spill"[1] in 1992 as some rubber ducks floated to that map position). Although it seems to recognize geographical features of the map, it fails to mention Easter Island is relatively nearby. And if it were recognized as a simple duck — which it described as a bird swimming in the water — it seems oblivious to the fact that the duck might feed itself in the water. It doesn't mention either that the size of the duck seems abnormally big in that map context.

- The concept of friends and foes doesn't apply to a rubber duck either. Btw labeling the duck picture as a friend and the bear picture as a foe seems arbitrary (e.g. a real duck can be very aggressive even with other ducks.)

Among other things, the astronomical riddle seems also flawed to me: it answered "The correct order is Sun, Earth, Saturn".

I'd like for it to state :

- the premises it used, like "Assuming it depicts the Sun, Saturn and the Earth" (there are other stars, other ringed-planets, and the Earth similarity seems debatable)

- the sorting criteria it used (e.g. using another sorting key like the average distance from us "Earth, Sun, Saturn" can be a correct order)

[1] https://en.wikipedia.org/wiki/Friendly_Floatees_spill


I did some reading and it seems that rubber's relative density to water has to do with its manufacturing process. I see a couple of different quotes on the specific gravity of so-called 'natural rubber', and most claim it's lower than water.

Am I missing something?

I asked both Bard (Gemini at this point I think?) and GPT-4 why ducks float, and they both seemed accurate: they talked about the density of the material plus the increased buoyancy from air pockets and went into depth on the principles behind buoyancy. When pressed they went into the fact that "rubber"'s density varies by the process and what it was adulterated with, and if it was foamed.

I think this was a matter of the video being a brief summary rather than a falsehood. But please do point out if I'm wrong on the rubber bit, I'm genuinely interested.

I agree that hallucinations are the biggest problems with LLMs, I'm just seeing them get less commonplace and clumsy. Though, to your point, that can make them harder to detect!


Someone on Twitter was also skeptical that the material is more dense than water. I happened to have a rubber duck handy so I cut a sample of material and put it in water. It sinks to the bottom.

Of course the ultimate skeptic would say one test doesn't prove that all rubber ducks are the same. I'm sure someone at some point in history has made a rubber duck out of material that is less dense than water. But I invite you to try it yourself and I expect you will see the same result unless your rubber duck is quite atypical.

Yes, the models will frequently give accurate answers if you ask them this question. That's kind of the point. Despite knowing that they know the answer, you still can't trust them to be correct.


Ah good show :). I was rather preoccupied with the question but didn't have one handy. Well, I do, but my kid would roast me slowly over coals if I so much as smudged it. Ah the joy of the Internet, I did not predict this morning that I would end the day preoccupied with the question of rubber duck density!

I guess for me the question of whether or not the model is lying or hallucinating is if it's correctly summarizing its source material. I find very conflicting materials on the density of rubber, and most of the sources that Google surfaces claim a lower density than water. So it makes sense to me that the model would make the inference.

I'm splitting hairs though, I largely agree with your comment above and above that.

To illustrate my agreement: I like testing AIs with this kind of thing... a few months ago I asked GPT for advice as to how to restart my gas powered water heater. It told me the first step was to make sure the gas was off, then to light the pilot light. I then asked it how the pilot light was supposed to stay lit with the gas off and it backpedaled. My imagining here is that because so many instructional materials about gas powered devices emphasize to start by turning off the gas, that weighted it as the first instruction.

Interesting, the above shows progress though. I realized I asked GPT 3.5 back then, I just re-asked 3.5 and then asked 4 for the first time. 3.5 was still wrong. 4 told me to initially turn off the gas to disappate it, then to ensure gas was flowing to the pilot before sparking it.

But that said I am quite familiar with the AI being confidently wrong, so your point is taken, I only really responded because I was wondering if I was misunderstanding something quite fundamental about the question of density.


That's a tricky one though since the question is, is the air inside of the rubber duck part of the material that makes it? If you removed the air it definitely wouldn't look the same or be considered a rubber duck. I gave it to the bot since when taking ALL the material that makes it a rubber duck, it is less dense than water.


A rubber duck in a vacuum is still a rubber duck and it still floats (though water would evaporate too quickly in a vacuum, it could float on something else of the same density).


A rubber duck with a vacuum inside (removing the air material) of it is just a piece of rubber with eyes. Assuming OP's point about the rubber not being less dense than water, it would sink, no?


No. Air is less dense than water; vacuum is even less dense than air. A rubber duck will collapse if you seal it and try to pull a vacuum inside with air outside, but if the rubber duck is in a vacuum then it will have only vacuum inside and it will still float on a liquid the density of water. If you made a duck out of a metal shell you could pull a vacuum inside, like a thermos bottle, and it would float too.


The metal shell is ridged though, so the volume maintains the same with the vacuum. A rubber duck collapses with a vacuum inside of it, thus losing the shape of a duck and reducing the volume of the object =). That's why I said it's just a piece of rubber with eyes.


> A rubber duck collapses with a vacuum inside of it

Not if there is vacuum outside too. In a vacuum it remains a duck and still floats.


If you hold a rubber duck under water and squeeze out the air, it will fill with water and still be a rubber duck. If you send a rubber duck into space, it will become almost completely empty but still be a rubber duck. Therefore, the liquid used to fill the empty space inside it is not part of the duck.

I mean apply this logic to a boat, right? Is the entire atmosphere part of the boat? Are we all on this boat as well? Is it a cruise boat? If so, where is my drink?


Agree, then the question becomes how will this issue play out?

Maybe AI correctness will be similar to automobile safety. It didn’t take long for both to be recognized as fundamental issues with new transformative technologies.

In both cases there seems to be no silver bullet. Mitigations and precautions will continue to evolve, with varying degrees of effectiveness. Public opinion and legislation will play some role.

Tragically accidents will happen and there will be a cost to pay, which so far has been much higher and more grave for transportation.


Devil's advocate. It is made of a material less dense than water. Air.

It certainly isn't how I would phrase it, and I wouldn't count air as what something is made of, but...

Soda pop is chocked full of air, it's part of it! And I'd say carbon dioxide is a part of the recipe, of pop.

So it's a confusing world for a young LLM.

(I realise it may have referenced rubber prior, but it may have meant air... again, Devil's advocate)


When you make carbonated soda you put carbon dioxide in deliberately and use a sealed container to hold it in. When you make a rubber duck you don't put air in it deliberately and it is not sealed. Carbonated soda ceases to be carbonated when you remove the air. A rubber duck in a vacuum is still a rubber duck and it even still floats.


If the rubber duck has air inside, it is known, and intentional, for it is part of that design.

If you remove the air from the duck, and stop it so it won't refill, you have a flat rubber duck, which is useless for its design.

Much as flat pop is useless for its design.

And this nuance is even more nuance-ish than this devil's advocate post.


A rubber duck in a vacuum (not a duck in atmosphere with a vacuum only inside) would not go flat or pop. It would remain entirely normal, as useful as it ever was, and it would still float on a liquid the density of water. Removing the air would have no effect on the duck whatsoever. It's not part of the material of the duck in any reasonable interpretation.

But pedantic correctness isn't even what matters here. The model made a statement where the straightforward interpretation is false and misleading. A person who didn't know better would be misled. Whether you can possibly come up with a tortured alternative interpretation that is technically not incorrect is irrelevant.


There's nothing wrong with what you're saying, but what do you suggest? Factuality is an area of active research, and Deepmind goes into some detail in their technical paper.

The models are too useful to say, "don't use them at all." Hopefully people will heed the warnings of how they can hallucinate, but further than that I'm not sure what more you can expect.


The problem is not with the model, but with its portrayal in the marketing materials. It's not even the fact that it lied, which is actually realistic. The problem is the lie was not called out as such. A better demo would have had the user note the issue and give the model the opportunity to correct itself.


But you yourself said that it was so convincing that the people doing the demo didn't recognize it as false, so how would they know to call it out as such?

I suppose they could've deliberately found a hallucination and showcased it in the demo. In which case, pretty much every company's promo material is guilty of not showcasing negative aspects of their product. It's nothing new or unique to this case.


They should have looked more carefully, clearly. Especially since they were criticized for the exact same thing in their last launch.


The duck is indeed made of a material that is less dense. Namely water and air.

If you go to such technical routes your definition is wrong too. It doesn't float because it contains air. If you poke in the head of the duck it will sink. Even though at all times it contains air.


The duck is made of water and air? Which duck are we talking about here.


Is it possible for humans to be wrong about something, without lying?


I don't agree with the argument that "if a human can fail in this way, we should overlook this failing in our tooling as well." Because of course that's what LLMs are, tools, like any other piece of software.

If a tool is broken, you seek to fix it. You don't just say "ah yeah it's a broken tool, but it's better than nothing!"

All these LLM releases are amazing pieces of technology and the progress lately is incredible. But don't rag on people critiquing it, how else will it get better? Certainly not by accepting its failings and overlooking them.


“Broken” is word used by pedants. A broken tool doesn’t work. This works, most of the time.

Is a drug “broken” because it only cures a disease 80% of the time?

The framing most critics seem to have is “it must be perfect”.

It’s ok though, their negativity just means they’ll miss out on using a transformative technology. No skin off the rest of us.


I think the comparison to humans is just totally useless. It isn’t even just that, as a tool, it should be better than humans at the thing it does, necessarily. My monitor is on an arm, the arm is pretty bad at positioning things compared to all the different positions my human arms could provide. But it is good enough, and it does it tirelessly. A tool is fit for a purpose or not, the relative performance compared to humans is basically irrelevant.

I think the folks making these tools tend to oversell their capabilities because they want us to imagine the applications we can come up with for them. They aren’t selling the tool, they are selling the ability to make tools based on their platform, which means they need to be speculative about the types of things their platform might enable.


If a broken tool is useful, do you not use it because it is broken ?

Overpowered LLMs like GPT-4 are both broken (according to how you are defining it) and useful -- they're just not the idealized version of the tool.


Maybe not if its the case that your use of the broken tool would result in the eventual undoing of your work. Like, lets say your staple gun is defective and doesn't shoot the staples deep enough, but it still shoots. You can keep using the gun, but it's not going to actually do its job. It seems useful and functional, but it isn't and its liable to create a much bigger mess.


So to continue the analogy, if the staple gun is broken and it requires you to do more than a working (but non-existent) staple gun BUT less work than doing the affixment without the broken staple gun, you would or would not use it ?


But nobody said they wouldn't use it. You said that. You came up with this idea and then demanded other people defend it.

I don't know why "critiquing the tool" is being equated to "refusing to use the tool."

I don't like calling something a strawman, because I think it's an overused argument, but...I mean...


I didn't come up with it nor ask anyone to defend it. I asked a different question about usefulness, and about what it means to him for something to be "broken".

My point is that the attempt to critique it was a failure. It provided no critique.

It was incomplete at the very least -- it assigned it the label of broken, but didn't explain the implications of that. It didn't define at what level of failure it would need to be to valuable.

Additionally, I didn't indicate whether or not he would refuse to use it -- specifically because I didn't know, because he didn't say.

We all use broken tools built on a fragile foundation of imperfect precision.


I think you are missing the point. If I do use it, then my result will be a broken and defective product. How exactly is that not clear? That's the point. It might not be observable to be, but whatever I'm affixing with the staple gun will come loose because its not working right and not sinking the staples in deep enough...

If I don't use it, then the tool is not used and provided no benefit...


It's not clear because it is false and I believe I can produce a proof if you are willing to validate that you accept my premise.

Your CPU, right now, has known defects. It will produce the wrong outputs for some inputs. It seems to meet your definition of broken.

Do you agree with that premise ?


One has nothing to do with the other. There's no rule about all broken tools because they can be broken in different ways. What's so difficult about my hypothetical? I laid it all out for you.


I assumed you understood we reached the end of the usefulness of your hypothetical to the original analogy since, as you said, the tools can be broken in different ways. I tried to introduce a scenario that was more applicable and less theoretical so that we could discuss those particular points.

If we do somehow try to apply your analogy, it would indicate that the LLM output is flawed in a way we cannot scrutinize -- the hidden failures that we aren't detecting (why? It's not specified, I am assuming because we didn't check to see if the tool was "broken" and not meeting some unspecified quality-level; that is, it's an unknown unknown failure mode).

This doesn't really comport with the LLM scenario, where the output is fully viewed, and the outputs are widely understood (that is it is a known failure mode).

This is more closely related to a computing service -- of which you are an active user of. You are using a "broken" computer right now, according to your definition of broken correct ?


>This doesn't really comport with the LLM scenario, where the output is fully viewed, and the outputs are widely understood (that is it is a known failure mode).

Yes, it does. People are regularly using LLMs to brief themselves on topics they are ignorant about. Are you actually serious?

>This is more closely related to a computing service -- of which you are an active user of. You are using a "broken" computer right now, according to your definition of broken correct ?

Doesn't really matter, because I'm not relying upon the computer for the verity of its functions


I think you're reading a lot into GP's comment that isn't there. I don't see any ragging on people critiquing it. I think it's perfectly compatible to think we should continually improve on these things while also recognizing that things can be useful without being perfect


I don't think people are disputing that things can be useful without being perfect. My point was that when things aren't perfect, they can also lead to defects that would not otherwise be perceived based upon the belief that the tool was otherwise working at least adequately. Would you use a staple gun if you weren't sure it was actually working? If it's something you don't know a lot about, how can you be sure it's working adequately?


Lying implies an intent to deceive despite, or giving a response despite having better knowledge, which I'd argue LLMs can't do, at least not yet. It just requires a more robust theory of mind than I'd consider them to realistically be capable of.

They might have been trained/prompted with misinformation, but then it's the people doing the training/prompting who are lying, still not the LLM.


To the question of whether it could have intent to deceive, going to the dictionary, we find that intent essentially means a plan (and computer software in general could be described as a plan being executed) and deceive essentially means saying something false. Furthermore, its plan is to talk in ways that humans talk, emulating their intelligence, and some intelligent human speech is false. Therefore, I do believe it can lie, and will whenever statistically speaking a human also typically would.

Perhaps some humans never lie, but should the LLM be trained only on that tiny slice of people? It's part of life, even non-human life! Evolution works based on things lying: natural camouflage, for example. Do octopuses and chameleons "lie" when they change color to fake out predators? They have intent to deceive!


Not to say this example was lying but they can lie just fine - https://arxiv.org/abs/2311.07590


They're lying in the same way that a sign that says "free cookies" is lying when there are actually no cookies.

I think this is a different usage of the word, and we're pretty used to making the distinction, but it gets confusing with LLMs.


You are making an imaginary distinction that doesn't exist. It doesn't even make any sense in the context of the paper i linked.

The model consistently and purposefully withheld knowledge it was directly aware of. This is lying under any useful definition of the word. You're veering off into meaningless philosophy that has no bearing on outcomes and results.


Most humans I professionally interact with don't double down on their mistakes when presented with evidence to the contrary.

The ones that do are people I do my best to avoid interacting with.

LLMs act more like the latter, than the former.


Given the misleading presentation by real humans in these "whole teams" that this tweet corrects, this doesn't illustrate any underlying powers by the model


>It's the single biggest problem with LLMs and Gemini isn't solving it.

I loved it when the lawyers got busted for using a hallucinating LLM to write their briefs.


People seem to want to use LLMs to mine knowledge, when really it appears to be a next-gen word-processor.


LLMs do not lie, nor do they tell the truth. They have no goal as they are not agents.


With apologies to Dijkstra, the question of whether LLMs can lie is about as relevant as the question of whether submarines can swim.


I totally agree with you on the confident lies. And it’s really tough. Technically the duck is made out of air and plastic right?

If I pushed the model further on the composition of a rubber duck, and it failed to mention its construction, then it’d be lying.

However there is this disgusting part of language where a statement can be misleading, technically true, not the whole truth, missing caveats etc.

Very challenging problem. Obviously Google decided to mislead the audience and basically cover up the shortcomings. Terrible behaviour.


Calling the air inside the duck (which is not sealed inside) part of its "material" would be misleading. That's not how most people would interpret the statement and I'm confident that's not the explanation for why the statement was made.


The air doesn’t matter. Even with a vacuum inside it would float. It’s the overall density of “the duck” that matters, not the density of the plastic.


A canoe floats, and that doesn't even command any thought regarding whether you can replace trapped air with a vacuum. If you had a giant cube half full of water, with a boat on the water, the boat would float regardless of whether the rest of the cube contained air or vacuum, and regardless of whether the boat traps said air (like a pontoon) or is totally vented (like a canoe). The overall density of the canoe is NOT influenced by its shape or any air, though. The canoe is strictly more dense than water (it will sink if it capsizes) yet in the correct orientation it floats.

What does matter, however, is the overall density of the space that was water and became displaced by the canoe. That space can be populated with dense water, or with a less dense canoe+air (or canoe+vacuum) combination. That's what a rubber duck also does: the duck+air (or duck+vacuum) combination is less dense than the displaced water.


No, the density of the object is less than water, not the density of the material. The Duck is made of plastic, and it traps air. Similarly, you can make a boat that floats in water out of concrete or metal. It is an important distinction when trying to understand buoyancy.


It also says the attribute of squeaking means it'll definitely float


That's actually pretty clever because if it squeaks, there is air inside. How many squeaking ducks have you come across that don't float?


You could call it clever or you could call it a spurious correlation.


language models do not lie. (this pedantic distinction being important, because language models.)


Good, that video was mostly annoying and creepy. The AI responses as shown in the linked Google dev blogpost are a lot more reasonable and helpful. BTW I agree that the way the original video was made seems quite misleading in retrospect. But that's also par for the course for AI "demos", it's an enduring tradition in that field and part of its history. You really have to look at production systems and ignore "demos" and pointless proofs of concept.


The GPT-4 demo early this year when it was released was a lot less.. fake, and in fact very much indicative of it's feature set. The same is true for what OpenAI showed during their dev days, so at the very least those demos don't have too much fakery going on, as far as I could tell.


A certain minimum level of jank always makes demos more believable. Watching Brockman wade through Discord during the napkin-to-website demo immediately made the whole thing convincing.

AI is in the "hold it together with hope and duct tape" phase, and marketing videos claiming otherwise are easy to spot and debunk.


>You really have to look at production systems and ignore "demos" and pointless proofs of concept.

While I agree, I wouldn't call proofs or concepts and demos pointless. They often illustrate a goal or target functionality you're working towards. In some cases it's really just a matter of allotting some time and resources to go from a concept to a product, no real engineering is needed, it all exists, but there's capital needed to get there.

Meanwhile some proof of concepts skip steps and show higher level function that needs some serious breakthrough work to get to, maybe multiple steps of that. Even this is useful because it illustrates a vision that may be possible so people can understand and internalize things you're trying to do or the real potential impact of something. That wasn't done here, it was embedded in a side note. That information needs to be before the demo to some degree without throwing a wet blanket on everything and needs to be in the same medium as the demo itself so it's very clear what you're seeing.

I have no problem with any of that. I have a lot of problems when people don't make it explicitly clear beforehand that it's a demo and explain earnestly what's needed. Is it really something that exists today in working systems someone just needs to invest money and wire it up without new research needed? Or is it missing some breakthroughs, how many/what are they, how long have these things been pursued, how many people are working on them... what does recent progress look like and so on (in a nice summarized fashion).

Any demo/poc should come up front with an earnest general feasibility assessment. When a breakthrough or two are needed then that should skyrocket. If it's just a lot of expensive engineering then that's also a challenge but tractable.

I've given a lot of scientific tech demonstrations over the years and the businesses behind me obviously want me to be as vague as possible to pull money in. I of course have some of those same incentives (I need to eat and pay my mortgage like everyone else). None-the-less the draw of science to me has always been pulling the veil from deception and mystery and I'm a firm believer in being as upfront as possible. If you don't lead with disclaimers, imaginations run wild into what can be done today. Adding disclaimers helps imaginations run wild about what can be done tomorrow, which I think is great.


What the Quack? I found it tasty as pâté.


Gemini demo looks like ChatGPT with a video feed, except it doesn't exist, like ChatGPT. I have ChatGPT on my phone right now, and it works (and it can process images, audio, and audio feed in). This means Google has shown nothing of substance. In my world, it's a classic stock price manipulation move.


Gemini Pro is available on Bard now.

Ultra is not yet available.


Yeah and have you tried it? It’s as dogshit as the original Bard.


I've been using Gemini in Bard since the launch, with respect to coding it is outperforming GPT4 in my opinion. There is some convergence in the answers,but Bard is outputting really good code now.


You're right. I just tested it with some code prompts and it did surprisingly well.


That's something! I'll try it too. Thanks <3


The bloomberg article seems to have been taken down and is now going to 404. https://www.bloomberg.com/opinion/articles/2023-12-07/google...


Just an error in the link, here's the corrected version: https://www.bloomberg.com/opinion/articles/2023-12-07/google...


and here's a readable version: https://archive.ph/ABhZi


I watched this video, impressed, and thought: what if it’s fake. But then dismissed the thought because it would come out and the damage wouldn’t be worth it. I was wrong.


The worst part is that there won't be any damage. They'll release a blog post with PR apologies, but the publicity they got from this stunt will push up their brand in mainstream AI conversations regardless.

"There's no such thing as bad publicity."


There’s no such thing as bad publicity only applies to people and companies that know how to spin it.

Reading the comments of all these disillusioned developers, it’s already damaged them because now smart people will be extra dubious when Google starts making claims.

They just made it harder for themselves to convince developers to even try their APIs, let alone bet on them.

This was stupid.


That demo was much further on the "marketing" end of the spectrum when compared to some of their other videos from yesterday which even included debug views: https://youtu.be/v5tRc_5-8G4?t=43


You can tell whoever put together that demo video gave no f*cks whatsoever. This is the quality of work you can expect under an uninspiring leader (Sundar) in a culture of constant layoff fear and bureaucracy.

Literally everyone I know who works at Google hates their job and are completely checked out.


Huh? It was a GREAT demo video!

If it had been real, that is.



Thanks. This is the first I'm hearing of a duck demo, and couldn't figure out what it was.



It’s not live, but it’s in the realm of outputs I would expect from a GPT trained on video embeddings.

Implying they’ve solved single token latency, however, is very distasteful.


OP says that Gemini had still images as input, not video - and the dev blog post shows it was instructed to reply to each input in relevant terms. Needless to say, that's quite different from what's implied in the demo, and at least theoretically is already within GPT's abilities.


How do you think the cup demo works? Lots of still images?


A few hand-picked images (search for "cup shuffling"): https://developers.googleblog.com/2023/12/how-its-made-gemin...


Holy crap that demo is misleading. Thanks for the link.


I'll admit I was fooled. I didn't read the description of the video. The most impressive thing they showed was the real-time responses to watching a video. Everything else was about expected.

Very misleading and sad Google would so obviously fake a demo like this. Mentioning in the description that it's edited is not really in the realm of doing enough to make clear the fakery.


i too was excited and duped about the real-time implications. though i'm not surprised at all to find out it's false.

mea cupla i should have looked at the bottom of the description box on youtube where it probably says "this demonstration is based on an actual interaction with an LLM"


I'm surprised it was false. It was made to look realistic and I wouldn't expect Google to fake this kind of thing.

All they've done is completely destroy my trust in anything they present.


Is the link to the article broken and anyone has it archived somewhere?

I wish people stop posting Twitter messages to HN and provide a link directly to the original article. What's next, post on HN on an Instagram post?



I dont get it too, my browser was loading for good 15 seconds and made 141 requests fetching almost 9 MB of resources to show me exactly same content as provided in OpenGraph tags and a freaking redirect to a Bloomberg link. Feels like a slap in the face to open such phishing link at any time, just a useless redirect with nine million bytes of overhead.


How is this not false advertising?


Or worse, fraud to make their stock go up

edit: s/stuck/stock


Why did you have to mention your edit?


It was a late one


I suppose it's not false advertising, since they don't even claim to have a product released yet that can do this, since Trojans Ultra won't be available until an unspecified time next year


It's still false advertising.

This is common in all industries. Take gaming, for example. Game publishers love this kind of publicity, as it creates hype, which leads to sales. There have been numerous examples of this over the years: Watch Dogs, No Man's Sky, Cyberpunk 2077, etc. There's a period of controversy once consumers realize they've been duped, the company releases some fake apology and promises or doubles down, but they still walk out of it richer, and ready to do it again next time.

It's absolutely insidious, and should be heavily fined and regulated.


You're right, it's astroturfing a placeholder in the market in the absence of product. The difference is probably just the target audience - feels like this one is more aimed at share-holders and internal politics.


possibly securities fraud though. Their stock popped a few percent on the back of that faked demo.


It's a software demo. If you ever gave an honest demo, you gave a bad demo. If you ever saw a good and honest demo, you were fooled.


As a programmer, I'd say that all the demos of my code were honest and representative of what my code was doing.

But I recognize we're all different programmers in different circumstances. But at a minimum, I'd like to be honest with my work. My bosses seem to agree with me and I've never been pressured into hosting a fake demo or lie about the features.

In most cases, demos are needed because there's that dogfood problem. Its just not possible for me to know how my (prospective) customers will use my code. So I need to show off what has been coded, my progress, and my intentions for the feature set. In response, the (prospective) customer may walk away, they may have some comments that increases the odds of adoption, or they think its cool and amazing and take it on the spot. We can go back and forth with regards to feature changes or what is possible, but that's how things should work.

------------

I've done a few "I could do it like this" demos, where everyone in the room knew that I didn't finish the code yet and its just me projecting into the future of how code would work and/or how it'd be used. But everyone knew the code wasn't done yet (despite that, I've always delivered on what I've promised).

There is a degree of professional ethics I'd expect from my peers. Hosting honest demos is one of them, especially with technical audience members.


I prefer to let my software be good enough to let it speak for itself without resorting to fraud, thank you ver much.



There was also the cringey "niiice!", "sweeeet!", "that's greaatt", "that's actually pretty good" responses from the narrator in a few of the demo videos that gave them the feel of a cheap 1980's TV ad.


It really reminds me of the Black Mirror episode Smithereens with the tech CEO talking with the shooter. Tech people really struggle with empathy, not just 1 on 1 but with the rest of the outside world which is predominantly low income relatively, with no college education. Paraphrased, Black Mirror ep was like:

[Tech CEO read instructions to "show empathy" from his assistant via Slack]

CEO: I hear you. It must be very hard for you.

Shooter: Of course you fucking hear me, we're on the phone! Talk like a normal person!


I remember that conversation! Man, that was a great episode.


I missed the disclaimer. So, when watching it, I started to think "Wow, so Google is releasing their best stuff".

But then I soon noticed some things that were too smooth, so seemed at best to be cherry-picked interactions occasionally leaning on hand-crafted situation handlers. Or, it turns out, faked.

Regardless of disclaimers, this video seems misleading to be releasing right now, in the context of OpenAI eating Google's lunch.

Everyone is expecting Google to try to show they can do better. This isn't that. This isn't even an mocked-up interaction future of HCI concept video, because it's not showing a vision of what people want to do --- it's only showing a demo of technical capabilities.

It's saying "This is what a contrived tech demo (not application vision concept) could look like, but we can't do it yet, so we faked it. Hopefully, the viewer will get the message that we're competitive with OpenAI."

(This fake demo could just be an isolated oops of a small group, not representative of Google's ability to rise to the current disruption challenge, I don't know.)


I knew immediately this was just overhyped PR when I noticed the author of the blogpost is Sundar.


OK I get that everyone’s hype sensitive and I absolutely remain to be convinced on Gemini’s actual ability

BUT

The fact this wasn’t realtime or with voice is not the issue. Voice to text could absolutely capture this conversation easily. And from what I’ve seen Gemini seems quicker than GPT4

Being able to respond quicker and via voice chat is not actually a big deal.

The underlying performance of the model is what we should be focussing on


I disagree.

One of the major issues in LLMs is the economics; a lot of people suspect ChatGPT loses money on every user, or at least every heavy user, because they've got a big model and A100 GPUs are expensive and in short supply.

They're kinda reluctant to have customers, with API rate limits galore, and I've heard people claiming ChatGPT has lost the performance crown having switched to a cheaper-to-run model.

If google had a model that operated on video in realtime, that would imply they've got a model that performs well, and is also very fast or that their 'TPUs' outperform the A100 quite a bit, either of which would be a big step forward.


Even if you'd be inclined to shrug off the fact that this wasn't real-time voice and video-based, which you shouldn't because the underlying implications would be huge for performance, there's still the matter that the prompts used. The prompts shown are miles apart and significantly misrepresent the performance of the underlying model.

It goes from a model being able to infer a great deal at the level of human intelligence to a model that needs to be fed essential details, and that doesn't do much inferring.

I get the feeling that many here on HN who are just shrugging it off don't realize how much of the “demo” was faked. Here’s a decent article that goes into it some more: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...


Exactly.

Corporate tech demo exaggerates actual capabilities and smoothes over rough edges? Impossible, this is unprecedented!!

The Apple vs Google brand war is so tiresome. Let's focus on the tech.


To be clear - I’m not saying this makes Gemini good. Just that it isn’t bad for these reasons!


While this might just be a bit of bad PR now, it will eventually be a nothing burger. Remember the original debut of Apple's Siri for which Apple also put out a promotional demo with greatly exaggerated functionality? People even sued Apple and they lost.

As much as I hate it, this is absolutely fine by our society's standards. https://www.theregister.com/AMP/2014/02/14/apple_prevails_in...


There's a vast difference between advertising a product, slightly shortening the sequences and, cutting out failed prompts, and completely misrepresenting the product at hand to a degree that the depiction doesn't resemble the product at all[0].

The former is considered Puffery[1] and is completely legal, and the latter is straight up lying.

0: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...

1: https://en.wikipedia.org/wiki/Puffery


At the 5:35 point, the phone screen rotates before he touches it. https://youtu.be/UIZAiXYceBI?t=335


This is what convince me that all are hoax.


It seems like the fake video did the trick, their stock is up 5.5% today.


I don’t understand why is Gemini even considered “jaw-dropping” to begin with. GPT-4V has set the bar so high that all their demos and presentations paled in comparison. And it’s available for anyone to use. People have already build mind-blowing demos with it (like https://arstechnica.com/information-technology/2023/11/ai-po...).

The entire launch felt like a concentrated effort to “appear” competitive to OpenAI. Google was splitting hairs talking about low single digit percentage improvement in benchmarks. Against a model that has been out for over 6 months.

I have never been so unimpressed with them. Not only has OpenAI managed to snag this one from under Google’s nose, IMO - they seem to be defending their lead quite well. Now that is something unmistakably remarkable. Color me impressed!


Some other commenter, a former googler, a while back alluded to figuring out the big secret and being thrown for a tizzy at the resulting cognitive dissonance they realize they’ve been buying into. Its never about making a good product. Its about keeping up with the joneses in the eyes of tech investors. And just look at the movement on the stock today as a result of this probable lemon of a product: nothing else mattered except keeping up appearances. CEOs make historic careers optimizing companies for appearances over function like this.


I get that. But I don’t get why journalists who cover the industry and are expected to get it would call this “jaw dropping”. Maybe I am reading too much into it. It was likely added to increase the shock factor.


I didn’t believed Google presentation off-hand because I don’t care anymore, especially because it comes from them. I just use tools and adapt. Copilot helps me automating boring tasks, can’t help much at new stuff, so I actually discovered I often do “interesting” work. I use gpt 3.5/4 for everything but work, it’s been a bless, best suggestion engine for movies, books and music with just a prompt and without the need of tons of data about my watch history(looking at you youtube). In these strange times I’m actually learning a lot more, productivity is more or less the same as before llms, but annoying tasks are relieved a bit. All of that without the hype. Sometimes I laugh at Google, it must be a real shit show inside that mega corporation, but I kinda understand the need of a marketing editing, having a first class ticket on the AI train is so important for them as it seems they see it as an existential threat. At least it seems so since they decided to take the risk of lying.


Any sufficiently-advanced technology is indistinguishable from a rigged demo.


Fake it til you make it, then keep faking it.


I guess a much better next step is to compare how GPT4V performs when asked similar prompts. Even if mostly staged this is very impressive to me, not much on the current tech but more on how much leverage Google has to win this race on the long run because of its hardware presence.

The more these models improve the more we will want less friction and faster interactions, this means that in the long term having to open an app and ask a question is not gonna fly compared to just pointing your phone camera to something, asking a question and getting an answer that's tailored to everything Google knows about you in real time.

Apple will most likely also roll their own in house solution for Siri instead of relying on an external company. This leaves OpenAI and the other small companies not just competing for the best models but also on how to put them in front of people in the first place and how to get access to their personal information.


> Even if mostly staged this is very impressive to me, not much on the current tech but more on how much leverage Google has to win this race on the long run because of its hardware presence.

I think you have too much information to form a reasonable opinion on the situation. Google is using editing techniques and specific scripting to try to demonstrate they have a sufficiently powerful general AI. The magnitude of this claim is huge, and the fact that they're faking it should be a likewise enormous scandal.

To sum this up "well I guess they're doing better than XYZ" discounts the absurd context of all this.


This is endemic to public product demos. The thing never works as it does in the video. I'm not excusing it, I'm saying: don't trust public product demos. They are commercials, they exist to sell to you, not to document objectively and accurately, and they will always lie and mislead within the limits of the law.


For more details about how the video was created, see this blog post: https://developers.googleblog.com/2023/12/how-its-made-gemin...


Even a year ago, this advert would have been obvious puffery in advertising.

But right now, all the bits needed to do this already exist (just need to be assembled and -to be fair- given a LOT of polish), so it would be somewhat reasonable to think that someone had actually Put In The Work already.


Just how many lives does Sundar have? Where is the board?


Counting their bonusses?


Well, sometimes I have this "Google Duplex: A.I. Assistant Calls Local Businesses To Make Appointments" feeling.

https://www.youtube.com/watch?v=D5VN56jQMWM


The hype really is drowning out the simple fact that basically no one really knows what these models are doing. Why does it matter so much that we include auto-correlation of embedding vectors as the "attention" mechanism in these models? And that we do this sufficiently many times across all the layers? And that we blindly smoosh values together with addition and call it a "skip" connection? Yes, you can tell me a bunch of stuff about gradients and residual information, but tell me why any of this stuff is or isn't a good model of causality.


I didn't even look at the demo. And not due to lack of interest in LLMs (I'm an academic working in NLP, and I have some work on LLMs).

The thing is that for all practical intents and purposes, if I can't try it, it doesn't exist. If they claim it exists, they should show it, a video or a few cherry-picked examples don't prove anything. It's easy to make a demo making even Eliza look like AGI by asking the right questions.


What is clear is that Google really has no 'God model' that they were holding back all along.

Gemini Ultra is barely beating ChatGPT in their manufactured benchmarks, and this is all that they got.

What this means is that those who are saying, including people at Google, that they have better models but are not releasing them in the name of AI safety, imply that, at least in the realm of LLMs, Google DeepMind had nothing all along.


There is a possibility of dataset contamination on the competitive programming benchmark. A nice discussion on the page where AlphaCode2 was solving the problems https://codeforces.com/blog/entry/123035

Problem showed in the video was reused in a recent competition (so could have been available in the dataset).


Wow - my first thought was I wonder what framerate they're sending video at. The whole demo seems significantly less impressive in that case.



Imagine being google and having 100 BILLION+ in liquid cash, tens of thousands of the best engineers world-wide, and everything you could possibly need for running tech products. Yet being completely unable to launch anything new or worthwhile. Like, how tf does that even happen? Is Google the next Kodak?


I really thought this was a realtime demo.

Shame on them :(


Bloomberg link not working; here's TechCrunch: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...


it was obviously marketing material, but if this tweet is right, then it was just blatant false advertising.


Google always does fake advertising. “Unlimited” google drive accounts for example. They just have such a beastly legal team no one is going to challenge them on anything like that.


What was fake about unlimited google drive? There were some people using petabytes.

The eventual removal of that tier and anything even close speaks to Google's general issues with cancelling services, but that doesn't mean it was less real while it existed.


What about when gmail was released and the storage was advertised as increasing forever, but at first they just increased it infinitesimally slower and then stopped increasing it all.


Oh, long before google drive existed?

I don't remember the "increasing forever" ever being particularly fast. I found some results from 2007 and 2012 both saying it was 4 bytes per second, <130MB per year.

So it's true that the number hasn't increased in ten years, but that last increase was +5GB all by itself. They've done a reasonable job of keeping up.

Arguably they should have kept adding a gigabyte each year, based on the intermittent boosts they were giving, but by that metric they're only about 5GB behind.


This must’ve been shot by one of the directors who did Apple’s new show Extrapolations. Very plausible illustration of AI in daily life though, despite the aggressive climate change claims made in it.

But neither AI not climate are there yet…


That show’s most out-there predictions all take place after 2050. Your criticism isn’t relevant.


Well, google has a history for faking things.. so I'm not not surprised. I expected that..

All companies are just yelling that they're "in" the AI/LLM game.. If they don't, share prices will drop.


The demo may be fake, but people genuinely love it. the top comment on youtube of that demo is still:

"Absolutely mindblowing. The amount of understanding the model exhibits here is way way beyond anything else."


I’ll assume Gemini is vaporware unless I can actually use it.


The more Google tries to over-hype stuff the more that keeps giving me a greater impression they are well behind OpenAI. Time to STFU and focus on working on stuff.


Google did the same with Pixel 8 Pro advertising - they showed stuff like photo and video editing, that people couldn't replicate on their phones.


Google is done, they can't compete in this space.


I looked at is as if it were a good aspirational target for 5 years from now. It was obvious the whole video was edited together not real time.


This is just a tweet that makes a claim without backing, and links to an article that was pulled.

Can we change the URL to the real article if it still exists?


the original linked article in the tweet [0] now returns 404 for me

[0] - https://www.bloomberg.com/opinion/articles/2023-12-07/google...


If you've seen the video, it's very apparent it's a product video, not a tech demo. They cut out the latencies to make a compelling product video.

I wasn't at all under the impression they were showcasing TTS or low latencies as product features. I don't find the marketing misleading at all, and find these criticisms don't hit the mark.

https://www.youtube.com/watch?v=UIZAiXYceBI


It's not just cutting. The answers were obtained by taking still photos and inputting them into the model together with detailed text instructions explaining the context and the task to the model, giving some examples first and using careful chain-of-thought style prompting. (see e.g. https://developers.googleblog.com/2023/12/how-its-made-gemin...) My guess is that the video was fully produced after the Gemini outputs were generated by a different team, instead of while or before.


I imagine the model also has some video embeddings like for the example when it needed to find where the ball was hiding.


I was looking at this demo today and was wondering the same. It looked way to fast to be calculating all of that info.


I wanted to see what's really going on, but the bloomberg article on the twit link seems taken down right now.


Google Gemi-lie


Lol, could have done without the cocky narration. "I think we're done here."


The whole launch is cocky. Bleh. Stick to the engineering.


why didn't they do so? what's the challenge? I suppose they could've programmed in some prompt indicator (like "OK Google" but less obvious), then that demo could have been technically feasible.


That's my suspicion when I first saw it. Its really an impressive demo though.


If the truck doesn't have a working engine, we can just roll it down the hill.

Brilliant idea!


Remember when they faked that Google Assistant booking a restaurant thing too.


How was that fake?


It was actually a human doing the entire call.


Mhm


AI: artificial incompetence


It was in fact done by a duck.

I for one, welcome our new quack overlords.


Link to the Bloomberg article from the Tweet is 404 now.


Bloomberg link in Xeet is 404 for me (Bangalore).


I can't find the original article anymore