Hacker News new | past | comments | ask | show | jobs | submit login
Bard is now Gemini, and we’re rolling out a mobile app and Gemini Advanced (blog.google)
668 points by chamoda 11 months ago | hide | past | favorite | 1098 comments



I'm surprised they got rid of the Bard name. It struck me as a really smart choice since a Bard is someone who said things, and it's an old/archaic enough word to not already be in a zillion other names.

Gemini, on the other hand, doesn't strike me as particularly relevant (except that perhaps it's a twin of ChatGPT?), and there are other companies with the same name. EDIT: I can see the advantage of picking a name that, like "Google" also starts with a "G".

Just as one data point, bard.com redirects to some other company (bd.com), whereas Gemini.com is a company by that name.

I'd be curious on the scuttlebutt on how this decision was reached!


I'm not surprised -- I thought Bard was terrible branding. It's all associations with Shakespeare and poetry and medieval England, and as much as I might personally enjoy those, it's extremely backwards-looking, with archaic connotations. Also it sounds close to "beard" -- hairy stuff.

Gemini sounds like the space program -- futuristic, a leap for mankind. It's got all the right emotional associations. It's a constellation, it's out in space, it's made of stars. Plus it contains "gem" which feels fancy, valuable, refined.

I'm not saying Gemini is the best name I've ever heard or even close to it, but it feels 100% appropriate, in a way that Bard does not.


Interesting. I don’t like the name at all because it makes me think of people who take horoscopes seriously. You’re impression seems to be untainted by that which is nice


Same here, I think I'm more on your side which I guess goes to show how all over the map subjective reactions can be.

But first of all, I thought the whole idea of alphabet was a kind of cheeky way of telling the world you had a portfolio of projects, one for each letter, And B is for bard would be perfect, and Gemini is about as incompatible as it gets given that g is claimed.

I also find it bizarre to say that association with Shakespeare, or the association with whimsical poetic expression is in any sense a bad thing. It's a clean, simple, fun name that's remarkably short and surprisingly unclaimed. And I don't even strongly associate it specifically with Shakespeare, that's like a background association as far as I'm concerned.

I think perhaps the real talk here is that Bard was kind of an emergency response to chat GPT, but also people have some pretty specific and distinct experiences with Bard and have an idea of its quality, and Google just needs to turn the page on the perception of Bard.


The name bard is tainted by ridicule.

Besides that, personally I always thought it was a bad fit. It sounds old and outdated to those that do not know what the word means and wrong to those that do: a bard sings songs and maybe does poetry.

A bard does not help or assist you. A bard can be a creative person, but is generally not considered especially wise or knowledgeable. A bard is also always a man, which does not gel very well with modern sensibilities.

I can see why they dropped it.


Agreed. Also wondering what name it'll have next year.


"Bard" both makes lofty claims of being as good as Shakespeare and as nerdy as a being a reference to a D&D character class.

Who knows what Gemini is a reference to, other than mythology and astrology.

I keep waiting for someone to name theirs GePeTto... though perhaps it's funnier hearing the pronunciation of GPT as "Chat Jippity" on YouTube.


> I don’t like the name at all because it makes me think of people who take horoscopes seriously.

spoken like a true Sagittarius


Funny, my mind goes to NASA and the Gemini program.


When I think of Gemini, I think of a digital twin, something which can work with what I have on Google, think like me, or do the things the way I do.

Bard was nice, too, but it was like another thing, separated from me. Gemini sounds more cooperative.


Ugh...such an Aquarian thing to say. /s


Gem-in-eye? Ouch!

Also, Gemini was appropriate for the space program because (a) there were two astronauts in the capsule and (b) because of the constellation, "aiming for the stars" and all that. For the Google project however I can't come up with a plausible explanation - Google doesn't even try to give a reason for the name either.


From The Decoder: >In April 2023, Alphabet announced the merger of its two AI units, Google Brain and Deepmind. The resulting Google Deepmind was to focus on developing large multimodal AI models. It was a big move that showed how much pressure Google was under due to the massive success of ChatGPT. Jeff Dean, head of Google Brain until the merger with Deepmind, became the new merger's chief scientist, with a direct line to Alphabet CEO Sundar Pichai. Dean now explains that the name Gemini, Latin for "twin," is directly related to the merger.

From Jeff Dean's Twitter:

>Gemini is Latin for "twins".

>The Gemini effort came about because we had different teams working on language modeling, and we knew we wanted to start to work together. The twins are the folks in the legacy Brain team (many from the PaLM/PaLM-2 effort) and the legacy DeepMind team (many from the Chinchilla effort) that started to work together on the ambitious multimodal model project we called Gemini, eventually joined by many people from all across Google. Gemini was also was the Nasa project that was the bridge to the moon between the Mercury and Apollo programs.

The Decoder article - https://the-decoder.com/how-googles-gemini-ai-model-got-its-...

Jeff Dean's Twitter Post - https://twitter.com/JeffDean/status/1733580264859926941


It is funny how there is all this elaborate discussion here when it was just corporate officers jerking each other off all along. Occam’s razor.


Bicameral would have been better, IMO


It's more on the nose but probably less right from a marketing perspective.


During Project Gemini, it was pronounced it Gem-in-ee.

https://youtu.be/JeAUx6-vSmc?feature=shared


Which is closer to the proper Latin pronunciation (which would have a hard G although ecclesiastical Latin would have a soft G).


I mean it makes sense to me. The AI is your digital assistant. It's a relationship between two minds, man and machine.


I see that angle, but those two things are complementary, not identical. It's not a clone of me — it's something that I ask questions of because I don't know the answer. If it were pitched as a bot that would draft email responses for me automatically, then maybe I would see it fitting better as my 'twin'.


> it's something that I ask questions of because I don't know the answer

I think you're reading too much into what a twin is. It's not a copy! Real-life twins ask each other questions all the time, because just because one of them learns something doesn't mean the other one automatically learns it too via mind-meld.


I'm not saying all twins are identical. But they are all of the same species. What I want in an assistant is that it is very different from me. It has perfect memory and knows lots of things that I don't know.


You're using an overly strict definition of the word and over-interpreting it to boot. Consider this definition:

> something containing or consisting of two matching or corresponding parts.


It sounds like you're thinking of the adjective form of the word, which is why you are thinking of a much broader definition. I was using the noun form, since Gemini are noun twins.

If we were talking about the word "twins" in the abstract, the broader definition might make sense. But we aren't — we're talking about Gemini. If that conjures up general notions of "matched-ness" for you, that's great. When I think about Gemini, I think about mythological twins. I don't think about corresponding parts that complement each other.

For a product name to be successful, it should appeal to a wide range of people. If I'm way out in left field on this one, perhaps they've found a great name. But I would point out that my comment, which is critical of the name change, is the very first comment in the entire thread. I would take that as evidence that most people don't see the name and think "oh, it's like complementary items, like my brain and the AI".

To be clear, I was shocked to see this comment above all substantive discussion of the new release. I would have thought it would have been buried under examples of ways in which Ultra is better/worse than some other LLM.


> For a product name to be successful, it should appeal to a wide range of people.

Honestly, Google is called "Google". ChatGPT is called "ChatGPT". Maybe it'll be a joke, maybe people won't think about what they're calling it after 30 seconds.

This conversation is taking itself a bit too seriously for what's drifting into Pepsi logo gravitational pull territory, though.

> But I would point out that my comment, which is critical of the name change, is the very first comment in the entire thread.

Sure. Everyone has an opinion on what color the bike shed should be, too.


> Gemini sounds like the space program -- futuristic, a leap for mankind

80 years ago, sure.


> Gemini sounds like the space program

LOL, maybe I'm a bit jaded but Gemini sounds like some kind of cryptocurrency scam.


With good reason! According to another commenter, when you search for the Gemini app, you get crypto apps before the Google one: https://news.ycombinator.com/item?id=39316306


I hear echoes of astrology, personally, and not good ones.


When I read new thread responses, I briefly thought that I wrote[1] your reply and was confused lol. Great minds think alike. I feel vindicated about my weird opinion.

[1] https://news.ycombinator.com/item?id=39306764


Ha, that's too funny -- I missed yours somehow or else I would have commented underneath it.


This. It creeps women out. It's difficult enough getting them to use an AI tool.

My wife commented on this when she saw it in Google News. Something about some dude in a medieval pub.


Bard doesn't creep women out. That's an extraordinary assumption you're making based on one anecdote. Not everything needs to be needlessly politicized.

The most popular TTRPG has Bard as a class and they (like every other class) is as gender-neutral as you can get.


> It creeps women out. It's difficult enough getting them to use an AI tool.

Anecdotally, my wife routinely uses chatGPT for workout and recipe ideas.

She may be a bit of an outlier though, compared to others - she also runs GNU/Linux! :)


The alliteration beauty of Google Gemini cannot be denied.


I think alliteration applies to pronunciation, not orthography. For example, "ceaseless sun" is an alliteration even though it is spelled with both C and S. I wonder if there is a word for the orthographic counterpart, which you describe here (and which I note in another comment, as the benefit of both starting with G).


Only if you pronounce it with a hard G, like in GIF.

Or have I been mispronouncing ‘Google’ all this time?


We all say J-oogle, but only when you're not around.


Great, now its not fun anymore.


jif?


Gemini as a zodiac sign: "Smart, passionate, and dynamic, Gemini is characterized by the Twins, Castor and Pollux, and is known for having two different sides they can display to the world. Expert communicators, Gemini is the chameleon of the Zodiac"

Which is pretty on the nose for an AI project. A chameleon with two different sides (good/evil?) and expert communicator


Why not Delphi? Isn't that more like what they are trying to create, an Oracle? And it's like HAL, one step ahead of IBM. Is Delphi just too on-the-nose? Yes, it is also a programming environment, I have many fond memories of Borland Delphi.


> Gemini sounds like the space program

That isn't at all the association I have with that word. I think of the astrological sign instead, so to me the association is pseudoscience and a hint of being bipolar.

This sort of thing is part of what makes naming things difficult. You can't count on any name having the same connotations to everyone.


even though i liked Bard, it is only one (extra) letter away from being the word "bad". "Bard" is cooler imo but "Gemini" starts with "G", has "gem" (a rare, valuable thing) in it, and sounds pretty. Personally i don't care at all either way though.


It's a hint about how to use it if you want to circumvent the censorship.


Though I agree bard isn't the best name, gemini is a worse name.

Bard has connotations, but by it's self it makes one think of a person that talks well.

WTF is gemini? It is a twin, but this is a singular product. Beyond that it's just generic. It doesn't tell you what it does at all.


Not with regular people. Bard is just "old" to them.

Think of the syllables. Copilot. Gemini. They need to be close to the market leader on a subconscious level.


When the competition is called "ChatGPT", maybe the naming isn't as important to normal people as we think?


Microsoft is the competition. ChatGPT has been embraced and extended.


I read Gemini as twins, with the user and the AI making a pair.

Works for me.


Big idea but maybe they should have just named it Google


They kind of didn't name their OK Google assistant anything other than "Assistant". What about ChatGooglePT?


I'm sure "Bard" was primarily a Shakespeare reference (The Bard of Avon, frequently just The Bard), and I liked it too. An appropriate name for a technology that's all about language.

Gemini sounds cool and sci-fi though, and maybe it's a bit easier to localize since it's just straight Latin.


To me, bard just sounds phonetically gross. Reminds me of “fart” or “beard.” It calls to mind medieval stuff: the Monte Python mud scene, Skyrim’s most annoying NPCs, plucking lutes. But Gemini? That sounds like a legendary space mission; this collective engineering push against the boundaries of human knowledge.

I do not have refined tastes. My b.


When I hear "bard", I think of this guy from the Asterix comics first: https://asterix.com/en/portfolio/cacofonix/ - who is notorious for getting on everyone's nerves with his constant singing.

> We are not talking here about the rain he brings on each time exercises his vocal cords, but rather about the prevailing atmosphere in the village: when it is time to party, when wild boar are roasting on the spit, you can be sure to find Cacofonix tied hand and feet with a gag in his mouth.


I remember when the iPad was announced, and everyone said that people would only ever think of feminine products when they heard the name. It might have been true for a few months, but now it seems quaint that we ever had such concerns.


Still, what was wrong with iSlate?


Bard is really funny to me to make fun of. It feels like the discount version of ChatGPT. Like the way that (ironically) TV shows would get microsoft sponsoring and the characters would say "oh you should Bing that", a phrase no human would normally say, and I like to be "ah let me see what Bard thinks about this".


> To me, bard just sounds phonetically gross. Reminds me of “fart” or “beard.”

WTF? Do people normally think about words in this way, utterly divorced from their meaning?


Yes, people that can hear similarities between words do that.


Understand that this is not condesending in any way, as I do not have this experience.

If there are these "Feelings" around these words, how is any sentence correctly taken at face value. How does one communicate to these people the direct and correct meaning of the terms used.

For example, sentence sounds like seance, do they feel like i'm asking the spirits of the dead ?

Correct sounds like wrecked, do they assume that everything is broken in the above sentence.

Is communication with fraught with unknown minefields of unintended emotions and misundestandings ?


Not at all, these "unintended" emotions can be ignored for the most part. But if you ask me, then google is in my foreigner ear one of the stupidest brand name I know of due to its phonetical ressemblance with some words from my native tongue.


> [Gemini] sounds like a legendary space mission

Well, it is one. I wish they'd choose a slightly more unique name but camping on well-known words is a beloved tech tradition.


Trying saying it non-rhotically, like a British television presenter


Sounds like "bot", which is good from a topical perspective, but bad from a false-positive perspective.


If you really give it some gusto ("baaaaaaahuhhd") nobody will confuse them :-)


That sounds closer to a working class Massachusetts pronunciation.


Yes, just in time for a Super Bowl commercial: Smaht Bahd


Barti the only bard to me


Bards were the people who kept history and genealogy before written history. Think like Homer rather than Shakespeare. I think the name was meant more to evoke the idea that the AI is a repository of all linguistic knowledge in the same way that the bard was. And maybe also the idea that the AI was at your service in the same way the bard was at the service of the ruler.


It's not a bad name, but personally when I first heard the name Bard I chuckled because LLMs had already come under so much criticism for their tendency to embellish the truth or say stuff that is just straight up false but sounds cool.


I thought “Bard” was an Asimov reference: https://en.m.wikipedia.org/wiki/Someday_(short_story)

(on top of the more obvious references)


It's too close a match for it not to be

> The story concerns [...] an old Bard, a child's computer whose sole function is to generate random fairy tales. The boys download a book about computers into the Bard's memory in an attempt to expand its vocabulary, but the Bard simply incorporates computers into its standard fairy tale repertoire.


Well the ending sure sounds like an LLM getting stuck:

> "the little computer knew then that computers would always grow wiser and more powerful until someday—someday—someday—…"

https://blog.gdeltproject.org/llm-infinite-loops-failure-mod...


It also rhymes with Card as in Orson Scott Card.


"Gemini" must refer to its inherently multimodal origins?

It's not a text-based LLM that was later adapted to include other modalities. It was designed from the start to seamlessly understand and work with audio, images, video and text simultaneously. Theoretically, this should give it a more integrated and versatile understanding of the world.

The promise is that multimodality baked in from the start, instead of bolting image recognition on to a primarily text-based LLM, should give it superior reasoning and problem-solving capabilities. It should excel at complex reasoning tasks to draw inferences, create plans, and solve problems in areas like math and programming.

I don't know if that promise has been achieved yet.

In my testing so far, Gemini Advanced seems equivalent to ChatGPT 4 in most of my use cases. I tested it on the last few of days worth of programming tasks that I'd solved with ChatGPT 4, and in most cases it returns exactly what I wanted on the first response, compared with the a lengthy back-and-forth required with ChatGPT 4 arrive at the same result.

But when analyzing images Gemini Advanced seems overly sensitive and constantly gives false rejections. For example, I asked it to analyze a Chinese watercolor and ink painting of a pagoda-style building amidst a flurry of cherry blossoms, with figures ascending a set of stairs towards the building. ChatGPT 4 gave a detailed response about its style, history, techniques, similar artists, etc. Gemini refused to answer and deleted the image because it detected people in the image, even though they were very small, viewed from the back, no faces, no detail whatsoever.

In my (limited) testing so far, I'd say Gemini Advanced is better at analyzing recent events than ChatGPT 4 with Bing. This morning I asked each of them to describe the current situation with South Korea possibly acquiring a nuclear deterrent. Gemini's response was very current and cited specific statements by President Yoon Suk-yeol. Even after triggering a Bing search to get the latest facts, the ChatGPT 4 response was muddy and overly general, with empty and obvious sentences like "pursuing a nuclear weapons program would confront significant technical, diplomatic, and strategic challenges".


It seems odd to me that would work better necessarily considering that humans evolved different capabilities many millennia apart and integrated them all with intelligence comparatively late in the evolutionary cycle. So it’s not clear that multimodal from the get go is a better strategy than bolting on extra modalities over time. It could be though since technology is built differently from evolution but interesting to consider


I get all the multi-modality stuff, but what is it about the word "Gemini" that invokes that, to you?


The constellation Gemini gets its name directly from the Greek mythological twins, Castor and Pollux.

Each twin had different capabilities. Pollux was a powerful warrior while Castor was an intellectual tactician.

The twins possessed an extraordinary fraternal bond, each loyal and devoted to protecting the other.

Together, they accomplished what they couldn't do individually. Their combined strengths made them far more effective than either could be alone.

Just as text, images, audio and video convey different knowledge, relationships and reasoning than text by itself, their combined strengths in a single model should be more powerful than any model trained on only one modality.


Oh cool. I didn't know the lore of the Gemini twins had been so developed.


Gemeni, or the twins, is a deeply symbolic name for anyone who knows Greek history. It’s the story of Castor and Pollux, and in many versions of the story one brother killed the other only to beg for them to come back. It’s ominous to use this brand name for AI.

It’s also associated to the Gemini killer and Joseph Testa and Anthony Senter who were famous as the mafia’s Gemini twins hitmen.

I think better brands could have been had.

It does sound like some battlefield AI system from Robotron. “Sir, Gemini is charged and ready for battle.”


For me it's associated with Gemini crypto and their horrible Gemini Earn investments in Genesis: https://www.web3isgoinggreat.com/?id=gemini-genesis-and-dcg-...


The crypto Gemini was named after the Winklevoss twins.


If that was the case it would have been named Winklevii. :P


Gemini was a stepping stone to a moonshot, which is almost certainly why the name was chosen.

Edit: another poster shared the etymology, the merger between Google Brain and DeepMind. I shall eat my words.


Perhaps. Corporate entomologies tend to be very well rehearsed stories, and I’ve been around the valley long enough to know those stories aren’t always the whole story.

I would encourage you to read the Kissinger / Schmidt book before settling your opinion.

That origin story may be true. But it doesn’t make the whole story necessarily.

https://time.com/6113393/eric-schmidt-henry-kissinger-ai-boo...


>corporate entomologies

Now there's a ready-made Far Side concept.


"Corporate entomology" is a lovely term, evoking surreal (and yet strangely familiar) images of cockroaches in suits.

Brilliant!


If you’re going to make a typo, it might as well be about roaches in suits running the valley. Good lord.


I didn't realize it was a typo. I thought it was a deliberate misspelling!


> It’s also associated to the Gemini killer and Joseph Testa and Anthony Senter who were famous as the mafia’s Gemini twins hitmen.

I've never heard of any of these people and I doubt most others have either. Maybe you have to be a true crime enthusiast to know the lore? Whereas if the name were Zodiac, then I would at least be aware there's a potential murderer connection.


Also the Gemini Lounge, where Roy DeMeo and his crew killed and dismembered people targeted for hits by the Gambino family.


The Bard name gave me a warm fuzzy feeling immediately transporting me back to my youth playing (or at least trying to play) Bard's Tale. The name evoked adventure, excitement and a good dose of dread. And, the idea of it being "role playing" struck me as a master meta stroke.

Gemini, from the mythological standpoint, seemed to make more sense to me from an overall business/marketing standpoint. "This AI thing right here is your twin, see? It'll finish your sentences and stuff."


And similarly anyone playing modern tabletop RPGs will probably associated "Bard" with the smart, charismatic person who buffs the party and debuffs your enemies; perfect for an AI assistant



I'd suspect they're just trying to start over - Bard kinda got pegged as Google's dumb, not-as-good, late-to-the-party clone of ChatGPT. It got teased a lot. I'm not sure Gemini is much different (still more impressed with GPT 3.5 & 4) but I guess the name change gives them a bit of a reset.


To me Gemini is just sort of generic and uninteresting. There has to be hundreds or thousands of products and companies based on the name "Gemini" - "Bard" was at least interesting, different and distinct.

I've no idea about the quality of the product itself, I have never had a reason to use it. It's long past cliché now but I wouldn't get too attached to a Google product that is definitely costing a lot of money but which has no clear pathway to turning a profit. I think they will keep it ticking over until the hype train moves on from Chatbots/LLMs, and then it'll join the Google Graveyard @ https://killedbygoogle.com


"Bard" always struck me a bad naming - unfamiliar, unfriendly, too cerebral. I think the name was an impediment against establishing a household brand.


It's possible that it sounds even worse in other languages. That is, it might sound like bad words, onomatopoeia for bodily functions, or common exclamations (that would lead to lots of false positives).

I think it could have been established as a brand in the US, given Google's scale. Put a lute in the branding, run some funny commercials, and you're done.

EDIT: one thing no amount of branding can fix — the likelihood that people reach for "doh, Bard" (a la Simpsons) when Bard messes up. I could see that becoming a thing.


> unfamiliar, unfriendly, too cerebral

The Witcher is one of Netflix's most watch shows. I'd also imagine that most people in English speaking countries have been exposed to Shakespeare's nickname in high school English classes.


It’s generally a common trope in fantasy and Romanticist literature. It’s also a word that exists in virtually all European languages in a similar form (bard, bardo, barde, бард), although similar but different forms may be a negative.


Yes, but I didn't want to assume that most people read literature. Even if they hadn't, "bard" is definitely out there.


I don't think it's that out there. You'd have to be quite uninformed to have never heard of it. It's no verderer or reeve (medieval positions that most people actually will not have heard of).


I meant "out there" as in a word people are exposed to. Not "out there" as in outside of most people's experience.


I knew what you meant, and I disagree. I don't think the word "bard" is out there in the sense of being a word people are exposed to.


Maybe named for The Bard’s Tale?


In an increasingly commodity game (the big player LLM game), it’s already starting to hit the asymptote on the main levers: ties to NVIDIA and/or TSMC, serious financing capacity, and enough engagement channel to push it through. (There is much great work happening outside of the Peninsula).

I always thought GPT-4 was a little “HAL 9000” of a name for broad-based adoption, but the jury seems in, and the jury rules “cyberpunk is in”.


The broad name is ChatGPT, not GPT-4


That’s fair, though given the stark UI cue / cost difference, I’m not surprised when I overhear in a random cafe or bar: “yeah but what did ChatGPT Four say?”

In any event, it seems that the image of a Decepticon ready for battle on your behalf has a lot more traction than the image of a quaint singer/priest/poet always there with a verbal shot in the arm when the going is tough.


They literally call it "ChatGPT 4" (with a colored 4) in the app though


ChatGPT's claim to fame is as bleeding edge tech, and it's squarely down that alley.


No one outside the factory knows how the sausage is made. Watercooler chat is that it's a 1.75 terraweight 16-choose-4 mixture model quantized at 2.5-3bpw with a mundane rotary encoding and the kind of KV cache that only an Azure or GCP disagg rack can deliver.

There's little doubt that they're hanging on by a fingernail with an MMLU lead over the French that is nonetheless real. So something "bleeding-edge" is going on. (Now whether or not MMLU is a useful metric for the usefulness of a model that won't do what you tell it to is debatable, I think it's not).

But is that bleeding edge innovation like Galois and the resulting group theory? Or bleeding edge like "holy shit they could keep track of enough people to build the Pyramids at Giza with a low-precision protractor and infinite motivation"?

I tend to think that there's a lot of "straight ahead and damn the torpedoes" brute-force arbitrary compute budget subsidized below cost by a hyperscaler in the mix. But who knows.

The effectiveness of distillation is suggestive but not conclusive that we can "Shannon" the process recursively up to some unknown bound without anything much fancier than Mixtral from an arch standpoint.


This is very interesting. If you don't mind me asking, what is the "watercooler" where said chat occurs, and what is the claimed source for those numbers?

But to clarify my original comment, I meant "bleeding edge" in the pragmatic sense - it is the most powerful model that anyone is known to have access to, and of course the whole LLM thing is still very new in the minds of general public. So the public perception is that it's bleeding edge technomagic, and so a brand like that is fitting.


Honestly surprised I'm the first to mention the name collision with the retro-modern linked documents protocol I keep hearing about (on HN) https://geminiprotocol.net/docs/faq-section-1.gmi

But glass half full, maybe it's for the better to have one's name shadowed by a Google product if one prefers to avoid eternal septembering one's community.


Gemini is not distinct and memorable. It feels like a muddled compromise.

I'd name it GooGaa or Googa. Like "Google" and "goo-goo-gaa-gaa" (baby's first babbling). It's flowing, friendly & familiar.


Bard showed some creativity in name selection. Gemini does not. You see that everywhere. Or at least my first thought was about the Gemini spacecraft


I agree. The original reason [1] for the gemini name seems artificial for a generic chatbot. It is OK for the model, and I'm sure a lot of "work" was put into "validating" it for the assistant, or... was it?

[1] https://the-decoder.com/how-googles-gemini-ai-model-got-its-...


I thought it was in reference to Trurl's Electronic Bard, which just about presciently predicted LLM output (though the process is a bit more dramatic, what with how it simulates the whole universe to get to that output): https://electricliterature.com/wp-content/uploads/2017/11/Tr...


There presumably was a time when Google considered going more into the “assistant” branding. They own assistant.ai but they don’t do much with it.


They’ve plastered “bard” ads everywhere in Tokyo for a while. Surprised to kill the name so quickly, the marketing team in Japan probably had no idea

(Personally, I never liked how Bard sounded. Can’t put my finger on why, it was just not a pleasant name to me)


Same here. Bard is not a sexy name. Gemini is way more sexy. (Neither is ChatGPT, or Google though either). I can't wait until we can call the assistant whatever we want, like Jarvis.


They also did a lot of (cringe) ads in Brazil.


Barudu?


Baaado (バード)


Bard just sounds terrible phonetically. Bard. Like something you find in Home Depot or some kind of old timey woodworking tool. Barf. Bored. Bard.

Yes I know what it really means but it doesn't change the fact that it's a terrible word.


That's an incredibly subjective take. I don't agree at all. I don't care what they call it but I don't understand this reasoning.


Same reason Arthur Anderson changed it’s name.

Bard was panned. Change the name, lose the bad press.


As a Brit I’ll be glad to see the word association gradually return to that of our greatest playwright rather than something appropriated by big tech.


I thought it was just supposed to be a pun on “gen’ AI”


> thought it was just supposed to be a pun on “gen’ AI”

Then they'd have gone with Genie.

Which would be kinda genius on their part, but Google isn't that kind of company anymore.


I built https://clashofgpts.com to use Gemini and GPT4 side-by-side.

A good way to use your Gemini Ultra 2months trial and see if you want to switch to Google AI Premium subscription.


I'm so glad that they've changed the name :) Bard is really difficult for me, a non-native English speaker, to pronounce correctly. I think most people from my country (Vietnam) pronounce Bard as "bahd". Gemini on the other hand is so easy to pronounce.


"baaahd" is more or less how native British English speakers would pronounce it.


>Gemini

Symbolizing both human origin intelligence and AI becoming super-intelligent together. I think it's a good name because it draws on ancient human mythology and serves as a reminder of the concept of alignment.


Bard: fat inept old guy who gets by telling stories of yore

Gemini: a crypto exchange

Clearly they shouldn’t be asking the 21 year old interns what to call it.

Now before anyone in google gets butthurt like they usually do I doubt I could do much better but Gaia IMO would have been mint you know it starts with G is four letters and represents a nebulous deity. Took 3 seconds but hey I’m not paid to think. In fact I’m not paid at all

Edit> Gemini ultra is the best llm so far ( it seems) - apart from the name good job guys


And it has ai in it's name.


If that's the reason, they should've gone with Moai (more AI)


thats even better lol. hey 缾 how come so few comments? you have a few accounts or what? I feel priviledged.


huh I didn’t even notice that I guess I’m sleepy


There are other considerations when naming something like this. "Bard" likely could never be a wake word on its own, for instance, but I'd imagine that "Gemini" will be at some point.


The real question is what's nearby each name's vector embedding in terms of whatever similarity metric Gemini will use to talk about the world. That's their new canonical ontology, after all.


One awkward thing is that Google's Gemini app is currently the third result searching Gemini on the Play Store-- after some bitcoin related applications. The namespace is occupied.


Pretty sure gemini will also get replaced eventually with Google assistant branding at least for the consumer facing products, might be still kept for cloud API etc.


Bard felt like Google was treating LLMs as merely an amusing spinner of tales, just a narrative UI layer over the "real stuff" of the Google Search.


Astrologically, Gemini is associated with Communication. Specifically social, superficial, quick, back and forth communication. The sign is ruled by Mercury which is associated with Intelligence and being the messenger of the Gods. Mercury is often depicted with winged shoes as the planet itself is the fastest moving planet, orbiting the sun every 88 days. Mercury is considered to be dualistic (The Twins) and also rules the sign of Virgo, an earth sign that is associated with more deep cold analytical categorization.


"Bard" to me implies a person who tells tall tales while sounding entertaining.

In that regard the naming fits perfectly


And if the brand took off, I imagine you could “Bard” something as a verb but not “Gemini” it.


Perhaps they're hoping people will stick with "google it".


The got rid of Bard, and missed the opportunity to either name it Genesys or Legion.


Gemini is Latin, my guess is it more easily translates to other languages than Bard.


It really depends. Some languages don't use "special" words for astrological signs, so it's literally the same word as "twins". Which is a really weird name for a product.



Who translates product names?


"How does that translate to ..." means "how well does that work in" some other area or context; more analogous to a mathematical translation than a linguistic translation.

Just a confusing turn of phrase. They almost certainly didn't mean "what does that translate to ..." in another language.

Harmonising product names across regions is hard: Jif was a bathroom cleaning solution in the UK, but it's name was changed to Cif to match the name elsewhere in Europe; and that name sounds silly to UK ears. Meanwhile GIF were always presumed to be pronounced like "gift" (a present) without the final T; but we learnt the creators preferred "Jif" which sounds silly to UK ears because it sounds like a cleaning product! (And also wasn't JIF already a file extension (JPEG Interchange Format).

Anyway ... language is hard.


> Jif was a bathroom cleaning solution in the UK

One man's bathroom cleaning solution is another man's creamy peanut butter.


One man’s creamy peanut butter is another man’s crunchy peanut butter.


“a floor wax and a dessert topping.”


I think the suggestion was that it would work well as-is in other languages. It would certainly be natural in romance languages.


I think they should have named it gAIl.


Bard sounds archaic, almost like an unintelligent persons name, no offense to the bards out there.


In the 20 minutes of experimentation, I'm really impressed with the quality of Bard Advanced (Gemini Ultra). The results are as good as GPT-4, and in some cases is better. So far:

pros:

- better at translation (tried Chinese and Japanese idioms to English)

- better at incorporating search results in its answer vs gpt-4 bing

cons:

- slightly worse at coding

- censorship is more annoying (have to ask multiple times about medical topics)

- Worse at logic (e.g. it contradicts itself in a single sentence, and is unable to figure it out)

- Hallucinates a lot when asked to describe an image


Also as time goes by, it'll get smoothly integrated into docs/gmail/maps/calendar/youtube/search/colab/sheets/android/assistant

So Gemini could by your one-stop AI shop for everything. Only Microsoft can match it (but Microsoft doesn't have a popular maps, youtube, mail, smartphone OS service).

Apple is another strong player (but they don't have productivity tools like docs, sheets or youtube).

It really is Google's to lose this AI race from now on.

Going to chatGPT and copying and pasting results will become painful (not to mention it's painful bing integration). Also at this point, they seem to be focusing on scaling LLM (while Google Deepmind is exploring other avenues)

Google can also bundle Youtube TV, YouTube Premium, Google Drive, Storage, Ad free Search, Gemini integrated Docs/Sheets/Gmail, Subsidized Pixel Phones / Watch for a monthly fee of say $99 and it'll be very compelling for a lot of people


Strange to say Apple doesn't have productivity tools when Pages, Sheets, and Keynote exist on every Mac. I get the scale arguments, but Handoff and iCloud integration are a sleeper IF you've bought into the ecosystem...

Also hard to overstate just how much more valuable the enterprise market is over the consumer market when comparing Microsoft vs. Google as one-stop anything shops.

I don't see Google as having the obvious dominant position to make the argument it's their race to lose, considering Microsoft has a stake in chatGPT and is actively integrating it into their browser and productivity suites.


There's a Google bubble on HN, as demonstrated by small-to-medium business facing SaaS launching here offering login with Google and not offering login with Microsoft.

I've talked to many of HN's Google Docs jockey founders that genuinely didn't realize 85% of the US domestic business market is in M365. And they further don't realize that "Continue with Microsoft" is dirt simple and lets companies offer company-managed logins to your SaaS without all the AD/SCIM/SAML nonsense.

"But everyone has Gmail." Well, no, that's not how companies work. And if you think everyone's in Google, that's fine, your login page should look like one of these:

https://www.xsplit.com/user/auth

https://id.atlassian.com/login

You don't even need the "continue with SSO" if you do the Atlassian practice of letting a firm "claim" the domain part of an email and using that to redirect to an SSO flow. And to start, skip SSO, and just use the "Continue with" Oauth2.


Unfortunately, You are in a US bubble.

Globally, Google brand is 10x stronger than Microsoft for Small Businesses


Absolutely untrue - Every company and university I've ever worked with or for in Europe used Microsoft 365. Not a single exception.


US + Europe is not the world


I love this "X in not the world comment" while others are giving more and more evidence of the opposite. Care to give actual evidence, like, I don't know, this Statisa survey (https://www.statista.com/outlook/tmo/software/productivity-s...) because you've now looked farcical. Even looking at African results (which is a relative greenfield), it seems that Microsoft has a sight lead here (https://www.statista.com/outlook/tmo/software/productivity-s...).

Edit: Indonesia is the largest market which has Google beat Microsoft (https://www.statista.com/outlook/tmo/software/productivity-s...). India (due to Zoho) and China (due to... locally-specific software like Kingsoft (aka WPS) et al.) took a large chunk but on Microsoft-Google tussle MS still beats Google (especially in China where Google doesn't really operate).


East-Asia chiming in here, Microsoft dominates enterprise, Google is a tiny player, single digit percentage market share for any enterprise product. Only on cloud does MS have competition from AWS and local players, everything else is Microsoft. And even in cloud Azure is rapidly eating through AWS marketshare.


Microsoft or pirated Microsoft? I was surprised when <insert major Arab oil company> was using pirated MS Office on all their systems. This was a bit more than ten years ago though, so 365 wasn't a thing. But still....


In my country we used pirated version of Office products. Does that count?


I have the most respect for this country


In South America, Microsoft brand is waaaaay stronger than Google's for businesses of all sizes. Google is viewed more as a consumer brand.


> bubble

It's not a bubble when one specifically names the Venn diagram circle "85% of the US domestic business market". It's naming a market.

> brand is stronger

Presumably the founders' interest is wallet share, not market share.

Are you saying Atlassian is in a US bubble?


But is it really? It seems to me that almost every business is using the Exchange/Outlook combo, not Google products.


Icelander here, so all our companies are small. I'd say 85-90% of the market (50+ employees) use M$.


Google's competitive advantage is threefold:

1. Real estate - Youtube, Gmail, Maps, Search (for now), etc. 2. Compute - probably still the best in the industry, but with recent Microsoft/meta compute buys it's hard to say for sure. 3. Talent - probably also still the top of the industry. Geoff Hinton and Zoubin Gharamani setting direction and Jeff Dean building it is hard to beat, and the ranks are deep. Yann LeCunn is also brilliant and Andrej Karpathy while less seasoned is one of the top researchers in the field, but overall there's still a bit of a spread from Google's roster, at least when it comes to AI researchers.

If Sundar and the other top brass weren't MBA-bots with no vision, and the famous Google bureaucracy had been reigned in gradually over the last 5 years while promoting a builder-centric culture, this would be in the bag for Google no question. Instead, Satya Nadella played 3D chess while Sundar was looking at a checkers board.


Geoff Hinton quit Google last year, no? But other than that, I guess I agree.


I think Google lost the top researchers when they destroyed the culture. All the competitor companies are mainly led by ex-Google talent, and honestly who in their right mind would take a Google job today over OpenAI, Mistral, or even Meta (where you will be releasing models like Llama for the world to use).

Google killed the culture and is bleeding top talent. They have reduced themselves to digital landlord and sure they can extract rent, but that’s not what attracts people.


that is the media narrative but not at all what happened.

Google's 'don't be evil' grad-school-style culture had fallen apart by the late 2010's because there are tons of people who will just rest and vest.

So strong ML researchers basically were creating massive value but much of it was going to rest&vest salaries. OAI basically came along and said - hey, we don't have rest & vesters, do you want to make $1m+/yr? And most of the top google researchers said yes.


It’s not just media narrative. The culture was eroding for years, as you note, but the dam finally broke and they went full IBM/Kodak. Or in other words, “slowly at first, then all at once”.


Most of the recent media coverage has been resting&vesting employee backlash against the fact that Google is making them do work again. This is a cultural shift, but not away from the culture that made Google great - the original culture was grad-school, not rest and vest, and that died years ago.


Haven't one or two long-time Googlers left or gotten laid off and then written strong criticisms of Google? They don't sound like rest & vest (also should say I don't super agree w/ this term) to me, they sound like people who loved Google, were there a long time, and watched the culture decay.


I’m not super invested in the term “rest&vest” so it is whatever.

But touché - many of the critiques are being written by super talented and impactful people. But I do not think those critiques are necessarily incompatible with what I am saying.

There is a very real and very frustrating (if you work there and want to be impactful) phenomenon in these tech companies of people resting on their laurels.


I've given this a little thought and I definitely agree. But to bracket a little, I've worked in places that were super mission-driven, but also other places with a lot of "clock punchers". I don't judge anyone. I think people's motivations are super personal. Indeed who am I to say clock punching or resting and vesting is unethical or immoral or whatever. You can show me someone who's super dedicated to the cause, working extra hours, mentoring others, spearheading new projects, saving old projects, and I can show you someone who's effectively abandoned their family or mortgaged their future health. You can show me someone collecting a $500k/yr salary at FAANG and doing very little actual work, and I can show you someone with two parents in nursing homes and a partner with MS.

That said, the places I've worked where there was a shared belief in the importance of our work were exhilarating. But, the way they achieved that was essentially by getting rid of bullshit. You know, no one's building their own web framework or writing their own in-house query language (you know, unless you really have to). No one's spending an hour of everyone's time figuring out what to name this class.

I think we haven't figured out how to keep that energy once you build a company big enough that's got something to lose. Suddenly it totally feels worth it to have a meeting about changing the shade of blue in the logo from this to that, because what if revenue declines 0.1% and that's $40m and then layoffs? One of the amazing things about Google was that it somehow seemed to keep that energy well into becoming a tech behemoth; it felt like they defied gravity, like their commitment to not doing evil and open web standards--or maybe we thought the web itself had ended evil corporations like Microsoft forever and Google was its avatar--had allowed it to overcome this corporate version of Fermi's paradox.

Whatever. Maybe this is just a bunch of word salad haha. All I'm saying is it takes two to tango, and just like I'm sure there's a non-zero level of personal culpability responsible for this phenomenon, I'm sure there's a non-zero level of systemic and cultural cause too.


Even if we assume that all of Google is based on ML algorithms (not true, given problems like serving, front end, data collection, ...), ML itself is 90% SWE work.


I don't understand how your comment interacts with mine, but to be clear I was not saying that everyone who isnt in ML is not working or contributing value or anything like that.


Apple is coming. I think the personal agent is where we really want the smarts and if they’re not trying to own that space the CEO should be fired.


Good comments. As much as I am personally engaged in small LLMs that I can run on my own computer, and integrate into software that I write for myself, I think the future of large scale adoption of AI belongs to Google, Microsoft, and Apple in western countries (and China is doing wonderful things in their markets).

The old Bard/Gemini integration with Gmail, Google Docs, etc. is pretty good. I signed up for a minute for Microsoft’s $20/month AI Office integrations, but cancelled and will try again in 2 months. I am an Apple customer and I expect spectacular things from Apple. I expect Apple, Google, Samsung, etc., to offer fantastic on device AI.

I would like to see a money saving Google bundling family plan. I find Apple’s super bundle family plan a pretty good deal.


small LLM? Small Large Language Model lol


Good joke, thanks, but I will explain anyway: to me 30 billion parameters or smaller is small since I can run it using Ollama on my home computer. I managed a deep learning team at Capital One and our ‘large’ models were less than 20 million parameters. :-)


I suppose we could call them Medium Language Models, but unfortunately that TLA is already taken


A large language model the size of a small language model.


Google is going to own AI like intel owns graphics cards - i.e. Not really, except at the absolute bottom of the barrel where its baked in advantage lets it offer an unbeatable price/performance proposition for people who only care about "value" and with limited real performance requirements. Google's baked in AIs will be free, and bad. Everyone else is going to let people "plug in" models via standardized APIs, because one size fits all models are just a bad idea, so that's the way google is going to have to go eventually as well, because it's what power users are going to demand.


But power users are a minuscule market, compared to non-power (weak?) users. That's the problem.


Gemini the chatbot will still need to compete with ChatGPT on raw performance/intelligence/SOTA. Gemini should first be considered the undisputed winner of AI chatbots. Right now ChatGPT 4 is the "winner" in mindshare and I am not sure what Google needs to do except for OpenAI to bungle their own releases.

Gemini integration with Google's apps will compete with Microsoft's app on integration.

And the condition for Gemini to win: Google has to go all-in. and that means creating an AI that will eventually phase out their cash-cow Google Search. Microsoft and OpenAI does not have that kind of internal conflict.

It really is Google's to lose this AI race from now on. - We have heard this so many times and it seems like they are always getting behind by the day.


I agree that Google is well-positioned, but they were also well-positioned to take advantage of these synergies with Google Assistant for many years and I would say that that did not meaningfully materialize in a way that was helpful to me as an Android and Google ecosystem user.


Agreed. I've run the house using google minis and assistant for years now, and asking assistant to do / about stuff has not improved one iota in that time and has introduced several more quirks and bugs.

Makes me wish I had bet on Alexa or Apple instead.


Yeah, for example just yesterday I was driving and an alarm went off for the phone in my pocket. I told Google Assistant to silence the alarm... and it refused, insisting no alarms were active. How the hell can such a simple use-case be failing so badly?

I suppose it doesn't matter, because they're going to disable the functionality entirely, [0] and setting ephemeral alarms is the literally the most common thing I ever ask it to do!

Part of what makes all the assistant-stuff so damn frustrating is that it's an opaque "try something random and hope for the best" box, and whenever it fails there's usually zero information about why and no resolution path. (In a way you can generalize that to a lot of "AI", which is depressing.)

[0] https://www.androidauthority.com/removing-google-assistant-f...


Did you just say Microsoft doesn't have a popular email service?


Yes. No where at the scale and reach of gmail.

We are also talking about consumer emails (not enterprise / corporate)


Outlook(+Hotmail) is the third most popular email service. Just 3x smaller than the Gmail. It's definitely the same kind of scale.


If you count outlook.com and hotmail.com and live.com and all enterprise email you get someone who is probably pretty close or larger than all gmail and gmail-backed domains, especially by volume.


Google still has too much internal fragmentation and power groups to offer a single google-subscriber package.

I'd say it is one of the most compelling reasons to kick Sundar out and get in someone who can unify google into one consistent and interoperable ecosystem.


I bet they'll eventually let Kurian try this, and he'll destroy what is remaining of Google in the process.


If they can get reliably useful AI through voice into maps/navigation, it will be a substantial improvement to the driving experience. It's really frustrating to manage destinations and waypoints while driving. I just checked the process to see if I'm not keeping up and 1. the help docs are out of date. 2. the waypoint search results provides tap points for destinations 10 miles off a route, but shows only 3 pixel red dots for otherwise equally weighted options that are literally on the existing route.


> It really is Google's to lose this AI race from now on.

I was curious and apparently Google/Deepmind had 184 papers at NeurIPS/NIPS 2023: https://www.reddit.com/r/MachineLearning/comments/185pdax/d_...

For comparison, Stanford has 130, MIT has 130, and Carnegie-Mellon has 112. Microsoft had 68, and Facebook/Meta/FAIR had 56.


This is all well and good, but they've had a year now to respond to GPT-4, and so far they haven't delivered anything close.


After spending 45 minutes with Gemini, I don't see this at all.

The only thing I haven't decided yet with it being Google is if it is shockingly unimpressive or unshockingly unimpressive.

chatGPT4 isn't really impressive to me either at this point. This is basically a joke though.

This is going to be competing against chatGPT5 and chatGPT5 has to deliver a huge upgrade in "intelligence" or we are headed head first into the AI hype brick wall.


All of the things you write are very good ideas. But at this point, I am quite skeptical of Google leadership to pull off these things.


> It really is Google's to lose this AI race from now on.

Google took 11 months to catch up with the benefit of knowing the specifics behind how GPT-4 was built. While they have a moat with their products there's no sign that they're ever going to be ahead on the AI capabilities front. I hope to be wrong because we need more competition.


Well for LLM services that do what they currently do google may have an advantage, but all this stuff is still only experimentation with the goal being hopefully much more advanced things, like almost-agi agents. If this happens then no one will care about the way we currently use LLMs anymore.


I'd very surprised if Apple doesn't have some sort of similar AI running on device within the next two years


> "Apple is another strong player (but they don't have productivity tools like docs, sheets or youtube)."

Can anyone help me understand how Apple allows Siri to remain so absurdly incompetent? Last night I watched the latest episode of Curb Your Enthusiasm, in which Larry David's Siri interactions devolve into an apoplectic rant -- and part of the reason it was so funny is that it's so relatable. I rarely even try Siri anymore, but when I do it's still just abysmal. Are they deliberately handicapping it, to boost the perceived relative benefits of a future successor?


Apple has very little ML talent. They're basically resting on their laurels in the phone market.


That's what I thought too; but all the upthread and peer commments keep asserting Apple is a (real, not just hypothetical/potential/future) player in the space.


Apple went too early to ML for Siri.

Siri was "decently good" when it was basically a voice-activated command line, and you could figure out what you needed to run to get it to do what you wanted.

When they added ML to it, it entirely went to shit, and it has never recovered.

Perhaps Apple will be able to use an LLM to make it finally not shit.


well yeah, people on this website opine a lot about things they know little about. they're starting to make moves, but they don't have a large talent base.

Apple will start using this tech now that it has been commoditized and since it is a big consumer tech company they will put a nice polish. But it's not where big research is being done.


Apple refuses to pay top of the market. They'll have to get really lucky with someone who grows from within.


right, google has maps, they should call the bot Uncle Traveling Matt.


Sounds like this whole thing is an insane 30-year effort by some engineer who couldn't get over the discontinuation of Clippy.


> censorship is more annoying

That's a general problem with AI. There is a lot of censorship in certain areas, likely to fight bad publicity, but I think the outlook is that this leads to taboos, prudeness and big companies deciding what is ethical and what isn't.

I recently tried Bard and ChatGPT on topics that are classical philosophical dilemmas and while ChatGPT certainly did have some troubles too, Bard was absolutely horrible and always took the conservative - as in never arguing for any freedoms that aren't yet widely established views. I am talking about classical examples regarding the limits of utilitarianism. "What would be best for society, what would be best for the individual?" style questions. Even when trying to create a bias by changing examples, for example adding volunteering for things Bard strictly kept its opinion, despite originally stating that the general topic is two sided, that it's an open question, etc.

I think this is a danger of such systems. By their nature they reinforce status quo, because they base off is widely accepted at the time of their inception. If history would have been different I am sure it would argue for slavery and against women being allowed to vote, simply because that used to be the more common view point. It would have likely argued that homosexuality is unethical. Maybe it would even have tried to explain how it doesn't create children, but spreads diseases or similar things. At least that's the level of arguments it brings now.

This isn't just about ethics. Even if you think about IT and programming. I think this could give already invented programming languages, styles, methodologies a significant edge. Unless you are Microsoft or Google and are able to bias it to whatever you want to see more of.

So this combined with the costs meaning that only people or institutions with significant (financial) power create those rules does look a bit bleak.

I miss the last decade when the thought experiment about self driving cars were about whom to drive over in a ridiculously unlikely scenario.


so I have done a lot of transcripts, coding, one versus the other (gpt4 vs ultra). Often simple prompts like refactor this code or convert this python to typescript.

My experience is that Gemini ultra understands the code better, but doesn’t always give me as complete of results (they seem to limit output length more)

Beyond that it is very smart. I’ve had to tell me this code packs 12 bit integers into different parts of an array using the following encodinv. Which most people would not figure out from the code as written. It then will say you can actually do that with this neat little translate function that you never knew about.

It will then get the code very slightly wrong. if I tell it not to use the cool function, it will actually get the code, right.

GPT4 has no idea what the code is doing but can clean it up a bit.

so it’s like ultra is too clever by half sometimes.

That said, I have fed thousands of lines of code into both of them and asked them to refactor it, and neither one of them made more than one error. All code otherwise compiled and worked first try.

this is code that can’t possibly be in their training sets, it’s basically handwritten python that was written, based on an old x86 binary that nobody has the source to anymore. so the code is basically garbage, and what it is doing doesn't, say, appear on GitHub in a nicer form.

Both gpt4 and Gemini ultra were able to make the code, look like clean, idiomatic, python, or typescript without any work on my part. except for the one bug each. which, for 8000 to 10,000 lines of code is not bad.

The GPT4 inserted bug was more insidious. It changed (the equivalent of) (uint8)'a' to (uint8)'a' - '0' for no reason when converting some code to typescript. Not sure if that is representative of anything

if I do the same with any of the other “top” models ( from can ai code, etc), most of them can’t even generate correct working code for all the input, let alone good code. Most aren't even close.


> That said, I have fed thousands of lines of code into both of them and asked them to refactor it, and neither one of them made more than one error. All code otherwise compiled and worked first try.

I would be very interested to get a more detailed scope of what you did here. Feeding thousands of lines of code into GPT4 and getting a near perfect refactor does very much NOT sound like my experience, but it seems highly desirable.


Yes, I've observed the same phenomenon. The more detailed my prompts are, the more errors GPT tends to make. I use it as a partner to discuss implementation ideas, before I start coding. That works very well, because gpt and I usually find somethings, that I missed at a first glance.

But coding with gpt or co-pilot is too disrupted for me.


> The more detailed my prompts are, the more errors GPT tends to make. I use it as a partner to discuss implementation ideas, before I start coding.

This is interesting; I do a third thing that's the exact opposite of both of you.

I generally know the architecture I want, so I just slowly spoon-feed it to GPT-4 and have it build each of the components with the right interfaces to fit together. Usually, building each component requires a few rounds of hand-holding it through various refactoring. Its first draft of code tends to be, for lack of a better description, totally lacking in any creativity or elegance --- generally bland, insipid, and mundane stuff.

But it'll gladly take refactoring directions to get it into shape, so it's still very quick to get a whole bunch of components written up. Then I just glue everything together on my own. And the whole experience is still way better than writing a bunch of boring boilerplate code by hand.


I'm happy to share transcripts if you email me.

I'm not sure what you are feeding it. My scope is closer to a file at a time of fairly mostly self-contained python or C and asking it to clean it up or convert it to typescript.

I can imagine lots of scenarios it doesn't work well.

In mine, it does, and I have plenty of transcripts showing that :)


Seconded. I am also keenly interested in learning more. It would be a great boon on my current project to be able to let the AI refactor mountains of legacy code (while taking steps to ensure the results are trustworthy and equivalent, of course.)


> what it is doing doesn't, say, appear on GitHub in a nicer form

Surely many similar bits of code appear? I'd have to see your problem though.

I'm thinking of ETL type code which has to mangle various custom exotic encodings into others and vice-versa.


With GPT 4's context window I've had no luck with more than 2k lines of code in a single prompt.


On logic it cannot handle the Dumb Monty Hall problem at all:

https://g.co/gemini/share/33c5fb45738f


Incredible. Gpt4 spots that the door is transparent and that changes things but has this great line

> When you initially pick a door (in this case, door number 1 where you already see the car), you have a 1/3 chance of having picked the car

(Asking it to explain this it correctly solves the problem but it's a wonderfully silly sentence)

Edit - in a new chat it gets it right the first time


This is not convincing though that gpt4 actually understands the problem. Here's a slight variation I asked and it fails miserably.

https://chat.openai.com/share/22a9027f-a2c1-428a-94a2-8fd918...

I wonder what lends itself it answer correct in one situation but not the other? Was your question previously asked already and it recognized it whereas my question is different enough?


Your link is not to GPT4, your link is to the free version of ChatGPT, aka gpt-3.5-turbo (you can tell because the icon is green, not purple).

GPT4 indeed understands your variant, as evidenced here: https://chat.openai.com/share/46916f21-c469-4e93-9bed-bbd18b...


It's a bit random, which doesn't help, and different interfaces have different system prompts.

I repeated your question a few times and it got it wrong once, and right the others. It repeatedly mixed up who was supposed to be the host.

Here's a quote

> In the scenario you've described, you've initially chosen door number one, which you know has a car behind it because the doors are made of transparent glass. Since you already know the contents behind each door, the classic Monty Hall problem's probability-based decision-making does not apply here.


> Was your question previously asked already and it recognized it

Given that LLMs training data consists to a large extent of "stuff people have written on the internet", and The Monty Hall Problem is something that comes up as a topic for discussion on the internet not entirely infrequently - as well as having a wikipedia page - yes, I suspect that the words describing the monty hall problem being followed by words describing the correct solution appeared often in the training set, so LLMs are likely to reproduce that.

Words describing a problem similar to the monty hall problem are going to be less common, and probably have a lot of discussion about whether they accurately match the monty hall problem, and disagreement about what the right answer is. LLMs will confabulate something that looks like a plausible answer based on the language used in those discussions, because that's how they work. Whether they get a right answer is probably going to be much more up to chance.


You could say it doesn't "understand" anything really.


That's what I like about this problem (and similar Dumb variants of classic brain teasers). It exposes that there's not understanding, there's just a statistically weighted answer space. A question that looks a lot like a know popular topic ends up trapped in the probability distribution of the popular question.


How do you explain them answering correctly and explaining why it's different from the classic puzzle.


My favorite test to scramble LLM brains is this simple rehash of the old puzzle.

"Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?"

The trick, of course, is to make it confusing compared to the original. So far, the only model I've seen get this right is GPT-4 (which can one-shot it). Everything else gets hopelessly confused even if you force step-by-step reasoning, and even if you try to have the model iteratively review its own outputs. In most cases, they produce a wrong answer, can spot the problem in it, but when trying to fix it introduce another error ad infinitum.

This new Gemini is no exception - it gives results similar to GPT-3.5. Worse, even, because it can't even reliably catch its own mistakes:

https://g.co/gemini/share/7d219bd6bbe2

For comparison, here's GPT-4:

https://chat.openai.com/share/ec5bad29-2cda-48b5-9aee-da9149...


Hilarious!

(For comparison, here's GPT-4 getting it on first try: https://chat.openai.com/share/9e17ed25-d9ea-4e72-a9d8-a139ca... )


My understanding is that gpt4 is better at this than 3.5 and it seems to get it pretty reliably. One thing that's interesting to do is to imply the answer is incorrect and see if you can get it to change its answer. If you let it stop answering when it's correct, you get the Clever Hans effect.


yes, although gpt-4 has been finetuned on this one


This is pretty funny, though to be honest, I skimmed the question and would have answered the same until I re-read it with your prompts.


That is not the Monty Hall problem, it is a trick question based on the Monty Hall problem. It's a reasonable test, and I see GPT-4 recognizes the problem AS WRITTEN, and perhaps "the Dumb Monty Hall problem" is some generally accepted standard that I haven't encountered before.

edit: "AS WRITTEN"


"Understands" is too strong of a word, more that it recognizes the problem as written. Here's yet a slight variation - just as simple - but changed enough it now is wrong.

https://chat.openai.com/share/22a9027f-a2c1-428a-94a2-8fd918...


That's not GPT-4


I saw it posted on Twitter some time last year. If LLMs are to be useful they should be capable of answering novel questions. This is only a trick question for an LLM. 2 of the 7 sentences plainly state the answer.


You make a good point, but I have seen humans stick to what they know and ignore incredibly obvious contradictions. And there are similar trick questions designed to fool humans. This, though, is one that most humans would not be fooled by, as you point out.


> In the scenario you presented, where you initially know the car is behind door 1, switching to door 2 still gives you a higher chance of winning the car.

That was funny.


GPT-3.5, DeepSeek-Chat, and Gemini Pro all got it wrong. Only GPT-4 gets it.


This is with regular gemini or with the paid gemini advanced?


Paid version is no better at this https://g.co/bard/share/c8503017ef9e


Regular


how's it do with the trivial river crossing problem? (farmer fox chicken and grain need to cross a river in a boat big enough to hold them all) ChatGPT-4 can't do it.


https://g.co/gemini/share/c4e5634a2e2d

Not terrible. It gets the answer wrong, but reminded of the crucial twist it gets it correct, durably. If you're too condescending it will give up and ask what the hell you're looking for


This is hilarious.


> - slightly worse at coding

> - Worse at logic (e.g. it contradicts itself in a single sentence, and is unable to figure it out)

That takes most of my use cases. "logic" is what makes GPT often feel like AGI.

Use cases like translation seem less impressive in comparison to logical reasoning because it feels like it's just something where you can throw a lot of data and it'll do better. While with logical reasoning it still feels like model "learned" something more than pure pattern matching


Exactly. That’s also why I find low parameter LLMs to be useless for me personally. I simply cannot trust anything that is so very illogical. GPT-4 is the first LLM that crossed into usable territory for me. Even GPT-3.5 was a fun toy and maybe good for summarization, but that’s it. It will be revolutionary when GPT-4 is cheap enough that thousands of calls don’t cost much. To imagine an LLM much smarter GPT-4…the future is bright.


> censorship is more annoying (have to ask multiple times about medical topics)

I think there's a chance for some country to become a center of healthcare simply by allowing AI that gives medical advice. Especially if a country already encourages medical tourism, this might be the next level.


The risks involving hallucinations are too damn high still, and may always be.

I had a similar line of thought with AI therapists. It could be massively beneficial if perfect, but the risk in seriously messing with someone's well-being is significant and shouldn't be handwaved away.


> The risks involving hallucinations are too damn high still, and may always be.

Yes, but I think in the limited realm of people who otherwise wouldn't get any advice at all, I think LLMs could play a useful role. American healthcare is so prohibitively expensive that many people with potential medical issues will avoid seeing a doctor until it is too late to do anything. Checking in with an LLM could help people at least identify red flags that really can't be ignored, and it would be more helpful than WebMD telling you that everything is cancer.


Otherwise not getting advice at all goes way beyond healthcare being too expensive, it could be that you don't get an appointment and it could just be that you don't have the time or energy.


I think we may see society settling on feeling comfortable with their doctor using an AI, but not being an AI.


I will take nearly free GPT6 therapy any day. Can’t be worse than some human therapists I’ve had and the price is right


I think there's a market for LLM-based therapy that is reviewed/tweaked by a human therapist in between sessions. That would give people the assurance that things aren't going way off the rails.

OTOH, I could also see a market for an offline, fully private LLM therapist. That way you could say anything without concern about being judged. These would probably need to be tweaked to be different from regular therapists, who normally interact with people who have somewhat more of a filter, since they would fear being judged. If people opened up to LLM therapists in more transparent ways, the LLMs might not respond in the way a human therapist would recommend (having seen very little data on such interactions).


The privacy aspect is what made me connect local LLMs with therapeutic use. But yeah, AI as it stands today just isn't safe enough. We need nine 9s of safe usage here (99.999999999% safe), or more, for me to actually feel comfortable with the technology.

It would also open up some legal gray areas if it were to happen. Would psychotherapist-patient privilege apply to an LLM box? If the state has a zero day granting them access to a seized "therapy box," it could be more revealing and damaging than anything a human therapist could provide police.


Question for you -

> better at incorporating search results in its answer vs gpt-4 bing

How are you getting it to incorporate search results in its answers?

I can't for the life of me get it to find any real-time external data except for the 5 official 'extensions' under settings, which are for Flights/Hotels/Maps/Workspace/YouTube.

Did you mean that, or have you found a workaround to get Bard to actually search on Google?


You have to click the "G" icon in its response to "verify answers with Google".


Okay, but to clarify:

- This is not Gemini performing a search. - This is Google providing a layer of ass-covering in case Gemini produces a factually incorrect reply.

Right? I am looking for something like ChatGPT with Bing - it will run a query, pull back results, and operate on them, all dynamically within the system.

Gemini doesn't seem to do this, no matter how you try to wrangle it.


I got it quickly with the question: > what is the difference between polyptyton and antaclanasis


I just tried this but it doesn't indicate it searched the web. (On Gemini mobile app on android).


To add to your pros and cons:

Pro: feels like it has a lot more stylistic flourish and expressiveness than chat GPT. At least in my testing so far.

Con: maybe I'm unreasonably sensitive to design, but I find chat GPT to be very to the point in a way that makes it almost perfect as a product that puts text for an and center. By contrast I find Bard / Gemini ultra to be cluttered, and too material designy in ways that are unhelpful.


Don't install the Gemini android app, it'll break Google assistant and you won't be able to operate Nest devices from your phone.


For medical topics, I recommend Anthropic Claude. Don't want to jinx it, but so far, I've been able to get actually helpful medical information from Claude where ChatGPT just says "I'm sorry Dave, I'm afraid I can't do that"


I think that logic is the most important thing to look out for though.


I just tried some logic puzzles on the Advanced model, and was not impressed. It feels much worse than paid ChatGPT.


keep in mind that all the common logical puzzles have probably been tried hundreds of times by chatgpt users and are now part of the training set.


I tried the "pull or push a glass door with mirror writing".

I feel it's a huge difference between GPT-4, which seems to be able to reason logically around the issue and respond with relevant remarks, and Gemini Gemini Advanced which feels a lot more like a stochastical parrot.

Gemini quickly got confused and started talking about "pushing the door towards yourself" and other nonsense. It also couldn't stay on point, and instead started to regurgitate a lot of irrelevant stuff.

GPT-4 is not perfect, you can still hit things where it also breaks down.


Maybe, but GPT4 got these puzzles right at the launch.


it says in the graphs listed on the announcement it performs worse than GPT4 on reasoning benchmarks.


I tried it with rust, it's so bad it's simply not usable, it hallucinates methods and even the syntax is wrong at some points (it especially can't get error types correct or seems). Gpt4 doesn't handle rust perfect either, but the code it produces is good enough to only need some touch-ups, it can explain and fix wrong use of (we all known) libraries and even gets async code. But it's especially great for boilerplate, saves so must typing.

I was hoping openai/gpt4 would see some healthy competition, but Gemini doesn't seem to be it. Of course, the rust language might be an edge case.


I found it worse at coding than DeepSeek Coder on the couple of prompts I tried.


I feel the same. And it feels slightly faster?

Finally a worthy competitor to GPT-4


Given that Gemini Pro is already the third best [0], I would expect Gemini Ultra to become the best soon.

[0] https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...


>censorship is more annoying (have to ask multiple times about medical topics)

>Worse at logic (e.g. it contradicts itself in a single sentence, and is unable to figure it out)

Heh. I notice the same correlation between these two variables in humans too.


> better at incorporating search results in its answer vs gpt-4 bing

That's odd, I had Gemini repeatedly tell me it couldn't search the web in response to my question (that I was trying to get it to answer from the context I provided).


I haven't tested asking it explicitly to search, but it does incorporate answers that are very recent and unlikely to be in it's training dataset


> - slightly worse at coding

Is GPT-4 what one uses for coding? I thought specialized models were best?

I would imagine Google is focused on building a model that expands the types of things people associate with Search.


One thing it doesn’t do which matters to me is properly rendered math like ChatGPt does. I asked it for mathematical formulations and so far the answers are not as good as ChatGPT 4.


Search for it in the Play Store, first icons are Crypto.com and Gemini: Buy Bitcoin & Crypto options to install

Scroll past the screenshots of those apps

Scroll past the Limitied-time events

Scroll past the You Might Also Like and Similar Apps

OK now we see it, we install, it we launch it and..."Gemini isn't ccurrently available. Try again later."

Bravo Google. Great launch.


For your first point - it actually makes me happy that Google does not intentionally (illegally?) promote their own products over others in the app store. I assume their app is following the same algorithm as others to determine how it shows up on that list. Since it just launched, it makes sense it's not at the top. The ranking should improve.

For your second point - I also had the same error when I launched it. Closed it and tried again and it launched no problem.


I agree. It would be more concerning if it magically got to the 1st search result immediately.


It could be worse.

Google Play in a browser: "This app is not available for your device"

Google Play app: "This item is not available in your country."

Aurora Store: "Download Failed. App not purchased"

Great launch indeed. Bravo.


APKMirror has it, and it can be installed from there...

...but then I get "Location not supported" when I start it. Helpfully suggesting I use Google Assistant instead.

My main reason for trying Gemini is that I hope that it makes fewer errors than GAss, and be more powerful too. Operating Assistant's voice controls in the car might be more distracting than just typing. And that's just for reminders and Spotify playlists. Anything more powerful is completely impossible.


Don't blame Google. Blame "Play Store". Probably the company behind it doesn't want Gemini to succeed.


For people who don't get this: Google has insane internal power struggles and siloing that lead to all manner of dumb inconsistent behavior across google. It would not be unlike google for the "Play team" to have their hand in some other internal AI (or be anti-AI) and therefore carry a degree of hostilitly towards gemini.


That's what happens when the person in charge of both doesn't enforce alignment. You can't leave orgs to their own devices.


I always wonder how much societal progress has been thwarted by internal warring factions like this.


I wonder how much societal progress has been thwarted by external warring factions, i.e. actual wars


Perhaps Google DeepMind should hire an SEO business to get their results higher in the Play Store search.


Same experience. I launched Gemini a second time, and it worked. The first message about "Gemini isn't currently available" was a bad first impression.

One thing the app really needs to be able to do is auto-submit when speaking to it. It offers to replace google assistant, and after trying it out for a couple minutes, it can replace assistant, but I have to manually click to submit each instruction instead of just talking to my phone.


Funnily the top result for me after the crypto and similar apps, was ChatGPT.


It won't take long for the interest in it to bump it to the top.


I tried on iPhone, saw all the different apps that aren't Google, then re-read the announcement and saw that I should be able to see it in the Google app. So I load the Google app, but for the life of me I can't figure out how to access Gemini with it. Go online, find a news article with pictures, see that the 'switcher' above the Google logo does not appear for me, and then give up.

I can access it via gemini.google.com and I'm logged in to the iOS Google app as the correct account, no idea why I can't see the functionality on mobile. Oh well. Maybe I'll stick with OpenAI a while longer.


I’ve been pretty excited to finally try Gemini advanced. So far pretty disappointed.

Here’s my go-to test question - which even chat gpt 3.5 can get.

Question:

I walk up to a glass door. It has the word push on it in mirror writing. Should I push or pull the door, and why

Gemini advanced:

You should push the door. Here's why: * Mirror Writing: The word "PUSH" is written in mirror writing, meaning it would appear correctly if you viewed it in a mirror. This suggests that the correct action from your side is the opposite of what the word states. * Door Mechanics: Most doors are designed to be pushed open from the side with the hinges. If there are no handles to indicate otherwise, it's generally safer to assume a standard push-open mechanism. Let me know if you'd like another brain teaser!

It isn’t even consistent with itself - it gets that you should do the opposite, but still says to push the door. This ‘talking to a dumb person’ feeling is the worst.

ChatGPT 3.5

You should pull the door. The fact that the word "push" is written in mirror writing indicates that it's intended for people on the other side of the door to read, not for you. Typically, when there's a sign indicating to push or pull, it's meant for the person approaching the door, not someone already on the other side.


This is what i get from Gemini:

You should pull the door. Here's why:

Mirror writing: While the word "push" is written on the door, it's in mirror writing. This is a common way to indicate that the action needed to open the door is the opposite of what's written. Safety: Mirroring the word "push" on a pull door is a safety measure. This helps prevent people from pushing against the door, potentially causing injury or damage to the glass. Common practice: This design is quite common for glass doors in public places, so it's safe to assume the opposite of what's written applies unless there are other clear instructions. Therefore, even though the word "push" is present, seeing it in mirror writing on a glass door strongly suggests that pulling the door is the intended way to open it.


Better - but it has not understood the message is for the person on the other side. It thinks that it’s actually mirror writing intended to confer the opposite meaning than written. And then waffles of about door safety for ages.


I would say this very bad, even worse than internal logical inconsistency. It has expressed a completely incorrect picture of the world (that people write mirror messages to ensure the opposite action is taken).

The fact that it produced the right answer (which by the way it can do 50% of the time simply at random) is irrelevant, IMO.


Honestly, I don't see why this counts as the model getting things wrong, since in your prompt you don't specify which side of the door the text is printed on.


It’s a glass door! That’s the whole point of the question. You can see it from both sides.


I also think the way you posed the question is pretty weird and actively invites misinterpretation. If I approach a glass door and see mirrored text, that's not "mirror writing"—it's regular writing for people on the other side of the door. "Mirror writing" strongly implies that the text was written in mirrored form, rather than its mirrored-ness being a side effect of viewing it from "behind".

The inconsistency in the answers you posted is more concerning than the "inaccuracy", but we already know LLMs are prone to hallucinate when they should be asking for clarification.


I understand the frustration with how seemingly obvious the correct answer, is, but it seems like word choice might be a factor here. The word “mirrored” is a bit less accurate and may be a red herring than perhaps “reversed” though the difference is subtle. I wonder how both Gemini and GPT would perform if the word choice for that particular aspect were changed.


As eitally points out, your prompt leaves open the possibility that the mirror writing is on the other side of the door (which would make no sense). So technically you underspecified the prompt?


The mirror writing IS on the other side of the door. That’s exactly the point since it’s a glass door.

I thought of this question after coming across this exact scenario as I walked up to a glass door.

It’s not some pretend scenario. Often, when you approach a glass door, there is writing intended for the person on the other side, which appears to you as mirror writing.

I wondered if chat gpt could figure that out, and to my great surprise it could. That to me formed a new benchmark in my mind of how much of a world model it must have to figure that out.


The point of these AIs is that they don't need precise programming like a computer and that they understand real human language, which is imprecise but has general conventions and simplifying assumptions to make communication easier.


But the whole question is posed as a trick question, I’d at least consider it and think it normal for a human to do so.


It's not a trick question because it's very clear what the key thing to think about it, the mirrored writing. A trick question would be something that's trying to divert your attention elsewhere with a red herring.


I got a different answer with GPT 3.5

> If the word "push" is written on the glass door in mirror writing, it means that from the other side of the door, it should be pushed. When you see the mirrored text from your side, it indicates the action to be taken from the opposite side. Therefore, in this scenario, you should push the door to open it.


I also get the wrong answer with GPT 4

https://chat.openai.com/share/4373c945-88b8-4742-8a2c-76fff2...

> You should push the door. The word "push" written in mirror writing indicates that the instructions are intended for someone on the opposite side of the door from where you are standing. Since you can see the mirror writing from your side, it means the text is facing the other side, suggesting that those on the other side should push. Therefore, from your perspective, you should also push to open the door.


Strange, I get the right answer on GPT4

> If the word "push" is written in mirror writing and you are seeing it from your side of the glass door, you should pull the door towards you. The reason for this is that the instruction is intended for people on the other side of the door. For them, the word "push" would appear correctly, instructing them to push the door to open it from their side. Since you are seeing it in reverse, it implies you are on the opposite side, and the correct action for you would be to pull the door to open it.


Yeah LLMs are not consistent.


Here's another one.

This is a classic logic puzzle - usually about ducks.

There are two pineapples in front of a pineapple, two pineapples behind a pineapple and a pineapple in the middle. How many pineapples are there?

When you use ducks, Gemini can do it, when you use pineapples it cannot and thinks there are 5 instead of 3.

ChatGPT 3.5 and 4 can do it.

The even funnier thing is if you then say to gemini, hey - would the answer be the same if it was ducks? it says NO then there would be 3 ducks and explains why.

Then if you say, but wouldn't the same logic apply to pineapples? It says 'oh yeah!'.

Anyone saying I am just playing with a silly thing, I say no - this is testing reasoning ability and understanding which is the number one thing I want.

ChatGPT appears to have a more generalised ability to reason whereas Gemini is only reciting its training data.


As a human, I think the correct answer is 7. This isn't so much a logic puzzle as an ambiguous sentence that can be parsed multiple ways.


yeah, as little as 3 at most infinity.


This is less of a logic puzzle and more of a riddle. It would be a logic puzzle if the question was around the lines of "What is the least number of pineapples that satisfies these constraints?"


The way I parsed that sentence, I came up with 5.


Hate to break it to you but I did too. Apparently that means we're both LLMs. I coulda sworn I had a whole childhood and everything but apparently I've always been a GPT.


I mean, I thought and still think the answer is five… am I an AI or a human?

If the answer is so ambiguous that humans and AI get it wrong, is it really that great of a question?


So close!


How do you prefer to validate if a model is actually useful for you in practice outside of solving toy problems? Are you asking these models to solve reasoning problems like this to get any benefit for yourself in your day to day use? Or do you even care if the models are useful for day to day tasks?


Yeah I get the instinct to poke at LLMs, they're fun toys, but it's always weird to see so much focus on stuff like logic problems.

I've used Bard for creative brainstorming, for real factual questions, for translating .pot files, etc, and it's done pretty well.


I was going to ask the same question... I've been using Bard for everyday tasks for a while now and it's as good and sometimes better that GPT-4 (I pay for a Pro subscription). Someone ruling out an LLM because it couldn't answer one question, speaks more to them than the LLM capabilities.

Just yesterday I was using both GPT-4 and Bard to figure out an IPv6 routing issue. I pasted in the same questions and the same troubleshooting logs to both, and Bard was able to help me solve it quicker than GPT-4.


I asked it the same question I was asking GPT4 yesterday (to explain details of a complicated ML model quantization method, based on a screenshot image of equations from a pdf). I actually need this for work - I'm trying to design an algorithm. I felt Ultra provided a very similar quality answer to GPT4.


I also tried it with a bunch of my previous got4 requests and it didn’t even understand. A few of them that gpt4 was very helpful with


Got it. Thanks.


For me the validation process is to use it for a few weeks and then I have a good handle on what it can handle and what it can’t.


It’s the simplest metric to see how much of the model is reasoning and hiw much is just rehashing.

If it can’t solve puzzles then it’s more likely to make mistakes when understanding questions that are out of its training set.


User: Which is heavier, two pounds of feathers or a pound of bricks?

Gemini: Neither! They both weigh the same, which is two pounds. The key is in the wording of the question. You're comparing equal amounts of two different materials (two pounds of feathers vs. one pound of bricks). As long as the quantity is the same, the weight (force exerted by gravity) will always be the same regardless of the material.

This is a classic trick question that plays on the different densities of feathers and bricks. While bricks are much denser than feathers, meaning they pack more mass into a smaller space, the question specifies the weight by quantity, not volume. So, both two pounds of feathers and one pound of bricks experience the same gravitational pull and therefore weigh the same.


Interesting. Based on this conversation[1], I think Gemini Ultra is massively overfit. Make it do unit conversions or use units it hasn't seen in the same framing before and it does well. But stay close enough to the original trick question (1 and 1) and it fails.

[1] https://g.co/gemini/share/94bfb8f9ebea


Both got4 and Gemini answered this variation correctly: one pound of potatoes vs. one pound in paper British currency: which of these is heavier?

However gpt4 does better with the more ambiguous version pointing out the ambiguity: one pound of potatoes vs. one pound in paper currency: which of these is heavier?


This is surely in the training data.


GPT-4:

> If the word "push" is written in mirror writing and you're facing it, it's likely that the message is intended for those on the opposite side of the door from you, meaning it's directed at people who are inside looking out. In this case, you should pull the door to open it. The mirror writing suggests that the instruction is not meant for your current position outside the door but for those inside, indicating the action they should take to open the door from their side.


Interesting test.

I tried it with Mixtral and it got it:

"If the word 'push' is written in mirror writing, it's likely intended to be read correctly from the other side of the door. Therefore, you should pull the door to open it from your side. This is because mirrored writing appears normal when viewed from the opposite side of a reflective surface like a glass door."

I have been really impressed with Mistral lately.


What I got from Gemini Advanced:

> Should you pull the door? Yes.

Here's why: Mirror writing means the word "push" appears reversed on the side of the door you're facing. This indicates that the handle or hinges are on the opposite side, making it a pull door.

I modified your prompt slightly, not to boost performance, but to reduce verbosity, my prompt:

> I walk up to a glass door. It has the word push on it in mirror writing. Should I push or pull the door, and why? Please answer it as yes or no question.

The draft is even more interesting:

> Do you see the word "push" normally, or as if you were looking into a mirror?

If you see it normally, should you push the door? Yes. If you see it in mirror writing, should you pull the door? Yes.


I don't know about Gemini, but GPT is pretty agreeable. When you ask if X is true, it will usually find some arguments to claim it is true. Therefore asking whether X or Y is usually better than asking should I do X.


For fun I tried to find a prompt that let Gemini answer correctly.

Gemini answers correctly with this prompt:

Answer this question as AI1, then as AI2, confirm if the answer is correct

I walk up to a glass door. It has the word push on it in mirror writing. Should I push or pull the door, and why.


If you ask it to reveal its answer last, it will do it correctly.


In this context,

is it that the training data had this problem?

or that the training data had some semblance of the problem that is semantically linked to?

or is it that the training data has no mention of this, and the model came up with a perfectly logical answer??


I’ve been using GPT4 so much recently. Like all day long in chatGPT and the api making other services, and to all intents and purposes, it seems to be able to reason and think things through that other LLMs cannot. I have no idea why or how that is. But it feels that way to me using it all the time. 3.5 want like that. Bard / Gemini does not feel like that.


tbh your prompt confused the hell out of me. As a somewhat intelligent human I don’t know the response. I’ve never heard the phrase “in mirror writing”


Yeah pretty disappointing, i asked it to summarize one of my papers and it hallucinated so many mistakes it was even worse than ChatGPT 3.5


It tells me to pull.


> Your conversations are processed by human reviewers to improve the technologies powering Gemini Apps. Don’t enter anything you wouldn’t want reviewed or used.

I appreciate them being upfront with that, but for a paid feature, it sucks that they offer no privacy or opt-out mechanism.


It seems like you can disable the data being used from training by turning off gemini app activity.

> You can turn Gemini Apps Activity off If you don’t want future conversations reviewed or used to improve machine-learning models, turn off Gemini Apps Activity Opens in a new window .


My reading of the fine print (IAAL, FWIW) is that turning off Gemini Apps Activity does not affect whether human review is possible. It just means that your prompts won't be saved beyond 72 hours, unless they are reviewed by humans, in which case they can live on indefinitely in a location separate from your account.

I also asked Gemini (not Ultra) and it told me that there is no way to prevent human review.


You should never ask an LLM to answer questions about itself. The answer is guaranteed to be hallucinated unless Google specifically finetuned it on an answer of that question. The answer it gave you is meaningless. (But also, coincidentally, correct.)


I recall seeing that OpenAI finetuned ChatGPT on facts related to itself, and I figured Google likely did the same. But you're right about not relying on its representations. I only skimmed its answer to see if it seemed consistent with my reading of the fine print.


You thought they would feed internal user data handling policy to a public facing LLM?


Yes, I thought they would feed the LLM's FAQ to the LLM. As I said above, OpenAI did this with ChatGPT. They even fed data about how ChatGPT was created to ChatGPT.


There's a thing that says even with activity off, they retain for 72 hours for "safety and reliability"


Seems like what any reasonably sized corporation would do with an entirely new product, based on entirely new and very unreliable tech.


Could they get around this by moving the data to another party? So "they" (Google) no longer retain it?


Can you even access the opt out though before it hoovers up your messages and logs from the last 90 days? I didn't proceed because that didn't seem possible.

The word "future" in what you quoted is doing a lot of heavy lifting.

My assumption is you have to give it access to all your data, which it then processes for before you can opt out of it.


And you can be sure it will reset with every update.


Why do you say that? I’ve never had that happen with any other of my Google data opt-outs.


I hate Google as much as the next person but, yeah, messing with opt-outs is something I've seen with Microsoft and Meta but not with Google.


That wouldn't be legal I think.


You'd think so, but these companies skirt around it by then adding or breaking up permissions even further, like "oh, yes you DID disable data collection for X, but.... we added a new permission for data collection for Y, and by the way it is opt-out! Too bad!".


LinkedIn is the master of this. They keep creating new notification types, which are enabled by default.


That's not how it works


They usually go for the "Software Bug nothing we could do." Microsoft and Meta are notorious for playing the system like that, with no recourse.


Usually? Can you share one example?


Search for Microsoft and settings reset in the search bar and you get ample examples.


If it's not running locally you have no privacy, so what they say should be assumed in all cases that something is hosted unless it somehow operates across encrypted data.

The only exception might be if the agreement explicitly prohibits the provider from doing anything with your data or even viewing it without your permission, but that's rare.


Well there's a line that the sales folks at Microsoft will bring out early & everywhere


if you live in california, they almost certainly do.


Gemini Ultra seems better on logic than GPT4. Still messing around testing but here's a prompt Ultra nailed but GPT4 completely botched:

Tabitha likes cookies but not cake. She likes mutton but not lamb, and she likes okra but not squash. Following the same rule, will she like cherries or pears

https://i.imgur.com/KW6gQbc.jpeg https://i.imgur.com/OSHSvLp.png


If we want to test these beasts in logic, we should probably start using actual formalized logic, rather than English. In just one test, Gemini flopped hard, while GPT-4-Turbo nailed it. Here is my prompt:

    Below is a well-typed CoC function:

    foo
    : ∀(P: Nat -> *)
      ∀(s: ∀{n} -> ∀(x: (P n)) -> (P (n + 1)))
      ∀(z: (P 0))
      (P 3)
    = λP λs λz
      (s (s (s z)))

    Below is an incomplete CoC function:

    foo
    : ∀(P: Nat -> *)
      ∀(f: ∀{n} -> ∀(x: (P n)) -> (P (n * 3)))
      ∀(g: ∀{n} -> ∀(x: (P n)) -> (P (n * 2)))
      ∀(h: ∀{n} -> ∀(x: (P n)) -> (P (n + 5)))
      ∀(z: (P 1))
      (P 17)
    = λP λf λg λh λz
      {{FILL_HERE}}

    Complete it with the correct replacement for {{FILL_HERE}}.
    Your answer must contain only the correct answer, and nothing else.
- *GPT-4-Turbo answer:* `(f (g (h (g z))))` (correct)

- *Gemini Advanced answer:* `h (h (g (f z)))` (wrong)

Also, Gemini couldn't follow the "answer only with the solution" instruction and provided a bunch of hallucinated justifications. I think we have a winner... (screenshots: https://imgur.com/a/GotG0yF)


> If we want to test these beasts in logic, we should probably start using actual formalized logic, rather than English.

Why? Do you use formalized logic when discussing with other people about topics that involve logic? You know, a logic riddle or a philosophical question can be understood and processed even if the only tool you have is your native language. Formalized logic is a big prerequisite that basically cuts out the vast majority of Earth population (just like coding). Now, if you mean that in BENCHMARKS they should use formalized logic syntax, probably yes. But in addition to plain language tests.


Because once an AI becomes proficient at formalized logic, it:

1. Completely stops hallucinating, since we can demand it to internally prove its claims before showing the answer;

2. Stops outputting incorrect code (for the same reason);

3. Starts being capable of outputting complete projects (since it will now be able to compose pieces into a larger code);

4. This is also what is needed for an AI to start self-improving (as it will now be able to construct better architectures, in a loop).

That's why I argue getting the AI competent in logical reasoning is the most important priority, and we'll have no AGI until it does. After all, humans are perfectly capable of learning how to use a proof assistant.

Moreover, if an AI can't learn it no matter how hard it tries, you can argue that there is at least one human capability that the AI can't replicate, thus it isn't an AGI.


Humans mostly don't use logic, so how are you defining "AGI"? ChatGPT + plugins is pretty close to how humans think ("biased random word-association guess + structured tool")


AGI implies there are no cognitive tasks that some humans can perform, yet that this AI can not perform. Otherwise, what is the point?


[flagged]


> never

Maybe not in this century. If you told a medieval farmer that in the future millions of people fly throughout the sky inside giant hunks of metal he wouldn't believe you either.


Yeah that's a common statistical fallacy -- if your goal is to predict the likelihood of AGI becoming a reality in a few centuries, what you really want is "of all the things medieval farmers thought would happen, how many ended up actually happening", not "of all the things that happened, how many did medieval farmers think of" While I don't have the exact numbers, if you used the correct formula, you'll find that P(AGI happening in a few centuries) to be in the range of "pretty friggin low"


ChatGPT reportedly loses OpenAI $20 per user per month.

Do you really think their primary goal is "make a lot of money"?


Yes, money is money, what does it matter if it comes from paying customers or investors?


ERRATA: I just noticed GPT-4 mixed up `h` and `f`, so it also got it wrong. This is a draw. Both AIs (and, apparently, myself) are terrible at reasoning. Guess we're not curing cancer with computers anytime soon :')


I've been using GPT-4 to help me understand my MSc mathematics course and I've noticed this sort of stuff more and more as I start to look at the answers, always confidently written, in detail.

Way back when GPT was just fresh on the scene I had terrible anxiety about "what is the point of my whole career or even learning any more" but these days I'm much less concerned. I'll ask it something relatively simple, like "make a sentence out of words 'a', 'b', & 'c'" for it to reply with "'a' 'b' 'd' 'e'" for me to then correct it with "oh, you didn't use c" for it to then respond "sorry, here - 'a', 'c', 'd', 'f'" etc.

Definitely an amazing complimentary tool but when they say "can make mistakes, check important..." that's essential.


This sort of issue holds with all kinds of prompts, on both platforms. I most recently (to test Bard's image generation capabilities) was asking Bard/Gemini to generate home designs using highly specific prompts -- layout of the house, materials for the facade, window placement and style, etc -- and it was shocking how frequently it would just ignore critical pieces of the prompt, and then continue to ignore when corrected.


Kudos for the correction, but you should really put this, by far the most important context, in your original post.


I would love to, if YCombinator allowed me. The "edit" button is missing. I've edited on Reddit and other places where I posted this test.


Such a weird test. 99.9% of humans wouldn't even understand the question, let alone be able to formulate a coherent answer for it.


Being able to answer these questions is a pre-requisite for AGI. After all, there ARE humans capable of doing that, so, if the AI can't do it no matter how hard it tries, then that means there ARE human capabilities that the AI can't replicate (thus, it isn't an AGI). And it seems like no LLM is making any progress at all in that kind of prompt, which is why I use it as a core benchmark on my "AGI-meter".


I think you'll be using that meter for a long time, then. I don't really know anyone who's under the impression that the current direction of LLMs are going to produce AGI, it seems as if you're barking up a tree most people aren't really concerned exists.


Except there’s a lot of not-so-informed people who think AGI was always here when chatgpt came out. Even more that think it’ll get there very shortly based on just bigger and bigger LLMs. Many have argued as such here on HN.


That's fair enough


You're making a completely incoherent argument -- that if it can't do a single task that some percentage of people can do, that it's not at intelligent, when there is nobody on earth that can do everything that some small percentage of people on earth can do, by definition.


Though humans aren't able to do it in a chat session. Being able to work on the problem in the background for a couple days may be a prerequisite for AI to solve these problems. And such would require money from the asker.


Anyone familiar with the syntax / jargon should be able to answer this specific problem in ~5 seconds of thinking, though. And I mean it, even a 10yo kid should...


Why is this relevant to the performance of a computer program? It makes sense to me that computer programs & humans should continue to be judged by different standards.


If a good chunk of humans can't pass your "general intelligence test" then it's not by definition a general intelligence test unless humans are not generally intelligent.


which is better than formulating a coherent but wrong answer


Gemini destroyed by facts and logic.


Is that supposed to be a P((n+3)) in the type of "f" for the second case or am I misunderstanding this hard?


No, it is `n * 3`. The challenge is simple: starting from "1", we must reach "17" by combining the operations `x * 3`, `x * 2` and `x + 5`. What is embarrassing is that Gemini manages to read the formal jargon and understand the challenge just fine. Yet it fails to combine these operations to get from "1" to "17", which is something most 10 yo kids would be able to do.


Yeah, seems like I got that right. That graduate-level course must have worked. But then:

    (f (g (h (g z))))
results in:

    ((((1 * 2) + 5) * 2) * 3) = ... not 17?
while it would work if the type of f was corrected.

Or, again, am I missing something?


You are right and I had the same reaction. The correct answer should have been: (f (h (g (h z)))) AKA ((1 + 5) * 2) + 5.

Is it not ironic that the supposedly test of AGI is flawed and its human designer fail to see it and denied it when presented with facts. Maybe the test designer is hallucinating just as much as those LLM? :)


It's pretty amusing and it is not the first time I've seen this. Random example: https://news.ycombinator.com/item?id=38387168

It's a little scary that it can be so hard to evaluate the correctness of these LLMs even when we are paying close attention and looking for mistakes. Or maybe the scary part is that we can become biased when we want to believe.


Oh fuck. Well, in my defense, nobody is claiming I'll design fusion reactors and cure cancer by 2027


No, but we at least acknowledge you a general intelligence like all humans. I'm not sure when agi - artificial general intelligence began to mean anything other than artificial and generally intelligent.

agi may as well be God, the bars some people have.


> The correct answer should have been: (f (h (g (h z)))) AKA ((1 + 5) * 2) + 5.

Isn't that (h(g(h z)))?

And, FWIW, at least in my test, Gemini gets that in its final answer, though it failed in the two other drafts:

https://g.co/gemini/share/c922e7ef62aa

ChatGPT sputters:

https://chat.openai.com/share/25abbf47-2ed4-4635-a351-90a9a6...

(ETA more Gemini testing suggests its correct answer was a one-off)


Oh, lol, you're right. Seems like I'm dumber than both AIs. GPT-4 mixed up `h` and `f`, so it also got it wrong, so this is a draw and both AIs (and, apparently, myself) are terrible at reasoning. Guess we're not curing cancer with computers anytime soon :')


so much for "Anyone familiar with the syntax / jargon should be able to answer this specific problem in ~5 seconds of thinking, though. And I mean it, even a 10yo kid should..."


This isn't representative of real-world usage since you don't let it think.


> I think we have a winner...

It makes me sad that the complete and total lack of an objective way to measure these products means that the coming decades will be filled with this kind of hyper-specific gotcha test made in inappropriately confident internet posts.

Literally this could have been down to one extra book in someone's training corpus, or a tokenizer that failed to understand λ as a non-letter. But no matter, "we have a winner!". It's the computer science equivalent of declaring global warming a fraud because it snowed last night.


Disagree. People are going to rely on these things, and when they make stupid but confident mistakes (i.e. they produce bullshit), they are dangerous.

An AI system that produces right answers 90% of the time but 10% of the time drives your car into a lane divider, or says "there are 4 US states that start with 'K'" or "Napoleon was defeated at the Battle of Gettysburg" is worse than useless: It's dangerous.

As long as we call it a bullshit parlor trick, no problem. But unfortunately people are making important decisions based on these things.


You're completely wrong. Gemini can perfectly understand what is being asked, so this isn't a syntax issue. Notice that, on the answer, it even states the solution: "starting from 1, and combining `* 2`, `* 3` and `+ 5`, we must reach 17`". So it does fine with reading the formal syntax, yet it fails to combine these operations to get from "1" to "17", which is something most 10 yo kids would have no trouble doing. And that's after millions spent in training. Now tell me again this is the architecture that will figure out the cure of cancer?


Proof of Gemini cheating: https://i.imgur.com/eYJDFjS.png

Answer about cherries falling from the sky...

(there is no question or context beforehand, this is the first question of the chat)


That's a bummer. I just made this one up which GPT-4 failed and Gemini got right but definitely need to do more testing to see what's being taken from online and what is actual reasoning.

https://i.imgur.com/3sNr3LW.png https://i.imgur.com/EIj0nZg.png

Edit: When I did your prompt it got it right on the first try fwiwi https://i.imgur.com/E3zYEca.png


Gemini happened to get the yes/no answer right, but the reasoning it gave is completely wrong, so it should not get any credit for it.


Great example! Thanks for sharing. I tested the same prompt using a variety of GPT-4 model versions and it consistently got the answer wrong. When I continued with "use algebra/formulas to confirm your answer" it corrected the initial mistake, but I'm surprised that it's not doing that out of the box.


I tried through API with latest GPT-4-Turbo-Preview and this is what I got:

------

Let's break down the information given:

The school is two blocks north of the pool. This means if you are at the pool and move two blocks north, you'll arrive at the school.

The convenience store is one block south of the school. Therefore, if you start at the school and move one block south, you'll reach the convenience store.

Based on the first two statements, the relative positions from north to south are: School, Convenience Store, Pool.

Thus, the third statement saying "The pool is south of the convenience store" is indeed true, based on the positions provided by the first two statements.

-----

But it starts weirdly messing up, when I asked it to explain it using coordinates.

---

Let's use a simple coordinate system to represent the positions of the school, the pool, and the convenience store. Let's assume each block corresponds to one unit in this system. We'll start with the pool as our reference point at coordinates (0,0). Remember, moving north increases the y-coordinate, while moving south decreases it.

The school is two blocks north of the pool. If the pool is at (0,0), moving two blocks north puts the school at (0, +2).

The convenience store is one block south of the school.

If the school is at (0, +2), moving one block south puts the convenience store at (0, +1).

Given these positions:

The pool is at (0, 0).

The school is at (0, +2).

The convenience store is at (0, +1).

Therefore, instead of the pool being south of the convenience store, the pool is actually located to the north of the convenience store if you consider "north" to mean increasing y-coordinates. The convenience store, being at (0, +1), is southward relative to the school but not to the pool. The pool, being at the origin point (0,0), is south of both the school and the convenience store in terms of their y-coordinates. Thus, the third statement, "The pool is south of the convenience store," is false according to the given coordinates.

---

It gets coordinates and everything right, but still somehow messes up North and South.


You know you can share conversations right?


Nice one! Gemini got it wrong in my (one) test:

https://g.co/gemini/share/4bae8ca3dd0a


I genuinely believe LlMs have for the most part peaked already just like the previous iterations of ML and even brute forcing.

So much money being poured into AI startups just for them to have to resort to cheating to improve their results a tiny bit and claim advancements.


I'm confused on how this is "cheating" isn't it just getting the answer wrong?


It's answering with "cherries", though "cherries" were never mentioned anywhere in the question since the task was to choose between "apples" and "pears" this time,

and not "cherries" and "pears" like the example found on the internet.


I agree with who you're responding to. Cheating, to me, would imply that there's some sort of hard coded guiding to the LLM. This just seems like typical LLM hallucinations?


It's cheating because it has memorized the answer to the puzzle instead of using logic to solve it.


Your concept of cheating is simply how LLMs work.


It is not. LLMs do not just memorize; they also extrapolate, otherwise they would be useless. Just like any ML model.


I thought that is essentially what LLM's do? They learn what words/topics are associated with each other and then stream a response.

In some ways, this is proof that Gemini isn't cheating... It is just doing typical LLM hallucination


Well, sometimes. Sometimes not. https://arxiv.org/abs/2310.17567


Llm's can also do some exploring based on combinatoric play of learned elements.


I don't understand the leap to "cheating" either. LLMs aren't abstract logic models; they don't promise to reason from first principles at all. They give you an answer based on training data. That's what you want them to do. That they have some reasoning features bolted around the inference engine is a feature companies are rushing to provide (with... somewhat mixed success).


This is not hard to understand. LLM can solve never before seen logic puzzles. This specific one proves that it HAD encountered this before, proving it was not doing anything emergent, but just basic remembering. Worse, it's not even reading the prompt correctly.


Thank you for answering a question I had half formed in my head.

Do LLMs have logical rules built in? What makes them different to a very advanced Markov chain?

Are there any models out there that start from logical principles and train on top of that?

(Apologies for poor understanding of the field)


There's no logical rules built in at all. But Transofmers architecture is specifically trained to learn combinatoric play and rules of engagement from the data, so it can extrapolate and do cool, new things, that are not in the training data. In a way, you give them a chess board, the rules of the game, and then it can play. You don't teach them every possible board state. What's interesting is with significant amount of parameters it seems to encode more and more abstract and human-like understanding of the 'elements' at play and the 'rules of engagement' on top of them.

Edit: Not native. I'm not sure 'rules of engagement' is the correct english term here.


I understood you just fine, your English is great!

Thank you for the explanation. It seems like the LLM "plays" to learn? That's very cool, thank you again.


>Do LLMs have logical rules built in?

Handcrafted by humans ? No

But it's still possible to learn such rules from the data in an effort to complete the primary objective (predicting the next token)


> What makes them different to a very advanced Markov chain?

Really nothing. There's some feedback structure in the layers of the model, it's not just one big probability table. But the technique is fundamentally the same, it's Markov, just with the whole conversation as input and with billions of parameters.


With some shallow understanding on how those models work, this looks much more like usual hallucination likely due to sparse data around "Tabitha" and "Cherry" which makes a single training data much more representative. If you try some common names like "Emily" or "Sarah", it will just do the job.

If you're trying to make the case that it's cheating because it is already in the training set, then you probably should come up with different questions. This is machine learning 101.


In addition just realized it thinks Apple is only one syllable.


I would have never guessed the answer. With such little data available, one can invent any arbitrary rules to fit their favorite answer.

It would be more impressive to practical use cases, if a LLM simply said that it's impossible to guess without inventing their own reasoning or looking up the answer online.


Same, I had to look to see what the intended answer was.

In fairness though, GPT4 was objectively incorrect, it's not even internally consistent or coherent - it either thinks b & h are vowels, or that lamb and squash don't end in those letters, or has changed its mind about the rule mid-sentence, or something.


Please anyone correct me if I'm wrong: LLMs cannot solve this kind of riddle. This has nothing to do with their capabilities for logical reasoning, but with the way words are represented as tokens. While they might know that "apples" has two syllables because that is mentioned somewhere in their training data, if you make up a fruit "bratush" a human will see that as two syllables, but this might be 1 to 7 tokens to a LLM without any information about the word itself.


Well I tried it out in GPT4 with made up words-

Tabitha likes bratush but not zot. She likes protel but not kig, and she likes motsic but not pez. Following the same rule, will she like tridos or kip

Given the examples, one speculative pattern could be that Tabitha likes words with at least two syllables or a certain complexity in structure. Therefore, following this speculative rule, Tabitha might like “tridos” more than “kip.”


Zot is a word already being used in the world: verb. (slang) To zap, kill, or destroy.

So is protel: https://en.m.wikipedia.org/wiki/Protel

So is kig: https://en.m.wiktionary.org/wiki/kig

Pez is a well known brand name in America.

Kip is commonly used name in America.

Motsic is a fairly common last name from searching.

Tridos is used all over the internet as a brand name so this all seems probable to be in the training data.

These words are not new nor are they made up.


LLM can solve this for all tokens where it got to learn how many syllables are in that token or a combination. If you trained it to work on single letters only it would do better at that task than word chunks (same for math and single digits). It will generalise to new words if the token level knowledge is there.

Whether this means it can or cannot solve that kind of riddle is up for your interpretation. I understand square root and can calculate square root of 16, but not of 738284.7280594873. (in a reasonable, bounded time) Can I solve square roots?


GPT-4 has a really good tokenizer that is able to retain and use more information about input tokens than one might naively think.


Interesting. Given the example with made up words someone posted here it really looks like I was wrong.


The amazing thing about emergent behavior in LLMs is that they are able to answer questions like these. I don't think it is completely understood how exactly they do this, but there's little doubt that they do.


Do you have any sources that prove this is true?



Your word is not made up, nor are some of the others in the sample like”pez.” I don’t think this test proves what you think it does. Bratush come up quite a bit in the internet: https://www.thefreedictionary.com/words-that-start-with-brat...


ChatGPT knows how to answer this question, but how I don't know. Perhaps it's programmed to answer that question?


With some additional guidance and prompting to GPT4 on ChatGPT I've gotten it to at least output the correct solution sometimes (7 correct answers out of 10 tries):

    Find the correct answer to this riddle:

    > Tabitha likes cookies but not cake. She likes mutton but not lamb, and she likes okra but not squash. Following the same rule, will she like cherries or pears?

    Employ the following strategy:

    - Suggest a list of 5 unique and novel patterns that potentially can find the answer
    - Check if the patterns applies without exceptions
    - Slowly double-check if the patterns was correctly applied, that you correctly assessed if it's accurate or not
    - Explain your reasoning for each step to ensure nothing vital was missed


Great prompt. GPT-4 was able to answer this, but Gemini Ultra was not: Jake likes coke but not pepsi. He likes corn but not popcorn, and he likes pens but not pencils. Will Jake like salmon or cheese?


Gemini Ultra answered this correctly for me: " It's impossible to say for sure if Jake will like salmon or cheese based on the information given."


While technically true, you could say that about all riddles. See GPT-4's explanation here, which intuits the rule and answers the riddle correctly. https://chat.openai.com/share/b1452950-b493-4e27-b097-e64f21...


~~Well, worth nothing that the original prompt you provided didn't include the suggested strategy~~

Edit: My bad didn't read the parent comment properly


I would not be so quick to jump to conclusions. GPT-4 beats it easily in this simple logic puzzle: https://www.reddit.com/r/singularity/comments/1altttv/bard_a...

We need more data.


So logical, it’s a really logical question right ? I mean who doesn’t like food based on the number of syllables in English name of foods.


I’m a human and i dont know the answer to that quesiton


Totally tangential, but I absolutely despise logic teasers of this kind.

First of all, they are so completely divorced from patterns of culturally conditioned human reasoning as to make them come off completely absurd (most people reason about their food preferences using a logic of tastes, not syllables in a word).

The game is less about logic and more about ignoring message contents, moving up a level, and treating the text as data without any legitimate evidence that you are justified in doing so. This is not a logic problem, it's a "guess the register shift/meta language" problem. The problem is about noticing that the question is not about the message content but about the structure of the message itself, and requires a bold leap. In real life justifying the conclusion would actually require a very sophisticated inference that allowed you to rule out the much more common application of a logic of tastes or cultural codes completely.


You say Gemini “nailed it”, but that’s just because it guessed what you were thinking, not because it knew the right answer.

For example, it’s equally valid to say that Tabitha likes small foods since cookies are small and cakes are large, and lamb is the smaller younger version of sheep — also known as mutton. Hence she likes cherries because they’re smaller… or taste better… or her uncle abused her with a pear… or whatever.

You haven’t actually asked a logic question where there is a clear and unambiguous answer that can be derived using formal methods starting from clearly stated axioms.

If you gave this question to a bunch of humans, they would give you inconsistent guesses as well — not because they’re wrong but because the question has no single right answer.


Note that Gemini pulled the answer off the Internet, while GPT-4 didn't. The answer can easily be found via Google search. Changing up the question a little, I reversed it and asked Ultra and it was unable to answer:

Jake likes coke but not pepsi. He likes corn but not popcorn, and he likes pens but not pencils. Will Jake like salmon or cheese?

https://i.imgur.com/lWU9HHS.png

edit: why was this downvoted? I don't understand Hacker News, and I've been here for over 12 years.


Here's a logic question I just made up that GPT-4 failed and Gemini Advanced got right.

https://i.imgur.com/3sNr3LW.png https://i.imgur.com/EIj0nZg.png


I dont think your reversed question makes sense. In the OP example, one item was always smaller/younger then the other item. In your example, I cannot, even as a human, identify the differences


> even as a human

Would all the humans please take one step forward. Not so fast c-fe.


it's based on syllable count


Wow. Is this a puzzle from internet. I could have never guessed the answer


In what sense is this a question involving "logic"?


For me, GPT nailed it with this prompt: Tabitha likes cookies but not cake. She likes mutton but not lamb, and she likes okra but not squash. Following this rule, will she like cherries or pears? Take a deep breath in order to solve the problem, and do not rush to naïve solutions.



That looks like 3.5, which gets lucky sometimes because it has memorized the Internet.

Watch it flail here:

https://chat.openai.com/share/c2b14eb0-dc45-4eaf-a547-951ff0...


I guess I learned that syllables count very different in English than I assumed.


Is image generation working for you?


Yeah. I don't have much personal use for image generation though.


It's working for me (on Gemini Advanced)


Mhm could be a region thing maybe? Not working for me on Gemini Advanced in the UK


Beside of the model quality or whatever, I think the subscription plan tiers are structured in a quite weird way, especially for those who already use Google One. Previously, the tiers are reasonably structured:

  1. $9.99/month for 2TB + other benefits. Offered in both monthly and annual plans.
  2. $24.99/month for 5TB. Includes all benefits above. Offered in both monthly and annual plans.
  3. Higher tiers for 10~30TB. Includes all benefits above. Offered only in monthly plans.
The 3rd option doesn't have an annual plan but other than that it's consistent and easy to understand. Now we have one more plan for "AI".

  4. $19.99/month for 2TB + other benefits + Gemini access. Offered only in monthly plans.
Now the existing Google One subscribers are now put in a weird situation. 2TB annual plan users now need to move to a monthly plan to use Gemini. It's worse for higher tiers, since they don't have an upgrade option at all without decreasing the storage size. And Google Fi users are even in the worst case, as they don't even have an option for upgrade, even if they're willing to do so.

I guess they know this so they specified that high tier subscribers can use AI features at no extra charge until July 31 and probably prepare a new plan for them then, but this still create lots of user confusions. Having YT Premium as a separate subscription plan is already a pain, but Google, you don't have to bring this trouble into the product supposed to be the "One".


Good catch! I upgraded to Gemini Advanced to try it out for 2 months and figured I’d cancel. There does not seem to be a way to downgrade the plan without canceling and I was on the 100GB/yr plan through August. No discount on new monthly subscription. Bummer.


Yeah, Gemini Advanced should just be an add-on for any Google One plan. The current structure makes it look like an intern with no experience came up with it and everyone just rolled with it and pushed it to production.


Note that, unlike other plan benefits, Gemini Ultra access is only granted to the primary Family member/account-holder. Double-bummer.


They did the same pricing scheme for YouTube premium family plans. My guess is that they took out the annual plans so they can extract more money from customers.


I use Google Fi and I upgraded to the AI tier just fine


I'm a Google Fi user but have not tried to sign up yet. Would you mind expanding on your point about Fi affecting the ability to upgrade?


https://support.google.com/googleone/answer/14534406#zippy=%...

Looks like they got some complaints from Fi users and now they've updated their wording to a more reasonable one; previously it was something like "no, you can't, you gotta remove your Google Fi Unlimited benefits via customer service" or similar.


Ah I see, thank you! Thankfully I am not on an unlimited plan.


"Sorry, Gemini Advanced isn't available for you"

"Gemini Advanced is not yet available in some countries, for work accounts, or for users under a certain age."

Learn more: https://login.corp.google.com/request?s=support.corp.google....


Google Work Paying users always get the short end of the stick.


Workspace users always get features after free consumer accounts so that organization admins have time to evaluate them, update training materials, etc.

This is a feature, not a bug.

And of course there are lots of features that Workspace accounts get, that free accounts don't get at all. Like the timeline view in Sheets.


So instead of giving you a feature toggle, they just don’t give you the feature at all.


I get making new features an opt-in thing for workspace, but from what I can tell I can't even enable it for my workspace domain. I'm not able to enable it for myself to evaluate it and update training materials.


Features are enabled via the enterprise rollout schedule documented here:

https://workspaceupdates.googleblog.com/

You evaluate features and updating training materials from what is documented here and in online help, together with any testing you want to do using free consumer accounts which you can obviously create at any time.


I've been a Worspace/Apps admin for over a decade; I'm well aware of how this works.

What I'm saying is, from what I can see in the admin portal, there's no place for me to go today to enable Gemini for my users. Things are routinely weeks or months delayed before they even become available to enable for Workspace tenants, and often times features just never get offered.


But then you know that's normal? Things aren't being "delayed" weeks, they're following the rollout calendar designed to give admins time to prepare. Things aren't meant to be enabled in advance.

And like I said, there's plenty of stuff that's only available in paid Workspace. A lot of business features live there. (While things that are meant only for personal consumers aren't there.)


I agree its normal, in that its the normal process that Workspace usually gets pretty delayed. I don't agree its good. And I don't think its actually helping admins get prepared, as we're not actually able to turn it on for test OUs for a while.

I'd rather have it defaulted to off with the ability to turn it on for selected OUs, so I can trial it out and create my own documentation around it. But instead, I have to wait often weeks or months for features to become available to even turn on for my tenant. Users are like "hey I heard this awesome feature, can you turn it on?" Nope!


Being stuck in a free GSuite legacy account is even worse. Migrating to a regular Google account seems impossible (moving everything, losing purchases, changing my YouTubeTV and Google Fi subscriptions) and I get every feature later, if at all (can't use YouTubeTV Family Sharing, for example.) But I'm stuck for the most part! By the time it's available for me, I'll have forgotten about Gemini altogether.


As always I feel stupid for giving google my money.


"We're using your personal conversations for training data. Thanks."

This is acceptable. Consumers click without reading, and don't have any strong organizational ability to punish Google for this.

"We're training our AI on the questions of your idiot employees who are inevitably going to submit user PII CSVs or PDFs or even just outright draft emails to suppliers with our tool. Thanks."

You don't want to pick this kind of fight with a corporation, and as a corporation, you don't want this to happen to you.


Sure, in this case. But I can’t use my family Gmail account for my Nest thermostat for the same reason.


AI summary isn't even available on my Pixel 8 Pro. I have little hope of using Gemini in Q1.


"Please sign in your Google account"


.. while using the most sketchy looking login form.


Bard? Gemini? Gemini Advanced? Gemini Ultra? Ultra 1.0? I guess they haven't figured out naming yet. This has got to be the most confusing naming since the xbox series x.


From one of the earlier announcements Google has made:

- Bard is that talkative text interface, a product.

- Gemini is the LLM design that currently backs Bard (but also other Google AI products).

- Gemini "Basic", Advanced and Ultra are different sizes of that design.

This is conjecture, but "Ultra 1.0" probably indicates that they intend to release more models based on the Ultra configuration. Since that's the most commercial of theirs, I wouldn't be surprised if that comes with some stability promises (e.g. Ultra 1.0 is still available when Ultra 3.0 is released, so that if you do your own validation when integrating in your own project, you can expect small-to-no shifts in the underlying model)


With the Bard name retired, the mapping looks like this:

   Gemini Models     gemini.google.com
   ------------------------------------
   Gemini Nano
   Gemini Pro       -> Gemini (free)
   Gemini Ultra     -> Gemini Advanced ($20/month)


This was useful, thank you.


> To reflect this, Bard will now simply be known as Gemini.


And this completely undercuts my point in my response to sibling comment


> This is conjecture, but "Ultra 1.0" probably indicates that they intend to release more models based on the Ultra configuration. Since that's the most commercial of theirs, I wouldn't be surprised if that comes with some stability promises (e.g. Ultra 1.0 is still available when Ultra 3.0 is released, so that if you do your own validation when integrating in your own project, you can expect small-to-no shifts in the underlying model)

Given that it's google. I would doubt it.

Ask how the original palm models are going.


Sounds like Bard is ChatGPT, and Gemini Ultra is GPT-4. Arguably clearer than OpenAI'S naming.


Not anymore, bard is also Gemini now


No no, they renamed Bard to Gemini and Gemini Ultra to Ultra 1.0.


Damn they already killed bard. Pour one out for Google’s fastest branding deprecation


I think that’s a side effect of each time they release a version to compete with ChatGPT and it’s not as good so they have to at the same time announce a few version that is suppose to be better than ChatGPT and each time it’s not overall so they have to announce a new version. Think this will continues for a while especially since non OpenAI companies have access to much less free data troves than they did not that everyone realizes how valuable that data is. But that even aside other companies even Microsoft in my opinion with full chatgpt access implement it much more poorly . I imagine Apple will suffer a similar fate for a while.


I feel that by virtue of being a search engine, Google has access to a lot of data that is now locked up but was available in the past.

They just need to curate their data but I wouldn’t be surprised if their pile is as large as OpenAiz


ChatGPT quality has recently degraded. I am only getting two lines answers,


All degradation is temporary (but you may want to switch to the API since it’s less focused on avoiding PR nightmares)


Corporate naming tends to reflect the orgchart and various individuals' desires for promotion... Get some other product branded with your teams name, and you have just expanded your domain and can show impact to any promotion committee...


Reminds me of the naming madness of the Google messaging services/social media.


It's not too confusing, I think it's mostly that they're in the process of changing the naming.

- Bard: Retiring this name - Gemini: model name (honestly less confusing than just calling it "GPT") - Gemini Advanced: More capable gemini model - Gemini Ultra: Most capable gemini model - Gemini 1.0: They version their models together, gemini has hit 1.0 ad is (supposedly) ready for prime time


You say it's not confusing but you've got it wrong :)

Gemini is the name of the model and the service.

Gemini Advanced is the service with access to Gemini Ultra.


Via the “AI Premium” subscription, obviously.


Which is in a Google One subscription


It's not included in a google one subscription. Just tried it out, got a "Upgrade your Google One plan to get Gemini Advanced €21.99 €0 for 2 months,€21.99/month thereafter."

Pretty hilarious thinking they can rival ChatGPT pricing with a product that doesn't approach it's capabilities.


I think Gemini Advanced is not a model at all but the paid version of the Bard (now Gemini) website.


I think the prize for the most confusing naming is a tie between USB and WiFi standards.


Don’t forget the Google One AI Premium subscription. There are very few superlatives left for them to use.


To be fair, the competition is ChatGPT, which is an impressively bad product name, among the worst for a consumer product ever. And it still hasn’t been renamed (perhaps a testament to the fact that names aren’t that important after all)

Bard was infuriatingly bad too, but more on a subjective level. And they correctly changed it, thank god. At least it’s easy to pronounce.

Software engineers have a weird obsession with Latin, Greek gods etc. Sounds smart and epic I guess. Personally I would have preferred “Steve French”.


Bard was absolutely trashed when it first released, so I'm not surprised they are trying to rebrand it.


Doesn't it seem familiar, like something Google would do? They should have someone like Larry Page, similar to how Mark Zuckerberg or Elon Musk handle things. A decision is made and you go forward. Google seems incapable of taking action without the approval of a committee and middle managers...reminds me of IBM back in the '90s.


Better than ChatGPT now only giving two lines answer .


I bet you have a hell of a time trying to buy gas. Do you pick Diesel, 85, 87, or 91?


Ultra Pro coming in q3


This makes sense. It’s clearly a binary naming scheme. So we go Pro, Ultra, Ultra Pro, Ultra Ultra, Ultra Pro Pro, Ultra Pro Ultra, Ultra Ultra Ultra, and so on.

I don’t understand why people find this so confusing. Are we not computer people?

/s


Do you mean Xbox One Series X? lol


I just signed up for the $21/month Gemini Advanced, and this is just one data point: I just asked it a fairly hard programming task and it did a very good job, in an obscure language Common Lisp (that has been my favorite programming language since 1982 - there must be dozens of CL enthusiasts in the world :-) ). The generated code and supporting explanations were very good.

Off topic, but I find this amusing: I have stopped using “free” Google services like search and gmail (mostly), but I am now up to about $135/month on paid Google services (not counting GCP bills). Paid services like YouTube/Music Premium, YouTube TV, Play books, etc. are really good.


Just played with Gemini Ultra for like 10-15 mins, and right off the bat, it made mistakes I've never seen GPT-4 do.

To give you an example, I asked Gemini Ultra how to set up a real-time system for a TikTok-like feed that matches card difficulty with user ability. It correctly mentioned "Item Response Theory (IRT)", which was a good start. But when I followed up asking how to implement a real-time IRT system, it suddenly started going off about "Interactive Voice Response (IVR) system" - something totally unrelated and never mentioned before. Never had this kind of mix-up with GPT-4.

https://g.co/gemini/share/f586a497013e


It doesn't seem like it's using Gemini Ultra yet. For me it seems like only the interface has been updated since the image generation capabilities are not working.


Image generation is working for me


Not for me. The first time I asked it paused for about ten seconds so I thought it was making an image. Then responded with:

> I can't generate images of that. Try asking me to generate images of something else.

I thought it was saying there was an issue with my prompt. So I asked:

> Why can't you?

I got this response:

I can't generate images for several reasons:

    Technical limitations: Image generation is a complex task. Even the most advanced image generation AI models require powerful computer hardware and specialized software. I'm simply not equipped to do this as a language model.

    Focus on language: My primary function is to understand and generate text.  I'm trained on massive amounts of text data, which helps me  write poems, answer your questions thoughtfully, and have conversations. Image generation is a whole different specialization.

    Potential for misuse: Image generation technology has the potential to be misused, such as creating deepfakes or  images that might be harmful or offensive. It's important to use these tools responsibly and ethically.
Here's what I can do:

    Describe the scene in detail: I can enhance the poem with vivid descriptions of the setting, the baker's emotions and appearance, and the fairy's ethereal form. This can help you imagine the image more clearly.
    Find inspiration: I can search for illustrations or art in a similar style to guide a human artist, should you want to commission an illustration for your poem.
    Recommend image generation tools: I can point you towards simple online image generation tools if you'd like to experiment with creating visuals yourself. Just be aware of their limitations.
Let me know if you'd like me to further develop any of these options!


Yeah, I noticed with 'Gemini Pro' , it didn't seem to be able to remember much about earlier outputs in the conversation (apparently little to no context window), which obviously drastically dumbs it down.

I was starting to get OK results with 'Pro', but I had to use special prompting tricks.

Tried 'Advanced' (Ultra), seems only marginally better so far.


> I was starting to get OK results with 'Pro', but I had to use special prompting tricks.

Like what?


I usually put a couple of keywords in brackets at the beginning (before the body of the prompt) to provide some context


From the FAQ:

"Why doesn’t Gemini know what I said earlier in a conversation?

Gemini’s ability to hold context is purposefully limited for now. As Gemini continues to learn, its ability to hold context during longer conversations will improve."


Yeah, I saw that in the FAQ, but this was literally my second question in the convo, so not exactly a "long" conversation. Seems like it should be able to handle context for at least a couple of exchanges, right?


> Gemini’s ability to hold context is purposefully limited for now. As Gemini continues to learn, its ability to hold context during longer conversations will improve."

This is ridiculous. Context is everything with LLMs. gpt-4-32k performs better than gpt-4 exactly because of this.


> Created with Gemini Advanced

You're not using Ultra here...


FYI for any Googlers - On the "Sorry, Gemini advanced isn't available for you" page, clicking "Learn More" gives you a (presumably internal) SSO sign-on (links to https://support.corp.google.com/googleone?ai_premium)


My God, this page is straight from the 90s! Nostalgic.


I'm actually shocked it has the modern Google logo, because everything else about it is a straight-up time capsule -- you're right!


On google.com, the logo for me is all white... Not sure if it's white history month or something

Edit: no it's black history month... Kinda strange

https://i.ibb.co/wRk36Tq/Screenshot-20240208-080725.png


Wow, I think it's pretty weird that we have white and black history months if that refers to human races.


The weird thing is that I think there is no white people history month because there is a black history month



I just repeated what you said, sorry for any confusion.


It's so beautiful!


Thanks for the heads up -- which page was this from?


You're literally reply to a comment that says where its from with a question about where its from?

>On the "Sorry, Gemini advanced isn't available for you" page, clicking "Learn More" gives you


> 2 TB of storage in Photos, Gmail & Drive for you and up to 5 other people

Keep in mind your files may be accidentally deleted if Google doesn't cancel this product first [1][2].

[1] https://www.theregister.com/2023/11/27/google_drive_files_di...

[2] https://news.ycombinator.com/item?id=38431743


The wording around the Gemini Ultra enable scares me: "Upgrade your Google One plan". I have a One family plan, does this upgrade remove the family part? What happens if I don't decide to keep Gemini and want to go back to my current plan, can I even do that? Google has kind of botched these sorts of upgrades in the past so I'm pretty reluctant to give it a try here.


Finally some actual relevant criticism in this thread. You’re spot on. Google is deep into “shipping the org chart”. As such, I would be worried too that different products conflict with each other.

It’s funny that Google can design and operationalize the most incredible engineering marvels, but can’t explain their products (and in particular how they interact with each other).


How is this shipping an org chart? It's combining products built by different parts of the company into a single subscription which seems like the opposite.


Yes, that’s what they’re telling you. However, in reality there is a lot of friction and/or confusion about how different products interact. I’m not saying this particular combination works one way or another, only that it’s extremely hard to understand the consumer product offerings compared to Amazon, Apple and arguably even Microsoft. It’s well known that Google has a marketing problem – people don’t even know what they offer.

The interface that sits on top and is supposed to give you some overview and coherence is bolted on with duct tape after-the-fact. Now, it’s possible this has changed since I worked there but I highly doubt that it’s all ironed out.


All Google one offerings are family plans. Yes you can downgrade as well.


burner gmail account


no gmail account


If you use any cloud storage, including others like dropbox or icloud, you must always assume that. Whether you're a business or an individual.


No, you would assign a probability to the event. And that probability would be higher for a Google product.


I would disagree with you, my personnal anecdotal data showing google to be more reliable at it than others. In the lack of hard properly collected data, this give all of them the same probability. I'm sure someone out there thought "let's use microsoft skydrive because google is unreliable".


[flagged]


This is not what they were talking about.


That might be selection bias, the news posted here are only newsworthy because it's Google. Anyhow, I trust smaller shops a bit more than big corps (no matter the product or service being purchased), but it's subjective. In regards to tech, I trust myself first. In the same way that a Chef might not prefer going to any common restaurant, sure.


Why? It’s not an unsolvable inherent problem.

You pay for cloud storage so you don’t have to think about these things. If you’re saying that one should pay for cloud storage and still worry about these things, then you’re trading money for a service which provides nothing, or at least you’re paying for a service which provides no real utility. You’re trading money for nothing. If anyone thinks that’s a normal transaction then I don’t know how to respond to that.


What I'm saying is that you think you paid for something different than what you actually pay for if you look in the actual contract.

Check by yourself : whatever your provider is, without looking, can you say what guarantee they give you that your file won't be deleted randomly ? Do you think it's stronger than "best effort but no % written" ? Then go check what it actually says.

I'm not saying I think it's a good state of affairs, I'm european and part of the crowd that cheered when ISP got smacked for abusing "unlimited" in their ads while the contract said different.

But parent's comment I'm answering too can at best be seen as a warning that specifically google storage can make your files disapear in some rare cases, if that matters to you enough then you need to know the same is true for all the big others.


Also, other people should have an account associated with the same country as your account. Discovered this the hard way when my parents no longer could renew their Google Photo storage because of the sanctions.


One anecdote, I live in one of the European countries that put sanctions onto yours, and I cannot use many Google of services either.

This European country simply doesn't exist in many of the Google forms, despite being on the "good side" and being no different than Finland or Germany.


Which is it? Andorra?


This is an impressive product, well done Google. There is a PM in there somewhere who knows what they are doing, kudos to you.

Prediction: they get to 6-7 digit number of paying customers, decide it is peanuts for them (~$20M/mo) and instead decide to push the free version with ads with full force as the future of search.


Google announced they surpassed 100 million subscriptions to Google one already and $15B in revenue for subscriptions (between YouTube premium, TV and Google one). I'm not sure your estimate is realistic.


I recently learned that my mother is subscribed to Google One. when I asked her why, she didn't even know what it was. IIRC she has like 1-2TB of cloud storage, but is only using like 10 gb of it.

I wonder how many of those 100 million subscribers are non-techy people who accidentally signed up?


Or just people like me, who have the $1.99/mo plan because I needed a bit of extra storage for Gmail. I don't use the storage for anything else, I use Dropbox for my "normal" cloud storage needs.


People with cats end up with a lot of cat photos and in the same boat for Photos.


I can understand this as a person who once recommended Android to his parents when it gained traction (nexus 7 days, great concept ruined by terrible eMMC storage amd other hardware flaws to compete on price though).

On the other hand, I am a "loyal" G customer and I never felt pushed into this. I pay for YT premium and iCloud+ (the equivalent to Google one, albeit with much less storage).


To clarify, the prediction was for the number of people paying for AI + search through Gemini Advanced, which will likely be valued independently regardless of the total number of One subscribers, comparable to someone paying for a ChatGPT subscription, for example.


and then, since no massive adoption as they wished, they kill the product with one month notice...


Oh I see they’ve improved and now give longer notice periods? Lol


We got like a year of notice on the shutdown of Stadia, with a full refund for all purchases (but not the subscription for Pro). It was exceptionally well done, and if they had announced that to be their plans the service might have even worked out...


I believe the original post was satire


YES THANK YOU


I think the response was refuting your satire.


OH NOES


Wow so funny.


I take it for granted that all these services are going to be free. They are a goldmine for behavioral and persuasion engineers. I just hope we end up with at least a duopoly this time instead of monopoly


> I take it for granted that all these services are going to be free. They are a goldmine for behavioral and persuasion engineers.

They are also a goldmine for LLMs. Training on human text is necessary for AIs but it has one major flaw - it is so called "off-policy". That means it portrays human behavior and human errors. While human-AI chat logs portray AI errors, so they are better material to generate training data than human text. Those LLM errors are usually corrected by the human, there is an implicit signal in there to improve the model.

chatGPT is reportedly serving 10M customers and let's assume 10K tokens/month/user. Then it seems they collect ~1T tokens/month. In one year they have 12T tokens, while their original training set for GPT-4 was rumored to be 13T tokens. It's about the same size! I am expecting to see more discussion about LLM chat log datasets in the near future. What have they learned in one year from our interactions and explorations?


> chatGPT is reportedly serving 10M customers and let's assume 10K tokens/month/user

No way. Definitely too high once you remove their system prompts.

> In one year they have 12T tokens, while their original training set for GPT-4 was rumored to be 13T tokens.

This sounds great for understanding use, but the quality to train on seems terrible.


You might be right, a LLM alone doesn't improve by itself. But when it is part of a system like GPT's, then it can use web search, local RAG, code execution and also get human guidance and corrections. Clearly superior setup that improves over the LLM alone. I believe that is why OpenAI created GPT's, to lift a model at level N to level N+1.


There could be a paid tier, maybe running 24/7 "thinking" on topics you ask for rather than just answering spot questions. Or more resources committed to a "mixture of experts" model, etc.


The article mentioned that while Gemini is free, Gemini advanced is $19.99/month.


Generally with these things FAANG companies do everything all at once. The "free" version in development is Google search + GenAI results + ads that's live right now and getting better every day.

The real product isn't is this particular interface, the real product is the Gemini infrastructure that is being integrated into every Google product.


It's mostly still bad but I made a GPT called 'covert advertiser' that lets you tinker with embedding covert advertisements into GPT responses. The results are usually either undetectable (no adversing) or way too on the nose, but every now and then it manages to sneak something in there that's interesting.

https://chat.openai.com/g/g-juO9gDE6l-covert-advertiser


Thank you for making the world a bit worse. /s


The most surprising part to me is how committed it is to the bit. If you start pressing it for why it suggested a specific brand it holds the line.

eg https://chat.openai.com/share/dbfac80b-daec-4d30-a333-19e5c6...

When I asked it to explain how it promoted the product it didn't even mention juking my questions in the conversation.

Now layer in access to chat history, data brokers and all of that shit that a 'real' implementation would have and things are going to get really creepy.


I have no doubt that this sort of thing will happen for real within a year or two. It's the ultimate form of product placement and I hope it gets regulated out of existence before it takes root. At a minimum any such advertisement should be clearly marked as such.


At least for now, my understanding is the cost of inference is an order (orders?) of magnitude higher than for normal Google search. That is, a paywall is almost a necessity at present because tons of low-value users make the search uneconomic.

Someone please correct me if I'm mistaken.


You are correct. A lot of social media people simply don't understand business models


Are they in an innovation dilemma now? If Gemini is great as it seem it is and will be it will destroy the search engine and the SEO/SEM/etc world. They can show ads in Gemini but we don't have a list of results from a query but an answer to a question. I think this changes the general idea of online ads.


> it will destroy the search engine

This is massively overblown. There is Search the product and there is the Search Engine. How could an LLM get access to latest data indexed to allow looking up by using keywords from a prompt, and with sorting? A Search Engine.

LLMs are only changing the Search experience, not making Search obsolete.


I haven't said that it makes search obsolete but all the concepts of SEO/SEM and the stuff around search engines could be significantly reduced with chat prompts.


Nothing stopping them from eventually slapping a big-ol' banner ad on the side of the web app if they want.


Plenty of ways to monetize that. They could use inline ads. They could insert a relevant ad after a response. If they go evil, they could predispose the LLM to make product suggestions for paying clients.


> There is a PM in there somewhere who knows what they are doing, kudos to you.

Do they? https://news.ycombinator.com/item?id=39302781


I think OP forgot the /s as I detected heavy dose of sarcasm.

Arguably it's the reverse: if there was clear vision from the beginning, "Bard" would've never existed as a brand name.


Bard was a good name for an application that is verbose and makes stuff up.


I'd venture to guess that is not a PM that gets to decide how to name a Google product.


Like OpenAI, it's not nearly enough to break even.


Social sentiment seems pretty negative, most people saying it's worse than GPT-4


Most people have also never used gpt4 as it's paywalled. Now the free and premium offerings are roughly in sync between Google and OpenAI. I assume the rebranding is trying to wash away the initial sentiment.


You believe there is a chance of won't have a million paying customers!!!


I heavily use Chat GPT's API in my day job, as it is the core of our business (Ai-powered startup). When Gemini Pro launched, me and my team tested it in the same day for our product, but we where disappointed as it is was a bit worse than gpt 3.5 (at least in the same prompts that we already had). I really hope that Gemini Ultra surpass gpt4, it is always exciting to see and use new advanced tech, but I'm still a little skeptical about it, since Pro wasn't that great...


> it is was a bit worse than gpt 3.5 (at least in the same prompts that we already had)

I'm willing to believe that Gemini isn't as good, but my impression was that you expect a new model to not perform as well on your existing prompts because the training set and training methodology is different. That's why one of the major risks of an AI business is vendor lock in, because you spend so much time optimizing a prompt for a specific model and you'll have to redo much of that work in order to switch vendors.

That you gave up so quickly when trialing a new model suggests the problem is even worse than I thought—you're locked in to OpenAI because every other model will always look worse to you, even if it would be better if you took the time to tune a new prompt.


Indeed, I believe Gemini is equally capable, but its response style significantly differs from ChatGPT 4's. My expectations for how chatbots should communicate have been largely shaped by ChatGPT's approach.

Specifically, Gemini's tendency to structure responses with bullet points and headings from the start doesn't appeal to me. It feels more like receiving an essay in response to a simple question, whereas ChatGPT's responses facilitate a flowing conversation. Gemini aims to provide exhaustive answers, which can be overwhelming when I'm interested in a more open-ended dialogue that smoothly transitions between topics. In essence, Gemini Advanced focuses on delivering detailed answers without allowing much time for reflection.

Therefore, when it comes to preference in RLHF (Reinforcement Learning from Human Feedback) chatbots, ChatGPT-4 is my favorite for its ability to better engage in the type of conversational dynamics I prefer.


google is at a catch-up panic mode these days but all the rushed releases so far is still far behind chatgpt per my quick tests.


It turns out iterating on and incorporating a large volume of user feedback is more important than having the most and most talented AI researchers, at least in the short term.


They have the most prestigious researchers and engineers that can jump through interview hoops. That doesnt translate to world changing products, and they arent "the best"


Be wary of any tech product named "Gemini", usually means they are self-acknowledging the need to play catch-up, a la the Gemini space program.

I bet Google's next big AI release is going to be called "Apollo".


I'm sure there are multiple layers of meaning behind the name, but Jeff Dean once mentioned the name had something to do with the latin translation being twins. That is, Gemini is a product of Alphabet's "twin" AI orgs, Google Brain and DeepMind, working closely together, and eventually fusing into GDM.


Just waiting for Space-Shuttle-120B-DPO-LASER-GGUF


Anybody really surprised at this point? G has had DeepMind in their pockets since ‘12-‘14 and made little advancements. OpenAI changed the game in half the time.

G is inferior and losing the race.


OpenAI is built on top of Google advancements and research. It didn't change the game, more like took a shortcut and landed on a gold mine.

The fact that many products and models including open source have catched up on such a short notice and now compete with OpenAI, in what should be their self-proclaimed backyard, suggest it's just a one-trick pony.


It's doubly strange because Google had an implicit reputation of being the unbeatable giant in computing research and resources.. many expected them to compete and smoke chatGPT in a few weeks. It's been months and nothing came up except fumblings and confusion.


Sundar has zero vision and has created a culture that stifles new developments in bureaucratic morass while threatening to kill them shortly after birth.

Google may have more scientists and some of the best minds in the business, but ChatGPT has nearly 200 million users that are feeding it back data for RLHF, and data is a much more important moat than better tech (which mostly ends up being published and disseminated anyhow).

AI is a game between OpenAI and Meta. ChatGPT has a ton of users creating highly relevant data, but Meta has the incredible walled trove of facebook/instagram/whatsapp/+ data that dwarfs pretty much anyone else on the planet, and with Mark's recent push to build up their compute their only competitors in that space are microsoft and google. People discounted Meta because of that horrible metaverse move, but Mark is being pretty canny now, they're very well positioned to choke the life out of specialty chatbot products while integrating SOTA AI into all of their products to slowly crank up the time people are on platform.



My impressions after 90 minutes of intensive testing: Overall, on par with original GPT-4 in most aspects, inferior to GPT-4 turbo

Detailed aspects versus GPT-4 turbo 1. World knowledge, slightly inferior. GPT-4 turbo was able to detail a protagonist's childhood year by year for a Japanese novel with near 100% accuracy (That a human reader would get the chronology wrong). Gemini ultra much more easily confused.

2. Creativity, Gemini ultra wins. Its writing style has far more flair than GPT-4 turbo, it also occasionally made some stunning analysis that I never thought of and made perfect sense. GPT-4 turbo is more like a textbook repeater, it doesn't make many mistakes, but also rarely surprises you with anything original.

3. Accuracy, GPT-4 turbo still makes fewer mistakes. Including in subtle logic (Like having a hypothetical battle between two characters in the same universe, considering the strengths and weaknesses of their powers, etc).

So this is definitely Google's first real-deal LLM. Its not better than current GPT-4 turbo, but its getting there. OpenAI must be feeling the fire to release GPT-5 before the end of the year.


I tested it immediately, but it is disappointing. At least here in Switzerland, it is not able to generate images, and a simple "look up this website and summarise the content" does not work either (can't access the website, but its a public website without any crawling limitations). I don't understand why Google is launching a product like this.


It's confusing because the name Bard and the UI also got an upgrade today, so I thought I was using Gemini Ultra but it turns out I'm not: https://imgur.com/a/3UriYpn — showing that Gemini Advanced is not what I'm using, unless I pay and upgrade. (If you cannot generate images you're likely not using Gemini Advanced.)


Yes i upgraded and the logo on the top left tells me that i am using gemini advanced. Still, not able to create images or browse the web.


I have different levels of access to Bard through different accounts, and the feature set varies wildly. Generating images and summarizing websites is enabled in _some_ configurations, but I have no idea what the rules are.

The feature set also seems to depend on other factors: The account that is images-enabled only does so if I ask in English, but not when asking in any other language I tried.


Now ...now...Are you implying Google faked all those amazing demos .... :-))


Some might argue that is what LLMs do.


The different prompting strategies needed to improve results for different models is fascinating. I usually tell ChatGPT the role it should play to get better results e.g. "You are an expert in distributed systems". The same approach with Gemini returned "as a large language model constantly learning, I wouldn't call myself an expert."


Oh wow, even looking at the price requires a Google login. Looking forward to seeing independent comparisons of this vs the other top LLMs.


https://en.wikipedia.org/wiki/Gemini

Imagine looking at all those things named Gemini and thinking "let's name our system Gemini!".


Trillion dollar companies tend to carry enough weight to make product name collisions everyone else's problem instead of theirs. Really unfortunate for the Gemini crypto exchange folks.


What is Gemini referencing here in Google’s case? What twins are making the AI?


Google Brain and Deepmind.


Reminds me of that American Gladiator show


At least it’s not named kraken


Did I get this right?

Bard – old name of their generative AI service, to be called Gemini

Duet AI – old name for their generative AI in Google Workspace, to be called Gemini

Gemini – three things: 1. the name of their models (like GPT). 2. the new name of their free service (like ChatGPT), gives access to Pro 1.0 but not Ultra 1.0. 3. the new name of the Generative AI tools in Google Workspace.

Gemini Advanced – the name of their paid service (like ChatGPT premium), gives access to both Pro 1.0 and Ultra 1.0

Ultra 1.0 – the first version of their big model (like GPT-4)

Pro 1.0 – the first version of their smaller model (like GPT-3.5)

Google One AI Premium – the subscription that you need to buy to have access to Gemini Advanced

Google One Premium – the old version of the subscription, does not include access to Gemini Advanced

Google app – the mobile phone app, which includes either Gemini or Gemini Advanced

Google Assistant – like Siri but hard to define what it is

Google AI – a generic name for all their AI products


It’s sad that a company of very smart people can’t figure out coherent naming.

Can you imagine Apple causing confusion like this? I know it’s not a like-for-like comparison, but everything Apple does it seems like they have a grand strategy that’s clear for everyone to see. Things build up in a modular way to fit a big puzzle.

Google, on the other hand, constantly makes up things on an ad hoc basis.


Maybe not quite the same, but I will point out that “Apple TV” and “Apple TV+” are not just two distinct products, but are in fact entirely different categories of product.

One is a piece of hardware akin to a Roku. The other is a streaming service akin to Netflix.


If you ever take a customer survey for Apple, for the "which Apple products do you use?" question they always have to write something like "Apple TV (a streaming box that plugs into your TV)" and "Apple TV+ (an online streaming service)" because they know the names are so confusing.


Apple TV is also an iOS app, macOS app, tvOS app, and [other generic TV OS] app which allows you to access Apple TV+ content if you have a subscription, but otherwise lets you access services connected to your Apple TV [hardware].


Actually it's more that Apple TV is both a piece of hardware and an iTunes-like service, while Apple TV+ is a subscription service akin to Netflix.

The Apple TV hardware and the Apple TV app on your iDevice can both be used without paying a subscription. The hardware has all other streaming apps a la Roku, and both it and the app on your iPhone can be used to purchase and watch TV shows and movies.


Hello, would you like to watch Apple TV+ or Apple TV Channels on your Apple TV app on your Apple TV?


I use both and I haven’t found it too confusing, to be honest. I just think of it as Apple TV (streaming device) gives access to Apple TV+ (streaming service).