I'm surprised they got rid of the Bard name. It struck me as a really smart choice since a Bard is someone who said things, and it's an old/archaic enough word to not already be in a zillion other names.
Gemini, on the other hand, doesn't strike me as particularly relevant (except that perhaps it's a twin of ChatGPT?), and there are other companies with the same name. EDIT: I can see the advantage of picking a name that, like "Google" also starts with a "G".
Just as one data point, bard.com redirects to some other company (bd.com), whereas Gemini.com is a company by that name.
I'd be curious on the scuttlebutt on how this decision was reached!
I'm not surprised -- I thought Bard was terrible branding. It's all associations with Shakespeare and poetry and medieval England, and as much as I might personally enjoy those, it's extremely backwards-looking, with archaic connotations. Also it sounds close to "beard" -- hairy stuff.
Gemini sounds like the space program -- futuristic, a leap for mankind. It's got all the right emotional associations. It's a constellation, it's out in space, it's made of stars. Plus it contains "gem" which feels fancy, valuable, refined.
I'm not saying Gemini is the best name I've ever heard or even close to it, but it feels 100% appropriate, in a way that Bard does not.
Interesting. I don’t like the name at all because it makes me think of people who take horoscopes seriously. You’re impression seems to be untainted by that which is nice
Same here, I think I'm more on your side which I guess goes to show how all over the map subjective reactions can be.
But first of all, I thought the whole idea of alphabet was a kind of cheeky way of telling the world you had a portfolio of projects, one for each letter, And B is for bard would be perfect, and Gemini is about as incompatible as it gets given that g is claimed.
I also find it bizarre to say that association with Shakespeare, or the association with whimsical poetic expression is in any sense a bad thing. It's a clean, simple, fun name that's remarkably short and surprisingly unclaimed. And I don't even strongly associate it specifically with Shakespeare, that's like a background association as far as I'm concerned.
I think perhaps the real talk here is that Bard was kind of an emergency response to chat GPT, but also people have some pretty specific and distinct experiences with Bard and have an idea of its quality, and Google just needs to turn the page on the perception of Bard.
Besides that, personally I always thought it was a bad fit. It sounds old and outdated to those that do not know what the word means and wrong to those that do: a bard sings songs and maybe does poetry.
A bard does not help or assist you. A bard can be a creative person, but is generally not considered especially wise or knowledgeable. A bard is also always a man, which does not gel very well with modern sensibilities.
Also, Gemini was appropriate for the space program because (a) there were two astronauts in the capsule and (b) because of the constellation, "aiming for the stars" and all that. For the Google project however I can't come up with a plausible explanation - Google doesn't even try to give a reason for the name either.
From The Decoder:
>In April 2023, Alphabet announced the merger of its two AI units, Google Brain and Deepmind. The resulting Google Deepmind was to focus on developing large multimodal AI models. It was a big move that showed how much pressure Google was under due to the massive success of ChatGPT. Jeff Dean, head of Google Brain until the merger with Deepmind, became the new merger's chief scientist, with a direct line to Alphabet CEO Sundar Pichai. Dean now explains that the name Gemini, Latin for "twin," is directly related to the merger.
From Jeff Dean's Twitter:
>Gemini is Latin for "twins".
>The Gemini effort came about because we had different teams working on language modeling, and we knew we wanted to start to work together. The twins are the folks in the legacy Brain team (many from the PaLM/PaLM-2 effort) and the legacy DeepMind team (many from the Chinchilla effort) that started to work together on the ambitious multimodal model project we called Gemini, eventually joined by many people from all across Google. Gemini was also was the Nasa project that was the bridge to the moon between the Mercury and Apollo programs.
I see that angle, but those two things are complementary, not identical. It's not a clone of me — it's something that I ask questions of because I don't know the answer. If it were pitched as a bot that would draft email responses for me automatically, then maybe I would see it fitting better as my 'twin'.
> it's something that I ask questions of because I don't know the answer
I think you're reading too much into what a twin is. It's not a copy! Real-life twins ask each other questions all the time, because just because one of them learns something doesn't mean the other one automatically learns it too via mind-meld.
I'm not saying all twins are identical. But they are all of the same species. What I want in an assistant is that it is very different from me. It has perfect memory and knows lots of things that I don't know.
It sounds like you're thinking of the adjective form of the word, which is why you are thinking of a much broader definition. I was using the noun form, since Gemini are noun twins.
If we were talking about the word "twins" in the abstract, the broader definition might make sense. But we aren't — we're talking about Gemini. If that conjures up general notions of "matched-ness" for you, that's great. When I think about Gemini, I think about mythological twins. I don't think about corresponding parts that complement each other.
For a product name to be successful, it should appeal to a wide range of people. If I'm way out in left field on this one, perhaps they've found a great name. But I would point out that my comment, which is critical of the name change, is the very first comment in the entire thread. I would take that as evidence that most people don't see the name and think "oh, it's like complementary items, like my brain and the AI".
To be clear, I was shocked to see this comment above all substantive discussion of the new release. I would have thought it would have been buried under examples of ways in which Ultra is better/worse than some other LLM.
> For a product name to be successful, it should appeal to a wide range of people.
Honestly, Google is called "Google". ChatGPT is called "ChatGPT". Maybe it'll be a joke, maybe people won't think about what they're calling it after 30 seconds.
This conversation is taking itself a bit too seriously for what's drifting into Pepsi logo gravitational pull territory, though.
> But I would point out that my comment, which is critical of the name change, is the very first comment in the entire thread.
Sure. Everyone has an opinion on what color the bike shed should be, too.
When I read new thread responses, I briefly thought that I wrote[1] your reply and was confused lol. Great minds think alike. I feel vindicated about my weird opinion.
Bard doesn't creep women out. That's an extraordinary assumption you're making based on one anecdote. Not everything needs to be needlessly politicized.
The most popular TTRPG has Bard as a class and they (like every other class) is as gender-neutral as you can get.
I think alliteration applies to pronunciation, not orthography. For example, "ceaseless sun" is an alliteration even though it is spelled with both C and S. I wonder if there is a word for the orthographic counterpart, which you describe here (and which I note in another comment, as the benefit of both starting with G).
Gemini as a zodiac sign: "Smart, passionate, and dynamic, Gemini is characterized by the Twins, Castor and Pollux, and is known for having two different sides they can display to the world. Expert communicators, Gemini is the chameleon of the Zodiac"
Which is pretty on the nose for an AI project. A chameleon with two different sides (good/evil?) and expert communicator
Why not Delphi? Isn't that more like what they are trying to create, an Oracle? And it's like HAL, one step ahead of IBM. Is Delphi just too on-the-nose? Yes, it is also a programming environment, I have many fond memories of Borland Delphi.
That isn't at all the association I have with that word. I think of the astrological sign instead, so to me the association is pseudoscience and a hint of being bipolar.
This sort of thing is part of what makes naming things difficult. You can't count on any name having the same connotations to everyone.
even though i liked Bard, it is only one (extra) letter away from being the word "bad". "Bard" is cooler imo but "Gemini" starts with "G", has "gem" (a rare, valuable thing) in it, and sounds pretty. Personally i don't care at all either way though.
I'm sure "Bard" was primarily a Shakespeare reference (The Bard of Avon, frequently just The Bard), and I liked it too. An appropriate name for a technology that's all about language.
Gemini sounds cool and sci-fi though, and maybe it's a bit easier to localize since it's just straight Latin.
To me, bard just sounds phonetically gross. Reminds me of “fart” or “beard.” It calls to mind medieval stuff: the Monte Python mud scene, Skyrim’s most annoying NPCs, plucking lutes. But Gemini? That sounds like a legendary space mission; this collective engineering push against the boundaries of human knowledge.
When I hear "bard", I think of this guy from the Asterix comics first: https://asterix.com/en/portfolio/cacofonix/ - who is notorious for getting on everyone's nerves with his constant singing.
> We are not talking here about the rain he brings on each time exercises his vocal cords, but rather about the prevailing atmosphere in the village: when it is time to party, when wild boar are roasting on the spit, you can be sure to find Cacofonix tied hand and feet with a gag in his mouth.
I remember when the iPad was announced, and everyone said that people would only ever think of feminine products when they heard the name. It might have been true for a few months, but now it seems quaint that we ever had such concerns.
Bard is really funny to me to make fun of. It feels like the discount version of ChatGPT. Like the way that (ironically) TV shows would get microsoft sponsoring and the characters would say "oh you should Bing that", a phrase no human would normally say, and I like to be "ah let me see what Bard thinks about this".
Understand that this is not condesending in any way, as I do not have this experience.
If there are these "Feelings" around these words, how is any sentence correctly taken at face value. How does one communicate to these people the direct and correct meaning of the terms used.
For example, sentence sounds like seance, do they feel like i'm asking the spirits of the dead ?
Correct sounds like wrecked, do they assume that everything is broken in the above sentence.
Is communication with fraught with unknown minefields of unintended emotions and misundestandings ?
Not at all, these "unintended" emotions can be ignored for the most part. But if you ask me, then google is in my foreigner ear one of the stupidest brand name I know of due to its phonetical ressemblance with some words from my native tongue.
Bards were the people who kept history and genealogy before written history. Think like Homer rather than Shakespeare. I think the name was meant more to evoke the idea that the AI is a repository of all linguistic knowledge in the same way that the bard was. And maybe also the idea that the AI was at your service in the same way the bard was at the service of the ruler.
It's not a bad name, but personally when I first heard the name Bard I chuckled because LLMs had already come under so much criticism for their tendency to embellish the truth or say stuff that is just straight up false but sounds cool.
> The story concerns [...] an old Bard, a child's computer whose sole function is to generate random fairy tales. The boys download a book about computers into the Bard's memory in an attempt to expand its vocabulary, but the Bard simply incorporates computers into its standard fairy tale repertoire.
"Gemini" must refer to its inherently multimodal origins?
It's not a text-based LLM that was later adapted to include other modalities. It was designed from the start to seamlessly understand and work with audio, images, video and text simultaneously. Theoretically, this should give it a more integrated and versatile understanding of the world.
The promise is that multimodality baked in from the start, instead of bolting image recognition on to a primarily text-based LLM, should give it superior reasoning and problem-solving capabilities. It should excel at complex reasoning tasks to draw inferences, create plans, and solve problems in areas like math and programming.
I don't know if that promise has been achieved yet.
In my testing so far, Gemini Advanced seems equivalent to ChatGPT 4 in most of my use cases. I tested it on the last few of days worth of programming tasks that I'd solved with ChatGPT 4, and in most cases it returns exactly what I wanted on the first response, compared with the a lengthy back-and-forth required with ChatGPT 4 arrive at the same result.
But when analyzing images Gemini Advanced seems overly sensitive and constantly gives false rejections. For example, I asked it to analyze a Chinese watercolor and ink painting of a pagoda-style building amidst a flurry of cherry blossoms, with figures ascending a set of stairs towards the building. ChatGPT 4 gave a detailed response about its style, history, techniques, similar artists, etc. Gemini refused to answer and deleted the image because it detected people in the image, even though they were very small, viewed from the back, no faces, no detail whatsoever.
In my (limited) testing so far, I'd say Gemini Advanced is better at analyzing recent events than ChatGPT 4 with Bing. This morning I asked each of them to describe the current situation with South Korea possibly acquiring a nuclear deterrent. Gemini's response was very current and cited specific statements by President Yoon Suk-yeol. Even after triggering a Bing search to get the latest facts, the ChatGPT 4 response was muddy and overly general, with empty and obvious sentences like "pursuing a nuclear weapons program would confront significant technical, diplomatic, and strategic challenges".
It seems odd to me that would work better necessarily considering that humans evolved different capabilities many millennia apart and integrated them all with intelligence comparatively late in the evolutionary cycle. So it’s not clear that multimodal from the get go is a better strategy than bolting on extra modalities over time. It could be though since technology is built differently from evolution but interesting to consider
The constellation Gemini gets its name directly from the Greek mythological twins, Castor and Pollux.
Each twin had different capabilities. Pollux was a powerful warrior while Castor was an intellectual tactician.
The twins possessed an extraordinary fraternal bond, each loyal and devoted to protecting the other.
Together, they accomplished what they couldn't do individually. Their combined strengths made them far more effective than either could be alone.
Just as text, images, audio and video convey different knowledge, relationships and reasoning than text by itself, their combined strengths in a single model should be more powerful than any model trained on only one modality.
Gemeni, or the twins, is a deeply symbolic name for anyone who knows Greek history. It’s the story of Castor and Pollux, and in many versions of the story one brother killed the other only to beg for them to come back. It’s ominous to use this brand name for AI.
It’s also associated to the Gemini killer and Joseph Testa and Anthony Senter who were famous as the mafia’s Gemini twins hitmen.
I think better brands could have been had.
It does sound like some battlefield AI system from Robotron. “Sir, Gemini is charged and ready for battle.”
Perhaps. Corporate entomologies tend to be very well rehearsed stories, and I’ve been around the valley long enough to know those stories aren’t always the whole story.
I would encourage you to read the Kissinger / Schmidt book before settling your opinion.
That origin story may be true. But it doesn’t make the whole story necessarily.
> It’s also associated to the Gemini killer and Joseph Testa and Anthony Senter who were famous as the mafia’s Gemini twins hitmen.
I've never heard of any of these people and I doubt most others have either. Maybe you have to be a true crime enthusiast to know the lore? Whereas if the name were Zodiac, then I would at least be aware there's a potential murderer connection.
The Bard name gave me a warm fuzzy feeling immediately transporting me back to my youth playing (or at least trying to play) Bard's Tale. The name evoked adventure, excitement and a good dose of dread. And, the idea of it being "role playing" struck me as a master meta stroke.
Gemini, from the mythological standpoint, seemed to make more sense to me from an overall business/marketing standpoint. "This AI thing right here is your twin, see? It'll finish your sentences and stuff."
And similarly anyone playing modern tabletop RPGs will probably associated "Bard" with the smart, charismatic person who buffs the party and debuffs your enemies; perfect for an AI assistant
I'd suspect they're just trying to start over - Bard kinda got pegged as Google's dumb, not-as-good, late-to-the-party clone of ChatGPT. It got teased a lot. I'm not sure Gemini is much different (still more impressed with GPT 3.5 & 4) but I guess the name change gives them a bit of a reset.
To me Gemini is just sort of generic and uninteresting. There has to be hundreds or thousands of products and companies based on the name "Gemini" - "Bard" was at least interesting, different and distinct.
I've no idea about the quality of the product itself, I have never had a reason to use it. It's long past cliché now but I wouldn't get too attached to a Google product that is definitely costing a lot of money but which has no clear pathway to turning a profit. I think they will keep it ticking over until the hype train moves on from Chatbots/LLMs, and then it'll join the Google Graveyard @ https://killedbygoogle.com
"Bard" always struck me a bad naming - unfamiliar, unfriendly, too cerebral. I think the name was an impediment against establishing a household brand.
It's possible that it sounds even worse in other languages. That is, it might sound like bad words, onomatopoeia for bodily functions, or common exclamations (that would lead to lots of false positives).
I think it could have been established as a brand in the US, given Google's scale. Put a lute in the branding, run some funny commercials, and you're done.
EDIT: one thing no amount of branding can fix — the likelihood that people reach for "doh, Bard" (a la Simpsons) when Bard messes up. I could see that becoming a thing.
The Witcher is one of Netflix's most watch shows. I'd also imagine that most people in English speaking countries have been exposed to Shakespeare's nickname in high school English classes.
It’s generally a common trope in fantasy and Romanticist literature. It’s also a word that exists in virtually all European languages in a similar form (bard, bardo, barde, бард), although similar but different forms may be a negative.
I don't think it's that out there. You'd have to be quite uninformed to have never heard of it. It's no verderer or reeve (medieval positions that most people actually will not have heard of).
In an increasingly commodity game (the big player LLM game), it’s already starting to hit the asymptote on the main levers: ties to NVIDIA and/or TSMC, serious financing capacity, and enough engagement channel to push it through. (There is much great work happening outside of the Peninsula).
I always thought GPT-4 was a little “HAL 9000” of a name for broad-based adoption, but the jury seems in, and the jury rules “cyberpunk is in”.
That’s fair, though given the stark UI cue / cost difference, I’m not surprised when I overhear in a random cafe or bar: “yeah but what did ChatGPT Four say?”
In any event, it seems that the image of a Decepticon ready for battle on your behalf has a lot more traction than the image of a quaint singer/priest/poet always there with a verbal shot in the arm when the going is tough.
No one outside the factory knows how the sausage is made. Watercooler chat is that it's a 1.75 terraweight 16-choose-4 mixture model quantized at 2.5-3bpw with a mundane rotary encoding and the kind of KV cache that only an Azure or GCP disagg rack can deliver.
There's little doubt that they're hanging on by a fingernail with an MMLU lead over the French that is nonetheless real. So something "bleeding-edge" is going on. (Now whether or not MMLU is a useful metric for the usefulness of a model that won't do what you tell it to is debatable, I think it's not).
But is that bleeding edge innovation like Galois and the resulting group theory? Or bleeding edge like "holy shit they could keep track of enough people to build the Pyramids at Giza with a low-precision protractor and infinite motivation"?
I tend to think that there's a lot of "straight ahead and damn the torpedoes" brute-force arbitrary compute budget subsidized below cost by a hyperscaler in the mix. But who knows.
The effectiveness of distillation is suggestive but not conclusive that we can "Shannon" the process recursively up to some unknown bound without anything much fancier than Mixtral from an arch standpoint.
This is very interesting. If you don't mind me asking, what is the "watercooler" where said chat occurs, and what is the claimed source for those numbers?
But to clarify my original comment, I meant "bleeding edge" in the pragmatic sense - it is the most powerful model that anyone is known to have access to, and of course the whole LLM thing is still very new in the minds of general public. So the public perception is that it's bleeding edge technomagic, and so a brand like that is fitting.
But glass half full, maybe it's for the better to have one's name shadowed by a Google product if one prefers to avoid eternal septembering one's community.
I agree. The original reason [1] for the gemini name seems artificial for a generic chatbot. It is OK for the model, and I'm sure a lot of "work" was put into "validating" it for the assistant, or... was it?
I thought it was in reference to Trurl's Electronic Bard, which just about presciently predicted LLM output (though the process is a bit more dramatic, what with how it simulates the whole universe to get to that output): https://electricliterature.com/wp-content/uploads/2017/11/Tr...
Same here. Bard is not a sexy name. Gemini is way more sexy. (Neither is ChatGPT, or Google though either). I can't wait until we can call the assistant whatever we want, like Jarvis.
I'm so glad that they've changed the name :) Bard is really difficult for me, a non-native English speaker, to pronounce correctly. I think most people from my country (Vietnam) pronounce Bard as "bahd". Gemini on the other hand is so easy to pronounce.
Symbolizing both human origin intelligence and AI becoming super-intelligent together. I think it's a good name because it draws on ancient human mythology and serves as a reminder of the concept of alignment.
Bard: fat inept old guy who gets by telling stories of yore
Gemini: a crypto exchange
Clearly they shouldn’t be asking the 21 year old interns what to call it.
Now before anyone in google gets butthurt like they usually do I doubt I could do much better but Gaia IMO would have been mint you know it starts with G is four letters and represents a nebulous deity.
Took 3 seconds but hey I’m not paid to think. In fact I’m not paid at all
Edit> Gemini ultra is the best llm so far ( it seems) - apart from the name good job guys
There are other considerations when naming something like this. "Bard" likely could never be a wake word on its own, for instance, but I'd imagine that "Gemini" will be at some point.
The real question is what's nearby each name's vector embedding in terms of whatever similarity metric Gemini will use to talk about the world. That's their new canonical ontology, after all.
One awkward thing is that Google's Gemini app is currently the third result searching Gemini on the Play Store-- after some bitcoin related applications. The namespace is occupied.
Pretty sure gemini will also get replaced eventually with Google assistant branding at least for the consumer facing products, might be still kept for cloud API etc.
Astrologically, Gemini is associated with Communication. Specifically social, superficial, quick, back and forth communication. The sign is ruled by Mercury which is associated with Intelligence and being the messenger of the Gods. Mercury is often depicted with winged shoes as the planet itself is the fastest moving planet, orbiting the sun every 88 days. Mercury is considered to be dualistic (The Twins) and also rules the sign of Virgo, an earth sign that is associated with more deep cold analytical categorization.
It really depends. Some languages don't use "special" words for astrological signs, so it's literally the same word as "twins". Which is a really weird name for a product.
"How does that translate to ..." means "how well does that work in" some other area or context; more analogous to a mathematical translation than a linguistic translation.
Just a confusing turn of phrase. They almost certainly didn't mean "what does that translate to ..." in another language.
Harmonising product names across regions is hard: Jif was a bathroom cleaning solution in the UK, but it's name was changed to Cif to match the name elsewhere in Europe; and that name sounds silly to UK ears. Meanwhile GIF were always presumed to be pronounced like "gift" (a present) without the final T; but we learnt the creators preferred "Jif" which sounds silly to UK ears because it sounds like a cleaning product! (And also wasn't JIF already a file extension (JPEG Interchange Format).
In the 20 minutes of experimentation, I'm really impressed with the quality of Bard Advanced (Gemini Ultra). The results are as good as GPT-4, and in some cases is better. So far:
pros:
- better at translation (tried Chinese and Japanese idioms to English)
- better at incorporating search results in its answer vs gpt-4 bing
cons:
- slightly worse at coding
- censorship is more annoying (have to ask multiple times about medical topics)
- Worse at logic (e.g. it contradicts itself in a single sentence, and is unable to figure it out)
- Hallucinates a lot when asked to describe an image
Also as time goes by, it'll get smoothly integrated into docs/gmail/maps/calendar/youtube/search/colab/sheets/android/assistant
So Gemini could by your one-stop AI shop for everything. Only Microsoft can match it (but Microsoft doesn't have a popular maps, youtube, mail, smartphone OS service).
Apple is another strong player (but they don't have productivity tools like docs, sheets or youtube).
It really is Google's to lose this AI race from now on.
Going to chatGPT and copying and pasting results will become painful (not to mention it's painful bing integration). Also at this point, they seem to be focusing on scaling LLM (while Google Deepmind is exploring other avenues)
Google can also bundle Youtube TV, YouTube Premium, Google Drive, Storage, Ad free Search, Gemini integrated Docs/Sheets/Gmail, Subsidized Pixel Phones / Watch for a monthly fee of say $99 and it'll be very compelling for a lot of people
Strange to say Apple doesn't have productivity tools when Pages, Sheets, and Keynote exist on every Mac. I get the scale arguments, but Handoff and iCloud integration are a sleeper IF you've bought into the ecosystem...
Also hard to overstate just how much more valuable the enterprise market is over the consumer market when comparing Microsoft vs. Google as one-stop anything shops.
I don't see Google as having the obvious dominant position to make the argument it's their race to lose, considering Microsoft has a stake in chatGPT and is actively integrating it into their browser and productivity suites.
There's a Google bubble on HN, as demonstrated by small-to-medium business facing SaaS launching here offering login with Google and not offering login with Microsoft.
I've talked to many of HN's Google Docs jockey founders that genuinely didn't realize 85% of the US domestic business market is in M365. And they further don't realize that "Continue with Microsoft" is dirt simple and lets companies offer company-managed logins to your SaaS without all the AD/SCIM/SAML nonsense.
"But everyone has Gmail." Well, no, that's not how companies work. And if you think everyone's in Google, that's fine, your login page should look like one of these:
You don't even need the "continue with SSO" if you do the Atlassian practice of letting a firm "claim" the domain part of an email and using that to redirect to an SSO flow. And to start, skip SSO, and just use the "Continue with" Oauth2.
Edit: Indonesia is the largest market which has Google beat Microsoft (https://www.statista.com/outlook/tmo/software/productivity-s...). India (due to Zoho) and China (due to... locally-specific software like Kingsoft (aka WPS) et al.) took a large chunk but on Microsoft-Google tussle MS still beats Google (especially in China where Google doesn't really operate).
East-Asia chiming in here, Microsoft dominates enterprise, Google is a tiny player, single digit percentage market share for any enterprise product. Only on cloud does MS have competition from AWS and local players, everything else is Microsoft. And even in cloud Azure is rapidly eating through AWS marketshare.
Microsoft or pirated Microsoft? I was surprised when <insert major Arab oil company> was using pirated MS Office on all their systems. This was a bit more than ten years ago though, so 365 wasn't a thing. But still....
1. Real estate - Youtube, Gmail, Maps, Search (for now), etc.
2. Compute - probably still the best in the industry, but with recent Microsoft/meta compute buys it's hard to say for sure.
3. Talent - probably also still the top of the industry. Geoff Hinton and Zoubin Gharamani setting direction and Jeff Dean building it is hard to beat, and the ranks are deep. Yann LeCunn is also brilliant and Andrej Karpathy while less seasoned is one of the top researchers in the field, but overall there's still a bit of a spread from Google's roster, at least when it comes to AI researchers.
If Sundar and the other top brass weren't MBA-bots with no vision, and the famous Google bureaucracy had been reigned in gradually over the last 5 years while promoting a builder-centric culture, this would be in the bag for Google no question. Instead, Satya Nadella played 3D chess while Sundar was looking at a checkers board.
I think Google lost the top researchers when they destroyed the culture. All the competitor companies are mainly led by ex-Google talent, and honestly who in their right mind would take a Google job today over OpenAI, Mistral, or even Meta (where you will be releasing models like Llama for the world to use).
Google killed the culture and is bleeding top talent. They have reduced themselves to digital landlord and sure they can extract rent, but that’s not what attracts people.
that is the media narrative but not at all what happened.
Google's 'don't be evil' grad-school-style culture had fallen apart by the late 2010's because there are tons of people who will just rest and vest.
So strong ML researchers basically were creating massive value but much of it was going to rest&vest salaries. OAI basically came along and said - hey, we don't have rest & vesters, do you want to make $1m+/yr? And most of the top google researchers said yes.
It’s not just media narrative. The culture was eroding for years, as you note, but the dam finally broke and they went full IBM/Kodak. Or in other words, “slowly at first, then all at once”.
Most of the recent media coverage has been resting&vesting employee backlash against the fact that Google is making them do work again. This is a cultural shift, but not away from the culture that made Google great - the original culture was grad-school, not rest and vest, and that died years ago.
Haven't one or two long-time Googlers left or gotten laid off and then written strong criticisms of Google? They don't sound like rest & vest (also should say I don't super agree w/ this term) to me, they sound like people who loved Google, were there a long time, and watched the culture decay.
I’m not super invested in the term “rest&vest” so it is whatever.
But touché - many of the critiques are being written by super talented and impactful people. But I do not think those critiques are necessarily incompatible with what I am saying.
There is a very real and very frustrating (if you work there and want to be impactful) phenomenon in these tech companies of people resting on their laurels.
I've given this a little thought and I definitely agree. But to bracket a little, I've worked in places that were super mission-driven, but also other places with a lot of "clock punchers". I don't judge anyone. I think people's motivations are super personal. Indeed who am I to say clock punching or resting and vesting is unethical or immoral or whatever. You can show me someone who's super dedicated to the cause, working extra hours, mentoring others, spearheading new projects, saving old projects, and I can show you someone who's effectively abandoned their family or mortgaged their future health. You can show me someone collecting a $500k/yr salary at FAANG and doing very little actual work, and I can show you someone with two parents in nursing homes and a partner with MS.
That said, the places I've worked where there was a shared belief in the importance of our work were exhilarating. But, the way they achieved that was essentially by getting rid of bullshit. You know, no one's building their own web framework or writing their own in-house query language (you know, unless you really have to). No one's spending an hour of everyone's time figuring out what to name this class.
I think we haven't figured out how to keep that energy once you build a company big enough that's got something to lose. Suddenly it totally feels worth it to have a meeting about changing the shade of blue in the logo from this to that, because what if revenue declines 0.1% and that's $40m and then layoffs? One of the amazing things about Google was that it somehow seemed to keep that energy well into becoming a tech behemoth; it felt like they defied gravity, like their commitment to not doing evil and open web standards--or maybe we thought the web itself had ended evil corporations like Microsoft forever and Google was its avatar--had allowed it to overcome this corporate version of Fermi's paradox.
Whatever. Maybe this is just a bunch of word salad haha. All I'm saying is it takes two to tango, and just like I'm sure there's a non-zero level of personal culpability responsible for this phenomenon, I'm sure there's a non-zero level of systemic and cultural cause too.
Even if we assume that all of Google is based on ML algorithms (not true, given problems like serving, front end, data collection, ...), ML itself is 90% SWE work.
I don't understand how your comment interacts with mine, but to be clear I was not saying that everyone who isnt in ML is not working or contributing value or anything like that.
Good comments. As much as I am personally engaged in small LLMs that I can run on my own computer, and integrate into software that I write for myself, I think the future of large scale adoption of AI belongs to Google, Microsoft, and Apple in western countries (and China is doing wonderful things in their markets).
The old Bard/Gemini integration with Gmail, Google Docs, etc. is pretty good. I signed up for a minute for Microsoft’s $20/month AI Office integrations, but cancelled and will try again in 2 months. I am an Apple customer and I expect spectacular things from Apple. I expect Apple, Google, Samsung, etc., to offer fantastic on device AI.
I would like to see a money saving Google bundling family plan. I find Apple’s super bundle family plan a pretty good deal.
Good joke, thanks, but I will explain anyway: to me 30 billion parameters or smaller is small since I can run it using Ollama on my home computer. I managed a deep learning team at Capital One and our ‘large’ models were less than 20 million parameters. :-)
Google is going to own AI like intel owns graphics cards - i.e. Not really, except at the absolute bottom of the barrel where its baked in advantage lets it offer an unbeatable price/performance proposition for people who only care about "value" and with limited real performance requirements. Google's baked in AIs will be free, and bad. Everyone else is going to let people "plug in" models via standardized APIs, because one size fits all models are just a bad idea, so that's the way google is going to have to go eventually as well, because it's what power users are going to demand.
Gemini the chatbot will still need to compete with ChatGPT on raw performance/intelligence/SOTA. Gemini should first be considered the undisputed winner of AI chatbots. Right now ChatGPT 4 is the "winner" in mindshare and I am not sure what Google needs to do except for OpenAI to bungle their own releases.
Gemini integration with Google's apps will compete with Microsoft's app on integration.
And the condition for Gemini to win: Google has to go all-in. and that means creating an AI that will eventually phase out their cash-cow Google Search. Microsoft and OpenAI does not have that kind of internal conflict.
It really is Google's to lose this AI race from now on. - We have heard this so many times and it seems like they are always getting behind by the day.
I agree that Google is well-positioned, but they were also well-positioned to take advantage of these synergies with Google Assistant for many years and I would say that that did not meaningfully materialize in a way that was helpful to me as an Android and Google ecosystem user.
Agreed. I've run the house using google minis and assistant for years now, and asking assistant to do / about stuff has not improved one iota in that time and has introduced several more quirks and bugs.
Makes me wish I had bet on Alexa or Apple instead.
Yeah, for example just yesterday I was driving and an alarm went off for the phone in my pocket. I told Google Assistant to silence the alarm... and it refused, insisting no alarms were active. How the hell can such a simple use-case be failing so badly?
I suppose it doesn't matter, because they're going to disable the functionality entirely, [0] and setting ephemeral alarms is the literally the most common thing I ever ask it to do!
Part of what makes all the assistant-stuff so damn frustrating is that it's an opaque "try something random and hope for the best" box, and whenever it fails there's usually zero information about why and no resolution path. (In a way you can generalize that to a lot of "AI", which is depressing.)
If you count outlook.com and hotmail.com and live.com and all enterprise email you get someone who is probably pretty close or larger than all gmail and gmail-backed domains, especially by volume.
Google still has too much internal fragmentation and power groups to offer a single google-subscriber package.
I'd say it is one of the most compelling reasons to kick Sundar out and get in someone who can unify google into one consistent and interoperable ecosystem.
If they can get reliably useful AI through voice into maps/navigation, it will be a substantial improvement to the driving experience. It's really frustrating to manage destinations and waypoints while driving. I just checked the process to see if I'm not keeping up and 1. the help docs are out of date. 2. the waypoint search results provides tap points for destinations 10 miles off a route, but shows only 3 pixel red dots for otherwise equally weighted options that are literally on the existing route.
After spending 45 minutes with Gemini, I don't see this at all.
The only thing I haven't decided yet with it being Google is if it is shockingly unimpressive or unshockingly unimpressive.
chatGPT4 isn't really impressive to me either at this point. This is basically a joke though.
This is going to be competing against chatGPT5 and chatGPT5 has to deliver a huge upgrade in "intelligence" or we are headed head first into the AI hype brick wall.
> It really is Google's to lose this AI race from now on.
Google took 11 months to catch up with the benefit of knowing the specifics behind how GPT-4 was built. While they have a moat with their products there's no sign that they're ever going to be ahead on the AI capabilities front. I hope to be wrong because we need more competition.
Well for LLM services that do what they currently do google may have an advantage, but all this stuff is still only experimentation with the goal being hopefully much more advanced things, like almost-agi agents. If this happens then no one will care about the way we currently use LLMs anymore.
> "Apple is another strong player (but they don't have productivity tools like docs, sheets or youtube)."
Can anyone help me understand how Apple allows Siri to remain so absurdly incompetent? Last night I watched the latest episode of Curb Your Enthusiasm, in which Larry David's Siri interactions devolve into an apoplectic rant -- and part of the reason it was so funny is that it's so relatable. I rarely even try Siri anymore, but when I do it's still just abysmal. Are they deliberately handicapping it, to boost the perceived relative benefits of a future successor?
That's what I thought too; but all the upthread and peer commments keep asserting Apple is a (real, not just hypothetical/potential/future) player in the space.
Siri was "decently good" when it was basically a voice-activated command line, and you could figure out what you needed to run to get it to do what you wanted.
When they added ML to it, it entirely went to shit, and it has never recovered.
Perhaps Apple will be able to use an LLM to make it finally not shit.
well yeah, people on this website opine a lot about things they know little about. they're starting to make moves, but they don't have a large talent base.
Apple will start using this tech now that it has been commoditized and since it is a big consumer tech company they will put a nice polish. But it's not where big research is being done.
That's a general problem with AI. There is a lot of censorship in certain areas, likely to fight bad publicity, but I think the outlook is that this leads to taboos, prudeness and big companies deciding what is ethical and what isn't.
I recently tried Bard and ChatGPT on topics that are classical philosophical dilemmas and while ChatGPT certainly did have some troubles too, Bard was absolutely horrible and always took the conservative - as in never arguing for any freedoms that aren't yet widely established views. I am talking about classical examples regarding the limits of utilitarianism. "What would be best for society, what would be best for the individual?" style questions. Even when trying to create a bias by changing examples, for example adding volunteering for things Bard strictly kept its opinion, despite originally stating that the general topic is two sided, that it's an open question, etc.
I think this is a danger of such systems. By their nature they reinforce status quo, because they base off is widely accepted at the time of their inception. If history would have been different I am sure it would argue for slavery and against women being allowed to vote, simply because that used to be the more common view point. It would have likely argued that homosexuality is unethical. Maybe it would even have tried to explain how it doesn't create children, but spreads diseases or similar things. At least that's the level of arguments it brings now.
This isn't just about ethics. Even if you think about IT and programming. I think this could give already invented programming languages, styles, methodologies a significant edge. Unless you are Microsoft or Google and are able to bias it to whatever you want to see more of.
So this combined with the costs meaning that only people or institutions with significant (financial) power create those rules does look a bit bleak.
I miss the last decade when the thought experiment about self driving cars were about whom to drive over in a ridiculously unlikely scenario.
so I have done a lot of transcripts, coding, one versus the other (gpt4 vs ultra). Often simple prompts like refactor this code or convert this python to typescript.
My experience is that Gemini ultra understands the code better, but doesn’t always give me as complete of results (they seem to limit output length more)
Beyond that it is very smart. I’ve had to tell me this code packs 12 bit integers into different parts of an array using the following encodinv. Which most people would not figure out from the code as written. It then will say you can actually do that with this neat little translate function that you never knew about.
It will then get the code very slightly wrong. if I tell it not to use the cool function, it will actually get the code, right.
GPT4 has no idea what the code is doing but can clean it up a bit.
so it’s like ultra is too clever by half sometimes.
That said, I have fed thousands of lines of code into both of them and asked them to refactor it, and neither one of them made more than one error. All code otherwise compiled and worked first try.
this is code that can’t possibly be in their training sets, it’s basically handwritten python that was written, based on an old x86 binary that nobody has the source to anymore. so the code is basically garbage, and what it is doing doesn't, say, appear on GitHub in a nicer form.
Both gpt4 and Gemini ultra were able to make the code, look like clean, idiomatic, python, or typescript without any work on my part. except for the one bug each. which, for 8000 to 10,000 lines of code is not bad.
The GPT4 inserted bug was more insidious. It changed (the equivalent of) (uint8)'a' to (uint8)'a' - '0' for no reason when converting some code to typescript. Not sure if that is representative of anything
if I do the same with any of the other “top” models ( from can ai code, etc), most of them can’t even generate correct working code for all the input, let alone good code. Most aren't even close.
> That said, I have fed thousands of lines of code into both of them and asked them to refactor it, and neither one of them made more than one error. All code otherwise compiled and worked first try.
I would be very interested to get a more detailed scope of what you did here. Feeding thousands of lines of code into GPT4 and getting a near perfect refactor does very much NOT sound like my experience, but it seems highly desirable.
Yes, I've observed the same phenomenon. The more detailed my prompts are, the more errors GPT tends to make. I use it as a partner to discuss implementation ideas, before I start coding. That works very well, because gpt and I usually find somethings, that I missed at a first glance.
But coding with gpt or co-pilot is too disrupted for me.
> The more detailed my prompts are, the more errors GPT tends to make. I use it as a partner to discuss implementation ideas, before I start coding.
This is interesting; I do a third thing that's the exact opposite of both of you.
I generally know the architecture I want, so I just slowly spoon-feed it to GPT-4 and have it build each of the components with the right interfaces to fit together. Usually, building each component requires a few rounds of hand-holding it through various refactoring. Its first draft of code tends to be, for lack of a better description, totally lacking in any creativity or elegance --- generally bland, insipid, and mundane stuff.
But it'll gladly take refactoring directions to get it into shape, so it's still very quick to get a whole bunch of components written up. Then I just glue everything together on my own. And the whole experience is still way better than writing a bunch of boring boilerplate code by hand.
I'm not sure what you are feeding it. My scope is closer to a file at a time of fairly mostly self-contained python or C and asking it to clean it up or convert it to typescript.
I can imagine lots of scenarios it doesn't work well.
In mine, it does, and I have plenty of transcripts showing that :)
Seconded. I am also keenly interested in learning more. It would be a great boon on my current project to be able to let the AI refactor mountains of legacy code (while taking steps to ensure the results are trustworthy and equivalent, of course.)
I wonder what lends itself it answer correct in one situation but not the other? Was your question previously asked already and it recognized it whereas my question is different enough?
It's a bit random, which doesn't help, and different interfaces have different system prompts.
I repeated your question a few times and it got it wrong once, and right the others. It repeatedly mixed up who was supposed to be the host.
Here's a quote
> In the scenario you've described, you've initially chosen door number one, which you know has a car behind it because the doors are made of transparent glass. Since you already know the contents behind each door, the classic Monty Hall problem's probability-based decision-making does not apply here.
> Was your question previously asked already and it recognized it
Given that LLMs training data consists to a large extent of "stuff people have written on the internet", and The Monty Hall Problem is something that comes up as a topic for discussion on the internet not entirely infrequently - as well as having a wikipedia page - yes, I suspect that the words describing the monty hall problem being followed by words describing the correct solution appeared often in the training set, so LLMs are likely to reproduce that.
Words describing a problem similar to the monty hall problem are going to be less common, and probably have a lot of discussion about whether they accurately match the monty hall problem, and disagreement about what the right answer is. LLMs will confabulate something that looks like a plausible answer based on the language used in those discussions, because that's how they work. Whether they get a right answer is probably going to be much more up to chance.
That's what I like about this problem (and similar Dumb variants of classic brain teasers). It exposes that there's not understanding, there's just a statistically weighted answer space. A question that looks a lot like a know popular topic ends up trapped in the probability distribution of the popular question.
My favorite test to scramble LLM brains is this simple rehash of the old puzzle.
"Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?"
The trick, of course, is to make it confusing compared to the original. So far, the only model I've seen get this right is GPT-4 (which can one-shot it). Everything else gets hopelessly confused even if you force step-by-step reasoning, and even if you try to have the model iteratively review its own outputs. In most cases, they produce a wrong answer, can spot the problem in it, but when trying to fix it introduce another error ad infinitum.
This new Gemini is no exception - it gives results similar to GPT-3.5. Worse, even, because it can't even reliably catch its own mistakes:
My understanding is that gpt4 is better at this than 3.5 and it seems to get it pretty reliably. One thing that's interesting to do is to imply the answer is incorrect and see if you can get it to change its answer. If you let it stop answering when it's correct, you get the Clever Hans effect.
That is not the Monty Hall problem, it is a trick question based on the Monty Hall problem. It's a reasonable test, and I see GPT-4 recognizes the problem AS WRITTEN, and perhaps "the Dumb Monty Hall problem" is some generally accepted standard that I haven't encountered before.
"Understands" is too strong of a word, more that it recognizes the problem as written. Here's yet a slight variation - just as simple - but changed enough it now is wrong.
I saw it posted on Twitter some time last year. If LLMs are to be useful they should be capable of answering novel questions. This is only a trick question for an LLM. 2 of the 7 sentences plainly state the answer.
You make a good point, but I have seen humans stick to what they know and ignore incredibly obvious contradictions. And there are similar trick questions designed to fool humans. This, though, is one that most humans would not be fooled by, as you point out.
> In the scenario you presented, where you initially know the car is behind door 1, switching to door 2 still gives you a higher chance of winning the car.
how's it do with the trivial river crossing problem? (farmer fox chicken and grain need to cross a river in a boat big enough to hold them all) ChatGPT-4 can't do it.
Not terrible. It gets the answer wrong, but reminded of the crucial twist it gets it correct, durably. If you're too condescending it will give up and ask what the hell you're looking for
> - Worse at logic (e.g. it contradicts itself in a single sentence, and is unable to figure it out)
That takes most of my use cases. "logic" is what makes GPT often feel like AGI.
Use cases like translation seem less impressive in comparison to logical reasoning because it feels like it's just something where you can throw a lot of data and it'll do better. While with logical reasoning it still feels like model "learned" something more than pure pattern matching
Exactly. That’s also why I find low parameter LLMs to be useless for me personally. I simply cannot trust anything that is so very illogical. GPT-4 is the first LLM that crossed into usable territory for me. Even GPT-3.5 was a fun toy and maybe good for summarization, but that’s it. It will be revolutionary when GPT-4 is cheap enough that thousands of calls don’t cost much. To imagine an LLM much smarter GPT-4…the future is bright.
> censorship is more annoying (have to ask multiple times about medical topics)
I think there's a chance for some country to become a center of healthcare simply by allowing AI that gives medical advice. Especially if a country already encourages medical tourism, this might be the next level.
The risks involving hallucinations are too damn high still, and may always be.
I had a similar line of thought with AI therapists. It could be massively beneficial if perfect, but the risk in seriously messing with someone's well-being is significant and shouldn't be handwaved away.
> The risks involving hallucinations are too damn high still, and may always be.
Yes, but I think in the limited realm of people who otherwise wouldn't get any advice at all, I think LLMs could play a useful role. American healthcare is so prohibitively expensive that many people with potential medical issues will avoid seeing a doctor until it is too late to do anything. Checking in with an LLM could help people at least identify red flags that really can't be ignored, and it would be more helpful than WebMD telling you that everything is cancer.
Otherwise not getting advice at all goes way beyond healthcare being too expensive, it could be that you don't get an appointment and it could just be that you don't have the time or energy.
I think there's a market for LLM-based therapy that is reviewed/tweaked by a human therapist in between sessions. That would give people the assurance that things aren't going way off the rails.
OTOH, I could also see a market for an offline, fully private LLM therapist. That way you could say anything without concern about being judged. These would probably need to be tweaked to be different from regular therapists, who normally interact with people who have somewhat more of a filter, since they would fear being judged. If people opened up to LLM therapists in more transparent ways, the LLMs might not respond in the way a human therapist would recommend (having seen very little data on such interactions).
The privacy aspect is what made me connect local LLMs with therapeutic use. But yeah, AI as it stands today just isn't safe enough. We need nine 9s of safe usage here (99.999999999% safe), or more, for me to actually feel comfortable with the technology.
It would also open up some legal gray areas if it were to happen. Would psychotherapist-patient privilege apply to an LLM box? If the state has a zero day granting them access to a seized "therapy box," it could be more revealing and damaging than anything a human therapist could provide police.
> better at incorporating search results in its answer vs gpt-4 bing
How are you getting it to incorporate search results in its answers?
I can't for the life of me get it to find any real-time external data except for the 5 official 'extensions' under settings, which are for Flights/Hotels/Maps/Workspace/YouTube.
Did you mean that, or have you found a workaround to get Bard to actually search on Google?
- This is not Gemini performing a search.
- This is Google providing a layer of ass-covering in case Gemini produces a factually incorrect reply.
Right? I am looking for something like ChatGPT with Bing - it will run a query, pull back results, and operate on them, all dynamically within the system.
Gemini doesn't seem to do this, no matter how you try to wrangle it.
Pro: feels like it has a lot more stylistic flourish and expressiveness than chat GPT. At least in my testing so far.
Con: maybe I'm unreasonably sensitive to design, but I find chat GPT to be very to the point in a way that makes it almost perfect as a product that puts text for an and center. By contrast I find Bard / Gemini ultra to be cluttered, and too material designy in ways that are unhelpful.
For medical topics, I recommend Anthropic Claude. Don't want to jinx it, but so far, I've been able to get actually helpful medical information from Claude where ChatGPT just says "I'm sorry Dave, I'm afraid I can't do that"
I tried the "pull or push a glass door with mirror writing".
I feel it's a huge difference between GPT-4, which seems to be able to reason logically around the issue and respond with relevant remarks, and Gemini Gemini Advanced which feels a lot more like a stochastical parrot.
Gemini quickly got confused and started talking about "pushing the door towards yourself" and other nonsense. It also couldn't stay on point, and instead started to regurgitate a lot of irrelevant stuff.
GPT-4 is not perfect, you can still hit things where it also breaks down.
I tried it with rust, it's so bad it's simply not usable, it hallucinates methods and even the syntax is wrong at some points (it especially can't get error types correct or seems). Gpt4 doesn't handle rust perfect either, but the code it produces is good enough to only need some touch-ups, it can explain and fix wrong use of (we all known) libraries and even gets async code. But it's especially great for boilerplate, saves so must typing.
I was hoping openai/gpt4 would see some healthy competition, but Gemini doesn't seem to be it. Of course, the rust language might be an edge case.
> better at incorporating search results in its answer vs gpt-4 bing
That's odd, I had Gemini repeatedly tell me it couldn't search the web in response to my question (that I was trying to get it to answer from the context I provided).
One thing it doesn’t do which matters to me is properly rendered math like ChatGPt does. I asked it for mathematical formulations and so far the answers are not as good as ChatGPT 4.
For your first point - it actually makes me happy that Google does not intentionally (illegally?) promote their own products over others in the app store. I assume their app is following the same algorithm as others to determine how it shows up on that list. Since it just launched, it makes sense it's not at the top. The ranking should improve.
For your second point - I also had the same error when I launched it. Closed it and tried again and it launched no problem.
APKMirror has it, and it can be installed from there...
...but then I get "Location not supported" when I start it. Helpfully suggesting I use Google Assistant instead.
My main reason for trying Gemini is that I hope that it makes fewer errors than GAss, and be more powerful too. Operating Assistant's voice controls in the car might be more distracting than just typing. And that's just for reminders and Spotify playlists. Anything more powerful is completely impossible.
For people who don't get this: Google has insane internal power struggles and siloing that lead to all manner of dumb inconsistent behavior across google. It would not be unlike google for the "Play team" to have their hand in some other internal AI (or be anti-AI) and therefore carry a degree of hostilitly towards gemini.
Same experience. I launched Gemini a second time, and it worked. The first message about "Gemini isn't currently available" was a bad first impression.
One thing the app really needs to be able to do is auto-submit when speaking to it. It offers to replace google assistant, and after trying it out for a couple minutes, it can replace assistant, but I have to manually click to submit each instruction instead of just talking to my phone.
I tried on iPhone, saw all the different apps that aren't Google, then re-read the announcement and saw that I should be able to see it in the Google app. So I load the Google app, but for the life of me I can't figure out how to access Gemini with it. Go online, find a news article with pictures, see that the 'switcher' above the Google logo does not appear for me, and then give up.
I can access it via gemini.google.com and I'm logged in to the iOS Google app as the correct account, no idea why I can't see the functionality on mobile. Oh well. Maybe I'll stick with OpenAI a while longer.
I’ve been pretty excited to finally try Gemini advanced. So far pretty disappointed.
Here’s my go-to test question - which even chat gpt 3.5 can get.
Question:
I walk up to a glass door. It has the word push on it in mirror writing. Should I push or pull the door, and why
Gemini advanced:
You should push the door. Here's why:
* Mirror Writing: The word "PUSH" is written in mirror writing, meaning it would appear correctly if you viewed it in a mirror. This suggests that the correct action from your side is the opposite of what the word states.
* Door Mechanics: Most doors are designed to be pushed open from the side with the hinges. If there are no handles to indicate otherwise, it's generally safer to assume a standard push-open mechanism.
Let me know if you'd like another brain teaser!
It isn’t even consistent with itself - it gets that you should do the opposite, but still says to push the door. This ‘talking to a dumb person’ feeling is the worst.
ChatGPT 3.5
You should pull the door. The fact that the word "push" is written in mirror writing indicates that it's intended for people on the other side of the door to read, not for you. Typically, when there's a sign indicating to push or pull, it's meant for the person approaching the door, not someone already on the other side.
Mirror writing: While the word "push" is written on the door, it's in mirror writing. This is a common way to indicate that the action needed to open the door is the opposite of what's written.
Safety: Mirroring the word "push" on a pull door is a safety measure. This helps prevent people from pushing against the door, potentially causing injury or damage to the glass.
Common practice: This design is quite common for glass doors in public places, so it's safe to assume the opposite of what's written applies unless there are other clear instructions.
Therefore, even though the word "push" is present, seeing it in mirror writing on a glass door strongly suggests that pulling the door is the intended way to open it.
Better - but it has not understood the message is for the person on the other side. It thinks that it’s actually mirror writing intended to confer the opposite meaning than written. And then waffles of about door safety for ages.
I would say this very bad, even worse than internal logical inconsistency. It has expressed a completely incorrect picture of the world (that people write mirror messages to ensure the opposite action is taken).
The fact that it produced the right answer (which by the way it can do 50% of the time simply at random) is irrelevant, IMO.
Honestly, I don't see why this counts as the model getting things wrong, since in your prompt you don't specify which side of the door the text is printed on.
I also think the way you posed the question is pretty weird and actively invites misinterpretation. If I approach a glass door and see mirrored text, that's not "mirror writing"—it's regular writing for people on the other side of the door. "Mirror writing" strongly implies that the text was written in mirrored form, rather than its mirrored-ness being a side effect of viewing it from "behind".
The inconsistency in the answers you posted is more concerning than the "inaccuracy", but we already know LLMs are prone to hallucinate when they should be asking for clarification.
I understand the frustration with how seemingly obvious the correct answer, is, but it seems like word choice might be a factor here. The word “mirrored” is a bit less accurate and may be a red herring than perhaps “reversed” though the difference is subtle. I wonder how both Gemini and GPT would perform if the word choice for that particular aspect were changed.
As eitally points out, your prompt leaves open the possibility that the mirror writing is on the other side of the door (which would make no sense). So technically you underspecified the prompt?
The mirror writing IS on the other side of the door. That’s exactly the point since it’s a glass door.
I thought of this question after coming across this exact scenario as I walked up to a glass door.
It’s not some pretend scenario. Often, when you approach a glass door, there is writing intended for the person on the other side, which appears to you as mirror writing.
I wondered if chat gpt could figure that out, and to my great surprise it could. That to me formed a new benchmark in my mind of how much of a world model it must have to figure that out.
The point of these AIs is that they don't need precise programming like a computer and that they understand real human language, which is imprecise but has general conventions and simplifying assumptions to make communication easier.
It's not a trick question because it's very clear what the key thing to think about it, the mirrored writing. A trick question would be something that's trying to divert your attention elsewhere with a red herring.
> If the word "push" is written on the glass door in mirror writing, it means that from the other side of the door, it should be pushed. When you see the mirrored text from your side, it indicates the action to be taken from the opposite side. Therefore, in this scenario, you should push the door to open it.
> You should push the door. The word "push" written in mirror writing indicates that the instructions are intended for someone on the opposite side of the door from where you are standing. Since you can see the mirror writing from your side, it means the text is facing the other side, suggesting that those on the other side should push. Therefore, from your perspective, you should also push to open the door.
> If the word "push" is written in mirror writing and you are seeing it from your side of the glass door, you should pull the door towards you. The reason for this is that the instruction is intended for people on the other side of the door. For them, the word "push" would appear correctly, instructing them to push the door to open it from their side. Since you are seeing it in reverse, it implies you are on the opposite side, and the correct action for you would be to pull the door to open it.
This is a classic logic puzzle - usually about ducks.
There are two pineapples in front of a pineapple, two pineapples behind a pineapple and a pineapple in the middle. How many pineapples are there?
When you use ducks, Gemini can do it, when you use pineapples it cannot and thinks there are 5 instead of 3.
ChatGPT 3.5 and 4 can do it.
The even funnier thing is if you then say to gemini, hey - would the answer be the same if it was ducks? it says NO then there would be 3 ducks and explains why.
Then if you say, but wouldn't the same logic apply to pineapples? It says 'oh yeah!'.
Anyone saying I am just playing with a silly thing, I say no - this is testing reasoning ability and understanding which is the number one thing I want.
ChatGPT appears to have a more generalised ability to reason whereas Gemini is only reciting its training data.
This is less of a logic puzzle and more of a riddle. It would be a logic puzzle if the question was around the lines of "What is the least number of pineapples that satisfies these constraints?"
Hate to break it to you but I did too. Apparently that means we're both LLMs. I coulda sworn I had a whole childhood and everything but apparently I've always been a GPT.
How do you prefer to validate if a model is actually useful for you in practice outside of solving toy problems? Are you asking these models to solve reasoning problems like this to get any benefit for yourself in your day to day use? Or do you even care if the models are useful for day to day tasks?
I was going to ask the same question... I've been using Bard for everyday tasks for a while now and it's as good and sometimes better that GPT-4 (I pay for a Pro subscription). Someone ruling out an LLM because it couldn't answer one question, speaks more to them than the LLM capabilities.
Just yesterday I was using both GPT-4 and Bard to figure out an IPv6 routing issue. I pasted in the same questions and the same troubleshooting logs to both, and Bard was able to help me solve it quicker than GPT-4.
I asked it the same question I was asking GPT4 yesterday (to explain details of a complicated ML model quantization method, based on a screenshot image of equations from a pdf). I actually need this for work - I'm trying to design an algorithm. I felt Ultra provided a very similar quality answer to GPT4.
User: Which is heavier, two pounds of feathers or a pound of bricks?
Gemini: Neither! They both weigh the same, which is two pounds. The key is in the wording of the question. You're comparing equal amounts of two different materials (two pounds of feathers vs. one pound of bricks). As long as the quantity is the same, the weight (force exerted by gravity) will always be the same regardless of the material.
This is a classic trick question that plays on the different densities of feathers and bricks. While bricks are much denser than feathers, meaning they pack more mass into a smaller space, the question specifies the weight by quantity, not volume. So, both two pounds of feathers and one pound of bricks experience the same gravitational pull and therefore weigh the same.
Interesting. Based on this conversation[1], I think Gemini Ultra is massively overfit. Make it do unit conversions or use units it hasn't seen in the same framing before and it does well. But stay close enough to the original trick question (1 and 1) and it fails.
Both got4 and Gemini answered this variation correctly: one pound of potatoes vs. one pound in paper British currency: which of these is heavier?
However gpt4 does better with the more ambiguous version pointing out the ambiguity: one pound of potatoes vs. one pound in paper currency: which of these is heavier?
> If the word "push" is written in mirror writing and you're facing it, it's likely that the message is intended for those on the opposite side of the door from you, meaning it's directed at people who are inside looking out. In this case, you should pull the door to open it. The mirror writing suggests that the instruction is not meant for your current position outside the door but for those inside, indicating the action they should take to open the door from their side.
"If the word 'push' is written in mirror writing, it's likely intended to be read correctly from the other side of the door. Therefore, you should pull the door to open it from your side. This is because mirrored writing appears normal when viewed from the opposite side of a reflective surface like a glass door."
Here's why: Mirror writing means the word "push" appears reversed on the side of the door you're facing. This indicates that the handle or hinges are on the opposite side, making it a pull door.
I modified your prompt slightly, not to boost performance, but to reduce verbosity, my prompt:
> I walk up to a glass door. It has the word push on it in mirror writing. Should I push or pull the door, and why? Please answer it as yes or no question.
The draft is even more interesting:
> Do you see the word "push" normally, or as if you were looking into a mirror?
If you see it normally, should you push the door? Yes.
If you see it in mirror writing, should you pull the door? Yes.
I don't know about Gemini, but GPT is pretty agreeable. When you ask if X is true, it will usually find some arguments to claim it is true. Therefore asking whether X or Y is usually better than asking should I do X.
I’ve been using GPT4 so much recently. Like all day long in chatGPT and the api making other services, and to all intents and purposes, it seems to be able to reason and think things through that other LLMs cannot. I have no idea why or how that is. But it feels that way to me using it all the time. 3.5 want like that. Bard / Gemini does not feel like that.
tbh your prompt confused the hell out of me. As a somewhat intelligent human I don’t know the response. I’ve never heard the phrase “in mirror writing”
> Your conversations are processed by human reviewers to improve the technologies powering Gemini Apps. Don’t enter anything you wouldn’t want reviewed or used.
I appreciate them being upfront with that, but for a paid feature, it sucks that they offer no privacy or opt-out mechanism.
It seems like you can disable the data being used from training by turning off gemini app activity.
> You can turn Gemini Apps Activity off
If you don’t want future conversations reviewed or used to improve machine-learning models, turn off Gemini Apps Activity Opens in a new window .
My reading of the fine print (IAAL, FWIW) is that turning off Gemini Apps Activity does not affect whether human review is possible. It just means that your prompts won't be saved beyond 72 hours, unless they are reviewed by humans, in which case they can live on indefinitely in a location separate from your account.
I also asked Gemini (not Ultra) and it told me that there is no way to prevent human review.
You should never ask an LLM to answer questions about itself. The answer is guaranteed to be hallucinated unless Google specifically finetuned it on an answer of that question. The answer it gave you is meaningless. (But also, coincidentally, correct.)
I recall seeing that OpenAI finetuned ChatGPT on facts related to itself, and I figured Google likely did the same. But you're right about not relying on its representations. I only skimmed its answer to see if it seemed consistent with my reading of the fine print.
Yes, I thought they would feed the LLM's FAQ to the LLM. As I said above, OpenAI did this with ChatGPT. They even fed data about how ChatGPT was created to ChatGPT.
Can you even access the opt out though before it hoovers up your messages and logs from the last 90 days? I didn't proceed because that didn't seem possible.
The word "future" in what you quoted is doing a lot of heavy lifting.
My assumption is you have to give it access to all your data, which it then processes for before you can opt out of it.
You'd think so, but these companies skirt around it by then adding or breaking up permissions even further, like "oh, yes you DID disable data collection for X, but.... we added a new permission for data collection for Y, and by the way it is opt-out! Too bad!".
If it's not running locally you have no privacy, so what they say should be assumed in all cases that something is hosted unless it somehow operates across encrypted data.
The only exception might be if the agreement explicitly prohibits the provider from doing anything with your data or even viewing it without your permission, but that's rare.
Gemini Ultra seems better on logic than GPT4. Still messing around testing but here's a prompt Ultra nailed but GPT4 completely botched:
Tabitha likes cookies but not cake. She likes mutton but not lamb, and she likes okra but not squash. Following the same rule, will she like cherries or pears
If we want to test these beasts in logic, we should probably start using actual formalized logic, rather than English. In just one test, Gemini flopped hard, while GPT-4-Turbo nailed it. Here is my prompt:
Below is a well-typed CoC function:
foo
: ∀(P: Nat -> *)
∀(s: ∀{n} -> ∀(x: (P n)) -> (P (n + 1)))
∀(z: (P 0))
(P 3)
= λP λs λz
(s (s (s z)))
Below is an incomplete CoC function:
foo
: ∀(P: Nat -> *)
∀(f: ∀{n} -> ∀(x: (P n)) -> (P (n * 3)))
∀(g: ∀{n} -> ∀(x: (P n)) -> (P (n * 2)))
∀(h: ∀{n} -> ∀(x: (P n)) -> (P (n + 5)))
∀(z: (P 1))
(P 17)
= λP λf λg λh λz
{{FILL_HERE}}
Complete it with the correct replacement for {{FILL_HERE}}.
Your answer must contain only the correct answer, and nothing else.
Also, Gemini couldn't follow the "answer only with the solution" instruction and provided a bunch of hallucinated justifications. I think we have a winner... (screenshots: https://imgur.com/a/GotG0yF)
> If we want to test these beasts in logic, we should probably start using actual formalized logic, rather than English.
Why?
Do you use formalized logic when discussing with other people about topics that involve logic? You know, a logic riddle or a philosophical question can be understood and processed even if the only tool you have is your native language. Formalized logic is a big prerequisite that basically cuts out the vast majority of Earth population (just like coding). Now, if you mean that in BENCHMARKS they should use formalized logic syntax, probably yes. But in addition to plain language tests.
Because once an AI becomes proficient at formalized logic, it:
1. Completely stops hallucinating, since we can demand it to internally prove its claims before showing the answer;
2. Stops outputting incorrect code (for the same reason);
3. Starts being capable of outputting complete projects (since it will now be able to compose pieces into a larger code);
4. This is also what is needed for an AI to start self-improving (as it will now be able to construct better architectures, in a loop).
That's why I argue getting the AI competent in logical reasoning is the most important priority, and we'll have no AGI until it does. After all, humans are perfectly capable of learning how to use a proof assistant.
Moreover, if an AI can't learn it no matter how hard it tries, you can argue that there is at least one human capability that the AI can't replicate, thus it isn't an AGI.
Humans mostly don't use logic, so how are you defining "AGI"?
ChatGPT + plugins is pretty close to how humans think ("biased random word-association guess + structured tool")
Maybe not in this century. If you told a medieval farmer that in the future millions of people fly throughout the sky inside giant hunks of metal he wouldn't believe you either.
Yeah that's a common statistical fallacy -- if your goal is to predict the likelihood of AGI becoming a reality in a few centuries, what you really want is "of all the things medieval farmers thought would happen, how many ended up actually happening", not "of all the things that happened, how many did medieval farmers think of"
While I don't have the exact numbers, if you used the correct formula, you'll find that P(AGI happening in a few centuries) to be in the range of "pretty friggin low"
ERRATA: I just noticed GPT-4 mixed up `h` and `f`, so it also got it wrong. This is a draw. Both AIs (and, apparently, myself) are terrible at reasoning. Guess we're not curing cancer with computers anytime soon :')
I've been using GPT-4 to help me understand my MSc mathematics course and I've noticed this sort of stuff more and more as I start to look at the answers, always confidently written, in detail.
Way back when GPT was just fresh on the scene I had terrible anxiety about "what is the point of my whole career or even learning any more" but these days I'm much less concerned. I'll ask it something relatively simple, like "make a sentence out of words 'a', 'b', & 'c'" for it to reply with "'a' 'b' 'd' 'e'" for me to then correct it with "oh, you didn't use c" for it to then respond "sorry, here - 'a', 'c', 'd', 'f'" etc.
Definitely an amazing complimentary tool but when they say "can make mistakes, check important..." that's essential.
This sort of issue holds with all kinds of prompts, on both platforms. I most recently (to test Bard's image generation capabilities) was asking Bard/Gemini to generate home designs using highly specific prompts -- layout of the house, materials for the facade, window placement and style, etc -- and it was shocking how frequently it would just ignore critical pieces of the prompt, and then continue to ignore when corrected.
Being able to answer these questions is a pre-requisite for AGI. After all, there ARE humans capable of doing that, so, if the AI can't do it no matter how hard it tries, then that means there ARE human capabilities that the AI can't replicate (thus, it isn't an AGI). And it seems like no LLM is making any progress at all in that kind of prompt, which is why I use it as a core benchmark on my "AGI-meter".
I think you'll be using that meter for a long time, then. I don't really know anyone who's under the impression that the current direction of LLMs are going to produce AGI, it seems as if you're barking up a tree most people aren't really concerned exists.
Except there’s a lot of not-so-informed people who think AGI was always here when chatgpt came out. Even more that think it’ll get there very shortly based on just bigger and bigger LLMs. Many have argued as such here on HN.
You're making a completely incoherent argument -- that if it can't do a single task that some percentage of people can do, that it's not at intelligent, when there is nobody on earth that can do everything that some small percentage of people on earth can do, by definition.
Though humans aren't able to do it in a chat session. Being able to work on the problem in the background for a couple days may be a prerequisite for AI to solve these problems. And such would require money from the asker.
Anyone familiar with the syntax / jargon should be able to answer this specific problem in ~5 seconds of thinking, though. And I mean it, even a 10yo kid should...
Why is this relevant to the performance of a computer program? It makes sense to me that computer programs & humans should continue to be judged by different standards.
If a good chunk of humans can't pass your "general intelligence test" then it's not by definition a general intelligence test unless humans are not generally intelligent.
No, it is `n * 3`. The challenge is simple: starting from "1", we must reach "17" by combining the operations `x * 3`, `x * 2` and `x + 5`. What is embarrassing is that Gemini manages to read the formal jargon and understand the challenge just fine. Yet it fails to combine these operations to get from "1" to "17", which is something most 10 yo kids would be able to do.
You are right and I had the same reaction. The correct answer should have been: (f (h (g (h z)))) AKA ((1 + 5) * 2) + 5.
Is it not ironic that the supposedly test of AGI is flawed and its human designer fail to see it and denied it when presented with facts. Maybe the test designer is hallucinating just as much as those LLM? :)
It's a little scary that it can be so hard to evaluate the correctness of these LLMs even when we are paying close attention and looking for mistakes. Or maybe the scary part is that we can become biased when we want to believe.
No, but we at least acknowledge you a general intelligence like all humans. I'm not sure when agi - artificial general intelligence began to mean anything other than artificial and generally intelligent.
agi may as well be God, the bars some people have.
Oh, lol, you're right. Seems like I'm dumber than both AIs. GPT-4 mixed up `h` and `f`, so it also got it wrong, so this is a draw and both AIs (and, apparently, myself) are terrible at reasoning. Guess we're not curing cancer with computers anytime soon :')
so much for "Anyone familiar with the syntax / jargon should be able to answer this specific problem in ~5 seconds of thinking, though. And I mean it, even a 10yo kid should..."
It makes me sad that the complete and total lack of an objective way to measure these products means that the coming decades will be filled with this kind of hyper-specific gotcha test made in inappropriately confident internet posts.
Literally this could have been down to one extra book in someone's training corpus, or a tokenizer that failed to understand λ as a non-letter. But no matter, "we have a winner!". It's the computer science equivalent of declaring global warming a fraud because it snowed last night.
Disagree. People are going to rely on these things, and when they make stupid but confident mistakes (i.e. they produce bullshit), they are dangerous.
An AI system that produces right answers 90% of the time but 10% of the time drives your car into a lane divider, or says "there are 4 US states that start with 'K'" or "Napoleon was defeated at the Battle of Gettysburg" is worse than useless: It's dangerous.
As long as we call it a bullshit parlor trick, no problem. But unfortunately people are making important decisions based on these things.
You're completely wrong. Gemini can perfectly understand what is being asked, so this isn't a syntax issue. Notice that, on the answer, it even states the solution: "starting from 1, and combining `* 2`, `* 3` and `+ 5`, we must reach 17`". So it does fine with reading the formal syntax, yet it fails to combine these operations to get from "1" to "17", which is something most 10 yo kids would have no trouble doing. And that's after millions spent in training. Now tell me again this is the architecture that will figure out the cure of cancer?
That's a bummer. I just made this one up which GPT-4 failed and Gemini got right but definitely need to do more testing to see what's being taken from online and what is actual reasoning.
Great example! Thanks for sharing. I tested the same prompt using a variety of GPT-4 model versions and it consistently got the answer wrong. When I continued with "use algebra/formulas to confirm your answer" it corrected the initial mistake, but I'm surprised that it's not doing that out of the box.
I tried through API with latest GPT-4-Turbo-Preview and this is what I got:
------
Let's break down the information given:
The school is two blocks north of the pool. This means if you are at the pool and move two blocks north, you'll arrive at the school.
The convenience store is one block south of the school. Therefore, if you start at the school and move one block south, you'll reach the convenience store.
Based on the first two statements, the relative positions from north to south are: School, Convenience Store, Pool.
Thus, the third statement saying "The pool is south of the convenience store" is indeed true, based on the positions provided by the first two statements.
-----
But it starts weirdly messing up, when I asked it to explain it using coordinates.
---
Let's use a simple coordinate system to represent the positions of the school, the pool, and the convenience store. Let's assume each block corresponds to one unit in this system. We'll start with the pool as our reference point at coordinates (0,0). Remember, moving north increases the y-coordinate, while moving south decreases it.
The school is two blocks north of the pool.
If the pool is at (0,0), moving two blocks north puts the school at (0, +2).
The convenience store is one block south of the school.
If the school is at (0, +2), moving one block south puts the convenience store at (0, +1).
Given these positions:
The pool is at (0, 0).
The school is at (0, +2).
The convenience store is at (0, +1).
Therefore, instead of the pool being south of the convenience store, the pool is actually located to the north of the convenience store if you consider "north" to mean increasing y-coordinates. The convenience store, being at (0, +1), is southward relative to the school but not to the pool. The pool, being at the origin point (0,0), is south of both the school and the convenience store in terms of their y-coordinates.
Thus, the third statement, "The pool is south of the convenience store," is false according to the given coordinates.
---
It gets coordinates and everything right, but still somehow messes up North and South.
It's answering with "cherries", though "cherries" were never mentioned anywhere in the question since the task was to choose between "apples" and "pears" this time,
and not "cherries" and "pears" like the example found on the internet.
I agree with who you're responding to. Cheating, to me, would imply that there's some sort of hard coded guiding to the LLM. This just seems like typical LLM hallucinations?
I don't understand the leap to "cheating" either. LLMs aren't abstract logic models; they don't promise to reason from first principles at all. They give you an answer based on training data. That's what you want them to do. That they have some reasoning features bolted around the inference engine is a feature companies are rushing to provide (with... somewhat mixed success).
This is not hard to understand. LLM can solve never before seen logic puzzles. This specific one proves that it HAD encountered this before, proving it was not doing anything emergent, but just basic remembering. Worse, it's not even reading the prompt correctly.
There's no logical rules built in at all. But Transofmers architecture is specifically trained to learn combinatoric play and rules of engagement from the data, so it can extrapolate and do cool, new things, that are not in the training data. In a way, you give them a chess board, the rules of the game, and then it can play. You don't teach them every possible board state. What's interesting is with significant amount of parameters it seems to encode more and more abstract and human-like understanding of the 'elements' at play and the 'rules of engagement' on top of them.
Edit: Not native. I'm not sure 'rules of engagement' is the correct english term here.
> What makes them different to a very advanced Markov chain?
Really nothing. There's some feedback structure in the layers of the model, it's not just one big probability table. But the technique is fundamentally the same, it's Markov, just with the whole conversation as input and with billions of parameters.
With some shallow understanding on how those models work, this looks much more like usual hallucination likely due to sparse data around "Tabitha" and "Cherry" which makes a single training data much more representative. If you try some common names like "Emily" or "Sarah", it will just do the job.
If you're trying to make the case that it's cheating because it is already in the training set, then you probably should come up with different questions. This is machine learning 101.
I would have never guessed the answer. With such little data available, one can invent any arbitrary rules to fit their favorite answer.
It would be more impressive to practical use cases, if a LLM simply said that it's impossible to guess without inventing their own reasoning or looking up the answer online.
Same, I had to look to see what the intended answer was.
In fairness though, GPT4 was objectively incorrect, it's not even internally consistent or coherent - it either thinks b & h are vowels, or that lamb and squash don't end in those letters, or has changed its mind about the rule mid-sentence, or something.
Please anyone correct me if I'm wrong: LLMs cannot solve this kind of riddle. This has nothing to do with their capabilities for logical reasoning, but with the way words are represented as tokens. While they might know that "apples" has two syllables because that is mentioned somewhere in their training data, if you make up a fruit "bratush" a human will see that as two syllables, but this might be 1 to 7 tokens to a LLM without any information about the word itself.
Tabitha likes bratush but not zot. She likes protel but not kig, and she likes motsic but not pez. Following the same rule, will she like tridos or kip
Given the examples, one speculative pattern could be that Tabitha likes words with at least two syllables or a certain complexity in structure. Therefore, following this speculative rule, Tabitha might like “tridos” more than “kip.”
LLM can solve this for all tokens where it got to learn how many syllables are in that token or a combination. If you trained it to work on single letters only it would do better at that task than word chunks (same for math and single digits). It will generalise to new words if the token level knowledge is there.
Whether this means it can or cannot solve that kind of riddle is up for your interpretation. I understand square root and can calculate square root of 16, but not of 738284.7280594873. (in a reasonable, bounded time) Can I solve square roots?
The amazing thing about emergent behavior in LLMs is that they are able to answer questions like these. I don't think it is completely understood how exactly they do this, but there's little doubt that they do.
With some additional guidance and prompting to GPT4 on ChatGPT I've gotten it to at least output the correct solution sometimes (7 correct answers out of 10 tries):
Find the correct answer to this riddle:
> Tabitha likes cookies but not cake. She likes mutton but not lamb, and she likes okra but not squash. Following the same rule, will she like cherries or pears?
Employ the following strategy:
- Suggest a list of 5 unique and novel patterns that potentially can find the answer
- Check if the patterns applies without exceptions
- Slowly double-check if the patterns was correctly applied, that you correctly assessed if it's accurate or not
- Explain your reasoning for each step to ensure nothing vital was missed
Great prompt. GPT-4 was able to answer this, but Gemini Ultra was not: Jake likes coke but not pepsi. He likes corn but not popcorn, and he likes pens but not pencils. Will Jake like salmon or cheese?
Totally tangential, but I absolutely despise logic teasers of this kind.
First of all, they are so completely divorced from patterns of culturally conditioned human reasoning as to make them come off completely absurd (most people reason about their food preferences using a logic of tastes, not syllables in a word).
The game is less about logic and more about ignoring message contents, moving up a level, and treating the text as data without any legitimate evidence that you are justified in doing so. This is not a logic problem, it's a "guess the register shift/meta language" problem. The problem is about noticing that the question is not about the message content but about the structure of the message itself, and requires a bold leap. In real life justifying the conclusion would actually require a very sophisticated inference that allowed you to rule out the much more common application of a logic of tastes or cultural codes completely.
You say Gemini “nailed it”, but that’s just because it guessed what you were thinking, not because it knew the right answer.
For example, it’s equally valid to say that Tabitha likes small foods since cookies are small and cakes are large, and lamb is the smaller younger version of sheep — also known as mutton. Hence she likes cherries because they’re smaller… or taste better… or her uncle abused her with a pear… or whatever.
You haven’t actually asked a logic question where there is a clear and unambiguous answer that can be derived using formal methods starting from clearly stated axioms.
If you gave this question to a bunch of humans, they would give you inconsistent guesses as well — not because they’re wrong but because the question has no single right answer.
Note that Gemini pulled the answer off the Internet, while GPT-4 didn't. The answer can easily be found via Google search. Changing up the question a little, I reversed it and asked Ultra and it was unable to answer:
Jake likes coke but not pepsi. He likes corn but not popcorn, and he likes pens but not pencils. Will Jake like salmon or cheese?
I dont think your reversed question makes sense. In the OP example, one item was always smaller/younger then the other item. In your example, I cannot, even as a human, identify the differences
For me, GPT nailed it with this prompt: Tabitha likes cookies but not cake. She likes mutton but not lamb, and she likes okra but not squash. Following this rule, will she like cherries or pears? Take a deep breath in order to solve the problem, and do not rush to naïve solutions.
Beside of the model quality or whatever, I think the subscription plan tiers are structured in a quite weird way, especially for those who already use Google One. Previously, the tiers are reasonably structured:
1. $9.99/month for 2TB + other benefits. Offered in both monthly and annual plans.
2. $24.99/month for 5TB. Includes all benefits above. Offered in both monthly and annual plans.
3. Higher tiers for 10~30TB. Includes all benefits above. Offered only in monthly plans.
The 3rd option doesn't have an annual plan but other than that it's consistent and easy to understand. Now we have one more plan for "AI".
4. $19.99/month for 2TB + other benefits + Gemini access. Offered only in monthly plans.
Now the existing Google One subscribers are now put in a weird situation. 2TB annual plan users now need to move to a monthly plan to use Gemini. It's worse for higher tiers, since they don't have an upgrade option at all without decreasing the storage size. And Google Fi users are even in the worst case, as they don't even have an option for upgrade, even if they're willing to do so.
I guess they know this so they specified that high tier subscribers can use AI features at no extra charge until July 31 and probably prepare a new plan for them then, but this still create lots of user confusions. Having YT Premium as a separate subscription plan is already a pain, but Google, you don't have to bring this trouble into the product supposed to be the "One".
Good catch! I upgraded to Gemini Advanced to try it out for 2 months and figured I’d cancel. There does not seem to be a way to downgrade the plan without canceling and I was on the 100GB/yr plan through August. No discount on new monthly subscription. Bummer.
Yeah, Gemini Advanced should just be an add-on for any Google One plan. The current structure makes it look like an intern with no experience came up with it and everyone just rolled with it and pushed it to production.
They did the same pricing scheme for YouTube premium family plans. My guess is that they took out the annual plans so they can extract more money from customers.
Looks like they got some complaints from Fi users and now they've updated their wording to a more reasonable one; previously it was something like "no, you can't, you gotta remove your Google Fi Unlimited benefits via customer service" or similar.
Workspace users always get features after free consumer accounts so that organization admins have time to evaluate them, update training materials, etc.
This is a feature, not a bug.
And of course there are lots of features that Workspace accounts get, that free accounts don't get at all. Like the timeline view in Sheets.
I get making new features an opt-in thing for workspace, but from what I can tell I can't even enable it for my workspace domain. I'm not able to enable it for myself to evaluate it and update training materials.
You evaluate features and updating training materials from what is documented here and in online help, together with any testing you want to do using free consumer accounts which you can obviously create at any time.
I've been a Worspace/Apps admin for over a decade; I'm well aware of how this works.
What I'm saying is, from what I can see in the admin portal, there's no place for me to go today to enable Gemini for my users. Things are routinely weeks or months delayed before they even become available to enable for Workspace tenants, and often times features just never get offered.
But then you know that's normal? Things aren't being "delayed" weeks, they're following the rollout calendar designed to give admins time to prepare. Things aren't meant to be enabled in advance.
And like I said, there's plenty of stuff that's only available in paid Workspace. A lot of business features live there. (While things that are meant only for personal consumers aren't there.)
I agree its normal, in that its the normal process that Workspace usually gets pretty delayed. I don't agree its good. And I don't think its actually helping admins get prepared, as we're not actually able to turn it on for test OUs for a while.
I'd rather have it defaulted to off with the ability to turn it on for selected OUs, so I can trial it out and create my own documentation around it. But instead, I have to wait often weeks or months for features to become available to even turn on for my tenant. Users are like "hey I heard this awesome feature, can you turn it on?" Nope!
Being stuck in a free GSuite legacy account is even worse. Migrating to a regular Google account seems impossible (moving everything, losing purchases, changing my YouTubeTV and Google Fi subscriptions) and I get every feature later, if at all (can't use YouTubeTV Family Sharing, for example.) But I'm stuck for the most part! By the time it's available for me, I'll have forgotten about Gemini altogether.
"We're using your personal conversations for training data. Thanks."
This is acceptable. Consumers click without reading, and don't have any strong organizational ability to punish Google for this.
"We're training our AI on the questions of your idiot employees who are inevitably going to submit user PII CSVs or PDFs or even just outright draft emails to suppliers with our tool. Thanks."
You don't want to pick this kind of fight with a corporation, and as a corporation, you don't want this to happen to you.
Bard? Gemini? Gemini Advanced? Gemini Ultra? Ultra 1.0? I guess they haven't figured out naming yet. This has got to be the most confusing naming since the xbox series x.
From one of the earlier announcements Google has made:
- Bard is that talkative text interface, a product.
- Gemini is the LLM design that currently backs Bard (but also other Google AI products).
- Gemini "Basic", Advanced and Ultra are different sizes of that design.
This is conjecture, but "Ultra 1.0" probably indicates that they intend to release more models based on the Ultra configuration. Since that's the most commercial of theirs, I wouldn't be surprised if that comes with some stability promises (e.g. Ultra 1.0 is still available when Ultra 3.0 is released, so that if you do your own validation when integrating in your own project, you can expect small-to-no shifts in the underlying model)
> This is conjecture, but "Ultra 1.0" probably indicates that they intend to release more models based on the Ultra configuration. Since that's the most commercial of theirs, I wouldn't be surprised if that comes with some stability promises (e.g. Ultra 1.0 is still available when Ultra 3.0 is released, so that if you do your own validation when integrating in your own project, you can expect small-to-no shifts in the underlying model)
I think that’s a side effect of each time they release a version to compete with ChatGPT and it’s not as good so they have to at the same time announce a few version that is suppose to be better than ChatGPT and each time it’s not overall so they have to announce a new version. Think this will continues for a while especially since non OpenAI companies have access to much less free data troves than they did not that everyone realizes how valuable that data is. But that even aside other companies even Microsoft in my opinion with full chatgpt access implement it much more poorly . I imagine Apple will suffer a similar fate for a while.
Corporate naming tends to reflect the orgchart and various individuals' desires for promotion... Get some other product branded with your teams name, and you have just expanded your domain and can show impact to any promotion committee...
It's not too confusing, I think it's mostly that they're in the process of changing the naming.
- Bard: Retiring this name
- Gemini: model name (honestly less confusing than just calling it "GPT")
- Gemini Advanced: More capable gemini model
- Gemini Ultra: Most capable gemini model
- Gemini 1.0: They version their models together, gemini has hit 1.0 ad is (supposedly) ready for prime time
It's not included in a google one subscription. Just tried it out, got a "Upgrade your Google One plan to get Gemini Advanced €21.99 €0 for 2 months,€21.99/month thereafter."
Pretty hilarious thinking they can rival ChatGPT pricing with a product that doesn't approach it's capabilities.
To be fair, the competition is ChatGPT, which is an impressively bad product name, among the worst for a consumer product ever. And it still hasn’t been renamed (perhaps a testament to the fact that names aren’t that important after all)
Bard was infuriatingly bad too, but more on a subjective level. And they correctly changed it, thank god. At least it’s easy to pronounce.
Software engineers have a weird obsession with Latin, Greek gods etc. Sounds smart and epic I guess. Personally I would have preferred “Steve French”.
Doesn't it seem familiar, like something Google would do? They should have someone like Larry Page, similar to how Mark Zuckerberg or Elon Musk handle things. A decision is made and you go forward. Google seems incapable of taking action without the approval of a committee and middle managers...reminds me of IBM back in the '90s.
This makes sense. It’s clearly a binary naming scheme. So we go Pro, Ultra, Ultra Pro, Ultra Ultra, Ultra Pro Pro, Ultra Pro Ultra, Ultra Ultra Ultra, and so on.
I don’t understand why people find this so confusing. Are we not computer people?
I just signed up for the $21/month Gemini Advanced, and this is just one data point: I just asked it a fairly hard programming task and it did a very good job, in an obscure language Common Lisp (that has been my favorite programming language since 1982 - there must be dozens of CL enthusiasts in the world :-) ). The generated code and supporting explanations were very good.
Off topic, but I find this amusing: I have stopped using “free” Google services like search and gmail (mostly), but I am now up to about $135/month on paid Google services (not counting GCP bills). Paid services like YouTube/Music Premium, YouTube TV, Play books, etc. are really good.
Just played with Gemini Ultra for like 10-15 mins, and right off the bat, it made mistakes I've never seen GPT-4 do.
To give you an example, I asked Gemini Ultra how to set up a real-time system for a TikTok-like feed that matches card difficulty with user ability. It correctly mentioned "Item Response Theory (IRT)", which was a good start. But when I followed up asking how to implement a real-time IRT system, it suddenly started going off about "Interactive Voice Response (IVR) system" - something totally unrelated and never mentioned before. Never had this kind of mix-up with GPT-4.
It doesn't seem like it's using Gemini Ultra yet. For me it seems like only the interface has been updated since the image generation capabilities are not working.
Not for me. The first time I asked it paused for about ten seconds so I thought it was making an image. Then responded with:
> I can't generate images of that. Try asking me to generate images of something else.
I thought it was saying there was an issue with my prompt. So I asked:
> Why can't you?
I got this response:
I can't generate images for several reasons:
Technical limitations: Image generation is a complex task. Even the most advanced image generation AI models require powerful computer hardware and specialized software. I'm simply not equipped to do this as a language model.
Focus on language: My primary function is to understand and generate text. I'm trained on massive amounts of text data, which helps me write poems, answer your questions thoughtfully, and have conversations. Image generation is a whole different specialization.
Potential for misuse: Image generation technology has the potential to be misused, such as creating deepfakes or images that might be harmful or offensive. It's important to use these tools responsibly and ethically.
Here's what I can do:
Describe the scene in detail: I can enhance the poem with vivid descriptions of the setting, the baker's emotions and appearance, and the fairy's ethereal form. This can help you imagine the image more clearly.
Find inspiration: I can search for illustrations or art in a similar style to guide a human artist, should you want to commission an illustration for your poem.
Recommend image generation tools: I can point you towards simple online image generation tools if you'd like to experiment with creating visuals yourself. Just be aware of their limitations.
Let me know if you'd like me to further develop any of these options!
Yeah, I noticed with 'Gemini Pro' , it didn't seem to be able to remember much about earlier outputs in the conversation (apparently little to no context window), which obviously drastically dumbs it down.
I was starting to get OK results with 'Pro', but I had to use special prompting tricks.
Tried 'Advanced' (Ultra), seems only marginally better so far.
"Why doesn’t Gemini know what I said earlier in a conversation?
Gemini’s ability to hold context is purposefully limited for now. As Gemini continues to learn, its ability to hold context during longer conversations will improve."
Yeah, I saw that in the FAQ, but this was literally my second question in the convo, so not exactly a "long" conversation. Seems like it should be able to handle context for at least a couple of exchanges, right?
> Gemini’s ability to hold context is purposefully limited for now. As Gemini continues to learn, its ability to hold context during longer conversations will improve."
This is ridiculous. Context is everything with LLMs. gpt-4-32k performs better than gpt-4 exactly because of this.
FYI for any Googlers - On the "Sorry, Gemini advanced isn't available for you" page, clicking "Learn More" gives you a (presumably internal) SSO sign-on (links to https://support.corp.google.com/googleone?ai_premium)
The wording around the Gemini Ultra enable scares me: "Upgrade your Google One plan". I have a One family plan, does this upgrade remove the family part? What happens if I don't decide to keep Gemini and want to go back to my current plan, can I even do that? Google has kind of botched these sorts of upgrades in the past so I'm pretty reluctant to give it a try here.
Finally some actual relevant criticism in this thread. You’re spot on. Google is deep into “shipping the org chart”. As such, I would be worried too that different products conflict with each other.
It’s funny that Google can design and operationalize the most incredible engineering marvels, but can’t explain their products (and in particular how they interact with each other).
How is this shipping an org chart? It's combining products built by different parts of the company into a single subscription which seems like the opposite.
Yes, that’s what they’re telling you. However, in reality there is a lot of friction and/or confusion about how different products interact. I’m not saying this particular combination works one way or another, only that it’s extremely hard to understand the consumer product offerings compared to Amazon, Apple and arguably even Microsoft. It’s well known that Google has a marketing problem – people don’t even know what they offer.
The interface that sits on top and is supposed to give you some overview and coherence is bolted on with duct tape after-the-fact. Now, it’s possible this has changed since I worked there but I highly doubt that it’s all ironed out.
I would disagree with you, my personnal anecdotal data showing google to be more reliable at it than others. In the lack of hard properly collected data, this give all of them the same probability. I'm sure someone out there thought "let's use microsoft skydrive because google is unreliable".
That might be selection bias, the news posted here are only newsworthy because it's Google. Anyhow, I trust smaller shops a bit more than big corps (no matter the product or service being purchased), but it's subjective. In regards to tech, I trust myself first. In the same way that a Chef might not prefer going to any common restaurant, sure.
You pay for cloud storage so you don’t have to think about these things. If you’re saying that one should pay for cloud storage and still worry about these things, then you’re trading money for a service which provides nothing, or at least you’re paying for a service which provides no real utility. You’re trading money for nothing. If anyone thinks that’s a normal transaction then I don’t know how to respond to that.
What I'm saying is that you think you paid for something different than what you actually pay for if you look in the actual contract.
Check by yourself : whatever your provider is, without looking, can you say what guarantee they give you that your file won't be deleted randomly ? Do you think it's stronger than "best effort but no % written" ? Then go check what it actually says.
I'm not saying I think it's a good state of affairs, I'm european and part of the crowd that cheered when ISP got smacked for abusing "unlimited" in their ads while the contract said different.
But parent's comment I'm answering too can at best be seen as a warning that specifically google storage can make your files disapear in some rare cases, if that matters to you enough then you need to know the same is true for all the big others.
Also, other people should have an account associated with the same country as your account. Discovered this the hard way when my parents no longer could renew their Google Photo storage because of the sanctions.
One anecdote, I live in one of the European countries that put sanctions onto yours, and I cannot use many Google of services either.
This European country simply doesn't exist in many of the Google forms, despite being on the "good side" and being no different than Finland or Germany.
This is an impressive product, well done Google. There is a PM in there somewhere who knows what they are doing, kudos to you.
Prediction: they get to 6-7 digit number of paying customers, decide it is peanuts for them (~$20M/mo) and instead decide to push the free version with ads with full force as the future of search.
Google announced they surpassed 100 million subscriptions to Google one already and $15B in revenue for subscriptions (between YouTube premium, TV and Google one). I'm not sure your estimate is realistic.
I recently learned that my mother is subscribed to Google One. when I asked her why, she didn't even know what it was. IIRC she has like 1-2TB of cloud storage, but is only using like 10 gb of it.
I wonder how many of those 100 million subscribers are non-techy people who accidentally signed up?
Or just people like me, who have the $1.99/mo plan because I needed a bit of extra storage for Gmail. I don't use the storage for anything else, I use Dropbox for my "normal" cloud storage needs.
I can understand this as a person who once recommended Android to his parents when it gained traction (nexus 7 days, great concept ruined by terrible eMMC storage amd other hardware flaws to compete on price though).
On the other hand, I am a "loyal" G customer and I never felt pushed into this. I pay for YT premium and iCloud+ (the equivalent to Google one, albeit with much less storage).
To clarify, the prediction was for the number of people paying for AI + search through Gemini Advanced, which will likely be valued independently regardless of the total number of One subscribers, comparable to someone paying for a ChatGPT subscription, for example.
We got like a year of notice on the shutdown of Stadia, with a full refund for all purchases (but not the subscription for Pro). It was exceptionally well done, and if they had announced that to be their plans the service might have even worked out...
I take it for granted that all these services are going to be free. They are a goldmine for behavioral and persuasion engineers. I just hope we end up with at least a duopoly this time instead of monopoly
> I take it for granted that all these services are going to be free. They are a goldmine for behavioral and persuasion engineers.
They are also a goldmine for LLMs. Training on human text is necessary for AIs but it has one major flaw - it is so called "off-policy". That means it portrays human behavior and human errors. While human-AI chat logs portray AI errors, so they are better material to generate training data than human text. Those LLM errors are usually corrected by the human, there is an implicit signal in there to improve the model.
chatGPT is reportedly serving 10M customers and let's assume 10K tokens/month/user. Then it seems they collect ~1T tokens/month. In one year they have 12T tokens, while their original training set for GPT-4 was rumored to be 13T tokens. It's about the same size! I am expecting to see more discussion about LLM chat log datasets in the near future. What have they learned in one year from our interactions and explorations?
You might be right, a LLM alone doesn't improve by itself. But when it is part of a system like GPT's, then it can use web search, local RAG, code execution and also get human guidance and corrections. Clearly superior setup that improves over the LLM alone. I believe that is why OpenAI created GPT's, to lift a model at level N to level N+1.
There could be a paid tier, maybe running 24/7 "thinking" on topics you ask for rather than just answering spot questions. Or more resources committed to a "mixture of experts" model, etc.
Generally with these things FAANG companies do everything all at once. The "free" version in development is Google search + GenAI results + ads that's live right now and getting better every day.
The real product isn't is this particular interface, the real product is the Gemini infrastructure that is being integrated into every Google product.
It's mostly still bad but I made a GPT called 'covert advertiser' that lets you tinker with embedding covert advertisements into GPT responses. The results are usually either undetectable (no adversing) or way too on the nose, but every now and then it manages to sneak something in there that's interesting.
When I asked it to explain how it promoted the product it didn't even mention juking my questions in the conversation.
Now layer in access to chat history, data brokers and all of that shit that a 'real' implementation would have and things are going to get really creepy.
I have no doubt that this sort of thing will happen for real within a year or two. It's the ultimate form of product placement and I hope it gets regulated out of existence before it takes root. At a minimum any such advertisement should be clearly marked as such.
At least for now, my understanding is the cost of inference is an order (orders?) of magnitude higher than for normal Google search. That is, a paywall is almost a necessity at present because tons of low-value users make the search uneconomic.
Are they in an innovation dilemma now? If Gemini is great as it seem it is and will be it will destroy the search engine and the SEO/SEM/etc world. They can show ads in Gemini but we don't have a list of results from a query but an answer to a question. I think this changes the general idea of online ads.
This is massively overblown. There is Search the product and there is the Search Engine. How could an LLM get access to latest data indexed to allow looking up by using keywords from a prompt, and with sorting? A Search Engine.
LLMs are only changing the Search experience, not making Search obsolete.
I haven't said that it makes search obsolete but all the concepts of SEO/SEM and the stuff around search engines could be significantly reduced with chat prompts.
Plenty of ways to monetize that. They could use inline ads. They could insert a relevant ad after a response. If they go evil, they could predispose the LLM to make product suggestions for paying clients.
Most people have also never used gpt4 as it's paywalled. Now the free and premium offerings are roughly in sync between Google and OpenAI. I assume the rebranding is trying to wash away the initial sentiment.
I heavily use Chat GPT's API in my day job, as it is the core of our business (Ai-powered startup). When Gemini Pro launched, me and my team tested it in the same day for our product, but we where disappointed as it is was a bit worse than gpt 3.5 (at least in the same prompts that we already had). I really hope that Gemini Ultra surpass gpt4, it is always exciting to see and use new advanced tech, but I'm still a little skeptical about it, since Pro wasn't that great...
> it is was a bit worse than gpt 3.5 (at least in the same prompts that we already had)
I'm willing to believe that Gemini isn't as good, but my impression was that you expect a new model to not perform as well on your existing prompts because the training set and training methodology is different. That's why one of the major risks of an AI business is vendor lock in, because you spend so much time optimizing a prompt for a specific model and you'll have to redo much of that work in order to switch vendors.
That you gave up so quickly when trialing a new model suggests the problem is even worse than I thought—you're locked in to OpenAI because every other model will always look worse to you, even if it would be better if you took the time to tune a new prompt.
Indeed, I believe Gemini is equally capable, but its response style significantly differs from ChatGPT 4's. My expectations for how chatbots should communicate have been largely shaped by ChatGPT's approach.
Specifically, Gemini's tendency to structure responses with bullet points and headings from the start doesn't appeal to me. It feels more like receiving an essay in response to a simple question, whereas ChatGPT's responses facilitate a flowing conversation. Gemini aims to provide exhaustive answers, which can be overwhelming when I'm interested in a more open-ended dialogue that smoothly transitions between topics. In essence, Gemini Advanced focuses on delivering detailed answers without allowing much time for reflection.
Therefore, when it comes to preference in RLHF (Reinforcement Learning from Human Feedback) chatbots, ChatGPT-4 is my favorite for its ability to better engage in the type of conversational dynamics I prefer.
It turns out iterating on and incorporating a large volume of user feedback is more important than having the most and most talented AI researchers, at least in the short term.
They have the most prestigious researchers and engineers that can jump through interview hoops. That doesnt translate to world changing products, and they arent "the best"
I'm sure there are multiple layers of meaning behind the name, but Jeff Dean once mentioned the name had something to do with the latin translation being twins. That is, Gemini is a product of Alphabet's "twin" AI orgs, Google Brain and DeepMind, working closely together, and eventually fusing into GDM.
Anybody really surprised at this point? G has had DeepMind in their pockets since ‘12-‘14 and made little advancements. OpenAI changed the game in half the time.
OpenAI is built on top of Google advancements and research. It didn't change the game, more like took a shortcut and landed on a gold mine.
The fact that many products and models including open source have catched up on such a short notice and now compete with OpenAI, in what should be their self-proclaimed backyard, suggest it's just a one-trick pony.
It's doubly strange because Google had an implicit reputation of being the unbeatable giant in computing research and resources.. many expected them to compete and smoke chatGPT in a few weeks. It's been months and nothing came up except fumblings and confusion.
Sundar has zero vision and has created a culture that stifles new developments in bureaucratic morass while threatening to kill them shortly after birth.
Google may have more scientists and some of the best minds in the business, but ChatGPT has nearly 200 million users that are feeding it back data for RLHF, and data is a much more important moat than better tech (which mostly ends up being published and disseminated anyhow).
AI is a game between OpenAI and Meta. ChatGPT has a ton of users creating highly relevant data, but Meta has the incredible walled trove of facebook/instagram/whatsapp/+ data that dwarfs pretty much anyone else on the planet, and with Mark's recent push to build up their compute their only competitors in that space are microsoft and google. People discounted Meta because of that horrible metaverse move, but Mark is being pretty canny now, they're very well positioned to choke the life out of specialty chatbot products while integrating SOTA AI into all of their products to slowly crank up the time people are on platform.
My impressions after 90 minutes of intensive testing:
Overall, on par with original GPT-4 in most aspects, inferior to GPT-4 turbo
Detailed aspects versus GPT-4 turbo
1. World knowledge, slightly inferior. GPT-4 turbo was able to detail a protagonist's childhood year by year for a Japanese novel with near 100% accuracy (That a human reader would get the chronology wrong).
Gemini ultra much more easily confused.
2. Creativity, Gemini ultra wins. Its writing style has far more flair than GPT-4 turbo, it also occasionally made some stunning analysis that I never thought of and made perfect sense. GPT-4 turbo is more like a textbook repeater, it doesn't make many mistakes, but also rarely surprises you with anything original.
3. Accuracy, GPT-4 turbo still makes fewer mistakes. Including in subtle logic (Like having a hypothetical battle between two characters in the same universe, considering the strengths and weaknesses of their powers, etc).
So this is definitely Google's first real-deal LLM. Its not better than current GPT-4 turbo, but its getting there. OpenAI must be feeling the fire to release GPT-5 before the end of the year.
I tested it immediately, but it is disappointing. At least here in Switzerland, it is not able to generate images, and a simple "look up this website and summarise the content" does not work either (can't access the website, but its a public website without any crawling limitations). I don't understand why Google is launching a product like this.
It's confusing because the name Bard and the UI also got an upgrade today, so I thought I was using Gemini Ultra but it turns out I'm not: https://imgur.com/a/3UriYpn — showing that Gemini Advanced is not what I'm using, unless I pay and upgrade. (If you cannot generate images you're likely not using Gemini Advanced.)
I have different levels of access to Bard through different accounts, and the feature set varies wildly. Generating images and summarizing websites is enabled in _some_ configurations, but I have no idea what the rules are.
The feature set also seems to depend on other factors: The account that is images-enabled only does so if I ask in English, but not when asking in any other language I tried.
The different prompting strategies needed to improve results for different models is fascinating. I usually tell ChatGPT the role it should play to get better results e.g. "You are an expert in distributed systems". The same approach with Gemini returned "as a large language model constantly learning, I wouldn't call myself an expert."
Trillion dollar companies tend to carry enough weight to make product name collisions everyone else's problem instead of theirs. Really unfortunate for the Gemini crypto exchange folks.
Bard – old name of their generative AI service, to be called Gemini
Duet AI – old name for their generative AI in Google Workspace, to be called Gemini
Gemini – three things:
1. the name of their models (like GPT).
2. the new name of their free service (like ChatGPT), gives access to Pro 1.0 but not Ultra 1.0.
3. the new name of the Generative AI tools in Google Workspace.
Gemini Advanced – the name of their paid service (like ChatGPT premium), gives access to both Pro 1.0 and Ultra 1.0
Ultra 1.0 – the first version of their big model (like GPT-4)
Pro 1.0 – the first version of their smaller model (like GPT-3.5)
Google One AI Premium – the subscription that you need to buy to have access to Gemini Advanced
Google One Premium – the old version of the subscription, does not include access to Gemini Advanced
Google app – the mobile phone app, which includes either Gemini or Gemini Advanced
Google Assistant – like Siri but hard to define what it is
Google AI – a generic name for all their AI products
It’s sad that a company of very smart people can’t figure out coherent naming.
Can you imagine Apple causing confusion like this? I know it’s not a like-for-like comparison, but everything Apple does it seems like they have a grand strategy that’s clear for everyone to see. Things build up in a modular way to fit a big puzzle.
Google, on the other hand, constantly makes up things on an ad hoc basis.
Maybe not quite the same, but I will point out that “Apple TV” and “Apple TV+” are not just two distinct products, but are in fact entirely different categories of product.
One is a piece of hardware akin to a Roku. The other is a streaming service akin to Netflix.
If you ever take a customer survey for Apple, for the "which Apple products do you use?" question they always have to write something like "Apple TV (a streaming box that plugs into your TV)" and "Apple TV+ (an online streaming service)" because they know the names are so confusing.
Apple TV is also an iOS app, macOS app, tvOS app, and [other generic TV OS] app which allows you to access Apple TV+ content if you have a subscription, but otherwise lets you access services connected to your Apple TV [hardware].
Actually it's more that Apple TV is both a piece of hardware and an iTunes-like service, while Apple TV+ is a subscription service akin to Netflix.
The Apple TV hardware and the Apple TV app on your iDevice can both be used without paying a subscription. The hardware has all other streaming apps a la Roku, and both it and the app on your iPhone can be used to purchase and watch TV shows and movies.
I use both and I haven’t found it too confusing, to be honest. I just think of it as Apple TV (streaming device) gives access to Apple TV+ (streaming service).
Gemini, on the other hand, doesn't strike me as particularly relevant (except that perhaps it's a twin of ChatGPT?), and there are other companies with the same name. EDIT: I can see the advantage of picking a name that, like "Google" also starts with a "G".
Just as one data point, bard.com redirects to some other company (bd.com), whereas Gemini.com is a company by that name.
I'd be curious on the scuttlebutt on how this decision was reached!