> I've tested bard/gemini extensively on tasks that I routinely get very helpful results from GPT-4 with, and bard consistently, even dramatically underperforms.
Yes. And I don't buy the lmsys leaderboard results where Google somehow shoved a mysterious gemini-pro model to be better than GPT-4. In my experience, its answers looked very much like GPT-4 (even the choice of words) so it could be that Bard was finetuned on GPT-4 data.
Shady business when Google's Bard service is miles behind GPT-4.
True, what is most puzzling about it is the effort Google is putting into generating hype for something that is at best months away (by which time OpenAI will likely have released a better model)...
My best guess is that Google realizes that something like GPT-4 is a far superior interface to interact with the world's information than search, and since most of Google's revenue comes from search, the handwriting is on the wall that Google's profitability will be completely destroyed in a few years once the world catches on.
MS seeems to have had that same paranoia with the bingified GPT-4. What I found most remarkable about it was how much worse it performed seemingly because it was incorporating the top n bing results into the interaction.
Obviously there are a lot of refinements to how a RAG or similar workflow might actually generate helpful queries and inform the AI behind the scenes with relevant high quality context.
I think GPT-4 probably does this to some extent today. So what is remarkable is how far behind Google (and even MS via it's bingified version) are from what OpenAI has already available for $20 per month.
Google started out free of spammy ads and has increasingly become more and more like the kind of ads everywhere in your face, spammy stuff that it replaced.
GPT-4 is such a refreshingly simple and to the point way to interact with information. This is antithetical to what funds Google's current massive business... namely ads that distract from what the user wanted in hopes of inspiring a transaction that can be linked to the ad via a massive surveillance network and behavioral profiling model.
I would not be surprised if within Google the product vision for the ultimate AI assistant is one that gently mentions various products and services as part of every interaction.
the search business has always been caught between delivering simple and to the point results to users and skewing results to generate return on investment to advertisers.
in its early years google was also refreshingly simple and to the point. the billion then trillion dollars market capitalization placed pressure on them to deliver financial results, the ads spam grew like a cancer. openai is destined for the same trajectory, if only faster. it will be poetic to watch all the 'ethical' censorship machinery repurposed to subtly weigh conversations in favor of this or other brand. pragmatically, the trillion dollar question is what will be the openai take on adwords.
Ads are supposed to reduce transaction cost by spreading information to allow consumers to efficiently make decisions about purchases, many of which entail complex trade-offs.
In other words, people already want to buy things.
I would love to be able to ask an intelligence with access to the world's information questions to help me efficiently make purchasing decisions. I've tried this a few times with GPT-4 and it seems to bias heavily toward whatever came up in the first few pages of web results, and rarely "knows" anything useful about the products.
A sufficiently good product or service will market itself and it is rarely necessary for marketing spend or brand marketing for those rare exceptional products and services.
For the rest of the space of products and services, ad spend is a signal that the product is not good enough that the customer would have already heard about it.
With an AI assistant, getting a sense of the space of available products and services should be simple and concise, without the noise and imprecision of ads and clutter of "near miss" products and services ("reach" that companies paid for) cluttering things up.
The bigger question is which AI assistant people will trust they can ask important questions to and get unbiased and helpful results. "Which brand of Moka pot under $20 is the highest quality?" or "Help me decide which car to buy" are the kinds of questions that require a solid analytical framework and access to quality data to answer correctly.
AI assistants will act like the invisible hand and shoudl not have a thumb on the scale. I would pay more than $20 per month to use such an AI. I find it hard to believe that OpenAI would have to resort to any model other than a paid subscription if the information and analysis is truly high quality (which it appears to be so far).
I did exactly that with a custom GPT and it works pretty well. I did my best to push it to respond with its training knowledge about brand reputation and avoid searches. When it has to resort to searches I pushed it to use trusted product information sources and avoid spammy or ad-ridden sites.
It allowed me to spot the best brands and sometimes even products in verticals I knew nothing about beforehand. It’s not perfect but already very efficient.
The ad model already went to take attribution / conversion from different sources into account (although there's a lot of spammy implementations), but it took many years for Google to make youtube / mobile ads profitable, and now adoption is much faster.
> And I don't buy the lmsys leaderboard results where Google somehow shoved a mysterious gemini-pro model to be better than GPT-4.
What do you mean by "don't buy"? You think lmsys is lying and the leaderboard do not reflect the results? Or that google is lying to lmsys and have a better model to serve exclusively to lmsys but not to others? Or something else?
Most likely the latter. Either Google has a better model which they disguise as Bard to make up for the bad press Bard has received, or Google doesn't really have a better model—just a Gemini Pro fine tuned on GPT-4 data to sound like GPT-4 and rank high in the leaderboard.
> Either Google has a better model which they disguise as Bard
Why wouldn't they use this model in bard then?
Anyway this is easily verifiable claim, are there any prompts that consistently work at lmsys but not at bard interface?
> fine tuned on GPT-4 data to sound like GPT-4 and rank high
This I don't get. Why would many different random people rank bad model that sounds like gpt4 higher than good model that doesn't? What is even the meaning of "better model" in such settings if not user preference?
Yes. And I don't buy the lmsys leaderboard results where Google somehow shoved a mysterious gemini-pro model to be better than GPT-4. In my experience, its answers looked very much like GPT-4 (even the choice of words) so it could be that Bard was finetuned on GPT-4 data.
Shady business when Google's Bard service is miles behind GPT-4.