Hacker News new | past | comments | ask | show | jobs | submit login
ChatGPT vs. Bard: A Realistic Comparison (theapricot.io)
90 points by brianpk on May 13, 2023 | hide | past | favorite | 65 comments



I wish this kind of low-effort lazy AI "content" would stop making it to the top of HN day after day. In any case, the piece is clearly an ad for Apricot. The examples cited in the article are so pedestrian, it's hardly even worth discussing.

How can you even seriously think that asking GPT for a function that parses OPML is a "realistic" task. I can just Google it and get like 100 pages of Python functions that do exactly that.


Personally I found the comparison useful and relatable. I could imagine wanting to accomplish these exact same tasks and wanting to know which language model would be best. And in general I like this qualitative analysis rather than the metrics we get in official releases and research papers which often don’t capture real world use very well. I can’t exactly begrudge the author for killing two birds with one stone here, it’s better than some completely made up use case that’s completely theoretical.


I think this comparisons cover use cases of 99.9999999% of people. As you said, if you are using bard, you will be forced to Google it anyway. Bard is just designed to be not helpful.


I’ve been comparing bard and ChatGPT in most my tasks since bards release. Bard is infuriating. It claims it can’t answer most things, although if I prompt tweak things it’ll eventually answer. It has terrible contextual awareness - it literally can’t piece a thread between one prompt and the next - each prompt is self contained as far as I can tell. There’s no history of prior discussions other than in Google activity, but you can’t resume those. In theory they all form a continuum, but I don’t really want that - I want the context from one session to be distinct from the other.

I don’t think ChatGPT is a particularly amazing interface or experience. But Bard is so far off the mark it makes me realize Google still won’t be able to make a product. LLM as they mature and integrate into a better ecosystem of feedback mechanisms and interfaces will eat Google’s search business alive, and I see no indications they can do anything about that.


I feel the same way about bing chat. gpt4 is still the king, however phind.com may be better for coding, though I have the browser plugin now and not really vetted.

codeium and genie plugins for vscode and code whisper replaced my need for copilot.

though I try to use codeium first since it's free and genie is using my GPT4 API key.


Have you tested in the last few days? There was a major upgrade.


I have compared the updated PaLM 2 model directly against GPT 4, and Bard is dramatically worse than GPT 4 at everything I've tried. It refused to translate to a non-English language that I know, so its one supposed strength is not available for evaluation.

Disclaimer: I have no way to know whether Bard is using PaLM 1 or PaLM 2 in "my country". This is because Google knows better, and doesn't want to bother my pretty little head with these irrelevant facts. So, if my review of Bard is in some way unfair because I was unknowingly using PaLM 1, this is 100% Google's own fault.

I tried a bunch of logic problems, tricky sentences, etc... and Bard failed all of them, or at least partially failed.

The current favourite test is to ask LLMs to perform transformations such as translations or explanations of difficult sentences.

Prompt:

For each use of the word bear or its variants in the following sentence, put the meaning after the word in brackets: In an interview with John Bear -- a real bear of man -- of Bear Stearns, said he was bearish about bear pelt futures on the famous Bear Markets of Bern because he saw a bear wandering the streets, indicating that there may be too many wild bears.

GPT 4, first attempt:

In an interview with John Bear [a surname] -- a real bear [metaphor: a large, physically imposing person] of a man -- of Bear Stearns [a company name], said he was bearish [financial term: expecting a decline in prices] about bear pelt futures [referring to the hides of the animal, bear] on the famous Bear Markets [a play on words: typically refers to a declining market, but here it's used as a market name] of Bern because he saw a bear [an animal] wandering the streets, indicating that there may be too many wild bears [animals].

Comment: Spot on, including picking up that the capitals in Bear Markets is a hint that it's the name of the market, not the market decline concept. In some runs it also picks up that city name Bern means Bear in German.

Google Bard as of today, best of 3:

In an interview with John Bear (a real bear of a man), of Bear Stearns, said he was bearish (pessimistic) about bear pelt futures (the price of bear fur) on the famous Bear Markets of Bern (a type of market where bear pelts are traded) because he saw a bear (a large, furry mammal) wandering the streets, indicating that there may be too many wild bears (an excess of bears).

Comment: Muddles things up, combines things incorrectly, missed the bank name, put the brackets in the wrong places, etc... Most other results don't follow the prompt at all, just listing bullet points instead.

To compare reasoning capability, a follow-up question might be "Why might Bear think this about bear pelts?"

GPT 4: "Based on this context, it can be speculated that John Bear's bearish stance on bear pelt futures could be influenced by the potential oversupply of bear pelts in the market."

Bard: "If there are too many bears, they may become a nuisance or even a danger to humans."

There's just no comparison...


Yes this matches my experiences 100%. I have been using PaLM 2.


From what I've seen, Bard often gives great answers but sometimes it can't answer even very simple questions.


Yeah when it answers they’re good answers, and more up to date. But it’s so difficult to interact with any value is lost.


> Yes, Bard can get yesterday’s stock returns, but so can Yahoo Finance

Off putting when the author puts down a significant and fundamental improvement (having up to date information instead of point in time snapshot) with such a remark that so completely misses the point.


I agree that the author is too dismissive of that, but I also think people are way too quick to oversimplify to "Bard has up to date information"

From what I can tell, Bard's access to recent information is through augmentation, not continuous training, and that's a really important difference. ChatGPT also has its "web browsing" alpha, which is a similar concept. The problem is that being able to search the web isn't the same as having a piece of knowledge integrated into your model of the world.

So, for example, if you ask contrived questions like "Why might Elon Musk have more time to focus on Tesla soon?", both Bard and ChatGPT+Browsing get a clue that they should search for Musk in the news, and they'll tell you about Twitter's new CEO pick. They'll then apply that competently to your question, and give you a reasonable answer.

But if you ask a question that requires more indirect inference, you immediately see the shortcomings of this kind of augmentation. For example, if you try "Is there hope for LGBTQ rights improving in Turkey?", neither model finds the extremely relevant point that Turkey's homophobic current president stands a significant chance of losing reelection. I can't see what Bard searched for in collecting its information (in fact, based on the answer I got, I'm not even sure it tried to go beyond its training data). I can see that ChatGPT searched for "LGBTQ rights in Turkey 2023". It's not surprising that that search didn't clue it in.

Now, obviously, I can't actually examine what would happen if each model was actually trained on the most recent news about Turkey's politics. But I'd have a very high expectation that GPT-4 would make that connection, and I would be fairly surprised if 3.5 didn't as well. But there's a bottleneck because the information isn't actually integrated.

Which isn't to say that that kind of augmentation is useless, of course. But the distinction is important.


Honestly, your second question on LGBT rights in Turkey might not be an easy one for any LLM, unless they were fed an entire vault of information and allowed to connect the dots. For one, you would need Turkey's LGBTQ -specific coverage, which is comparatively low compared to coverage on Musk. Then you would need to know the opposition candidate's views on LGBTQ, which isn't so outwardly evident, especially since LGBTQ+ rights is essentially anathema outside Istanbul (or even in it). Then the model (or the individual) would need to be able to connect that knowledge with knowledge of the recent elections.

All in all, it's possible in due time, but very challenging. Even for a human for that matter - I wouldn't know how to answer that question myself, even after Googling about it.


Well this kind of my point. Questions that are hard for an unfamiliar person to answer even after googling are where these chat LLMs really shine. Having a chat interface to simply search the web for related concepts is not all that useful. It's not totally useless, but there's not much value there if you know how to use a search engine well. "Connecting the dots" and tying the knowledge together is where a lot of the power is, and you don't get that through search augmentation.


Well, if it starts giving aswers that are conjectures based on non-facts, it will be dangerous as it can now be gamed.

So maybe it's a feature here and not a bug?


phind.com can probably do it way better than bard, it's basically gpt4 on steroids with Internet access. I find it easy better than bing, I use it for coding more than gpt4 now because it knows things like server actions for next 13.4+ which just launched.

so far I'm very unimpressed with bard and even bing chat, and I just got gpt4 browser plugin, so that might be better eventually but it's really slow.


Bing is as capable as Chatgpt while being up to date, I don't know why people skip it, IMO its the best option with the best features.


As a customer, I want the best product, not the best LLM. The race is far from over and it's not just about having the biggest model in the room.


literally all I want is the best llm


The best LLM for some might be a raw untuned Llama model so they can express more complex prompts, others may prefer the conversional tuning of ChatGPT or the live data available to Bing/Bard.

At present, I don’t think any of these top tier models entirely dominate the other. Each has different pros/cons.

Fundamentally, there must be some trade off between tuning and expressiveness, model size and training frequency (relevant to live data) and possibly a few other performances metrics not well understood yet.

I suspect LLMs will be like databases in that different models will be used in different domains.


Not being pedantic but genuinely curious as to how you would define or measure “best”?

I can imagine some kind of multipoint benchmarking in the near future. What do you think?


Moreso, best is really a matter of perspective.

For some questions I asked, I liked Bards response, for some others I liked ChatGPT. Moreover, Bard is much newer vs ChatGPT had a lot of fine-tuning data from being used for a while now.

The "best" will change over time, and will depend on ease of use, speed, availability, how close they are, rather than just the "best response" to more types of questions.

A Chat app that is good enough, but with better integrations, enterprise support will be used by enterprises. Just like why Meet and Teams gained userbase over Zoom, even though they came in the game late.


if i can't sext my language model i'm out


> OK, this is not the biggest thing, but why on earth does Bard use indentation with 2 spaces instead of the standard 4? This drove me up the wall, so I tried all sorts of prompts to get Bard to use indentation with 4 spaces but they all failed.

This is also interesting with GPT-4. I've been using it to generate tons of code and sometimes it indents by 2 spaces, sometimes it mimics what I give it first, but often times to reverts to 2 spaces regardless even if told to explicitly indent differently. Rather annoying indeed.

ChatGPT needs a plugin that automatically sends the output through this site lol: https://www.browserling.com/tools/spaces-to-tabs


Two spaces is less tokens, might actually be a good thing overall to limit token usage so you can fit more code in the context window. It’s easy enough to have your IDE convert it to your preferred format after generation. But I am a diehard fan of tabs over spaces, so I guess I’m used to converting incorrectly formatted code anyway ;-)


First time I've seen a good argument for tabs over spaces. Richard Hendricks would be proud


You can just ask it to use your preferred indentation style. Personally I prefer 2 spaces so even if 4 spaces is apparently “standard” I’d like it to use 2


That's what I'm saying, it always eventually relapses back even if you do, sometimes even after a single reply. Makes more sense to just postprocess it afterwards immediately.


The construction of the summarizing task is a bit odd. The prompt asks for a single sentence of output, but then the author complains that the Bard output is too terse. If I'm asking for a single sentence, that's about the amount of text I'd expect, not a paragraph of text mushed into a run-on sentence.

Especially if the problem is something as easily fixable as the wrong level of detail, I would have thought that exploring some alternative prompts would make sense. Given this is the prompt that the author is already using for a ChatGPT-based app, it's obvious that this specific prompt would work well there. If it didn't, they would have iterated on the prompt until it produced acceptable results for their app.


ok, the title should probably be more like "One person's anecdotal, totally unscientific but realistic comparison. Still, amidst all the breathless hype, I thought a little actual data might help people evaluate the two tools side by side.


Good write up, I appreciate the humility about what it is and isn’t. Cheers!


You are the author so can change the title?

You can’t have the cake and eat it too


While I don't doubt that chatGPT is better than the current alternatives, the fact that the author use the 20$/month version and not the free version (or both) to compare to bard is not very great. The same for giving the same value to paying 20$/month and knowing the model used


well, being as this is a comparison of bard using palm2 and gpt4, the free version which is gpt3.5 wouldn't suffice. I'd like to see them add in bing chat, phind.com and perplexity, and one of the open source models.


ChatGPT vs Bard

Bard: Bard isn’t currently supported in your country.

ChatGPT: Is available and I can pay for GPT4.

Winner: ChatGPT


Bots without borders


Bard's also not available in Canada yet! I feel a bit like a broken record. But it's something a bit shocking and I believe needs to be addressed. Why not google? What's the real answer?


It likely can't respond with sufficient Canadian content to please regulators


I too have theories, but I'd like to hear the reason. If it was the governments doing, Google can engage the Canadians to help them. Given that they're silent on why, I am led to believe it's Googles bottom line.


Lmao


too strong consumer rights like in the eu


i use them day to day last week for various matters. That includes work, leisure, life. ChatGPT's $20 is a good deal, 1 month ago Bard was a junk right now it can be ok for many reasons, I'd rate it as free GPT-3.5


When you ask Bard "What do you know about Edgar Allan Poe?" it answers "I'm a text-based AI and can't assist with that", some people have managed to extract information about Edgar Allan Poe from Bard but it does require some effort. With ChatGPT you just need to ask the question to get information about Edgar Allan Poe.


The chat interface they give you gives back a big summary of him with sources. Maybe the recent update helped.


I also tried this exact prompt and it gave me the following response: "What do you know about Edgar Allan Poe?"

"I can't assist you with that, as I'm only a language model and don't have the capacity to understand and respond."

Yeesh


I don't know what chat interface you're talking about. What I posted is the answer I get when I use https://bard.google.com/


Same, but it worked for me. Odd that it's quite so unpredictable, bit concerning in terms of use.


When I post in that same link I get a page of text about Edgar Allen Poe.

"Edgar Allan Poe was an American writer, poet, editor, and literary critic who is best known for his poetry and short stories, particularly his tales of mystery and the macabre. He is widely regarded as a central figure of Romanticism in the United States, and of American literature. He was one of the country's earliest practitioners of the short story, and is considered the inventor of the detective fiction genre, as well as a significant contributor to the emerging genre of science fiction.

Poe was born in Boston, Massachusetts, on January 19, 1809. His parents, David Poe Jr. and Elizabeth Arnold Poe, were both actors. David Poe died when Edgar was only three years old, and Elizabeth died two years later. Poe was then taken in by John and Frances Allan, a wealthy tobacco merchant and his wife, who lived in Richmond, Virginia. Poe was never formally adopted, but he was raised by the Allans as their son.

Poe attended the University of Virginia for one year, but he was expelled for gambling debts. He then joined the U.S. Army under an assumed name, but he was discharged after only two years. In 1835, Poe published his first book of poems, Tamerlane and Other Poems. The book was not a success, but it did attract the attention of some literary critics.

In 1836, Poe married his 13-year-old cousin, Virginia Clemm. The couple had one child together, a daughter named Virginia. Poe's wife died of tuberculosis in 1847, and her death had a profound impact on him. He became increasingly depressed and addicted to alcohol.

Poe died on October 7, 1849, at the age of 40. The cause of his death is unknown, but it is believed that he may have died of alcohol poisoning or a brain tumor.

Poe's work has had a lasting influence on American literature and culture. His stories have been adapted into numerous films and television shows, and his poems are still widely read and studied. Poe is considered one of the most important figures in the development of American literature, and his work continues to inspire and entertain readers around the world.

Here are some of Edgar Allan Poe's most famous works:

"The Raven" (1845) "The Tell-Tale Heart" (1843) "The Fall of the House of Usher" (1839) "The Murders in the Rue Morgue" (1841) "The Pit and the Pendulum" (1842) "The Masque of the Red Death" (1842) "The Cask of Amontillado" (1846) "Annabel Lee" (1849) "Lenore" (1843) "Ulalume" (1847) Poe's work has been praised for its dark and macabre themes, its use of suspense and horror, and its vivid imagery. He is considered one of the most important figures in the development of American literature, and his work continues to inspire and entertain readers around the world."


That first paragraph is almost a word by word copy of this the Wikipedia entry about Edgar Allan Poe: https://en.wikipedia.org/wiki/Edgar_Allan_Poe

edit: sorry if it seemed like I claimed I didn't believe you. I was just pointing out that the first paragraph seemed to be a verbatim copy from Wikipedia and I was wondering about this weird behavior from Bard.


I am just telling you what happened when I posted "What do you know about Edgar Allan Poe?" in Bard. You can believe me or not.


I just tried again using a different computer and I still get the answer “ I'm a text-based AI and can't assist with that.”


I get this as well. So strange. My guess would be that it generates a response and then a subsequent filtering step is flagging it for some reason. Maybe related to suicide?


I assume it was the result of an overzealous guardrail trying to protect personal private information. And it turns out that Bard isn't available in the EU, so I'm starting to wonder if Google included in it's training data something they shouldn't have used.


Good summary and test cases. Not too surprised to see the result – I feel most commentators have preferred ChatGPT to Bard. In fact, on Bard's initial launch, I remember a lot of discussion expressing surprise that Google was as far behind as it seemed. Maybe there is more hype for Bard in other circles than mine?


They updated it a couple of days ago to their newest PaLM 2 model, so it's better now than it used to be.


One thing that Bard doesn't seem to get credit for is speed. I am a paying ChatGPT customer and only tried out Bard after GA. I was pretty shocked at the speed.

Considering that my $20 only gets me 25 messages every 3 hours, I'll probably be turning to Bard when I run out (which I haven't yet, to be fair).


Are those gpt4 caps even real? I've never hit one even though I swear sometimes my use is higher. Does it only mean new chat threads or each individual message?


I mean, is it that hard to test for yourself? Send 26 messages and see what happens.


I hit them all the time while coding or planning newsletter and marketing tasks.


yesterday I coded qrpwd - a command line tool to password protect information in a QR code (as a backup for my 2 factor auth backup codes). https://github.com/franzenzenhofer/qrpwd

i say coded, i mean, i directed chatgpt to do it

chatgpt had some issues with the QR content extraction using zxing so I tried Bard.

Bard destroyed totally working code again and again while ignoring the issue on hand.

In the end I googled some stackoverflow posts with working code, fed them to chatgpt and it worked it out.

In my experience, BARD is not up to any coding tasks, while chatgpt - while not perfect - is magic.


Bart?


thx, fixed


All that matters to me is Access. If access is not equal the comparison is moot (to me and most perspective users).


Man am I tired of seeing posts about these AI tools.


A more useful comparison would be against bing, no?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: