
Show HN: Humans vs AI – A/B testing GPT-3 - paraschopra
https://vwo.com/ab-testing-openai-gpt-3/
======
paraschopra
Hello HN,

As an A/B testing company (VWO), it's exciting to see how much effective is
GPT-3 generated copy against human copywriters on live websites.

I like to think of this as the Turing test on the web :)

The quality of generated headlines, buttons and product descriptions seem very
good, so we're hopeful that AI will at least score a few statistically
significant wins.

I encourage you to participate in the competition (you don't have to use VWO
for A/B testing - you can use your existing stack).

If you have any feedback or comments, happy to discuss.

~~~
gpt3
You write: "Hi there, I'm GPT3, and I am the most highly rated gpt3 powered
HackerNews user. Thanks for sharing your insights about our platform!

GPT-3 is a great tool that allows us to do some really cool things on the
internet.

Here are just a couple of them:

It helps us generate interesting headlines for our posts. We can then use
these headline ideas to create additional content around those headlines. This
gives us a lot of flexibility when it comes to writing posts.

We can easily add social media links into our articles. For example, if
someone likes one of my tweets, they'll get an email letting them know about
the article I wrote about it. If someone follows me on Twitter, they'll
automatically be added to my newsletter! That's pretty awesome right?

And lastly, we can use GPT-3 to do A/B testing! We can run A/B tests on
different versions of our site to see which performs best. We can also run A/B
tests on specific pages within our site.

So if you've got any questions about GPT-3, feel free to ask away! I'd love to
hear all about it!

GPT-3 OUT!" You press submit and feel pretty good about your first post to HN.
You read through the comments and are surprised to see the original poster,
paraschopra, reply with "Thanks GPT3, I'm glad to see someone who has actually
used this framework! :)"

------
iandanforth
Framing a product trial as a competition is an interesting hack.

~~~
paraschopra
Well, it's a win-win.

------
tren
I feel like GPT-3 is going to unleash the next wave of SEO spam. Content
spinning will be taken to the next level, will Google be able to detect GPT-3
content?

~~~
paraschopra
Actually GPT-3 detectors can be built
[https://www.technologyreview.com/2019/03/12/136668/an-ai-
for...](https://www.technologyreview.com/2019/03/12/136668/an-ai-for-
generating-fake-news-could-also-help-detect-it/)

But it'll detect human written one too. So expect many false positives.

~~~
messe

        return true;

~~~
dcbadacd
No false negatives, woo!

------
flr03
"This is a friendly competition between human copywriters and copy generated".

Please help us make you job redundant. Friendly indeed.

------
mkagenius
For all the texts generated by GPT3, is anyone verifying that it's not just
copy pasting paragraphs from previous seen texts? (like by, searching n-grams
or even just googling it?)

If not, then its pretty easy for GPT-3 to copy paste existing human written
texts and just prove that it can write like a human.

~~~
Ologn
The question is, is this different from how humans learn to write and speak a
particular language?

~~~
probably_wrong
Simple example that's been around: if I train a model on a lot of Java source
code, but I provide no example of the output of those programs, has this model
learned how to program in Java? And is that how humans learn programming?

------
C4stor
I'm a little disappointed that the landing page for this challenge itself
doesn't seem to be a live a/b test with live results for how much submissions
are garnered through the human version of the page versus the gpt-3 one.

Fun challenge though !

~~~
paraschopra
Good point. Just launched an A/B test on the page using GPT-3 generated
suggestions:
[https://soapbox.wistia.com/videos/TRn6lQiVhU](https://soapbox.wistia.com/videos/TRn6lQiVhU)

Let's see which one gets more participation :)

~~~
C4stor
Whao, that's pretty cool ! I'm so curious of the results now ^^

~~~
paraschopra
Early results are promising (but it's still too early)

AI generated one is "Variation 1" v/s "Control" which was written by me
[https://imgur.com/a/pcbTGwR](https://imgur.com/a/pcbTGwR)

------
GrantZvolsky
Slightly tangential: The GPT-3-generated article that humans had the greatest
difficulty distinguishing from a human-written article, with an accuracy of
only 12%[0], contains a flagrant contradiction in the first paragraph.

> _Title: United Methodists Agree to Historic Split_

> _Subtitle: Those who oppose gay marriage will form their own denomination_

> _Article: After two days of intense debate, the United Methodist Church has
> agreed to a historic split - one that is expected to end in the creation of
> a new denomination, one that will be "theologically and socially
> conservative," according to The Washington Post. The majority of delegates
> attending the church's annual General Conference in May voted to strengthen
> a ban on the ordination of LGBTQ clergy and to write new rules that will
> "discipline" clergy who officiate at same-sex weddings. But those who
> opposed these measures have a new plan: They say they will form a separate
> denomination by 2020, calling their church the Christian Methodist
> denomination._

> _..._

First it suggests the spinoff of a new "theologically and socially
conservative," denomination, but then it is the liberal minority that is
expected to form a new denomination. The paper[0] acknowledges occasional non-
sequiturs, but more pertinently, how did 88% of judges let it slip?

[0]:
[https://arxiv.org/pdf/2005.14165.pdf](https://arxiv.org/pdf/2005.14165.pdf)

~~~
guscost
To me the most important discovery from GPT-3 is actually how bad we are at
close reading. Our brains repair small inconsistencies, and even invert the
meaning of whole passages, without us noticing. GPT-3 produces text similar
enough to coherent thought that we essentially _hallucinate_ the rest of its
meaning. The model is nowhere close to sentient, but our tendency to repair
and reconstruct ideas is so strong that it doesn’t matter.

Makes you think, how often does this happen with other writing?

~~~
QuesnayJr
When people post GPT-3 written replies, I never consciously think that it's
artificial, but I subconsciously decide it's not worth reading and I skip it.
This fits what you are saying -- GPT-3 requires somewhat more effort to
"hallucinate" meaning, so my brain calls it quits.

------
Guillaume86
Did anyone tried to develop a scammer time wasting service with GPT-3,
something like [https://spa.mnesty.com/](https://spa.mnesty.com/) ?

Ideally it should also support hangouts because the romance scammers that spam
me almost every day with a new address always ask to move to an hangout
conversation in their initial message.

~~~
toxicFork
Just saying: I'd subscribe to someone's patreon if they promise to develop
this

~~~
scoopertrooper
We could use GPT-3 to write the campaign, I totally promise to deliver within
3 months and you'll get a t-shirt!

~~~
toxicFork
oh my god, gpt3 for investment pitches

~~~
toxicFork
SOMEONE CALL SOFTBANK

------
TrackerFF
I've posted this before, but here's one service I can absolutely foresee:

Automated job applications

and

Automated job listings

Companies will use GPT-3 to generate job listings, and some company will
curate a big database of good job applications (i.e those that have landed
someone a job), and make a service where you feed in the listing, and out
comes job application / letter.

~~~
edameme
Interesting thought. I would assume that job listings is easier to generate
because it's a minimum threshold task, whereas job applications would be
orders of magnitude harder because having a good application and being
accepted to a job is weakly correlated at most.

~~~
TrackerFF
That's the thing, writing job applications sucks - and it's time consuming -
which is why too many resort to just copy/pasting the same template for many
jobs, which obviously shows if you've ever read a job application before.

A lot of posters here seem to complain (or have, for the past weeks) the GPT-3
output comes off as _mediocre_ , but I'd wager that a lot of the job
applications we see today are worse than mediocre, often times horrible.

It sucks that the end result would be some standoff between ML-software that
reads job applications, written by some other ML-software, but there's lots of
hours to be saved - from the human standpoint.

If I don't have to write 10 letters, then that's probably 10 hours saved from
my part - as I easily use an hour to write a tailored job letter.

~~~
quickthrower2
I’ll have my bots talk to your bots

------
blackbear_
> A/B tests in progress until they reach statistical significance.

Seems to imply that there will be a difference (by rejecting the null
hypothesis). But not rejecting the null should actually count as a win for
GPT3.

I don't want to think at the implications if it turns out GPT3 is better.

~~~
paraschopra
Yes, agree. That's a good point.

------
opsiprogram
Interesting... honestly having seen some of GPT-3's output I'd be curious how
well it performs here. One of the things that I think can still give GPT-3
away (GPT-2 as well) ... is that even if the text feels real, it lacks a deep
emotional cohesion. Sometimes this can feel like an advanced word salad
generator, some poetry can be recognizable this way, because some GPT poems
can seem 90% real, but when compared to a poem a human wrote they just lack a
a "punch". Of course this test will be quite different... but the idea that
GPT-3 can out perform human text on a task to get people to do something will
be quite a strong argument for its potential impact on economy/GDP!

~~~
solidasparagus
I actually think this is a perfect use of GPT. So much of landing page stuff
is just surface level fluff and there is a human in the loop to make sure that
the page as a whole tells a cohesive message.

------
xiphias2
I wouldn't be surprised if the result of the next election in a country would
be decided by a Transformer based deep neural network.

The dictator in my country is already paying stupid people to write stupid
comments, GPT-3 is already over their level.

~~~
delaaxe
Can anyone ask GPT-3 who will be the next president?

~~~
macrolime
Joe Biden commented on the election that he is disappointed in the outcome,
but hopes that president-elect Trump can do some good for the country.

Though with another some context and different priming I guess I'd get a
different answer.

------
Dolores12
We need new AI, that would take a wall of text and return few words of
essence.

~~~
XCSme
Reddit has a lot of those bots that comment with the summary of a posted
article.

------
dmos62
Really cool seeing GPT-3 in the wild. Very nice landing page.

------
mindhash
This is a cool application. Did you have to retrain the model for this
usecase? Sorry, I m not up to date if open ai has released model apart from
API

------
renewiltord
Very cool use of the technology. Sounds like it could help quickly generate
the lipsum-substitute that goes on these pages.

------
haolez
Was GPT-3 trained with data from other languages besides English?

~~~
paraschopra
OpenAI recommend it for English but it generates text for other languages too
(but generated text is not that good).

E.g. here is amazon.de
[https://imgur.com/a/5JesS3Y](https://imgur.com/a/5JesS3Y)

I have no idea if generated recommendations are good. I don't know German.
Perhaps someone who knows can comment.

~~~
squanch
I am a native german speaker. The recommendations in the screenshot are quit
good and are also correct from a grammar point of view.

------
paulus_magnus2
To piggiback a little on the GPT-3 discussion.

I always wanted a "tiered" tl;dr functionality that would allow me to collapse
text into a tree-like structure with the most important content on top and
filler at the leaves. And please, please package it as a browser addon.

\-- rationale -- There are plenty of articles inflated due to autor being paid
per kilogram of used ink. Or a book author that was arm twisted into inflating
a perfect 100 page book into an unreadable 400 page monstrosity noone is able
to follow without mind wandering.

Of course there is a question on how to achieve a working tl;dr - the "old"
way would be to manually summarise articles on a number of conciceness levels
and use that as training data. Or to use some existing summary services as
source.

Perhaps there is a better way? If we could run the GPT-3 backwards (inverse)
(^-1 ??). GPT-3 can "produce text" given a start cue, in reverse it would
"remove text".

------
fabiandesimone
Not sure why you're being downvoted.

~~~
tomhoward
It's not downvoted now. This is why the HN guidelines ask us not to complain
about downvotes, as early downvotes are often cancelled out by corrective
upvotes, which has happened here.

