Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I trained GPT-2 to write like Goop (goopt2.xyz)
96 points by calebkaiser 58 days ago | hide | past | web | favorite | 25 comments

After trying a few to get a handle on things, I decided to make guesses based solely on if the sentence makes coherent sense. I judged 14 quotes before getting a repeat, having gotten 13 wrong and only 1 right.

Now, if it was 50/50, I'd have said they're just as coherent. But it's 13:1 which suggests to me that there is a bias here. I think the authors intentionally selected quotes which make the least sense out of context to be the Goop quotes and cherry picked GPT2 quotes that happened to sound the most sense without any context. This is supported by the fact that I only had to go through 14 quotes before it started repeating.

If that's the case, and I suspect it is, it's not really dishonest per se, but it is at least sensationalist and potentially misleading. It's asking you to draw conclusions by having you participate in an experiment where it has its thumb on the scales.

Heavy curation is unfortunately rampant in the genre of "wow check out this AI"

There's no issue with heavy curation as long as it's disclosed.

That’s what my bot said.

Be nice, feelings that bots have too.

Looking at the site, and the authors comment, this looks like a quick weekend project made for fun.

I wouldn’t spend too much time reading into it or critiquing it’s honesty. It clear the author is just making fun of Goop through the medium of AI. Making out as anything more than that would be like critiquing The Onion for not providing sources.

I got 8:2 in my favor... It might have been luck but I think I could tell the signs of the machine-written bits. Sometimes it was a small grammatical detail that I thought a pro writer would take into account, other times when it was inference on content, where I did not think Goop would write about that.

I'd imagine that if you didn't know the text might be written by AI you wouldn't recognize anything strange with it at all. I glance over typos and grammatical errors all the time. And then there's the the famous "the the" phenomenon.

Hey HN -- I had a slow weekend and built this thing. It's a little game that displays a sentence, and you decide if you think an ML model generated it, or if it's an actual quote from Goop.

I fine tuned the model (OpenAI's GPT-2) using Max Woolf's gpt-2-simple and by scraping articles from Goop's "Wellness" section. I generated predictions by feeding it a few words from the opening of actual Goop sentences (not sentences it was trained on) and seeing what it spat out.

There aren't many quotes (something like 25) in it right now, but I can add more easily if people have fun with it.

I would be interested in learning how you built it, do you have a blog entry about fine tuning the gpt-2 with the scraped text ? Or can you recommend a blog post that does something similar ?

See the notes and Colab notebook in the source project repo: https://github.com/minimaxir/gpt-2-simple

I am working on a new text generation package which should be even more simple to use.

Thanks! I'm planning on writing something up this week. I can message you when it's done, if you're interested.

Yes, very much so. Good luck and thank you!

I can't believe how bad I am with this! It reminds me of the book "The Most Human Human." Every year, at the competition for the Loebner Prize (a five minute chatbot Turing test), an award is given not only to the AI which convinces the most humans that it is a human (The Most Human Machine), but also to the human who convinces the most humans that they are a human (The Most Human Human). The author of the book tried to set the record for being the highest scoring human in the test. His book dives into just how smart, curious, and empathetic you have to be to show your human-ness. The Goop authors, it seems, are not going to be in the running for "Most Human Humans" any time soon.

100%. Part of what made it so fun to generate the text was that all the usual data points I instinctively use to vet ML-generated text get thrown out the window with Goop. A Goop writer really may have used those seemingly unrelated nouns together and somehow connected them to medicine.

For people like me who didn't know: Goop.com describes itself as "Cutting-edge wellness advice from doctors, vetted travel recommendations, and a curated shop of clean beauty, fashion, and home." The company has been the eye of many criticisms specially for lack of scientific prove of their healthcare advices https://en.m.wikipedia.org/wiki/Goop_(company)

Thanks for this—I probably should have clarified. Goop is a psuedoscience-as-a-lifestyle brand that frequently publishes hilariously terrible medical advice under the guise of "wellness."

Many from here probably also know it from Netflix's "The Goop Lab With Gwyneth Paltrow" and the associated commentary the site has had about it in a few threads.

Last valuation was $250M.

They've gone way beyond "lack of scientific proof" many times. Including dangerous advice like "vaginal steaming". https://www.independent.co.uk/life-style/health-and-families...

When I see things like this I start to wonder if all of deep learning is actually getting this close to an uncanny valley like result, but because it doesn't really understand grammar (or an image, or a road etc.) we don't feel the weirdness of the result. There is just something a bit off about the structure and meaning of what is produced such that reading the ones not written by Goop usually clog up my brain. The Goop ones clog up my brain in a different way of course ;-)

Hey everyone—looks like HN traffic squeezed my server a little more than I expected. It's upgrading now, but might not be available for about 5 minutes. Sorry about that, and thanks for checking it out!

EDIT: All good now :)

Brilliant work.

The way you packaged it can be used to label data to improve ML models. I am thinking of such a link being sent out to numerous people to crowdsource labeling. Even if they answer one question, if a few million people answer it once, that's a few million responses to help train the model.

Is it using any notion of common sense?


This reminds me of the time that MAD Magazine did parodies of entries in the old Spencer Gifts catalogs.

I mean, it can't have been too hard. Anyone can do this. Just open your laptop and take a big watery shit right on the keyboard.

Volia, Content indistinguishably coherent from the original. Bring your own sanitation wipes.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact