Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A tool to make a bot that speaks like you, learning from WhatsApp chats (github.com)
134 points by spandan-madan on Nov 27, 2018 | hide | past | favorite | 39 comments

To everyone - I literally came back home from lab to find this blown up. Little overwhelming frankly.

I built the bot as a gift for a friend, and didn't really see the Black Mirror angle, even though I have seen the series.

Anyway, if someone is interested in building extensions of this, please write to me at smadan@mit.edu, I'd be happy to collaborate and guide :)

Better be careful with that...

From "Bicycle Repairman", by Bruce Sterling (1996) (This is a spoiler, BTW!):

"The mook speaks just like the Senator did, or the way the Senator used to speak, when he was in private and off the record. The way he spoke in his diaries. As far as we can tell, the mook was his diary.... It used to be his personal laptop computer. But he just kept transferring the files, and upgrading the software, and teaching it new tricks like voice recognition and speech-writing, and giving it power of attorney and such.... And then, one day the mook made a break for it. We think that the mook sincerely believes that it’s the Senator."

Oh, and just note that this was way before Black Mirror.

I mean, off topic, but it's not like any (citation needed) of the ideas in Black Mirror are original; it repackages existing ideas and tropes in a modern context. It does an excellent job of that, but it's a bit disingenuous to use it as the golden standard for whacky scifi prophecies.

It would be nice to create a bot that would learn how I write my shitty code to be able to write code for me when I retire and eventually die. Only to be resurrected in JavaScript.


that could be made recursive ;)

Sounds interesting. I'll report on my results here once I have uploaded my otherwise end to end encrypted private and personal conversations into this unknown script...

It gets better. Soon it will imitate you without you uploading your conversations, because enough people around you did it (making themselves guilty of private conversation disclosure, at least 6 months emprisonment in France).

It’s not that privacy is an endless pursuit, it’s more that the governments enjoy it so much that they don’t really work on preventing it.

Pretty crazy. It's not synthesizing new sentences, but it's basically a huge index of yourself. You can ask it "what's your favorite food?" and you get a response back.

This makes private conversations that much more valuable to ad companies: if WhatsApp sells all your conversation packaged into a model similar to this, most of your personality is easily summarised to be targeted.

This is very interesting.

I had been lurking with an idea where one could create a bot, which is a clone of yourself. Bot learns/trains by hearing what you say and how you say. When you talk with the bot, it replies back with your own voice, basically Siri with your own voice and attitude. It would be the best thing one could leave behind after his death.

Is it possible to pull off something like this?

This is literally a Black Mirror episode -


Synopsis: "The episode tells the story of Martha (Hayley Atwell), a young woman whose boyfriend Ash Starmer (Domhnall Gleeson) is killed in a car accident. As she mourns him, she discovers that technology now allows her to communicate with an artificial intelligence imitating Ash, and reluctantly decides to try it. "Be Right Back" had two sources of inspiration: the question of whether to delete a dead friend's phone number from one's contacts, and the idea that Twitter posts could be made by software mimicking dead people."

I understand the value of providing hypothetical situations, but the constant mention of Black Mirror episodes with very little other substance here is getting tiring. Without having seen Black Mirror, that synopsis doesn't add much to the conversation other than "somebody made a tvshow/movie about that". Other than the fact that a similar situation was explored, what new conclusion did the episode reach that warrants mentioning?

Mainly that it was creepy and the widower realized she didn’t really want a dead boyfriend/android thing.

It's an old idea. The premise of caprica was that you could upload your experiences and that would reconstitute your soul (or would it?).

That's a mass media instantiation of the premise of I am a strange loop, by douglas Hofstadter, which is to be honest a tome lamenting the passing of his wife

This is the story behind Replika.

"Three years ago, Kuyda hadn’t intended to make an emotional chatbot for the public. Instead, she’d created one as a “digital memorial” for her closest friend, Roman Mazurenko, who had died abruptly in a car accident in 2015. At the time, Kuyda had been building a messenger bot that could do things like make restaurant reservations. She used the basic infrastructure from her bot project to create something new, feeding her text messages with Mazurenko into a neural network and creating a bot in his likeness. The exercise was eye-opening. If Kuyda could make something that she could talk to—and that could talk back—almost like her friend then maybe, she realized, she could empower others to build something similar for themselves." https://www.wired.com/story/replika-open-source/

Or a more in depth read: https://www.theverge.com/a/luka-artificial-intelligence-memo...

You can talk with the deceased friend: https://itunes.apple.com/us/app/roman-mazurenko/id958946383?...

As mentioned elsewhere in the thread, Be Right Back (black mirror ep with a product that is exactly this)

Lol, I love seeing that no idea is completely unique. I have been thinking about it a lot over the last month. I am honestly wondering what your ethical concerns are with this "digital replication".

With bot-like digital replication of oneself the possibilities of things that can be achieved are huge and a lot of ways on how such tech could be misused.

Along with the other stories suggested here, Alistair Reynolds' Revelation Space features "beta simulations" which are essentially the same thing: reconstructions of people based on recordings of them and their speech/writing/interactions.

(There are also "alpha simulations" referenced which were experimental direct mind uploads. The experiment was never repeated because, due to some deficiency in the process, the uploads went mad.)

Lyrebird (https://lyrebird.ai/) is trying to the voice imitation component.

It's scary that you can't even read their privacy policy or ToS without signing up first!

Great! Seems promising.

Check out MariFlow, a RNN trained to play Mario Kart as the creator would. https://www.youtube.com/watch?v=Ipi40cb_RsI

I'd love that. It'd be like Google Duplex, but save me the effort of having to say "no" to going to things.

I wonder if one could create clones of every user on a site such as hackernews for instance, and create bots for each one, such that when you feed in a random article, comments are automatically generated and replies to comments appear recursively. Comments as a Service.

Poor bastard who stumbles into such a forum never realizing everyone is a bot.

I could probably write one of those with just 'if' statements.

if "facebook" in $Title: print ( "I deleted my facebook years ago and it was the best thing I ever did")

if "uber" in $Title: print("Uber has only $dollars left of runway. They only have $months left before they'll need another rescue")

if "I made" in $Title: print("Why would I use $new_app when I could just use $old_app_from_90s? I really don't like the trend of using $technology instead of $older_technology. I hope $technology dies a cold death at the bottom of a ditch")

I think people have done something similar with reddit actually. They tried to make redit conversations, and well as expected they were pretty funny. Try google searching ou might be able to find it.

Why not learn from normal text-files instead?

Why connect it to a facebook-owned, proprietary service?

If you follow the instructions you’ll discover that it works by reading normal text files.

It doesn't connect to any service, just understands text logs exported from WhatsApp

You can probably skip that module from the code.

A more interesting tool: a tool to stylometry-fuzz one's writing.

I'm getting this error when I try to run it:

Traceback (most recent call last): File "clean_whatsapp_chats.py", line 39, in <module> all_text[-1] += line IndexError: list index out of range

You're supposed to change the name globals as mentioned in the README.md file.

Then you also need to create a `res` directory in the root directory and install the necessary dependencies. Then you also need to change the encoding when you load the serialized data in the preprocessing notebook.

If you’re looking for a quick, fun sci fi read with this concept run amok check out Darknet by Matthew Mather.

Be Right Back

This comment is referencing Be Right Back, a Black Mirror episode featuring this exact idea as a premise.


One of the best episodes of Black Mirror. Truly made me have deep thoughts about myself, life, and where humanity is going. Immortality seems indeed achievable, but not in a biological way our ancestors imagined.

black mirror anyone?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact