Hacker News new | past | comments | ask | show | jobs | submit login
Twss.js (github.com)
305 points by nezzor on Jan 12, 2012 | hide | past | web | favorite | 47 comments

Now I'm just waiting for someone to make a Twitter bot that randomly samples tweets and responds to them with this...

Already done (and undone) it would seem:


I tried to do this last year, but every single twss-related handle was taken. Don't believe me, look for yourself. I can't wait for the day when Twitter starts reclaiming unusued/squatted names (and have a pay for service that let's you claim it for life.)

There's this, too.


It randomly samples tweets for training, but it doesn't write back automatically. If you find one you like on the site, though, you can tweet it.


Would interest me much more if you open sourced it, are you already decided on not doing so?

edit: I see you do actually have it at https://github.com/mattspitz/yepthatswhatshesaid - is it up to date? Thanks for sharing!

http://twitter.com/Michael_Scott here's agent Michael Scarn's friend, although not very active recently.

I think your negative sample set is a little biased. Since all the phrases start with verbs like "was in the car" or "went to the park", these kinds of phrases are given lower probabilities.

For example:

    > twss.prob("was on a stiff pole");
Only 1.6% chance of that's what she said?!?

EDIT: Counter example:

    > twss.prob("that's one stiff pole");

An interesting (and funny) exercise.

For those interested in neural networks and Bayesian classifiers check out the brain.js library: http://harthur.github.com/brain/

It works in both node and the browser.

Well that open source project left me satisfied and smiling

that's what she said

A while back I was interested in implementing a much less naive algorithm for classifying TWSS expressions, based on this [1] paper. Never actually got around to finishing the work.

Interesting problem though, and nice work.

[1] - http://www.cs.washington.edu/homes/brun/pubs/pubs/Kiddon11.p...

Soon to be implemented in all IRC bots the world over

There was a similar project before that implemented this for an IRC bot. You can also train this bot by telling it what jokes are good and bad ones. :)

https://github.com/jsocol/scottbot http://coffeeonthekeyboard.com/say-hi-to-scottbot-594/

Our IRC bot had this feature before it was popular...

This is probably the first time I've understood node.js.

I was wondering if anyone knew of a place where I could learn about this stuff in general. I know nothing about unigrams, bigrams, trigrams, tf-idf, Bayesian filtering, etc. Maths - while not awful - is not my strongest point, but I think I could grok a well-written tutorial to this stuff (with code examples!).

I was hoping/wondering if anyone knew of sites I could start learning about this from? I find this very interesting, and I'm sure it could be highly useful and applicable to many different types of problems...

DanielRapp: in file twss.js/lib/classifier/knn.js, number of NN should be odd to prevent ties [EDIT: also, NN should be large enough to prevent over-fitting; small NN would mean that the difference (decision boundary) between twss and not-twss is highly non-linear; you need to implement cross-validation to find best NN]

Note to self: machine learning using node.js; what's the speed of calculations, what's the memory management in node.js, can I find pure JS implementation of SVM?

Thanks. I did do a simple analysis[1] and changed it[2] to 5 neighbors. Though when I look at the graph now, I see that 4 is actually the optimal value..

Swedish graph (täckning = recall): http://cl.ly/BJRa/pr.png

[1] https://github.com/DanielRapp/twss.js/blob/master/lib/analyz...

[2] https://github.com/DanielRapp/twss.js/commit/3cfcda785583084...

Why don't you try 10-fold CV (http://en.wikipedia.org/wiki/Cross-validation_%28statistics%...) - the graph might drastically change. Here is example how to do it: https://onlinecourses.science.psu.edu/stat857/book/export/ht...

If precision & recall monotonically go down when increasing NN then it means you don't have enough training data.

I'm still looking for a classifier that will take a phrase, determine if and what the "In Soviet Russia X Y you" response would be.


I don't think that would be a classifier, or at least not reasonably. You could have "In Soviet Russie X Y you" for each X,Y as your classes, but that would be unreasonable.

Yakov Smirnoff is a structural joke. You would need to parse sentences, pattern match, transform it, and then do some kind of regression on the phrase to get its humor quotient.

The Stanford Parser for structural parsing, then some custom pattern matching and transforming code, might get you somewhere.

A first approximation might just taking the simple permutation's such as

You can X Y <-> Y X you

and find the probability that it is an english sentence

Has anyone thrown this on a web server with a simple interface?

It seems to be returning a lot of false positives, at least with the default options. "Good morning" = true, "How are you?" = true

Based on my linear algebra homework:

> "You need to use Gauss-Jordan elimination."

> That's what she said.

"Capitalism without failure isn't capitalism."

That's what she said.

I tried adding dropdowns to change the algorithm and threshold, but changing to knn crashes out ("ReferenceError: trainingPrompt is not defined"), so scrapped that and just left the demo running the defaults.

MRI's have shown that humans are able to do this because of a dedicated site in the brain called "Scott's region". Once activated, this linguistic region is constantly searching for linguistic cues, surfacing signals to our conscious thoughts when the cues are strong enough.

I've seen a Siri proxy TWSS implementation: http://www.youtube.com/watch?v=p4LamngB070

We made our IRC bot respond to TWSS jokes, but ours was just a dumb match from a set of few thousand jokes that we scraped from offline. You can look at the code at: https://github.com/jfriedly/jenni

Now that I took Stanford's Machine Learning class though, I think I might just duplicate what this guy did for our bot.

The training data is pretty funny. I suppose he collected it from an online TWSS thread.


You seem to have been hellbanned 132 days ago.

While it seems on the surface like a waste of time (albeit amusing one), I actually expect this is a great project to learn from because of its use of Bayesian classifiers.

In other words, I'm TOTALLY going to be using this on my next project.

Great start -- interesting to watch it go vs the twitter stream. (If you restrict to < 8 word tweets)

Looks like this could easily be integrated into a script for Hubot

I've never had a script do that for me


What exactly is Node.js specific about this?

Using exports, NPM package, and an executable that depends on /usr/bin/node? What's your point?

I just hate when people release JavaScript libraries that needlessly depend on specific platforms. For awhile that dependency was usually jQuery, then with the rise of server-side JavaScript it was the DOM in general, now it appears to be Node.js.

Just write "X for JavaScript", dammit.

That said, this doesn't appear to have any Node.js specific dependencies, it could be used in any CommonJS environment.

The reason I chose node over "browser-js" is because it was originally going to be a Twitter bot, but decided to simplify the GitHub repo into just a node module to make it more useful.

But you're totally correct. This could've easily be written in any language.

The source is open and it is really easy to port it to the browser, so this shouldn't warrant a complaint. Each one works with what he feels more comfortable with.


I approve this post.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact