Hacker News new | past | comments | ask | show | jobs | submit login

That's like saying "why don't you use algoritms and code". Like, sure, but what is it you're proposing? What features would you learn from and match against?

(For those unfamiliar with algoritms and code as solution, it's a reference to this: https://www.reddit.com/r/ProgrammerHumor/comments/5ylndv/so_... )




Actually we have implemented something like that for HTTP requests. Features would be: IP (first 3 octets are probably enough), posting time, length, time to solve captcha, time between clicks, country where the IP is located, post contains certain words (can be learnt from spam posts), does the post contain a link(y/n)

I think I would start with these, probably looking into what other people are doing.


An ideal machine learning implementation would also need the context, such as the original post itself, parent comment(s), other comments in the thread, etc.

It can be quite difficult than one might think. For example, now that we are talking about spam, the word "Viagra" shouldn't block my comment, even though my parent post doesn't mention the word or in a situation where nobody else mentioned it.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: