Hacker News new | past | comments | ask | show | jobs | submit login

Those are used. Search for minimum description principle and entropy based classifier. The performance is poor, but it is definitely there and really easy to deploy. I have seen gzip being used for plagiarism detection as similar text tends to compress better. Use the compression ratio as weights on spring model then for visualisation. Also works with network communication metadata ...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: