Hacker News new | past | comments | ask | show | jobs | submit login

They acknowledge that NSFW (or pornographic) is hard to define, a la 'I recognize it if I see it'.

But looking at the meager 3 sample images I'm confused about the scoring already. Why is the one in the middle scoring the highest?

The question is an honest one. The two rightmost images seem to be interchangeable to me and are ~boring~: People at the beach. Is this network therefor already trained to include the biases of the creators?




All ML networks are inherently biased towards its creators. My colleague recently described this issue to me as the "Old, white, male" problem. This is why most voice recognition services drastically fail when they are shown foreign accents.


> This is why most voice recognition services drastically fail when they are shown foreign accents

As someone with a broad Norwegian accent: This has gotten massively better over the last few years.

Not that long ago, my local cinema chain started using voice recognition to discriminate between a list of city names, and it would consistently think I said "Birmingham" when I said "London" (!).

These days, both my Amazon Fire and the Youtube app will correctly recognise most things I throw at it, including e.g. names of random Youtube channels that bear no relation to real English words.

It's by no means perfect, but it's getting there. In relation to the "old, white, male" problem (well, I do somewhat fit that), presumably because these systems are now finally trained on huge and varied data sets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: