Hacker News new | past | comments | ask | show | jobs | submit login
Comparison of image moderation APIs (dataturks.com)
57 points by mohi13 7 months ago | hide | past | web | favorite | 22 comments

It would be nice if someone could compare these commercially available APIs with Yahoo's open_nsfw model in terms of accuracy: https://github.com/yahoo/open_nsfw

I'm currently building an API wrapper around it and running it on a Hetzner server with a GTX 1080 - prediction takes about 0.25 seconds and while I haven't optimised it for parallel execution, I think it should be able to handle at least +10 images/sec comfortably. I'm also testing video moderation by using ffmpeg to slice the video into screenshots and predicting the min/avg/high scores.

Moderating 25 million explicit images using Google Cloud Vision would cost around $19,500/mo vs €99/mo on Hetzner.

Makes a lot of sense, actually its really difficult to get a large enough dataset for moderation tasks to make a decent inhouse model for a fair enough comparison.

Sure, we can try scraping that from pornhub etc but fee then the negative classes would be very domain specific, using stock images may not provide a good measure.

Also, its really weird to assign such a task to any of your employees, feels kinda strange :)

Yahoo's model could be fine-tuned: http://caffe.berkeleyvision.org/gathered/examples/finetune_f...

Yeah, it's definitely not a nice task but what's stopping someone (well, besides potential legal issues) from using these commercial APIs to create datasets programatically and training a cloned model from that?

I'm curious what the profit margins are on these APIs because I think they are way overpriced.

I really hope you haven’t picked a dataset that features only women nudity, but sample images suggest otherwise: https://dataturks.com/projects/Mohan/NSFW(Nudity%20Detection...

Here are all 90 nude images: http://jsbin.com/hufulewaji/2/edit?html,js,output . Truly professional take on the topic.

:)..There are a few examples of males as well.

I tried out various image labeling APIs, including Google Vision (Safe Search) for exactly this use case (moderation). I was honestly astonished at the pricing of these APIs. Google is somewhere at 1.50€ for 1000 images which is - imo - very expensive. I tried out the default models that come with Tensorflow but well, they are trained on scientific datasets which typically involve species and flowers - no luck there either. Any good tips for pre-trained models that solve this (for tensorflow)?

You can use nudebox from https://machinebox.io, is an API in a docker container (disclaimer: I built it)

I've found promising results trying this one out:


Download 1000 NSFW images, 1000 racy images and 100 completely SFW images. Train a model and publish it on GitHub?

Would be really interested to see the results of this.

Also, consider using Dataturks to create and host the dataset.

For those interested, PixLab let you analyze 50K images or video frames via its /nsfw endpoint for $25. They charge $0.9 per 1000 requests after you reach this quota.



what is the verdict? i been just scrolling down to see which is best and cant find any.

The sample images shown only feature white women. Is that a limitation of the dataset?

Tila Tequila is Vietnamese, and features in the majority of the sample images shown.

At the same time, the data set is atrocious.

She's also a white nationalist and neo-Nazi...

Not relevant to the conversation

BTW we had a question in general to anyone who can help us, Does hosting such a dataset cause issues with SEO etc? Anything else we should be aware of?

Probably wouldn't be wise to host a dataset like this with "example" images embedded or linked where a search engine could find them.

thanks, will see how can I block crawl on this dataset. BTW how does it hurt exactly, couldn't find much except in case of safe-search mode.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact