
Comparison of image moderation APIs - mohi13
https://dataturks.com/blog/image-moderation-api-comparison.php
======
speeq
It would be nice if someone could compare these commercially available APIs
with Yahoo's open_nsfw model in terms of accuracy:
[https://github.com/yahoo/open_nsfw](https://github.com/yahoo/open_nsfw)

I'm currently building an API wrapper around it and running it on a Hetzner
server with a GTX 1080 - prediction takes about 0.25 seconds and while I
haven't optimised it for parallel execution, I think it should be able to
handle at least +10 images/sec comfortably. I'm also testing video moderation
by using ffmpeg to slice the video into screenshots and predicting the
min/avg/high scores.

Moderating 25 million explicit images using Google Cloud Vision would cost
around $19,500/mo vs €99/mo on Hetzner.

~~~
mohi13
Makes a lot of sense, actually its really difficult to get a large enough
dataset for moderation tasks to make a decent inhouse model for a fair enough
comparison.

Sure, we can try scraping that from pornhub etc but fee then the negative
classes would be very domain specific, using stock images may not provide a
good measure.

Also, its really weird to assign such a task to any of your employees, feels
kinda strange :)

~~~
speeq
Yahoo's model could be fine-tuned:
[http://caffe.berkeleyvision.org/gathered/examples/finetune_f...](http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html)

Yeah, it's definitely not a nice task but what's stopping someone (well,
besides potential legal issues) from using these commercial APIs to create
datasets programatically and training a cloned model from that?

I'm curious what the profit margins are on these APIs because I think they are
way overpriced.

------
konradzikusek
I really hope you haven’t picked a dataset that features only women nudity,
but sample images suggest otherwise:
[https://dataturks.com/projects/Mohan/NSFW(Nudity%20Detection...](https://dataturks.com/projects/Mohan/NSFW\(Nudity%20Detection\)%20Image%20Moderation%20Datatset)

~~~
mohi13
:)..There are a few examples of males as well.

~~~
konradzikusek
No, there are not.
[http://jsbin.com/bewinalezu/1/edit?html,js,output](http://jsbin.com/bewinalezu/1/edit?html,js,output)

------
yaseen-rob
I tried out various image labeling APIs, including Google Vision (Safe Search)
for exactly this use case (moderation). I was honestly astonished at the
pricing of these APIs. Google is somewhere at 1.50€ for 1000 images which is -
imo - very expensive. I tried out the default models that come with Tensorflow
but well, they are trained on scientific datasets which typically involve
species and flowers - no luck there either. Any good tips for pre-trained
models that solve this (for tensorflow)?

~~~
dannyw
Download 1000 NSFW images, 1000 racy images and 100 completely SFW images.
Train a model and publish it on GitHub?

~~~
mohi13
Would be really interested to see the results of this.

Also, consider using Dataturks to create and host the dataset.

------
symisc_devel
For those interested, PixLab let you analyze 50K images or video frames via
its /nsfw endpoint for $25. They charge $0.9 per 1000 requests after you reach
this quota.

[https://pixlab.io/cmd?id=nsfw](https://pixlab.io/cmd?id=nsfw)

[https://pixlab.io/pricing](https://pixlab.io/pricing)

------
jaequery
what is the verdict? i been just scrolling down to see which is best and cant
find any.

------
edent
The sample images shown only feature white women. Is that a limitation of the
dataset?

~~~
specializeded
Tila Tequila is Vietnamese, and features in the majority of the sample images
shown.

~~~
ummonk
She's also a white nationalist and neo-Nazi...

~~~
invalidusernam3
Not relevant to the conversation

------
mohi13
BTW we had a question in general to anyone who can help us, Does hosting such
a dataset cause issues with SEO etc? Anything else we should be aware of?

~~~
judge2020
Probably wouldn't be wise to host a dataset like this with "example" images
embedded or linked where a search engine could find them.

~~~
mohi13
thanks, will see how can I block crawl on this dataset. BTW how does it hurt
exactly, couldn't find much except in case of safe-search mode.

