
Show HN: Quickly build a production ready image classifier from your images - matryer
https://blog.machinebox.io/how-anyone-can-build-a-machine-learning-image-classifier-from-photos-on-your-hard-drive-very-5c20c6f2764f
======
natch
I think it would be worthwhile for HN to come up with some sort of rule or
strong guideline at least that says if you are going to post a Show HN item,
you should pop in and answer questions in a timely manner. There are many
great questions here going unanswered 18 hours later so far.

ObML: It would be nice if the tutorial covered categories that aren't already
in the standard big generic models. For example, instead of dog and cat, use
something new that doesn't exist in the model, and show how to train for that.

Also as feedback, the terminology "submit your images" clashes with the
statement that data doesn't leave your own system. Does the latter statement
mean "in normal use after training is completed"? As opposed to during
training, where the images do leave your system? There is wiggle room for
either meaning here, so more clarity would be helpful.

------
aisofteng
>Teaching essentially involves opening each image, converting it into a base64
string, and submitting it to the /classificationbox/teach API endpoint in a
request that resembles this.

Why base64?

~~~
spooneybarger
My assumption would be because base64 is a very common format to create a
textual representation of binary data.

~~~
samhunta
Yes but why represent data this way when HTTP can already handle binary? Does
base64 actually compress binary data enough to make these optimized requests?
I'm genuinely asking because I have not googled it yet.

~~~
JimDabell
Base64 doesn't compress at all, in fact it's only 75% as efficient as binary,
meaning that encoding a file into Base64 increases its size by a third.

Base64 uses a 64 character alphabet (hence the name), so you can only
represent 64 different values within each byte. 64 == 2^6, so basically you
are using 6 bits out of every byte and losing the other 2 bits.

HTTP can use compression to reduce the size of the Base64 representation, but
assuming the images were already using a compressed format, all this is going
to do is mitigate the inefficiency, it's not going to be more efficient
overall.

~~~
gregmac
It's worth noting that for highly-compressible types of files, if you do need
to use base64 for some reason it's still more space-efficient to compress
first (before base64). If you can compress to 2/3 the size, base64 will only
bring it back to the original size again.

This mainly applies when you have to transmit binary data using a text-based
format like JSON or XML. I've also used this before to build a shell-script-
based installer, where a compressed archive is base64'd in a variable.

------
dmitrygr
It is interesting that the words "production ready" can be used in connection
to image classifiers. Google is unarguably the leader of this in the world,
and yet Google photos search for cats shows me dogs and a search for dogs
shows me cats and sometimes rats. We're probably still a few years away from
production-ready image classifiers of any sort.

~~~
omalleyt
Google doesn't run those images through a pixel-level ConvNet, Google's algo
uses a combo of the HTML info surrounding the image and the image's
popularity. Running every image on the web through a CNN would be
computationally infeasible

ConvNets for image classification (like what the article is about) often
achieve better-than-average-human performance

~~~
akvadrako
_> better-than-average-human performance_

There is false, except under some contrived tests. NNs still aren't very good
at handling rotations, can be confused by changing single pixels and often
recognise shapes out of complete static.

------
pitchups
This looks great!

But not entirely clear when looking at the site, what would be the pricing to
classify say 20,000 images from 20 different folders?

------
taeric
This is neat, but sadly useless for what I'd imagine many of us have. Pictures
of people we want to classify. Possibly of events.

Classifying events is often most easily done by looking at the metadata of the
image, so converting to base64 just the image is probably a bad idea.
(Specifically, I'd expect the date/time and possibly location data to be of
help.)

Classifying the people is a helluva challenge. As someone in a family of 6, I
don't think machine classification has much of a chance on getting pictures of
my kids right. Especially if it doesn't have the dates on the pictures. I've
seen some pictures of my oldest daughter that I would swear are my youngest
son.

~~~
Iv
Give it a try.

Many AI classes talk about the fact that metadata are less often relevant than
we think and are often embedded in the image in a way the classifier easily
learns.

In an event, the classifier will probably spot a few architectural elements
that will make useless to rely on the time of the picture.

On my first attempt at image classification with deep learning I was surprised
at how a very badly design and not custom-tailored network managed to
outperform a classifier that had several days of work into it.

I'd say give it a try anyway, you could be surprised.

~~~
taeric
As I said, there are some pictures where the only thing in them are my kids.
Lighting was such that they are literally all that is in the picture. And I'm
talking about 3 month old pictures. They look identical. Not surprisingly,
even the well trained classifiers of Google and Amazon get them wrong.

None of this was to say this isn't impressive. It truly is. I don't know why
my bar for the magic of image recognition is so high.

------
smortaz
Looks great! Uploaded to

[https://notebooks.azure.com/smortaz/libraries/hvasslab-TF-
tu...](https://notebooks.azure.com/smortaz/libraries/hvasslab-TF-tutorials)

ready to clone & run...

------
jvm_
Would it be sensitive enough to pickout differences in wear patterns on the
backs of playing cards? We play Rook daily at lunch, and there's a kitty, and
we always want to know what the cards were. If you took a picture of the backs
and let machine learning figure out what cards they were, we could know after
the hand is done.

------
statictype
Do I need a lot of images for this to work? One of the biggest barriers to
entry in machine learning is not having enough sample data to train on. I
don’t have 10,000 images of my family and friends. Would this sill be useful?

------
amelius
Is that Docker image capable of utilizing a GPU?

