Hacker News new | comments | ask | show | jobs | submit login
Show HN: Quickly build a production ready image classifier from your images (machinebox.io)
142 points by matryer 11 months ago | hide | past | web | favorite | 20 comments

I think it would be worthwhile for HN to come up with some sort of rule or strong guideline at least that says if you are going to post a Show HN item, you should pop in and answer questions in a timely manner. There are many great questions here going unanswered 18 hours later so far.

ObML: It would be nice if the tutorial covered categories that aren't already in the standard big generic models. For example, instead of dog and cat, use something new that doesn't exist in the model, and show how to train for that.

Also as feedback, the terminology "submit your images" clashes with the statement that data doesn't leave your own system. Does the latter statement mean "in normal use after training is completed"? As opposed to during training, where the images do leave your system? There is wiggle room for either meaning here, so more clarity would be helpful.

>Teaching essentially involves opening each image, converting it into a base64 string, and submitting it to the /classificationbox/teach API endpoint in a request that resembles this.

Why base64?

My assumption would be because base64 is a very common format to create a textual representation of binary data.

Yes but why represent data this way when HTTP can already handle binary? Does base64 actually compress binary data enough to make these optimized requests? I'm genuinely asking because I have not googled it yet.

Base64 doesn't compress at all, in fact it's only 75% as efficient as binary, meaning that encoding a file into Base64 increases its size by a third.

Base64 uses a 64 character alphabet (hence the name), so you can only represent 64 different values within each byte. 64 == 2^6, so basically you are using 6 bits out of every byte and losing the other 2 bits.

HTTP can use compression to reduce the size of the Base64 representation, but assuming the images were already using a compressed format, all this is going to do is mitigate the inefficiency, it's not going to be more efficient overall.

It's worth noting that for highly-compressible types of files, if you do need to use base64 for some reason it's still more space-efficient to compress first (before base64). If you can compress to 2/3 the size, base64 will only bring it back to the original size again.

This mainly applies when you have to transmit binary data using a text-based format like JSON or XML. I've also used this before to build a shell-script-based installer, where a compressed archive is base64'd in a variable.

No, base64 is normally ~30% larger than binary to represent the same data.

It is interesting that the words "production ready" can be used in connection to image classifiers. Google is unarguably the leader of this in the world, and yet Google photos search for cats shows me dogs and a search for dogs shows me cats and sometimes rats. We're probably still a few years away from production-ready image classifiers of any sort.

The definition of "production ready" for machine learning is when it's at an accuracy level that is high enough that it becomes "useful" for some sense of the word. Machine learning models will almost never achieve 100% accuracy, so it comes down to what a "useful accuracy" level is to you.

Google doesn't run those images through a pixel-level ConvNet, Google's algo uses a combo of the HTML info surrounding the image and the image's popularity. Running every image on the web through a CNN would be computationally infeasible

ConvNets for image classification (like what the article is about) often achieve better-than-average-human performance

> better-than-average-human performance

There is false, except under some contrived tests. NNs still aren't very good at handling rotations, can be confused by changing single pixels and often recognise shapes out of complete static.

It DOES run your photos that you store in google photos through (at least according to public statements)

This looks great!

But not entirely clear when looking at the site, what would be the pricing to classify say 20,000 images from 20 different folders?

This is neat, but sadly useless for what I'd imagine many of us have. Pictures of people we want to classify. Possibly of events.

Classifying events is often most easily done by looking at the metadata of the image, so converting to base64 just the image is probably a bad idea. (Specifically, I'd expect the date/time and possibly location data to be of help.)

Classifying the people is a helluva challenge. As someone in a family of 6, I don't think machine classification has much of a chance on getting pictures of my kids right. Especially if it doesn't have the dates on the pictures. I've seen some pictures of my oldest daughter that I would swear are my youngest son.

Give it a try.

Many AI classes talk about the fact that metadata are less often relevant than we think and are often embedded in the image in a way the classifier easily learns.

In an event, the classifier will probably spot a few architectural elements that will make useless to rely on the time of the picture.

On my first attempt at image classification with deep learning I was surprised at how a very badly design and not custom-tailored network managed to outperform a classifier that had several days of work into it.

I'd say give it a try anyway, you could be surprised.

As I said, there are some pictures where the only thing in them are my kids. Lighting was such that they are literally all that is in the picture. And I'm talking about 3 month old pictures. They look identical. Not surprisingly, even the well trained classifiers of Google and Amazon get them wrong.

None of this was to say this isn't impressive. It truly is. I don't know why my bar for the magic of image recognition is so high.

Looks great! Uploaded to


ready to clone & run...

Would it be sensitive enough to pickout differences in wear patterns on the backs of playing cards? We play Rook daily at lunch, and there's a kitty, and we always want to know what the cards were. If you took a picture of the backs and let machine learning figure out what cards they were, we could know after the hand is done.

Do I need a lot of images for this to work? One of the biggest barriers to entry in machine learning is not having enough sample data to train on. I don’t have 10,000 images of my family and friends. Would this sill be useful?

Is that Docker image capable of utilizing a GPU?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact