Its hard to compete with Google on a task like image classification when Google has immense computational resources, tons of data, and hoards of top researchers.
But that is not the case. Which suggest Google isn't that good at productivizing stuff.
When you open an email with images, Google will proxy the request for the tracking image and then cache it. If each user has a unique tracking image, you know when it was opened. Google is not caching the images before the open so you do know that it was opened.
What you possibly lose is repeat opens which might end up with the cached image.
That seems like a loss, but with this change, Google turned on images by default. So you get loads of Gmail users loading images by default rather than the old way where many more people would be loading your message with images off.
MailChimp has a little write-up about it: https://blog.mailchimp.com/how-gmails-image-caching-affects-...
tl;dr: Google's change didn't stop marketers from knowing you opened the email, but did potentially block cookie sending, IP address information, referrer information, etc.
Unfortunately, it's looking more and more like it's going to be a competition on training data and raw computational power, and it's hard to compete with Google's corpora from the web, gmail, captchas, maps, etc. -- not to mention Google's tremendous number crunching resources.
And possibly, though for privacy reasons in a much more filtered down form (see the word2vec Google News dataset). I find it unlikely for private data though.
Could be any kind of niche, from specific industries (utilities, media, transport...) to specific use cases (I don't have many in mind, one could be https://sightengine.com). Ideas welcome :)
For example, Google probably could've made Reader profitable, but it was not on track to be used by a billion people. Relative to Google's overall business, it was uninteresting to them. It might've been a great startup, though.
IBM Watson has a suite of vision APIs available that have some similar features.
For example, the demo at http://vision.alchemy.ai/#demo has example images that demonstrate facial detection and identification, label extraction, object identification, and so on.
Another demo at http://visual-insights-demo.mybluemix.net/ uses the Visual Insights  API to identify a set of relevant tags.
And the recently released Visual Recognition  API allows you to train the model with your own dataset. Demo: http://visual-recognition-demo.mybluemix.net/
Disclosure: I am an evangelist for the Watson Developer Cloud suite of services at IBM.
Some feedback though:
- As an independent developer, pricing is important to me. It was very difficult to find pricing for the Watson APIs (apparently it was in Bluemix?) and if I wasn't a little more determined (thanks to the ability to train my own classifier), I wouldn't have persevered.
- If I already have a wealth of labelled data (I do), it seems difficult to train a new classifier for the Visual Recognition service. If I have 200 000 images each with an average of 20 labels (from a set of ~2000 labels) for example, the positive + negative sample per label is very time and bandwidth consuming, as I have to train 2000 classifiers using ~ 5000 images per classifier (for plenty of training data), for a total of ~10 000 000 uploads. It'd be far nicer to be able to upload a folder of JPEGs with a JSON blob per file containing labels (or a classifier name) and have Watson derive positive and negative samples from it.
As a work-around for mass uploading, I might suggest signing up for a 30 day free Bluemix trial . You could then upload your data to a container and script the creation of sample archives and uploads from there.
(I would also say the exact same thing for Microsoft and it's Project Oxford APIs, whoever here is working as its evangelist)
Disclaimer: I'm CEO & CTO of Imagga.
Also wonder if they'll bring the ability to train your own classifiers using their networks...
Funny enough I just answered a similar question a few days ago here - http://kaptur.co/10-questions-for-a-founder-imagga/
The bottom line is - we believe we can provide much better API service with competing level of technology precision.
At the same time we also stress a lot on specific things like custom categorization training and enterprise/on-premise installations (both differ from custom software).
Actually we don't plan to run away in a niche market, though some people suggest it as a proper strategy. We'll give them a good run for their money on the broad use-case.
I believe that the "Hacker" culture works in our favour in this case and I hope you help us prove it :)
But this data shouldn’t belong to a corporation.
It’s inacceptable that Google has the ability to use this dataset, their market monopoly in this online market, as competitive advantage in the self-driving car market.
Btw, black hats have no demand at all for captcha-solving solutions by training captcha solvers – instead, hiring people from Bangladesh is far cheaper and faster.
Is anyone else having the same issue/know where I can find this ID?
Anyone who has successfully signed up, where is the user account ID?
This custom categorization is actually one of our most requested services at Imagga - here is some info if you are interested - http://imagga.com/solutions/custom-categorization.html
But this is the nature of machine learning algorithms, including both the process of supervision and ability to view the impact feedbacks have on the algorithms, and also, the impact the quality of the training set given to the algorithms. At a lesser company, the problem could be as simple as very few black people represented in the training set, so that when the algorithm sees a dark-colored human-like shape, it is more "likely" that that shape is a gorilla (which is human like and pretty much always has dark fur) than it is a human, because the algorithm was trained mostly on light-colored humans. The Google Photos algorithm obviously takes in more kinds of input and factors besides visual composition so there was probably more to it than this.
Or maybe not...who knows? I'm not interested in reviving a discussion about importance of diversity in the engineering workforce, but this is one kind of problem that can slip by the most competent and well-intentioned of engineers simply because they're less aware of how disenfranchisement can propagate into technical problems, no matter how correct and powerful the math behind the algorithm.
Another example from a few years back was when HP released a auto-tracking webcam that became infamous after a black retail employee uploaded a YouTube video of how the camera ignored him but not his white co-worker:
I'm in 100% agreement that this was likely not HP's intentional fault, and also that face detection of darker complexions is computationally more complex than it is for lighter complexions because of how contrast is used by the algorithm...but I most definitely know that if I were an HP engineer, and if the CEO and/or my direct boss were black and tried out a prototype that behaved as it does in the aforementioned YouTube video, there is almost no fucking way that the product would be released as-is, with my excuse being "Well, accurately detecting black faces requires a much more complicated training set -- that's just how math works!"
This is why you shouldn't just train your system to hit higher accuracy figures but also investigate the type of errors it's making. This needs to be done while thinking about your specific use case and domain.
If they have an African-American engineer working on the product, they would have detected this during development.