Amazon AI

merricksb · on Jan 18, 2017

Discussed at time of launch 7 weeks ago:

Lex: https://news.ycombinator.com/item?id=13072813

Polly: https://news.ycombinator.com/item?id=13072944

Rekognition: https://news.ycombinator.com/item?id=13072956

amelius · on Jan 18, 2017

I hope that we will soon see open-source versions of these functionalities, because I don't like the idea of using an external service for reasons including privacy, reliability and latency.

BjoernKW · on Jan 18, 2017

The problem with machine learning software often isn't a lack of openly available algorithms. Those are well-researched and well-documented.

The problem in most cases is a lack of openly available training data. No open source library helps you with that. It's the data itself that needs to be open-sourced. Companies like Google and Amazon naturally have a huge amount of relevant data but they aren't exactly eager to share that data with the world (which by the way would have huge privacy implications as well).

__coaxialcabal · on Jan 18, 2017

I'm not sure if open sourcing the data is absolutely necessary to reproduce the results given access to the prediction API. "Stealing Machine Learning Models via Prediction APIs" (https://arxiv.org/abs/1609.02943) examines techniques in this space. The API might also be used in adversarial training. The underlying data would be useful for advancing research, but not necessarily a prerequisite for reproducing the API.

AWS wins not because they have created an impenetrable black box, but because the economics of reverse engineering (and maintaining) are not cost effective for the majority of users.

bjacobel · on Jan 18, 2017

My area of expertise isn't AI, so correct me if this is wrong, but isn't the important part the model? Would it be possible to open-source the model without revealing the data which it was trained on?

(For applications where the problem you're trying to solve is relatively similar to the ones Apple/Google/Amazon are trying to solve - conversational AI)

_dps · on Jan 18, 2017

[AI/ML researcher/engineer here]

At best you can partly do this, and even that is debatable.

Here's a simple example: say Google builds a linear regression for the incremental likelihood to click a social post based on the distance of poster from you in a social network. By merely revealing the (single) coefficient in this model, they've revealed something about the underlying data set that they may not have intended.

It gets exponentially (I use this term deliberately) more difficult with multivariate nonlinear models. I have no doubt that substantial information about a data set could be reverse engineered from complex ML models, even if we don't yet know how. But even in the case of basic multivariate regression, the model directly encodes both the mean values and the covariances of all the variables. So that's a fair amount of disclosure already.

In some sense all models are (usually lossy) data compression, and it's just a matter of understanding the compression logic to go backwards from facts about the model to facts about the source data.

ragebol · on Jan 18, 2017

Moreover, each element of data you give these services helps them expand their datasets. Its free for a reason.

Would it be possible to run a service like this in a truly distributed fashion, SETI@Home and Folding@Home style?

PetahNZ · on Jan 18, 2017

They specifically say they do not use your data to train thier models with out permission.

ragebol · on Jan 18, 2017

Didn't see that, where does it say that?

PetahNZ · on Jan 18, 2017

https://aws.amazon.com/rekognition/faqs/#data-privacy

bryanrasmussen · on Jan 18, 2017

It says 'No, not unless you provide us permission to do so.'

How is permission provided? I mean I don't foresee people calling up Amazon and saying hey dataset 2001-G yeah you have permission to use that to train on!

So I guess I would expect Amazon to ask you when you submit your training data, and how do they ask you. Is it opt-in or opt-out and so forth. I would expect them to make it easy to share the data with them, and difficult to opt out.

ascorbic · on Jan 18, 2017

It's not free.

pilooch · on Jan 18, 2017

Look https://deepdetect.com for an Open Source serve that covers most functionalities for building a personal deep learning as a service and API platform. It's not perfect but does a good job at images classification, object detection and character based text processing. There are AMIs available actually.

PetahNZ · on Jan 18, 2017

There is open source alternatives like http://www.robots.ox.ac.uk/~szheng/crfasrnndemo

jitbit · on Jan 18, 2017

+1, at least lets hope to see some more info on the technology used behind this

MichaelBurge · on Jan 18, 2017

Does anyone find the "Amazon Machine Learning" product actually useful? Scalable linear/logistic regressions seems like a pretty niche product.

fergal_reid · on Jan 18, 2017

A great many practical ML systems actually delivering business value are just scaled linear and logistic regressions (if not custom coded rules!)

candiodari · on Jan 18, 2017

That's the big secret of AI. 95% of the AI in companies like Google and Facebook are rules, programmed by humans, sequenced by humans, curated by humans. A relatively close open source example would be spamassassin.

Now these do use advanced AI algorithms, Bayesian, ... etc. But it's one factor out of hundreds, and they are nowhere near the top.

Now at Google and Facebook, the rules are like the rules of a firewall. Some new type of spam comes out, it gets past the filters, they look at it, they write a new rule. After 10 years of this with 50 people doing that fulltime you have thousands of rules and it doesn't look like a set of rules anymore. This is why those companies don't get blown out of the water every time a PhD realizes how to make the basic algorithm perform 5% better.

They don't really use the newest algorithms at all. With gmail it's obvious since some of the tricks it knows are things a machine learning algorithm could never ever figure out, for instance how to look up flight times, or package status. No amount of training on mails will ever yield that. Furthermore, they have custom "cards" and other small custom UIs that get triggered by their "AI" classification.

Of the remaining 5%, 4.95% is still the same principle, but basic algorithms can adapt it a bit : human designed rules, but a linear or logistic regression over the data of the past 5 (minutes, hours, days, weeks, months) can make the rule 10% stricter or looser.

The only real "advanced" AI is speech-to-text and in Google's case image search.

mindcrime · on Jan 18, 2017

I'd say just the opposite. Probably a lot of people think they need deep learning and complex neural networks, when they could create value with just linear / logistic regression. Note that I'm talking about simply delivering business value here, not doing ML research per-se.

tma-1 · on Jan 18, 2017

Spark ML has that functionality.

jb1991 · on Jan 18, 2017

Since tools like Siri grow more useful as the crowd feeds the cloud and Siri learns from the masses, I wonder if that too will happen with Amazon. Will its speech recognition service evolve to reflect what the service itself has learned from its many clients? Will the service indeed have the same capabilities of Alexa or will Amazon reserve the best part of the technology for itself?

closed · on Jan 18, 2017

Are there white papers on some of the algorithms they're using? For example, they mention getting a face similarity score, but there are quite a few ways that could be done.

The idea of lowering the bar for rolling out these kinds of tasks seems great, but I'd miss being able to think about the underlying models.

erikbye · on Jan 18, 2017

Look for papers on SGD optimizations and on loss functions for classifications.

braindead_in · on Jan 18, 2017

No standalone ASR? I am looking for an ASR as a Service for audio/video files where I can use my data to train my own models. AFAIK nobody's offering that yet.

andyjohnson0 · on Jan 18, 2017

ASP is Automatic Speech Recognition, yes?

PaulHoule · on Jan 18, 2017

Positioning is very much like IBM's Watson brand.

ahier · on Jan 19, 2017

Give me all your data ~ resistance is futile

anacleto · on Jan 18, 2017

Not being nitpicky but it was announced months ago.

tomhoward · on Jan 18, 2017

The first snapshot on Archive.org was about 6 weeks ago on Dec 6th. So, less than 2 months.

And as far as my searches can dig up, it hasn't been discussed on HN yet.

techwizrd · on Jan 18, 2017

I remember the individual products like Lex, Polly, and Rekognition being discussed on HN after they were announced. However, these were individual threads, not about this page.

There are a number of duplicates as well.

Lex: https://news.ycombinator.com/item?id=13072813

Polly: https://news.ycombinator.com/item?id=13072944

Rekognition: https://news.ycombinator.com/item?id=13072956