
Amazon AI - jitbit
https://aws.amazon.com/amazon-ai/
======
merricksb
Discussed at time of launch 7 weeks ago:

Lex:
[https://news.ycombinator.com/item?id=13072813](https://news.ycombinator.com/item?id=13072813)

Polly:
[https://news.ycombinator.com/item?id=13072944](https://news.ycombinator.com/item?id=13072944)

Rekognition:
[https://news.ycombinator.com/item?id=13072956](https://news.ycombinator.com/item?id=13072956)

------
amelius
I hope that we will soon see open-source versions of these functionalities,
because I don't like the idea of using an external service for reasons
including privacy, reliability and latency.

~~~
BjoernKW
The problem with machine learning software often isn't a lack of openly
available algorithms. Those are well-researched and well-documented.

The problem in most cases is a lack of openly available training data. No open
source library helps you with that. It's the data itself that needs to be
open-sourced. Companies like Google and Amazon naturally have a huge amount of
relevant data but they aren't exactly eager to share that data with the world
(which by the way would have huge privacy implications as well).

~~~
bjacobel
My area of expertise isn't AI, so correct me if this is wrong, but isn't the
important part the model? Would it be possible to open-source the model
without revealing the data which it was trained on?

(For applications where the problem you're trying to solve is relatively
similar to the ones Apple/Google/Amazon are trying to solve - conversational
AI)

~~~
_dps
[AI/ML researcher/engineer here]

At best you can partly do this, and even that is debatable.

Here's a simple example: say Google builds a linear regression for the
incremental likelihood to click a social post based on the distance of poster
from you in a social network. By merely revealing the (single) coefficient in
this model, they've revealed something about the underlying data set that they
may not have intended.

It gets exponentially (I use this term deliberately) more difficult with
multivariate nonlinear models. I have no doubt that substantial information
about a data set could be reverse engineered from complex ML models, _even if
we don 't yet know how_. But even in the case of basic multivariate
regression, the model directly encodes both the mean values and the
covariances of all the variables. So that's a fair amount of disclosure
already.

In some sense all models are (usually lossy) data compression, and it's just a
matter of understanding the compression logic to go backwards from facts about
the model to facts about the source data.

------
MichaelBurge
Does anyone find the "Amazon Machine Learning" product actually useful?
Scalable linear/logistic regressions seems like a pretty niche product.

~~~
feral
A great many practical ML systems actually delivering business value are just
scaled linear and logistic regressions (if not custom coded rules!)

~~~
candiodari
That's the big secret of AI. 95% of the AI in companies like Google and
Facebook are rules, programmed by humans, sequenced by humans, curated by
humans. A relatively close open source example would be spamassassin.

Now these do use advanced AI algorithms, Bayesian, ... etc. But it's one
factor out of hundreds, and they are nowhere near the top.

Now at Google and Facebook, the rules are like the rules of a firewall. Some
new type of spam comes out, it gets past the filters, they look at it, they
write a new rule. After 10 years of this with 50 people doing that fulltime
you have thousands of rules and it doesn't look like a set of rules anymore.
This is why those companies don't get blown out of the water every time a PhD
realizes how to make the basic algorithm perform 5% better.

They don't really use the newest algorithms at all. With gmail it's obvious
since some of the tricks it knows are things a machine learning algorithm
could never ever figure out, for instance how to look up flight times, or
package status. No amount of training on mails will ever yield that.
Furthermore, they have custom "cards" and other small custom UIs that get
triggered by their "AI" classification.

Of the remaining 5%, 4.95% is still the same principle, but basic algorithms
can adapt it a bit : human designed rules, but a linear or logistic regression
over the data of the past 5 (minutes, hours, days, weeks, months) can make the
rule 10% stricter or looser.

The only real "advanced" AI is speech-to-text and in Google's case image
search.

------
hellofunk
Since tools like Siri grow more useful as the crowd feeds the cloud and Siri
learns from the masses, I wonder if that too will happen with Amazon. Will its
speech recognition service evolve to reflect what the service itself has
learned from its many clients? Will the service indeed have the same
capabilities of Alexa or will Amazon reserve the best part of the technology
for itself?

------
closed
Are there white papers on some of the algorithms they're using? For example,
they mention getting a face similarity score, but there are quite a few ways
that could be done.

The idea of lowering the bar for rolling out these kinds of tasks seems great,
but I'd miss being able to think about the underlying models.

~~~
erikbye
Look for papers on SGD optimizations and on loss functions for
classifications.

------
braindead_in
No standalone ASR? I am looking for an ASR as a Service for audio/video files
where I can use my data to train my own models. AFAIK nobody's offering that
yet.

~~~
andyjohnson0
ASP is Automatic Speech Recognition, yes?

------
PaulHoule
Positioning is very much like IBM's Watson brand.

------
ahier
Give me all your data ~ resistance is futile

------
anacleto
Not being nitpicky but it was announced months ago.

~~~
tomhoward
The first snapshot on Archive.org was about 6 weeks ago on Dec 6th. So, less
than 2 months.

And as far as my searches can dig up, it hasn't been discussed on HN yet.

~~~
techwizrd
I remember the individual products like Lex, Polly, and Rekognition being
discussed on HN after they were announced. However, these were individual
threads, not about this page.

There are a number of duplicates as well.

Lex:
[https://news.ycombinator.com/item?id=13072813](https://news.ycombinator.com/item?id=13072813)

Polly:
[https://news.ycombinator.com/item?id=13072944](https://news.ycombinator.com/item?id=13072944)

Rekognition:
[https://news.ycombinator.com/item?id=13072956](https://news.ycombinator.com/item?id=13072956)

