

Datasight.io – Machine Learning for the Masses - ncvc
http://datasight.io/

======
minimaxir
The flaw with "machine learning for the masses" is that the data still
requires preprocessing in order to derive meaningful statistically significant
conclusions. That skill is as difficult and tedious as implementing a machine
learning algorithm itself.

More importantly, any valid ML analysis requires that the source data is
_good_ , which is impossible to guarantee with this service.

~~~
agibsonccc
That's one of my biggest problems with "any" of the ML as a service. You can't
simply claim you can make things simpler for a lay audience when there is
still a lot of infrastructure required for handling data,
preprocessing/cleaning etc. At that point you will need some sort of data
expertise anyways.

I've found most people who need machine learning usually want control of
it/hosting it in house.

~~~
muzakthings
Hi there, founder here. Great point. We actually create a wrapper that allows
you once finished to simply cut and paste a line from a CSV to get predictions
- preprocessing, scaling, dummy encoding, PCA, etc all included.

We'll also deploy to an HTTP endpoint and you can call it and get JSON
response.

Does that address your concerns?

~~~
agibsonccc
It's great that you're doing all of this, there still comes a time when you're
making a lot of assumptions, and I believe for most use cases a good default
pipeline will work great for most use cases.

I know you guys are only targeting the very simple needs of people which is
great. Keep in mind I'm far from your target audience being almost a little
too deep in the machine learning side. Best of luck with it!

~~~
muzakthings
Making preprocessors for creating bag or words vectors from free text, a
special IP address feature processor, etc - these are all the ways we're
starting to make the automation smarter and more of a value add to experienced
folks like yourself.

What value add would you want to see?

~~~
agibsonccc
I usually roll my own.

I've found that certain things, especially NLP pipelines, are usually very
tricky to get right.

Let me give you one example from my day to day:

Sentence segmentation/tokenization ---> part of speech tagger ---> filter
words by part of speech and then rank by tfidf scores for topic relevance
scoring.

Computer vision also has its own problems ---> binarize images for detection
of certain kinds of features vs needing colors, so different kinds of image
transformations for handling of different kinds of object recognition, or
scene detection.

I also do churn prediction, by the time you're done vectorizing users, why
should I take time out of my day to then upload all of this to an external
service when I could just run logistic regression, random forest or what have
you locally? I typically calculate profit curves and the like as well.

Re: your word vectors and deep learning. I do this in my distributed deep
learning lib I hand wrote myself[1]. In my word2vec computations, I have found
it takes a significant amount of data to tune right, I usually want control
over this pipeline as well.

Again, not your target audience ;).

To give you some helpful feed back, don't try being everything to everybody,
own a certain niche really well and run with it. If you'd like to take this
discussion offline, I'd be more than happy to help, email is in my profile.
Again, good luck with the service!

[1] [http://deeplearning4j.org/](http://deeplearning4j.org/)

------
agibsonccc
It's not a bad idea, but it just doesn't seem to me that "general purpose"
machine learning as a service has any real value for lay users.

I'd love to get others thoughts on this and if they would use a service like
this what you would want in something like this.

I would imagine something like churn prediction as a service, face recognition
as a service, sentiment analysis,... more concrete needs could be good.

The landscape is mainly APIs (wise.io,algorithms.io,..)

BigML does decision trees as a service which at least allows you to interpret
the results with a good visualization.

Ersatz does neural nets as a service with a decent GUI and an API.

------
golergka
I like this idea a lot, and I am a potential customer — I'm developing an app
that would heavily rely on complicated data analysis, and I planned to find a
data scientist after completing first prototypes.

However, there's nothing at this page right now that would convince me about
this service. There's no actual examples of what your technology can achieve,
there's no programming API, just a promise. I understand that you're still in
development, and I'm not asking for full documentation and a page of famous
logos under "they use us", but at least a code snippet or something like that
would make it easier to visualize.

Right now, I'm lost; my data is a lot more complicated and inter-connected
then a mere spreadsheet, will you support it? How will it work? What pricing
will be, and how will it be calculated? Yes, I understand that you haven't
decided yet on all this things, but that's the problem: there's nothing to
show except for the idea yet.

Anyway, I really hope that this service will work great, and I wish you all
the luck with it.

~~~
swatthatfly
Datacratic has been doing what you ask for a while now (disclaimer, I work for
them). We do ML, audience optimization, segmentation, multivariate and
nonlinear models for small clients or for some of the largest shops in the
industry. But the algorithms are indeed a small problem compared to building
the on-demand infrastructure, managing the dataset, and all the rest of the
challenges related to managing other people's data. Trust to handle data that
doesn't belong to you is earned with great difficulty, unless it comes from
shady sources.

------
Gonzih
doesn't look like solid project right now. One screenshot, few word about the
idea. Nothing more. There is prediction.io which looks like much solid
implementation, but I still don't like it. I don't like ML software that feels
like blackbox. If I can't see learning stats/graphs - I don't trust results.
There is also bigml.com, but I haven't played with this yet.

~~~
dave_sullivan
You should try [http://www.ersatzlabs.com](http://www.ersatzlabs.com) \-- sign
up for the app at [http://api.ersatzlabs.com](http://api.ersatzlabs.com)
invite code "ersatzbeta"

It's still in beta, but it's a lot more full featured than it was a few months
ago. We're trying to make deep learning easy and we provide a variety of
neural network algorithms backed by GPUs. Documentation still sucks... I'm co-
founder.

It's a tricky balance w/ all of our products right now (BigML, Ersatz,
prediction.io, datasight.io, wise.io, more) -- how to strike the right balance
between appealing to Excel power users versus people who are very comfortable
in matlab or python but still don't want to build their own if there's a
significantly easier option.

~~~
muzakthings
Founder here. We also do deep learning, and we include hyperparameter
optimization which is a very necessary component of training any good DBN.
Feel free to contact me at will@datasight.io if you have questions.

------
cschmidt
That's very interesting. I'm working on a similar concept,
[http://www.predictobot.com](http://www.predictobot.com). I'm about two week
from launching my beta, so I'm not sure who is further along. If any
datasight.io guys are on here, we should talk. I'm in Boston, if you ex-MIT
guys are still here.

------
afandian
Unfortunate phonetic naming clash.
[http://www.datacite.org/](http://www.datacite.org/)

~~~
muzakthings
Haha, this is great. Thanks for the link.

~~~
afandian
Seriously though. If this is a new thing you're launching you may want to
consider re-naming before it becomes painful.

------
twog
How can I get access to this? I tried to email contact@datasight.io but the
address bounced

