
Yhat Sciencebox - yiedyie
http://blog.yhathq.com/posts/yhat-sciencebox.html
======
rguldener
I think this is cool but for me personally it would be so much cooler if it
was open-source. Sure I understand they want to make money with it but really
this doesn't offer me enough more than my current setup that I would be
willing to pay $0.05 - $1.40 per hour. That's up to $33 per day or $1041 per
month, (quite) pricey for some background execution and rsync functionality.

My current setup is to use a workhorse server where I set up a samba share for
my home folder (so I can edit scripts remotely from my notebook without the
git-push-pull dance) and a remote iPython notebook. Both config files are
write once and forget and I get 90% of their functionality built in (without
the web GUI). Would love to learn about other people's setup though, I am sure
mine can be improved.

~~~
darkxanthos
well the range is really $36 - $1,041 per month. That seems reasonable
depending upon what you need.

~~~
fyrabanks
Assuming the upper end of that--for a lab with 10 researchers (forget the
students, postdocs, etc. for a second), you could save maybe half the cost by
hiring a single IT person to do this. Aside from their sb client, I don't see
anything in this that can't be downloaded and installed (either from source or
automagically from any of the freely available science-related R, Python, etc.
repos).

Neat idea, I just find it difficult to justify a $1000/mo per-machine cost,
especially in academia.

~~~
clebio
You can hire IT people for 12k a year?

------
elliott34
This pairs well with the recent AirBNB fraud detection article. Airbnb has a
lot of people to construct a robust machine learning architecture. For small
start ups trying to make a recommendation engine, why not do sciencebox+ yhat
cloud/enterprise? Especially for data science people that are don't have great
dev skills.

------
evancasey
Seems like a useful tool that would make doing data analysis on a remote
machine much easier than with plain old AWS + anaconda.

However, the incremental improvement of renting an EC2 instance (even with 8
cores and 65gb) pales in comparison to using a distributed data processing
approach with medium to large datasets. Any plans for supporting Hadoop/Spark
+ Amazon EMR in the future?

------
radikalus
Two things: \- GPU instance support would be really really helpful \- Pricing
of the higher perf tiers is a bit painful which changes the normal AWS $
optimization metrics pretty massively

Otherwise -- it's definitely something we'd consider using; I end up doing 99%
of my work on my laptop because AWS setup fixed cost time is 30minutes plus
every time.

------
aroch
Every time I see one of these pop up I'm reminded that one of these days I'll
need to get around to automating my data analysis and re-learning R :(

And perhaps updating the COBOL scripts from 1970-something that I inherited

------
ehurrell
Oh brilliant, I'd be interested to hear how this compares with Statwing, which
seems to be going for the same market. Definitely a useful service.

~~~
cschmidt
Statwing is trying to do nice visualizations and analysis in a web browser,
without programming. This is more of a way to use R or python to do data
analysis on a remote AWS machine. You still have to know what you're doing in
R or python, but it makes the "remote" part easy.

~~~
ehurrell
I had thought Statwing set up the remote stuff and then automated a lot of
tasks in-browser, but it doesn't seem to give the easy access this does. This
is a lot more attractive!

------
aye
Awesome! The "get an email when your job has finished" takes me back to the
days of True BASIC on the Dartmouth time-sharing system.

