
Data Science Toolkit as Virtual Machine - tomaskazemekas
http://www.datasciencetoolkit.org/
======
logn
(excuse the shameless plug that follows)

Here's another API (available as a service) that would complement their Data
Science Toolkit: Search Query to Structured Results List --
[http://screenslicer.com/](http://screenslicer.com/)

It would work quite nicely with "HTML to Story" \--
[http://www.datasciencetoolkit.org/developerdocs#html2text](http://www.datasciencetoolkit.org/developerdocs#html2text)

You could take the results list from ScreenSlicer and plug those into HTML to
Story. I might bundle this all together myself as a service.

------
jhancock
I'm looking for an appliance solution for the textbook example of users-books-
genres recommendations. I've read articles and books that present this as a
classic problem but haven't seen a packaged appliance that helps me put data
in and get recommendations out. Is there such a thing? Is there a reason such
a thing can't exist without putting the developer through a big learning
curve?

I have an ebook site with around 100k registered users and 1M bookmarks (what
they read). Every book is well categorized with one or more tags. I'm guessing
I have enough data to feed the recommender but have put off trying as the
learning curve seems like more than I have time for.

~~~
reverius42
Disclaimer: I work for GraphLab Inc.

GraphLab Create aims to do what you're describing in terms of solving this
problem without a big learning curve. I'm not sure what you are looking for in
an appliance solution -- it's a Python package, but does not require a lot
code to get good results, so hopefully this may qualify. With 5 lines of
Python code to import your dataset and train an out-of-box model, you can get
up and running with a basic recommender. Some data munging may be required to
get things into the format our model expects, but I'd love to hear what format
your data is in if some munging is necessary so we can make our API as easy to
use as possible.

You can follow the tutorial [1] to create a basic recommender (if you want to
try out GraphLab Create on your machine, just pip install graphlab-create and
come to our website to get a product key [2]).

[1]
[http://graphlab.com/learn/notebooks/basic_recommender_functi...](http://graphlab.com/learn/notebooks/basic_recommender_functionalities.html)
[2] [http://graphlab.com/products/create/quick-start-
guide.html](http://graphlab.com/products/create/quick-start-guide.html)

~~~
jhancock
Thanks! I'll give this a try. My data is in mongodb using ruby/rails as a web
front. Data structure shouldn't be too hard...its just books, users,
bookmarks, and tags/genre. All have unique ids (mongo UUIDs for users, books,
and basic integer ids for the tags) and simple associations so I guess it
couldn't be too hard to import. I'm thinking an offline system would be fine
for some experiments. Not sure I need some real-time recommender
integrated...but I'm new to this, so I guess I'm still not sure what I need.

~~~
reverius42
The simplest way to start out would be to import from CSV (assuming it's easy
to dump users/books/tags from MongoDB to CSV), use that to train the model,
then dump the recommendations from the model back to CSV (or using Python
code, write them directly to MongoDB). That would give you an offline system
with one-time recommendations. You could then run this in batch, on a
schedule, to re-train as you get new tags/bookmarks and update the
recommendations periodically.

------
justin66
Thanks to these guys for creating a VMWare-compatible image as well.

