

Show HN: Seldon – Open Predictive AI released on GitHub under Apache 2.0 - ahousley
http://www.seldon.io/open-source/

======
LukeB_UK
The main site seems to be down. The only other one I can find is their docs
site: [http://docs.seldon.io/](http://docs.seldon.io/)

~~~
Chlorus
They really should have seen that coming...

------
coding4all
Main site is down. Here's the Github
[https://github.com/SeldonIO](https://github.com/SeldonIO)

~~~
ahousley
Website will be back soon, meanwhile here are the docs:
[http://docs.seldon.io/](http://docs.seldon.io/)

Sign up for early access to betas here:
[http://eepurl.com/6X6n1](http://eepurl.com/6X6n1)

Thanks!

------
ahousley
Seldon includes the following algorithms that can be used to build highly-
scalable predictive analytics and recommender systems.

Seldon is available on Github: [https://github.com/SeldonIO/seldon-
server](https://github.com/SeldonIO/seldon-server) Technical docs:
[http://docs.seldon.io](http://docs.seldon.io)

# User Clusters _Improve relevance of recommendations in high churn media
services._ \- Cluster users based on historical activity. \-- configurable
taxonomy (category, price range, brand, visit referrer, .. \-- unsupervised
(fuzzy k-means) \- Apache Spark to handle large historical data sizes. \- Load
user clusters into front-end servers periodically and count content hits for
users in same cluster \- Decay counts to provide activity dynamics as new
content is published. \- Recommend by combining counts for content based on
cluster membership of user. \- Real-time stream processing for adding short-
term dynamics to recommendations.

# Item Activity Correlation _Built for static slowly changing historical
inventory_ \- Similar to Amazon’s “people who bought this also bought…” \- Use
historical user activity to find items that share similar user activity. \-
Apache Spark scalable offline implementation. \- Upload for each item: top-N
similar items. \- For each user: item recommendations based on their
historical activity.

# Topic Models _Built for sites needing long tail recommendation_ \- Assume
activity is associated with a set of topics. \- Users individuals tastes are
covered by a subset of topics. \- Describe users by the set of keywords for
the items they have interacted with. \- Built with Apache Spark and Vowpal
Wabbit implementation of Latent Dirichlet Allocation. \- Online serving layer
scores user association with items in real time.

# Latent Factor Models _Best for e-commerce sites lower churn sites_ \-
Netflix Prize-winning solution. \- Use Matrix Factorization to reduce activity
matrix to two low dimension user and item factor matrices. \- Load factors
into API servers and score users and items in real time. \- Fold-in new users
and items until next batch update of model. \- Utilize Apache Spark mllib and
streaming modules.

# Content Similarity _Built for services with rich metadata and high sparsity_
\- Requirement – fast content based technique to match user history to similar
content based on text/tags of content. \- Utilize random vectors technique.
Each word/tag is assigned a random high-dimensional vector. \- Open-source
Semantic-Vectors and word2vec implementations. \- Periodically process recent
content into vectors and update servers. \- Servers load vectors into memory.
\- Recommendation on recent user activity to find similar content in real-
time.

(via [http://seldon.io/algorithms](http://seldon.io/algorithms))

