Hacker News new | past | comments | ask | show | jobs | submit login
Giving Away Our Recommendation Engine (mortardata.com)
297 points by kky on Apr 9, 2014 | hide | past | web | favorite | 43 comments

Fwiw, here are two light-weight feature-based recommendation engines I built for Node.js (for situations where you have the cold-start problem and therefore can't rely on user/item based collaborative filtering): Alike [1] and Look-Alike [2]

[1] https://github.com/axiomzen/Alike

[2] https://github.com/axiomzen/Look-Alike

Thanks for sharing. What do you mean by the "cold-start" problem? Just want to know exactly when I can use your engines.

Just speculating: not having a recommendation when you first begin because you don't have any data.

Exactly right. I borrowed that term from Chapter 2 on Collective Intelligence [1]

[1] http://shop.oreilly.com/product/9780596529321.do

So hang on, what exactly is a recommendation engine?

They give examples of LinkedIn (people you may know) and Amazon (presumably other people who bought this, so-and-so's list of such-a-subject books).

That makes sense, though the segment of businesses that may actually benefit seems limited. Social stuff, sure. Most of us? What's the minimum recommendable-entity/category-or-user threshold that this makes sense for? Is success with these sorts of engines merely a reflector of poor UI design in your normal UX? (Of the above examples, the first seems very unidimensional - in that it's basically a simple graph distance - and the latter also rather rudimentary and often irrelevant).

So what exactly is this thing providing? Graph analysis? I think not. It reads more like some kind of raw timestamped user behavioural event data processing to infer relationships between users or products they interact with. Reading through the docs it seems this is a layer on top of Apache Pig (https://pig.apache.org/) - a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. I think clarity in explaining this thing could be improved, particularly selling clearly what a recommendation is and when its useful. Using phrases like "award winning" doesn't help.

PS. Why all the downvotes? Sheesh.

I suspect you're being downvoted for having a dismissive tone in the same breath as you admit to not understanding the problem space. My guess is that the marketing copy on the site isn't targeted towards you, so it shouldn't surprise you that you don't understand it, whereas their target customers know all sorts of things about, say, Pig. That's fine, but then your comment should read like, "Can someone explain to me what this is for?" Instead, your comment is dripping with condescending snark, lecturing someone or another about how this thing you don't understand probably isn't useful, and incredulous that you don't understand something on the internet.

Imagine opening an advanced textbook on a subject you don't understand, reading two paragraphs of it, and throwing up your hands in disgust because what does this even mean?

Apologies, condescending snark was not the intent (I can't even see where you see that, actually!). In response to your more salient points, it could be argued that a web+EC2 layer on top of existing software is hardly an advanced textbook. Likewise, their announcement's stated intent is to gain customers, so feedback on what's unclear should be well within an acceptable scope of discussion. Finally, I doubt any of us are excluded from their intended market as software people move frequently between problem domains.

For the record, this reply seems as snarky and dismissive as the first. Hopefully some constructive criticism.


https://www.coursera.org/course/recsys will provide you with a good introduction to the topic

Thanks. For others who are interested, that course apparently uses a different piece of software called LensKit http://lenskit.grouplens.org/

Could anyone summarize the difference between Pig and LensKit when applied to recommendation systems?

The mortar recommendation system is written in Pig, and can scale to any data size, even petabytes. The lenskit tool cannot.

> benefit seems limited. Social stuff, sure. Most of us?

Any business where you have a large catalog that users are going to want to filter through. This gives you the ability to offer a shortcut to things they might find interesting. Other examples would be netflix, spotify, app stores, or coursera.

Basically those are all media discovery applications. (PS. I can't think of a single app store experience worth replicating...)

Drop the media part and yeah sounds about right. Anything that has a large enough collection of things to need discovery.

FWIW we've been using the mortar platform to run large pig jobs without a fuss at http://datadog.com and we've been very happy with it. Glad to see them contribute their recommender code too.

Can you please suggest why you need a recommendation engine for datadog?

We don't use the recommendation engine but the underlying platform, which makes it really simple to write and run pig jobs. Though the majority of our business deals with real-time data processing, the ability to crunch numbers in batch without dev or ops overhead is attractive and well worth the price to us.

Is this better or similar to Hue?

I'm curious what the business case was for open sourcing the code. Maybe to create an ecosystem?

"We’re giving over a year’s worth of work on our recommendation engine away because we want to earn your business on our platform."

From the "What you'll need" section of the first tutorial -

A Mortar account. You can sign up for a free Public account with Mortar here. If you want to keep your customized recommendation engine code private, you will need a Solo-level account ($99/month). Beyond that, you'll only pay for your actual usage of AWS cloud services (we never add an upcharge).

Kudos for the open source, but it looks like to actually use this for business you'll still need to pay. Unless i'm misreading it, "Open source but you'll still have to go through our platform" is pretty disingenuous.

The code is all released under the Apache 2.0 license, so calling such an action "disingenuous" is itself disingenuous, (imo).

I'm not trying to start a flame war over the use of the word open, and I think it's great that they're releasing code that others can learn from.

It's just that making a big press release and blog post that brags about open sourcing, vs the reality that you can't actually do anything substantial with the code without paying for it... it seems off to me.

I get what they're trying to do, but to me the whole point of OS code is that you can self-host, and/or modify it for business use if you so choose.

To me this would be better served by advertising "We like you so much, we're giving away access to our service for free for noncommercial and test use, and opening up the code to the library so you can see how it works", but that's less interesting as click bait.

Maybe i'm just mis-reading the whole thing and you can self-host.

It seems to me that you can self-host. There is reasonable installation documentation at http://mortar-framework.org/. You just have to set up the infrastructure yourself, which you would have had to do anyway if they didn't offer hosting. It also seems that you can use their platform for only the cost of AWS if you don't mind open-sourcing your recommendation engine, which sounds terrific if your project is itself open source. Honestly, I can't find any qualms with this. Their business model seems to be that you will probably pay them $100/year (or $500/year for team access) because it's easier/cheaper than DIY.

Thanks for the clarification. I'm in agreement with your opinions regarding false promises of open source, and also that this is increasingly a problem. However, I don't think that actually applies here. Specifics:

1. Everything in this github repository (https://github.com/mortardata/mortar-recsys) appears to be truly open - it's just a bunch of pig scripts, some java UDF definitions, and some python management code. There doesn't appear to be any dependencies on proprietary MortarData anything. All the code is licensed under the Apache 2.0 license.

2. The blog post states: " You can run this code anywhere. It’s built on widely-adopted open source technologies—Hadoop, Pig, and Python. But we think you’ll want to use our platform."

I cant find the part which provisions the mapreduce cluster - isnt that part of their platform lockin ?

You can use Amazon Elastic MapReduce to provision your own cluster. I'm guessing the value prop they bring is the ease of handling that part for you.

If you want a relatively simple way to provision a Hadoop cluster locally, you may want to try out Ferry (http://ferry.opencore.io). It's based off Docker, so in theory, you could also write a nice Dockerfile to deploy Mortar's recommendation engine. (full disclosure, I'm the author of Ferry).

To my mind it depends a little on what functionality is behind the account. If this is a huge chunk of functionality, that presently depends on their infrastructure but doesn't have to, then I think that's fine (though certainly enough to self-host would be better). If this is basically a thin wrapper and all the actual functionality is in their proprietary server code, then it's hugely disingenuous.

The code does not depend on the infrastructure. You can execute the Pig code locally on your machine, or on your own hadoop cluster. This really is a 'free' give-away of code.

Fantastic! Thanks.

You can easily run the Pig code they released yourself.

I can't help but feel that this is akin to complaining that someone gave you the cart, but not the horse. You can buy or build your own horse, or rent someone else's. It doesn't stop the cart from being freely given.


The code is all released under Apache, but is all the code needed to use this thing released? If the parent poster is accurate that a user still needs to engage with their platform, this conversation is just pedantics and sophistry over what "open" means.

You can run the code on your local machine, on Amazon ElasticMapReduce, or on your own Hadoop cluster. This really is a give-away of useful open source code.

It reads like "open source but not free to make proprietary." First, it's awesome just to see source as something to learn from. Second, it seems reasonable they don't want people forking, modifying then profiting from their work without contributing back to it - either by also releasing source or by paying.

I think it's a nice model actually.

It's opensourced under Apache 2.0 license. That means, it's free as in freedom, with all the legal ability to be forked and profited from. AFAIK, the only substantial difference between Apache 2.0 and GPLv3 is that APL2 is not copyleft license. That means, one's fork is not even required to remain opensourced (as far as it contains a reference to original APL-licensed version).

So either Mortar opensourced some feature-crippled fragment of their platform, and it relies on features from their proprietary platform heavily; or the statement of requirement Mortar account is property of the Tutorial's approach, not the opensourced code itself.

It's called the GPL. If that's really the model they're trying to create, it would be nice if they just used the GPL.

Anyone know of any comparisons between this and Apache Mahout? I've used Mahout's Item-Item recommender in the past, and it's worked well, just wondering if there were advantages to this recommender.

I'm sure plenty of good karma (even the non-HN kind) is headed your way - kudos.

WOW, Awesome Documentation and Product!! Kudos and Greetings from Germany 😊

Those who know what Hadoop, Pig and the whole "Data Science Stack" is, will find this surely useful.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact