
Beyond Amazon: How to Make Recommendations Smarter - Tichy
http://www.fastcompany.com/1721659/how-recommendations-could-get-smarter
======
mgkimsal
"Researchers won’t be able to look at the data, but they will be able to dump
their algorithms in and have the box spit out results, which the researchers
can then use to refine their hypotheses."

I may be thick, but I'm not sure how much value there's going to be if
researchers can't get access to the data. Anonymized data would still be more
useful than a black box. I can think of some general things that can be
tested, but it still seems like this will be an uphill battle for people
wanting to use it.

But... I'm not a maths/algorithm guy, so perhaps this will still be useful to
people.

~~~
martey
I think that _truly_ making data anonymous is really hard. See AOL, Netflix,
or almost every other company that has tried to release an anonymous dataset.
By "black boxing" the data, the company in question will still be able to keep
control of it.

------
izendejas
What a simple idea that will go a long way. There are great many
algorithms/models out there that are seemingly great and do wonders on toy
data, but in the real world fail not only because the data isn't comprehensive
(it isn't sampled well), but also because people misuse existing models and
tweak them enough to work for that data and those tweaks make it on acclaimed
journals/conferences. These kinds of efforts will help tremendously to filter
out all the noise because now, you'll have to show results for real-world data
to argue that your algorithms can generalize and/or that your tweaks actually
matter.

Besides Amazon, Google has taught us that more data is better and sometimes
less complex algorithms that scale but give you only a marginal hit on
accuracy are preferred; ie, cost functions matter, and that's something you
don't get practice with in the academic world.

When you think about data sets to boost performance, think outside the
directly dependent ones. There is so much data out there now and it's not
being used. This I think will change soon.

~~~
cdavid
That's indeed a good idea. It does not solve one real issue with
recommendation, though: the objective measure is whether people buy more
stuff, which means running experiments not only on real data but in the real
business environment.

~~~
izendejas
Right, the cost functions could be defined in terms profit maximization (thus,
the cost being lost sales, eg).

Either way, these experiments are very difficult to control because you can't
fully validate models by predicting sales a posteriori with holdouts once you
have the data as you don't know if some algorithm might influence certain
sales not reflected in such real-life data. You could try a/b testing, but I'm
not sure the Amazons are willing to run someone's models very readily. Either
way, this is a good thing.

