

Help us develop a default algorithm for ranking NeoCities sites - kyledrake
https://github.com/kyledrake/neocities-web/issues/36

======
noelwelsh
Repost of the comment I left on the Github issue:

This is going to be a bit of a drive-by comment because I don't have a great
deal of time. Apologies in advance.

Firstly, I think you need to define the problem more precisely. I would define
it thus: "the algorithm should return a list of sites ordered according to the
likelihood of the user finding them engaging." Very quickly you need to decide
if you are personalizing this list per-user or not. I'm assuming you're not.

With this definition we can go places because: 1\. There is a huge literature
on measuring engagement; and 2\. There is a huge literature on generating
rankings.

For the former we can start by looking here:
[http://www.dcs.gla.ac.uk/~mounia/](http://www.dcs.gla.ac.uk/~mounia/)
(There's lots more work in the field. I just happen to have some familiarity
with her work.) For the latter searching for "submodular diversity ranking"
will return a lot of papers with complicated algorithms. Yay!

The number one problem I see is that you don't have sufficient information to
get a useful measurement of engagement. If you don't have useful feedback you
can't learn if what you're suggesting is any good. At the minimum you want to
measure clickthrough. Better would be time-on-page (possibly normalised by
page length). Solve this problem, even poorly, and we can them look at using
this to generate rankings.

As for generating rankings, while the submodular optimisation approach has
some very nice properties (e.g. it maintains diversity in the suggestions)
there are simpler methods to start with. Let's say we have some measure of
engagement. We can put a confidence interval on this measure using standard
maths. Then just display the sites that have the higher upper bound on the
confidence interval. You should decay the measure over time, widening the
confidence interval, to allow for changing interests.

Hope that's useful.

------
thaumaturgy
Here's one approach:
[http://pastebin.com/af8N2hjE](http://pastebin.com/af8N2hjE)

I find that for stuff like this, it's nice to have a lot of tunables that you
can tweak to your taste, and basic exponential (or exponential decay)
functions seem to do a pretty nice job at that.

I set up some default values and ran them through a few imaginary site
profiles, and the scores came out pretty sane -- ~ 1.3 for a brand new, empty
site with an email address, ~ 173 for a basic site that's been around for a
while and has a little traffic, ~ 1875 for a popular site with content that's
getting some traffic and was updated recently. If e.g. the number of days
since the last update is more important to you, then just increase the value
for that tunable and you'll get a larger "falling off the cliff" result for
sites that haven't been updated recently.

None of this is very advanced math. There's probably a cooler, more mathy way
to do it. But it should do the job.

------
didgeoridoo
>"Also, there's a lot of subjectivity here."

This feels like a case where subjectivity is the PRIMARY consideration. All of
the proposed metrics are trivially gameable, and anyway don't represent
"goodness" in any way.

I'd throw this one to the users with a typical popularity/voting dynamic, with
a twist: separate voting and leaderboards for "best neocities site" and "best
geocities site", roughly corresponding to "good + modern" and "entertaining +
retro". It would be sad to have the top sites populated entirely by slick
parallax and flat design when Neocities is supposed to be a direct descendant
of the weird and wild Geocities.

