
Hacker News Needs Honeypots - tansey
http://www.nashcoding.com/2011/10/28/hackernews-needs-honeypots/
======
pg
"Nevertheless, I do believe that we are seeing a continuing trend downward in
overall article quality on the front page."

I agree comment quality has decreased, but I'm not so sure about the
frontpage. I created <http://news.ycombinator.com/classic> so I could detect
frontpage decline. It shows what the frontpage would look like if we only
counted votes of users who joined HN in the first year. Usually it looks the
same as the frontpage, but with a time lag because there are fewer voters.

~~~
tansey
But that seems like a poor heuristic in general. For instance, maybe the early
users are not necessarily great at filtering articles, they were just good at
not submitting off-topic ones. A honeypot approach is a much more flexible and
robust way to enforce guidelines without visibly punishing users.

~~~
pg
Yeah, you may be right. I might try this. Does anyone have any opinions about
whether this is a good idea, and if so how best to do it?

~~~
DanielBMarkham
Something about this bothers me. Can't put my finger on it.

I think the assumption is that there are articles so bad that it should be
obvious that they are off-topic. In the past, I've submitted things on the
border. I imagine many others have too. I've always counted on the groupthink
to correct any errors in submission I might make. This seems to assume that
there are hard guidelines. Watching the board over the years, I'm not sure
that assumption is accurate.

Put differently, if you had a cache of really bad articles, shouldn't we see
them? That way we'd know not to submit. But if you already know they're bad,
then what's the point of voting or flagging?

Perhaps I'm just mentally adrift here. Honeypots make sense to me when we are
talking about boolean things: a website visitor is either harmful or not. An
email sender is either a spammer or not. But I'm not sure at all that this
concept applies to something like an essay. Seems like if it would, you could
just use the flagging behavior mentioned to rank the articles and dump
everybody else's votes. Right? This is like verifying the voting behavior by
setting up some completely different system to rank quality detectors. But if
you could rank quality detectors, why keep the old system? And if not, how
would you separate which parts of which system are useful and which are not?

~~~
DanielRibeiro
I think it is more and more becoming clear that google's approach is the best
way: your ideal first page has to be personalized (when you are looged in,
Google personalizes your results with location, previous searches, +1s, and
even who you follow on twitter).

Trying to be everything to everybody means there will be people left with sub-
optimal results.

~~~
DanielRibeiro
For those downvoting: Please share your thoughts. If you disagree, I'd love to
know why.

~~~
palish
It's not you; it's that pg has said many times (and I agree) that fragmenting
the front page is a bad idea. We must all see the same front page to judge the
same quality.

~~~
DanielRibeiro
Interesting. However, many people already filter the first page, let it be by
points, or by the twitter accounts:

<http://twitter.com/#!/newsyc20>

<http://twitter.com/#!/newsyc50>

<http://twitter.com/#!/newsyc100>

<http://twitter.com/#!/newsyc150>

And from each filter, people auto-select things that interest them. Sometimes
I only see a story when it is retweeted to me.

And this you can't prevent. It lies on the fact that different people have
different definitions of what "quality" means.

Which is the core problem highlighted by linked blog post.

~~~
jvdongen
The front-page of HN is a filtered list in and of itself. And I for one
actually like the result.

Some people would like to see a different filter and thus create one, because
they can. That is indeed not something you can, or should want to, prevent.

But the fact that some people create their own filters is not a motivation to
_not_ tweak the HN front-page filter in such a way that the front-page matches
the intended goal (pg's goal in this case, presumably adopted by the majority
of HN readers, more-or-less codified in the guidelines) as closely as
possible.

------
ryan-allen
We're assuming people are using the vote buttons to vote, and not save
articles for later. What if a percentage of people said "that looks good, that
looks good", vote for them so they can review them in their 'saved stories' at
a later date?

What if HN would not allow people to vote on things unless they actually
clicked on the link?

~~~
thegrossman
I think this is a good suggestion. Even worse than people using the vote
button to save stories for later: I've found myself occasionally voting on
articles because I _assume_ I'd like the article, without reading it.

I scold myself whenever I catch myself doing it. But I bet others do the same
thing.

~~~
Wilduck
I do the exact opposite. I only upvote articles if I read it and know that I
will almost certainly want to read it again later. I'm not sure if this is
better or worse practice...

------
necro
The quality is relative. As HA has gotten more popular the level of experience
of the average user has gone down, and they now dicate more what "quality" is.
In the past you had a higher percentage of core "hackers", and they were
posting things that they find of quality, but now you have much more varied
submitters and voters. More people equals lower lowest common denominator.

To fix you need segregation. A ultra code/tech area, a business/start up area,
and a fanboy/fluff area. You basically want to give high signal to noise ratio
for the different groups of people. For example for me, I would just visit the
code/tech area and not have to deal with all the noise of the other sections.

------
eric-hu
Am I missing something with the second formula?

h = (f - v)/(s * t)

The perfect flagger would have v = 0 and f = s. His total flag score is t = s
+ x, where x is the number of non-honeypots flagged. His score would be:

h = s/(s * s + s * x) = 1/(s + x)

As s => infinity, h => 0\. This score would actually _punish_ a good flagger
over time, no? A perfect veteran flagger with 100 for 100 honeypots flagged
would have a lower honeypot ratio than a perfect newbie flagger who's 5 for 5
honeypots flagged.

~~~
tansey
Correct you are! I've updated the article with a fixed version of this
formula.

This version should in fact rank the 5 for 5 guy slightly lower than the 100
for 100 guy. It's still not perfect but I believe it does its job in theory.

Thank you for pointing that out. :)

~~~
eric-hu
You're welcome!

The current version looks better. I would add that the flag adjustment should
also factor in honeypots seen. This version will punish flaggers (good or bad)
with a large flagging history who've seen no honeypots.

-(1-(f/(t+1))

goes to zero as t goes to infinity--desirable when someone has seen and not
flagged honeypots, but not as desirable when someone hasn't (which would be
any user with many flags at the time of algorithm implementation).

I went back and read your footnotes. Footnote [1] is indeed a linkbait-y
article. To me, it demonstrates a behavior described in another comment here:
upvote as a save function. The title looks interesting, and in the middle of a
work day, one may not have time for a long article. There's even more
incentive to use it as a save as the front page volume cycles more.

Personally, I think that this 'noise' in upvote value can be mitigated by
adding a separate save function and perhaps even eliminating an upvote history
visible to the user (migrating current upvote history over to save history
first so users can still access their clippings).

------
Mz
Let's so _not_ do this. The worst possible way to educate people is by showing
them all the things they shouldn't do. It's painful and slow compared to
cutting to the chase and showing them what to do. I see no good coming of
this. None whatsoever.

~~~
tansey
The whole point is you show them _nothing_ with honeypots. They simply get
their vote penalized in the background because they didn't follow the
guidelines of the site.

When a site is growing, there is no way to handle the constant influx of new
users. The result is a dilution of quality on the front page, at least as
measured by the guidelines of the site.

~~~
Mz
_When a site is growing, there is no way to handle the constant influx of new
users._

Just because we don't yet have an established means to effectively do so does
not mean it cannot be done. I would rather work on solving this question and
other important questions concerning managing culture. Manipulative tactics
like this have very serious limits and substantial downsides. I used to give a
lot of parenting advice and I can't tell you how many parents have essentially
asked me "How do I manipulate my child into being less manipulative?" And the
answer is you can't. They learned that crap from you. Don't like it? Then stop
doing it.

The same applies to online forums. "Do as I say, not as I do" fails as a
moderating tactic just as badly as it does as a parenting tactic, only worse
in some ways because it's magnified thousands of times (ie by the number of
members emulating what the leadership does) rather than a handful of times (ie
however many children you have at home doing the same stupid stuff the parents
are doing).

~~~
tansey
I am not sure if we disagree or if you are fundamentally misunderstanding the
proposed solution.

There is nothing preventing using both methods simultaneously. Your approach
is to teach people how to act. My approach is to punish people who break the
rules. They are not mutually exclusive techniques.

~~~
Mz
A) Unless I completely misunderstood your article, you are suggesting the
admins put out links to things they do not approve of to see who screws up and
upvotes them. In other words, you are suggesting that admins basically model
what not to do by doing it themselves. This is precisely one of my points:
Don't do yourself what you don't want others to do. Lead by example.

B) There is a time and place for "punishment" but it should be a last resort,
not a first line of defense. It fosters an uncivilized environment and is
therefore counterproductive to solving the issues people here most strongly
express concerns about.

So I can't say I agree with your assertion that they are not mutually
exclusive techniques. They mostly are in my experience.

~~~
tansey
_> Unless I completely misunderstood your article, you are suggesting the
admins put out links to things they do not approve of to see who screws up and
upvotes them._

Great! We now know that you did not completely understand my article. :)

The whole point of implicit honeypots is to leverage the fact that articles
are already making it to the front page that violate guidelines (e.g.,
politics, religion, etc). The admins can then flag _these_ articles, so as to
not have to spam their own site.

~~~
heelhook
I have to say, I tend to partially agree with both of you: admins shouldn't be
submitting articles that won't add to the discussion, so I would drop the
first part of your proposal.

For submissions that have already made it into the site and are detected to be
honeypots, _those_ votes and flags would be used to punish users.

------
Joakal
Could go fully complicated path;

1) Public voting: revealing who votes for who on articles. If people want
their votes public, they can mark themselves so (Hopefully opt-in to public
votes).

2) Blacklisting voters; let people mark public voters as bad as a form of a
blacklist. May lead to haunters who post but no one can see.

3) Whitelisting voters; only those who vote for articles are valued more or
absolute. May lead to 'power voters' seeking votes but that happens already
"Vote and add to the HN discussion here".

People seem to crave certain votes over others. I have no idea regarding
comments. It's a mixed bag.

~~~
prodigal_erik
Even more complicated would be
<http://en.wikipedia.org/wiki/Collaborative_filtering>, increasing the weight
of votes from users who vote like I do.

------
rdl
The main feature I'd like is a killfile. USENET had this; too few websites do.

I'd like a way to 1) killfile comments by particular users and especially 2)
killfile articles by keyword or submitter.

~~~
PotatoEngineer
That's easy enough to do with a Greasemonkey script or other extension. Are
you interested enough to write it?

------
there
there's already enough junk on the /newest page that makes it difficult for
legitimate articles to gain traction. adding fake articles to it is just going
to make the problem worse.

i would rather see flagged articles get removed from /newest quicker, and have
some mechanism for letting articles with at least 1 other upvote, or maybe
those submitted by users with enough karma, to linger on /newest longer than
they would otherwise.

~~~
tansey
_> adding fake articles to it is just going to make the problem worse_

Hence the implicit honeypot extension that I proposed. :)

~~~
there
your honeypot concept appears to be targeting sockpuppets or voting rings that
try to vote up bad content that gets flagged by legitimate users. that's not
the same as legitimate articles that don't get enough upvotes because they
roll off of /newest too fast.

even with zero spam, the site is now big enough that legitimate articles can
get submitted at a rate that makes the /newest page move too fast for things
to get traction.

~~~
muxxa
If the /newest page is moving too fast, how about aging articles not based on
time, but on the number of times they have been clicked by users.

------
droithomme
It seems this method works on the same sorts of psychology as when the Russian
KGB would test people to see if they are loyal to the state by having agents
make anti-government statements, and then see which citizens report them.

------
pilooch
Another missing feature is the automated grouping of similar posts. Having
five times the same (tech gossip) news on the first two pages is annoying,
even sad, considering the state of the art of computer science these days. HN
could definitely be a better flagship for hackerness IMO...

~~~
DanBC
> _Having five times the same (tech gossip) news on the first two pages is
> annoying_

Especially when they're all to blogs with sloppy / link-bait / contrarian
reporting of the same source article which is usually more interesting.

------
jeswin
Although not exactly the same, honeypots are similar to meta-moderation on
slashdot. I am not sure if they still have it, or if it played any part in
slowing a decline, but slashdot is very ordinary these days.

Personally, I would love to see more startup related articles. I don't care
about the stuff I might pick up from elsewhere, such as Techcrunch articles,
or Gruber's opinion, or how A sells more than B, or Politics.

IMO, front page can do without: 1\. Google denies requests to hand over data
2\. Samsung overtakes Apple. Last week, Apple overtook someone else. 3\. Gates
to students, "....." 4\. Righthaving, copyrights and piracy 5\. Forrester's
thoughts about supporting Macs in IT 6\. Stallman v/s Steve Jobs. 7\. Ripples
visualization.

------
neworder
Maybe the article itself is a honeypot.

------
hugh3
It seems to me that instead of worrying about fancy ways to enforce the
guidelines, perhaps the guidelines should be rewritten to be more explicit
about what is and isn't on-topic.

This:

 _Most stories about politics, or crime, or sports, unless they're evidence of
some interesting new phenomenon... If they'd cover it on TV news, it's
probably off-topic._

is pretty vague.

The worst threads are the ones straddling the line between "politics" and
"economics", where a lot of people with bees in their bonnet get a chance to
wheel out their favourite hobby horses (with apologies for mixing equine and
apiaristic metaphors). These are the stories I'd like to see squashed,
somehow.

------
angelbob
This is deviously awful, yet highly effective. It makes for a desperate
underclass of the automatically ignored, yet never lets them _know_ that they
are members of said class

Better yet, it accuses them of deserving it.

Ten Machiavelli points to you, sir.

~~~
tansey
Thanks? :)

But this is not serfdom. You have mobility in the case of implicit honeypots,
because if you follow the guidelines well then you'll float to the top and
become a super flagger. And even better, if you stop consistently upvoting
crap, then you will rise from the ignored to the heard again. :)

~~~
angelbob
Ah, but only if the honeypots are chosen to match the explicit and stated
guidelines -- which are currently often ignored. Should that drift, the tone
of the site will change slowly, invisibly and inexorably, and the old guard
will be automatically shifted out with a silent coup de grace.

The honeypots become a way for moderators to upvote or downvote the _whole
tone of HN_ and do so _without telling any of the users_.

I look forward to the bot- and crowd-based tools that will evolve to watch the
front page of the site and try to guess which articles increase or decrease
your HN influence. It's a mathematically interesting problem.

~~~
ja2ke
You probably meant "HNfluence."

------
jorangreef
You could try and trap with a honeypot.

Or you could educate and convert.

It may be that the decrease in comment quality is at least due to an increase
in exposure of the wrong sort of entrepreneurial hacker motive "take VC, do
whatever it takes, exit, be financially independent" as opposed to the right
sort of entrepreneurial hacker motive "serve the community honorably at a
profit", in the spirit of Packard, Hewlett, Bezos, Edison, Ford, Watson.

The arc of the startup has become more about 15 minutes of fame, and less
about hundreds of years of employing thousands of people. More about not
offending and not doing evil, and less about asserting truth and doing good.
Culture has become more about free lunches and less about doing hard things
and standing in the gap when it hurts. Some have forgotten what humility
means, that "we are all grains of sand", that we exist "for others" not "for
ourselves", to serve and not to take. And some no longer believe this is even
possible.

If HN will begin to reward the others-centered motive, and rebuke the self-
centered motive, then the ground will be prepared for the true startup spirit
to again take root and flourish. If we can educate the next generation of
hackers, and get the motivation right, the methods will follow, and there will
be less and less need for honeypots.

To do this, there needs to be a Hacker Credo, and it needs to be at least as
radical as the Johnson and Johnson credo, and as definitive and steadfast as
Henry Ford's magnum opus "My Life and Work".

~~~
pbhjpbhj
> _You could try and trap with a honeypot. / Or you could educate and
> convert._ //

So something like when I downvote/flag you put an info box up saying "this
post was upvoted by 94% of top ranked users, are you sure?"?

------
ehsanu1
_If the h-ratio of a user is greater than an admin-specified threshold, we
flag the user as detrimental to the overall quality of the site and their
upvotes would either be discounted or ignored entirely._

Nit-picking here, but I suppose tansey meant "If the h-ratio is _smaller_ ",
rather than "greater", since you'd want to ignore upvotes from those who
upvote honeypots too much, rather than flag them too much.

~~~
tansey
Yes, thank you! I originally had the formulas differently and forgot to update
the text to reflect the flip. :)

------
16s
What is link-bait? Seems open to interpretation. Some topics are highly
controversial and bring passionate views out, but should they be banned
because of that?

Honestly, I much prefer the pure coding articles or stories about code or
coders than the "how I launched in 36 hours and had one million users"
articles. In my opinion, the latter are link-bait and decrease the value of
HN.

------
rkalla
I don't expect this comment to get read, it's likely buried under the 100+
already here, but here is my two cents on the subject...

Chasing spammers with greater and greater automated systems inevitably starts
catching real people in the net. People that don't _know_ they are in the net
and people who otherwise contributed to the community, get enraged at the fact
that their contributions are obviously being ignored.

Over time, the only people that get through the ever-growing net of automated
spam blocking are smaller and smaller, eventually turning the site into an
effort driven by a small group of users so highly rated that the spam
algorithm simply doesn't look at them anymore. In Digg v3 parlance "super
users".

Digg had one of the most advanced anti-spam algorithms in social news for 3.0
and they STILL couldn't control it as the site became dominated by a few
select people that has escaped the initial watchful eye of the spam-
algorithms.

Once their "rep" was high enough, they became impervious to getting knocked
down by it.

Unfortunately for all the new users, there was no hope unless they played
EXACTLY by the rules of this nebulous anti-spam algorithm that no one was able
to tell if it was doing a good job or not... unless you had people manually
review the spam submissions all day long which is impossible at this volume.

The net-net of these honey pot and highly advanced ideas is that you catch a
lot of decent people in the net and they have no way of getting out.

That is a lot of time spent on fighting a battle that isn't really the right
focal point.

The reality is as this sites popularity grows, submissions and comments _are_
going to get more normalized. That is the nature of folding more and more
people into the mix.

That isn't spam, that is human nature.

Create a group (of any kind, like organizing birthday parties) of 3 people and
see how it performs and behaves. Now add 40 people to it... it will be
significantly less efficient and more "spammy" with stupid email forwards and
questions about international deserts being "appropriate".

This isn't spam, this is just the nature of a much larger group.

If you deploy a spam algorithm and start muting half those people, you might
knock out some of the distracting emails (at least ones that the person
writing the spam filter deems distracting) but you also piss off half the
group that goes elsewhere to contribute.

Digg v4 took this to an extreme and we saw what happened with their community.
Reddit still plays by their original rules even though they _dominate_ the
social news sector with traffic and they manage just fine.

If HN was crushed by pharma submissions and link bait I'd say we have a
problem, but traffic seems to continue to grow and I haven't seen any obvious
degradation in the last year.

I am sure that HN of today is much _different_ than HN of 3 years ago, but
that doesn't necessarily mean worse. If the people complaining about HN's
quality really mean they just want a different type of elite site that isn't
open to all this riff-raff (I consider myself riff-raff), that is a lot
different problem than spam-blocking.

This idea that every submission should be amazing and every comment will make
you cry because of its intelligence is not realistic.

The site is fine.

------
codex
If this proposal were implemented, I might only upvote comments and articles
that I think the top HN users would like, not necessarily the ones I like,
turning HN into a cliché of HN. "How to bootstrap your minimum viable product
using Node.js". "Scala, Clojure, or Erlang?". "LISP for Bayesian A/B testing."

I'd much prefer a system which correlated my votes with other users and
preferentially showed me articles and comments which matched my own tastes.
Sure, if I only upvote to match my own biases, I'll get more biased articles.
But if I also upvote good but contrarian opinions (and I would) I'll also get
more good and contrarian opinions. Best of all, this encourages non-strategic
voting--so, later on, if you find a good use for someone's voting record, you
can trust the veracity of that record.

------
yaix
Things tend to look better the further you move them into the past. Nostalgia.
Things just aren't anymore what they never used to be.

HN was and is the same. IMHO the best place of its kind. While the article
shows an interesting formula, I don't think there is a need for it on HN.

------
adulau
The idea is neat. Concerning the terminology, it might be more appropriate to
call it a honeytoken ( <http://en.wikipedia.org/wiki/Honeytoken> ) than a
honeypot.

------
AmazingBytecode
No one will upvote anything anymore. They'll move their cursors to the up
arrow and pause for a moment.

What if this is the one? What if this link is the buried landmine that will
explode and destroy my perfect Hacker News karma score. I can see the
headlines now: "Respected Hacker News User Clicks on Obvious Flamebait" Think
of the scandal.

And then they'll move their mouse cursor away, pining for a HN where they can
express their opinions about articles without worrying about what the group
will think.

------
pilooch
HN needs to grow a social graph in the background. Users often fall within the
same threads and discussions without even noticing it. The graph should then
be used to personalize results, on a group or individual basis. This a call
for a fragmented view, but with a social touch, preserving the herds around
multiple topics. A button could let you opt out the personalized view.

~~~
parfe
Reddit attempted this but gave up. From what I can remember, the personalized
stories were not noticeably better than random.

For instance, here on news.yc, I posted in a Iphone thread today, but I don't
actually want Iphone news highlighted. I posted in a Steve Jobs thread, but I
certainly did not enjoy the cacophony of stories that flooded the main page
following his death.

------
johnsonman
This discriminates against users who cannot flag and users who do not flag.
Since the only way to improve one's h-value is to flag more honeypots, it
basically means that someone who can't/doesn't flag will have at best an
h-value of -1. So would the h-value not be counted for people with few/no
flags?

------
shasta
I don't think you need explicit honey pots. Just give a select few special
up/down votes that mean "this should never be flagged" and "this should never
be upvoted". Then use this labeling of the data to compute a metric for vote
quality. (And I'd recommend these moderation type votes be retractable).

------
mrcode925
Interesting. I've mostly been a consumer here and rarely submit or comment but
even so I never realized there was a guidelines page. Honestly, unless I need
to find a "contact us" link I rarely look at the footer or any website.
Perhaps a little more visibility of its existence would go a long way.

------
DanBC
Honeypots would be great - they'd sort out lots of problems.

But there's still the problem of people submitting lousy articles; or
submitting blogs / reports about an article instead of the original article.
These aren't just new users either. Some of them are established long time
users.

Some way of sorting those would be useful.

------
three14
It might be more effective to simply have a page showing sample good comments,
so people have a reference point when deciding to comment. I'm not sure how to
pick a list of good comments, but it would be best if it specifically included
examples of comments that went against HN groupthink.

------
nickknw
I think it is a really neat idea, and certainly couldn't do any harm.

It is targeted to solve a specific problem that DOES occur on HN. Not
necessarily every single day, but often enough that it would be nice to have a
countermeasure.

I do think that implicit honeypots are the way to go, rather than explicit.

------
ck2
Can't the same problem be solved by giving additional qualified members the
ability to downvote?

------
zerostar07
What about an initial burden-to-entry, of the sort of peer review that
journals do (I.e review of links from the community before being up for
voting). We ve been testing it on <http://textchannels.com/>

------
joshu
This is just supervised learning. No need to be so fancy about it.

~~~
tansey
It's not really supervised learning. I suppose one could argue that the
bootstrapping phase resembles supervised learning, since you are in fact
measuring how well each user discriminates between honeypots and acceptable
articles. However, after that it's more like unsupervised learning.

If I were going to use this terminology, I would say that implicit honeypots
are a generative model that is bootstrapped via a discriminative learning
phase.

And who is being fancy? It's not like those formulas are that confusing, are
they? :)

------
qeorge
Might it be simpler to penalize people who vote for an article which later
becomes flagged/killed?

Maybe a temporary ding on their votes' impact.

------
swah
Since the decreased quality can only be perceived by a few (if everyone
noticed it, there would be no problem, right?) perhaps those could be selected
by the benevolent dictator to get 50-point votes or something? IOW,
moderation.

------
ThaddeusQuay2
Article quality, much like beauty, is in the eye of the beholder. Therefore,
quality assignment should be calculated specifically for a particular user,
based on all available criteria, using a method selected by said user.
Methodologies based on group-think or admin-think will always lead to a
measure of quality which is "ugly" or "bad" for someone, at some point. So,
center quality around what the user wants. In today's social networks, central
management of data quality is an absurd notion left over from the early days
of the Internet. It always leads to data deletion, user exclusion, or other
forms of censorship. All data should remain, but should be filtered, for each
user, based on what the user wants. To that end, the social network's job is
to provide more selection criteria, for all users, and better methods to put
that criteria to work, for each user.

------
raldi
Let's post some honeypot suggestions. I'll start with two potential honeypot
comments:

"Fuck Republicans."

"Fuck Democrats."

~~~
unabridged
yeah only take people who upvote both

