
How to get your IP unbanned on HN - pg
HN has a hair trigger about banning IPs that request too fast (sorry about that; we don't have a lot of spare performance), so I wrote something people can use to get their IP unbanned once if it gets banned by accident.<p>http://news.ycombinator.com/unban?ip=&#60;ip address&#62;<p>Obviously you have to use it from another IP address, like your phone.
======
ck2
Use IPTABLES xt_connlimit to regulate overly aggressive client requests.

Since there are so few images on HN, there is no reason to have more than a
couple connections per IP on port 80.

It will radically reduce your server load and there will be no
blacklists/whitelists to maintain.

~~~
pandemicsyn
...except people also browse HN from behind corporate gateway's/firewall's.

~~~
ck2
It won't ban them, it just throttles simultaneous connections from the same
IP.

Unless there are dozens of people from the exact same IP and not an IP pool,
it won't be a problem.

The worst case is they will see a longer delay for an initial page load in
their browser by half a second. But it helps the server tremendously,
especially since HN seems to use Apache.

~~~
cbhl
In this age of NAT and IPv4 address exhaustion, it's not uncommon to have
dozens of people from the exact same IP.

~~~
ck2
...and if they are all hitting HN at the exact same millisecond, then their
connection should be delayed

HN serves with connection-close, not keep-alive, so as soon as one request is
done, the connection is freed for the next visitor on the same IP. This would
just force them to be in single file on a very quickly moving line instead of
requiring dozens of connections to be served all at the same time.

Think of grocery store with one super-fast express lane vs no express lane and
a dozen very slow cashiers and people with full carts ahead of you.

Don't knock connlimit until you try it. Again, it's not a ban, just backlogs
the requests.

~~~
cbhl
That sounds _better_ , but it feels like a band-aid solution to me. For
example, I worry about whether it will actually fix the load problems if a bad
network has lots of requests, resulting in a very long queue and lots of open
connections. It sounds like it's worth trying, at least.

------
smanek
pg: I have fair bit of lisp dev experience. If, as a weekend project, I
modified the HN src to use postgres and memcache would you consider using it
in production? Obviously, I don't expect carte blanche prior agreement, but I
wouldn't want to invest the time unless I thought it was plausible the work
could actually help.

I would expect it to solve most of your performance problems for the
foreseeable future (at the very least, by letting you scale horizontally and
move the DB, frontends, and memcaches to separate boxes - plus ending memory
leaks/etc by moving most of the data off the MzScheme heap).

The obvious downside is that it would use your (or someone at YC's) time.
First to merge the changes I make to <http://ycombinator.com/arc/arc3.tar>
into the production code, then to buy/setup some extra boxes and do the
migration. We're probably talking, roughly, a day. It also has the unfortunate
side effect of costing HN's src some of its pedagogical value, since it adds
external dependencies and loses 'purity'.

Been looking for an excuse to learn arc for a while now ...

~~~
marcusmacinnes
I suspect there's good reason why HN is still using this old codebase. YC
after all is not short of the cash needed for a complete revamp.

The site is very much _hacked together_ , but works... In a lot of ways, this
reflects the hacker ethos of getting something up and running quickly at low
cost while still producing value.

A revamp might have negative impact too by attracting a wider, more mainstream
audience which could possibly dilute the purity of the community here.

~~~
Osmium
> dilute the purity of the community here

Careful now :) It's not like there's anything stopping HN attracting a wider
audience anyway; there's no restriction on who can register. Anyone can come
and join in, which (in my opinion) is as it should be.

~~~
marcusmacinnes
Of course. I'm not suggesting that there should be any limitations on who can
join, but as the community moves more mainstream, quality will dilute. As the
site is rather un-sexy right now, it seems to attract those who are genuinely
interested. Remember what happened to Digg...

------
dylanpyle
doesn't sanitize HTML fyi - may leave you open for XSS

~~~
pg
Ack, what was I thinking? Fixed. Thanks!

~~~
tptacek
The same thing every smart developer who ever committed or deployed a line of
vulnerable code thought: "I'm just trying to get this feature done, not write
a formal proof". You're in good company.

~~~
javajosh
It makes me think that one non-negotiable feature of any webapp architecture
is to detect situations when inbound strings are placed in any context where
they can be interpreted as code, and either refuse to run or at least spit out
a severe warning.

And there are no webapp architectures which do this.

~~~
samstokes
Yesod (a Haskell web framework) tries pretty hard. e.g.
<http://www.yesodweb.com/book/shakespearean-templates>

~~~
javajosh
Cool, thanks for that.

(My hobby: posting "nothing like X exists" in a Hacker News thread. :)

------
someone13
Do you have a rough set of guidelines for how fast we should request from HN?
For a side project, I was thinking of writing something that scraped the HN
frontpage and all the associated comment threads every 10 minutes or so, and
I'd rather not cause performance issues or get banned. I'd be happy to rate-
limit requests to whatever is convenient.

~~~
unreal37
May be better to use the official API.

<http://www.hnsearch.com/api>

~~~
laumars
That's not an official API: <http://www.hnsearch.com/about>

Quote: "HNSearch was built by the team at ThriftDB to give back to the
community and to test the capabilities of the ThriftDB flexible datastore with
search built-in."

Interesting API all the same though.

~~~
zargon
Regardless whether it is official or not, it is pg's preferred api:
<http://news.ycombinator.com/item?id=4694308>

------
freditup
I'm curious to why HN would be walking such a performance tightrope. I could
speculate, but it would be uninformed rambling, so I'd love it if someone more
knowledgeable than I could explain.

~~~
grinich
It's a side project by a couple of guys with full-time jobs, written in an
experimental Lisp dialect and running on a single machine.

~~~
akkartik
The last bit is key. HN is served off flat files, and caches state in-memory
in global variables. That -- and not cost -- makes it hard to add a second
machine.

~~~
sneak
It also makes it nearly impossible to slowly read one's own comment history,
as the "next" pagination links are session data dependent and are garbage
collected quite frequently.

This is, quite possibly, the worst webapp I use on a regular basis.

~~~
vacri
Thank you for explaining why the 'unknown link' happens at all. It's terrible
- same with when you spend time thinking and formulating a response, only to
see it disappear with the same error.

~~~
TeMPOraL
Back button usually helps in recovering the response.

~~~
akkartik
Yeah, modern browsers are great at not losing form data. When I hit this error
I hit back, then reload, then resubmit with full confidence that my comment
will be unharmed.

So HN doesn't bother me so much. It's the #$%# smartypants webapps that try to
reinvent textareas in javascript that piss me off. I'm looking at you, Quora
and new Gmail compose.

~~~
csense
> modern browsers are great at not losing form data

I wouldn't know. I had a bad experience with a lost webmail ~7 years ago. As a
result, I ALWAYS copy form data to the clipboard before hitting Next.

~~~
hnriot
Maybe it's time to let it go... 7 years ago is the dark ages in browser time.
You are clearly the Adrian Monk of web users :)

~~~
csense
If the old ways work well enough, why bother to change?

I still call my Windows scripts .bat files (instead of .cmd, that's clearly
for OS/2 programs).

It was only in the last three or four years that I stopped naming my files
with all-uppercase names not longer than eight letters, with an extension not
longer than three letters, to be sure they would be compatible with a FAT16
filesystem.

I'm rather distrustful of GUI's for doing things like moving or copying files.

I never drag-and-drop files into programs, partly because I seldom use GUI
file managers, but mainly because most programs didn't support the metaphor
when Windows 95 first came out, and I haven't bothered to check if things have
gotten better yet.

Given these facts, you might find it surprising to learn that my age is less
than 30.

------
malandrew
Awesome. I've gotten my IP banned several times after the browser crashed and
I reopened the tabs (I had too many HN threads open prior to crash, enough to
trigger the ban)

~~~
saurik
Yeah... if I open Chrome I am pretty much guaranteed to be banned for days. :(
The mechanism should really be changed to account for this: a ton of requests
per second for only a few seconds should not trigger an issue, it should be a
number of requests per second spike along with some sustained usage per
minute. I actually made modifications to Chrome to change how it loads tabs
mainly because of Hacker News' weird IP ban system, but I still got burned
recently as I accidentally hit "undo close tab" one too many times, which
reopened an entire window.

~~~
ars
On firefox turn on the option "Don't load tabs until selected". I don't see
this option in chrome.

It speeds up browser startup dramatically. Especially when you leave lots of
tabs open as your "to read" list.

~~~
saurik
Yeah: I ended up figuring out a way to add it. I now generally like having the
feature, but it was a complete necessity due to the Hacker News IP ban rules
(although, as I mentioned, still doesn't solve the underlying problem for this
site, which is incredibly touchy).

<http://news.ycombinator.com/item?id=4717730>

------
evx
In my experience the banning is too strict.

It is triggered very quickly and it seems to last forever (maybe 15min would
be better?).

I ask pg to kindly consider making it a bit more lenient.

I doubt HN goes under deliberate/malicious attacks, etc...

I'm making a HN extension that preloads some data such as the comments and the
links on the next page (it's still with reasonable delays).

But at the moment it's impossible for it to function without risking the user
getting banned.

~~~
EwanToo
I've no doubt that HN is under pretty much constant deliberate, malicious
attacks.

Pretty much any site with decent traffic is under constant attack, and the
high profile of HN means it'll be under far more scrutiny than others.

------
nkurz
Repost from "Show dead" that relates to this issue:

 _[−]sunstone1 10 hours ago | link [dead]_

 _Well I never had my IP banned but I did have my account hell banned after
about a dozen posts as you can see. Oh, actually no, you can't see, because
it's banned. No, I never bothered to get another account, now I'm just a taker
not a giver._

Most of the time it's clear why a user was banned, but looking at sunstone's
history I don't really see a reason. While the algorithm will never be
perfect, it would be nice if there was a clearer solution for misfires.

------
CWIZO
Great news! I was banned last week
(<http://news.ycombinator.com/item?id=4736919>), the bann was lifted in the
meantime. But this will come in handy the next time I'll be developing an
extension for HN and will refresh it all the time :)

------
gprasanth
Oh! The benefits of dynamic IPs.. :)

------
mindslight
Well, I might as well try striking while the code is hot..

It occurs to me that I would like to interact with noprocrast in a different
manner. Currently, I leave noprocrast disabled most of the time. I like to use
longish minaway times (~day), but this makes me feel as if my first visit to
HN will start the clock ticking, and I'd better be sure to get my HN fill
before the timer runs out (yes, this is kind of ridiculous). So I only enable
noprocrast (with a short maxvisit) upon realizing I'm stuck in a web loop.

The mechanism that I envision is either a button that immediately starts a
one-shot noprocrast ban, or a page-count based maxvisit. The latter might be
better since it could always be left enabled.

------
exolxe
Interesting fix. Is the trigger weighted based on request action type or user
karma?

------
ddod
Thanks Paul! I'm reluctant to try this in conjunction with developing any HN
scrapers since I'm not sure what set it off in the first place and your
language suggests it will only unban the IP once (I will, however, make sure
the CMU IP I was using gets unbanned). It would be helpful to know what,
precisely, that hair trigger is so we can make sure to avoid it.

