
Ask PG: Are there IP blocks? - halostatue
I feel a little silly submitting this via Ask, but I don't see any other way to ask this.<p>Yesterday, I was no longer able to reach HN via my work wifi. I can ping HN and get responses, but the web server explicitly sends nothing back:<p><pre><code>    % curl http://news.ycombinator.com
    curl: (52) Empty reply from server
</code></pre>
I can still see HN through other networks (I'm doing this currently through my wifi hotspot), and apparently am still active.<p>Has a block been placed on my work network? I have captured the work network's root IP address if it's needed.
======
pg
Yes, we block IPs that seem to be crawlers ignoring robots.txt. We've always
blocked abusive IPs, but I tightened up the blocking a few weeks ago. A lot of
people were crawling HN, most of them unnecessarily because they were doing
things they could have done more efficiently through HNSearch's API.

Some users may remember that the site had gotten really slow a few weeks ago.
One of the reasons it's faster now is that we cracked down on crawlers.

~~~
ronnier
I think it's time I shut down <http://ihackernews.com>. It's getting
impossible to keep the IP address from being blocked. Really unfortunate,
because a lot of people like to use the site.

~~~
pg
If you tell me the ip addr I can whitelist it. pg@ycombinator.com

~~~
ronnier
Wow, thanks! Email sent.

------
Pent
Beware of a Chrome extension called "Hacker News Sidebar", it presumably got
me IP banned this week. It cross checks every page visited with HN to see if
it has a thread and if so, displays the thread.

Here is the extension:
[https://chrome.google.com/webstore/detail/hhedbplnihmkekhgma...](https://chrome.google.com/webstore/detail/hhedbplnihmkekhgmaoikgfbkjjaocnl)

~~~
halostatue
Hmmm. I think I have that extension installed, too. Out that one comes, too.

------
bialecki
I got blocked two weeks ago when I was playing with creating a "realtime" view
of comments so you didn't have to refresh the page. To test I had it polling
one story every five seconds and I think I left it running overnight. (Sorry
about that.)

Next day, no HN, so I spent the next week browsing HN on Firefox with a proxy
setup through an EC2 instance. Thankfully, my IP changed or the ban is gone.

For what I was doing the HNSearch API wouldn't have helped, but if there was
an API like the one at ihackernews.com that's running and live, that'd be
great.

------
sixtofour
I was automatically banned awhile ago, after doing something silly (checking
_all_ my bookmarks for dead links). The explanation was that the server
thought I was DDOSing. It was OK after a week or so. Maybe it'll work out for
you too.

~~~
noahc
I was doing something even more silly and was banned for 48 hours or so.

------
tlogan
Question: Is there are software which can be easily installed on apache or
inside app to detect crawlers?

We use ipban but that is not what we want: we want a system which can easily
detected "bad" crawler or "abusing" user and ban them for some time.

As of now, we have a simple script going thru apache logs and sending list us
list of IP and their activity.

~~~
cft
We use fail2ban (available as ubuntu package) very successfully for this. You
can point it to apache log and finesse the rules by browser string, URL or
whatever you want.

------
ronnier
At Amazon, it's not uncommon for both ycombinator and twitter to be
unreachable because of our IPs being blocked.

------
rwl
I also get empty responses when I try to browse HN over Tor. I assume this is
because my IP address looks like a spammer's. I, too, would like to know if
there are IP blocks, and what (if anything) legitimate users can do to get
around them.

~~~
pg
There were some people doing some bad stuff over Tor a while ago, so we banned
all the Tor exit nodes they were using. They seem to have given up so I'll try
unbanning these IPs.

------
ctide
Our work got blocked as well this week, it seems. I can still browse HN via
https though. I don't know if that's intentional or not. :)

Have you tried loading: <https://news.ycombinator.com/>?

~~~
mjdwitt
That sounds more like your work added some new rules to their filter that bans
some of the the content here. Using https would encrypt that content and keep
the filter from scanning it. Now you just have to hope that they don't ban it
by url.

~~~
ctide
No, I can assure you that's not the case, unless Comcast decided to do it. It
appears to be fine now, but for this week we were getting 0 byte responses
from port 80, but everything working fine over HTTPS.

------
tonyarkles
Hi,

Is there a proxy at work? I was experimenting with an HTTP proxy as part of my
thesis work a few weeks ago and found similar results. I didn't end up ever
solving my problem though...

------
ajju
Isn't it likely that the admins at work blocked HN via a proxy? A few years
ago my comments to HN were being mangled by a web proxy at work.

~~~
halostatue
I don't think there's a web proxy—but I could be wrong. Our sites themselves
are sites that might be blocked by your average proxy (no, it's not gambling
or porn).

I am also checking that hypothesis, but news.yc is the only site returning an
empty result from my normal reading list.

Until this morning, I had a hn-related Chrome extension installed that could
have been misbehaving.

------
SwaroopH
Our primary ISP (BSNL NIB) at work is entirely blocked as well. Bharti Airtel
is able to get access.

------
tomh-
I have a similar problem visiting HN from my mobile network. I get a 502
response then.

