

Ask HN: Are you running a web crawler off the following IPs? It's broken - latitude

Fellas,<p>Whoever is running a web crawler on one of the following IPs, check your code. It is sending some garbage as the User-Agent string, e.g.<p><pre><code>   RMID=007f010001c2513733c300ea; 
   SERVERID=w2; 
   SS_MID=672bfcf3-7c0e-492c-b4f5-c4006e444f5chdygi1wn;
   ss_lastvisit=1362572526119; 
   hbfec=OePK5KX7vFq7S0Uk-9GxBTrsv9aAR0LR8BxQl+rkdBh+nV56OyysSNYKFbLQ-; 
   gae_b_id=X2dhZV9iaW5nb19yYW5kb206NjFDcDJxVzZNRGxadHNGdEFmaTVOZ25...
   GOOGAPPUID=582;
   BBC-UID=556113279384de3e4f43d838f1fec2a6700c2f5db484512e9aa1244e...
   ...
</code></pre>
Over 4KB of what appears to be some internal crawler's state or data. Involved IPs are:<p><pre><code>   107.22.13.225        (amazonaws.com)
   207.97.227.247       (aux1-ext.rs.github.com)
   46.51.203.140        (amazonaws.com)
   54.235.37.145        (amazonaws.com)
   75.101.190.67        (amazonaws.com)
</code></pre>
The crawler is accessing only one page on my server, the one linked from the HN's front page, so I think it's someone from here.
======
latitude
Mods, stop it with the overzealous title editing, will you? Compare:

    
    
      HN: Are you running a web crawler off the following IPs? Fix it, it's broken.
    

to

    
    
      Ask HN: Are you running a web crawler off the following IPs? It's broken
    

This is _not_ "Ask HN", I wasn't asking and I don't want it to come across as
a question. If you feel like fixing capitalization, typos and punctuation -
fine, but then _stop_. If you edit in a way that affects the tone and style of
the title, then change the submitter name to your own! Because that title is
certainly no longer mine.

~~~
logn
Grammatically, "Fix it, it's broken" is incorrect. The correct punctuation is
a semi-colon, not a comma (splice). Alternatively, you could use a colon ("Fix
it: it's broken") or conjunction (e.g., "Fix it, as it's broken"). Anyhow, for
hackers, "broken" and "fix it" are redundant.

Also, people occasionally use "Tell HN:" but almost never "HN:".

That's probably why it was edited.

------
ig1
From first glace it looks like whatever it is it's dumping it's cookies into
the user agent. The "SS_" are squarespace cookies, hbfec the humble bundle
cookie and BBC-UID the BBC cookie.

------
orangethirty
It's not the Nuuton crawler.

