

Fun With HTTP Headers - parenthesis
http://www.nextthing.org/archives/2005/08/07/fun-with-http-headers

======
qw
The ultimate header is produced by a newspaper. Try running this line and see.
It is a random line, so you might want to try it a few times :-)

    
    
       curl -i -s --head http://www.vg.no/ | grep N

~~~
wozer
Use <http://web-sniffer.net/> if you are too lazy too install curl:
[http://web-sniffer.net/?url=vg.no&submit=Submit&http...](http://web-
sniffer.net/?url=vg.no&submit=Submit&http=1.1&type=GET&uak=0)

------
aristus
the "Cneonction: close" thing is a quirk of Netscaler loadbalancers. It's done
to nullify any "Connection: close" headers the webserver spits out, as the
Netscaler wants to manage it better. It's scrambled instead of removed so that
it doesn't have to regenerate packets (length is the same) and it's scrambled
semi-randomly so that people don't just assume it's a misspelling and add
compatibility for it.

~~~
axod
Interesting, I wonder why they didn't go with something more self explanatory:

    
    
      Connection: -> X-Ignore-X:

~~~
seekely
X-Ignore-X is longer than close, which I suppose would mess up the packet
length. Or maybe having an unrecognized value for the Connection key would
still default to a close? Just guessing here.

~~~
limmeau
TCP checksums are fairly simple; a TCP stack basically just sums up the 16-bit
words in a packet and stores the result in the checksum field; this will not
detect 16-bit words being swapped around.

My guess is that the load balancer tried to invalidate the header while
preserving the TCP checksum.

~~~
axod
Ah yes, forgot about the checksum field :) thx

------
dhimes
_X-No: I will not give you a job for poking at my headers. But nice try._

------
tptacek
The simplest explanation for "OCR is watching you" is an old web attack called
"HTTP Response Splitting"; it happens when a server generates headers (like a
Referer) based on user input, but doesn't escape out newlines.

------
jmtame
certainly easier to run this in terminal. the firefox way (a bit more tedious,
but nice if you're already on the website):

open firebug, enable net monitoring, look at the very first GET request, check
out the headers, scan for sense of humor.

\---

stick this in your controller:

headers["We-Are-Uh-Meh-Zing"] = "true on sunny days"

------
euroclydon
I have a web scraping script written in Python that I want to make multi-
threaded. It scrapes web pages from a list, and enters results into a DB. Can
someone (the author maybe) show me a simple example of how to make a multi-
threaded Python script?

~~~
lacker
Check out the docs on the "multiprocessing" or "threading" module. In
particular multiprocessing.Pool is handy for controlling the number of
parallel things you have going on at once.

    
    
      # Assume we have functions GetUrls() that retrieves a list
      # of the urls we want to get, and Download(url) which
      # downloads the content of a url and sticks it in the
      # database.
    
      import multiprocessing  
      pool = multiprocessing.Pool(processes=100)
      urls = GetUrls()
      pool.map(Download, urls)
    

See:

<http://docs.python.org/library/multiprocessing.html>

<http://docs.python.org/library/threading.html>

Also, if you're entering results into a database, the easiest way may be just
to spawn multiple python processes from the command line.

------
joshuaxls
If you're using curl to poke around headers, as the author suggests, use the
-I flag instead of the -i flag. -I gives you the headers only.

~~~
wooster
That works, however some sites return different headers depending on whether
they get a HEAD or GET request. -I sends a HEAD request, while -i sends a GET
request.

------
kubrick
_Speaking of P2P technologies, I was interested to run across a KaZaA server:

HTTP/1.0 404 Not Found X-Kazaa-Username: anonymous_user X-Kazaa-Network: KaZaA
X-Kazaa-IP: xx.xx.xx.xx:1348 X-Kazaa-SupernodeIP: xx.xx.xx.x:3699

It looked like it was running on someone’s DVR. Anyone have any pointers as to
what software does that?_

Uh, Kazaa does that. Not a DVR.

~~~
wooster
IIRC, it was full of recently-recorded TV shows. I hypothesized someone was
running some DVR software that also shared the shows on the Kazaa network.

~~~
ars
I see you watch your referrer log. Created an account here just to reply?
Welcome to the site.

