
Parsing Headers - CaliforniaKarl
https://vsoch.github.io/2020/url-headers/
======
montroser
> For fun, let’s inspect history.com and see what the heck all those headers
> are.

I don't know for sure, but those seem like they might be response headers from
app servers meant to be picked up on the way out by proxy servers for the
purposes of analytics, or possibly for applying caching rules.

Normally you'd have the proxy strip these out before sending back to the end
user.

------
woodruffw
Great writeup! It's always interesting to see what cruft is coming along for
the ride (and what makes it past the reverse proxies).

I did a similar (not nearly as well visualized) analysis of every domain
registered under the .nyc gTLD about a year ago[1]. Lots of servers running
old, vulnerable HTTPd/SSHd/FTPd services, even for a relatively new TLD.

[1]: [https://blog.yossarian.net/2018/11/11/Scanning-the-nyc-
gtld](https://blog.yossarian.net/2018/11/11/Scanning-the-nyc-gtld)

------
cosmie
It's interesting to see how many sites are setting analytics related ids and
optimization/split testing cookies.

I wasn't expecting that, since the dataset is purely the immediate response of
the request to the root domain/homepage, and doesn't include any additional
asset calls that would have occurred once the page rendered.

Which indicates all of that analytics and testing infrastructure is integrated
right smack-dab in the middle of the hot path of the server response.

As an analytics guy, it'd be fascinating to get a peak behind the curtain at
those architectures. Even more so if any of those implementations have gone
through a CCPA/GDPR compliance review.

------
red_admiral
Neat little easter egg with the x-secret header. (It's there, but for some
reason Ctrl-F on [https://researchapps.github.io/url-
headers/](https://researchapps.github.io/url-headers/) won't find it.)

~~~
FreeFull
It's because the whole graph is a single canvas element, there's no
selectable/searchable text.

------
gsnedders
[https://httparchive.org](https://httparchive.org) has a much larger dataset,
if you want to start playing around with something truly vast.

