

Adding GIT SHAs to your application's HTTP headers - andrewvc
http://blog.andrewvc.com/making-life-easier-with-git-shas-in-your-http

======
bradgessler
Using a header isn't quite right. Just implement an URL on your app at
/_revision that grabs the HEAD commit. Then write a git-diff-prod command like
this:

    
    
      #!/bin/sh
      git-diff `curl -s "http://www.yourapp.com/_revision"`
    

Get your paths squared away, chmod +x it, then type in:

    
    
      git diff-prod
    

And boom! Instant diff with head in production.

Of course, you might want to get your deploy scripts squared away for figure
out why your servers aren't restarting if thats the reason for this.

~~~
andrewvc
Yeah, that was the original reason I implemented it (since fixed), but I just
found having it live was too useful to take away.

------
morganpyne
While it's a neat little trick, I'm not so sure I like this. This is adding
unnecessary headers and traffic overhead. It doesn't even addresses some of
the points the author mentioned e.g. "someone deployed something in a weird
way and bypassed the normal deployment process" (if somebody is doing this you
have bigger problems anyway). If you are managing your releases in a sensible
and automated way then you will also typically have an easy way to check which
version is currently deployed or running which doesn't involve adding more
baggage to the HTTP headers.

~~~
andrewvc
True, but I'm not worried about the extra bytes of an HTTP header.

It's actually really useful when coupled with Rails+Unicorn w/ downtimeless
deploy. Every once in a while unicorn will run a deploy but stop picking up
code changes, and this is a good way to verify that your deploy didn't go all
the way through.

Additionally, it integrates with shell scripts easier than most other
solutions. Lastly, if you really want you don't need to make it an HTTP
header, you can just make a special /git_sha page that renders it out if
you're really worried.

Additionally, GIT_SHAs are good for cache busting in some situations (though
totally inappropriate for others).

~~~
marcc
That's a really good point. Doing this will invalidate browser cache on most
sites whenever you do a deployment. If you are the type that does production
deployments throughout the day, then this could slow down the user experience.
However, maybe you want an easy-to-use javascript cache reset tool for every
deployment.

------
listrophy
Neat, but kinda overkill to send it on every request. If you're doing Rails
(as the article is), try out pig ( <https://github.com/bendyworks/pig> ) and
go to /revision on your site.

does not work with heroku since they strip the .git folder... we should
probably add that to the readme

------
sophacles
You know, the other day I was lamenting the sorry state that is caching on the
web. I decided a pretty good protocol would be similar to this: make a
standard http header that includes a SHA of the content for that url. That way
if the content changes the SHA changes, and you get really good cache-ability
without the worries of content timeout etc[1]. Further, cache control
pragmas/headers/etc don't accidentally get misconfigured in a bad way. Caches
just send a header request upstream and if the header hasn't changed, just
blast the content on the presumably faster/cheaper local link.

[1] They could still exist for hints or what not, but I presume reasonably
advanced caches will be able to heuristically figure out how long the content
lasts and do things like start sending results while waiting on the header to
return... with some sort of reset if it turns out to be a bad cached result.

~~~
prodigal_erik
Are you describing <http://en.wikipedia.org/wiki/HTTP_ETag> or something
different?

~~~
sophacles
Apparently ETag, but you know, with people actually using it, and a
standardized hashing mechanism (to reduce duplication of common elements
across sites).

(honestly I had never even heard of it before just now).

~~~
riffraff
ommon elements are shared by using a single source (e.g. google's CDN for
javascripts) but standardizing makes little sense imvho. For one, you can use
an ETag without generating the page and hashing it if it only depends on a db-
stored resource, and that is mightily application dependent.

~~~
sophacles
1) What does a CDN have to do with the simple real-world fact that content
gets duplicated around sites all the time, particularly stupid crap like funny
picture memes? A good global hash of this would certainly help optimize web-
caching.

2) A well designed app will hopefully not have every page rendered dynamically
at the server anymore, but just have an api to send data back to the page for
rendering client side. This benefits from content caching (provided the
content is big enough to benefit from caching rather than retransmission).

3\. Using something like varnish in front of your site will allow you to
dynamically invalidate the hash when needed, and in the interim, a generated
result to a url is no different than a static one.

~~~
riffraff
1) that common stuff reused as-is (such as jquery) can be cached and reused by
many, funny picture memes are usually resized, cropped, modified and I frankly
do not see many of them, but I load jquery 200 times a day, it seems to me you
want to optimize a non-bottleneck.

2) how do you send such content? suppose it's in the db, you need to convert
it to a wire format (json or whatever) and then generate an hash for it, but
you do not need to, you could just send a revision id as ETag without even
reading the full data from the db

3) whatever you want to do with an hash, you can do using an hash as the etag.
The latter is simply more generic

~~~
sophacles
1) You are doing browser level cache optimization and bike-shedding, I am
looking at it from an ISP level. Once you have a big pipe and many users,
things like this actually start to have a noticeable effect on bandwidth
usage. Further, no matter what your use case, it has almost no bearing on the
normal (statistically normal no digressions please) use case.

2) So I see what you are saying, but you are wrong -- the ETag still requires
a lookup of "did content change since then, based on this tag". My system does
not preclude a similar mapping of (tag value, last change) in the server.

3) Yeah, I know, my point is that a standardized hashing method for the ETag
provides benefits on top of the ETag. Sometimes everyone playing nicely
together actually works out better than lots of flexibility.

------
wulczer
Sounds useful for early deployment stages, but later on you really should tag
the code you deploy and perhaps include the tag in your application headers
(or in the page footer, or somewhere). And even better, build packages after
tagging in your VCS, which gives you numerous advantages, like being able to
quickly check if someone modified the code in-place on the production machine.

------
dman
So you have git installed in production?

~~~
jrockway
I bet he has "ls" also. The question is, "what's your point"?

~~~
emillon
If it is not necessary, don't install it. Rsync or scp is probably "good
enough" for production.

~~~
jrockway
Apparently not if you want to include the git commit id in your HTTP headers
:)

------
ulf
Awesome. A real hack. Beautiful, simple, useful.

