

Ask HN: Can we safely rely on HTTP HEAD to check remote filesizes? - SchizoDuckie

I'm currently implementing a system that will download external resources and poll them regularly for updates, to redistribute further to mobile devices.<p>The external URL's can be coming from anywhere, as long as it's a valid HTTP url.<p>The HTTP 1.1 spec says that HEAD is the proper method to check stuff like this, but i'm not sure on what percentage of servers actally implements this properly. Does anybody have any stats on this? Experiences? Do you support it yourself?
======
1SaltwaterC
As others pointed out, no, you can not rely on HEAD requests. You may send
conditional GET's though, as you pointed out: by using ETag/If-None-Match
and/or Last-Modified/If-Modified-Since. Keeping a hash database in order to
track changes is also reliable. You can also place a proxy between you and the
rest of the world in order to deal with that, if the disk space isn't an
issue.

I wrote a couple of services that need to poll for remote resources. They look
like your usage model. I'd advise against PHP though. We already tried that
and failed. Several times actually. The timeouts kill the workload if the
remote isn't responding. Using an async framework like node.js proved to be
more appropriate.

------
chc
From my recollection of randomly curling sites to look at their headers, many
sites don't respond to HEAD with a valid content length for web pages
(presumably because their CMS doesn't bother to generate the page for HEAD
requests).

Quickly checking with curl -LI www.google.com www.apple.com, I see neither of
them do. And weirdly, it looks like Cloudflare chops it off all the time.

~~~
SchizoDuckie
I've also dreamt up some tests, and I have a theory:

I could be okay, since I will be downloading mostly ZIP files. These will most
likely not be served through a serversided scripted proxy, but allowed to be
handled as a raw file, so HEAD _might_ work.

Now just to find the time to do these tests :) I hope to whip something up
tomorrow.

~~~
toomuchtodo
Sounds like an awesome idea for an API: Submit a URL/URI and get back metadata
without your app or mobile device having to do the heavy lifting.

------
mappu
Why not just test it for yourself?

It's a very, very common request, it's "required", and apache2/nginx/IIS will
all support it out of the box, but it might depend on the application server
in question. If you're dealing with an internal application, all bets are off.

~~~
SchizoDuckie
I'm trying to get an overview of what's 'out in the wild'. The Urls that will
be checked are not under my control.

------
ig1
How are you using this data ? - are you using the file-size to detect if it's
changed or to ensure you have enough disk-space, etc ?

~~~
SchizoDuckie
Depending on filesize and Last-Modified or Etag (which ever will be available)
i'll determine wether or not the file has to be re-downloaded. Then i use
(additionally) md5_file in php to see if it's changed in comparison with our
last copy of the file.

(This is actually part of a matroesjka doll / onion like mechanism i've built
to synchronize data from our servers to mobile devices. Based on
global/group/user rights. I'll be publishing a blog post on that later)

------
staunch
It will _generally_ work for static files served up by the major web servers.
Other than that: no.

