
HTTP GZIP Compression remote date and time leak - LukasReschke
http://jcarlosnorte.com/security/2016/02/21/date-leak-gzip-tor.html
======
endymi0n
Public Service Announcement:

Use UTC.

[http://yellerapp.com/posts/2015-01-12-the-worst-server-
setup...](http://yellerapp.com/posts/2015-01-12-the-worst-server-setup-you-
can-make.html)

~~~
api
I always always always write code to just use the numeric time in seconds or
milliseconds since the epoch. Time is only converted into messy Gregorian
cruft to display to the user. Anything else is begging for crazy bugs.

In a SQL database a time field should be an unsigned bigint.

~~~
rtkwe
While that works it'd be a real annoyance to deal with when doing a lot of the
debugging queries I wind up writing for work doing batch ETL. As far as I have
found there's no built in function that would take that epoch timestamp and
convert it into a usable date that could then be truncated for daily binning
etc.

------
hollander
Basically it comes down to this. Many webservers use gzip to compress data.
Gzip creates a header which has a date field in it. Most webservers fill this
date field with zeros, about 10% use the actual date, including the timezone,
which can reveal the location of the server.

Tor is not to blame here, and gzip or most webservers probably neither. It's
an unforseen side effect by a combination of tools.

The article provides a script to test if a server reveals the date and
timezone or not.

~~~
Houshalter
I think you can blame gzip. Tools shouldn't insert unnecessary metadata into
files. At least not without asking or warning. What does putting a timestamp
into gzip add? Or the operating system, for that matter?

~~~
masklinn
> What does putting a timestamp into gzip add?

Roundtripping the file's mtime through compression/decompression is
convenient. Same reason why the original filename can be stored (roundtrip it
in case the filename is truncated at one point e.g. by having the gzip archive
move through an msdos system). Here's how the spec defines it:

    
    
        MTIME (Modification TIME)
                This gives the most recent modification time of the original
                file being compressed.
    

Sadly the spec then goes on to recommend leaking the compression date:

    
    
                The time is in Unix format, i.e.,
                seconds since 00:00:00 GMT, Jan.  1, 1970.  (Note that this
                may cause problems for MS-DOS and other systems that use
                local rather than Universal time.)  If the compressed data
                did not come from a file, MTIME is set to the time at which
                compression started.  MTIME = 0 means no time stamp is
                available.
    

_However_ note that the spec recommends a unix timestamp, which is ~UTC, and
doesn't include the space for a timezone. Reading the POC[0] it apparently
assumes any non-zero mtime is immediate (ignoring cached assets) and local.

[0] [https://github.com/jcarlosn/gzip-http-
time/blob/master/time....](https://github.com/jcarlosn/gzip-http-
time/blob/master/time.php#L29)

------
oxplot
This problem bit me in the butt when I was trying to setup my blog on S3. On
every sync, all the pages, which were gzipped on my machine (Cloudfront
doesn't do compression on the fly) would upload again. Then I discovered `gzip
-n` which avoids storing the original filename and timestamp.

~~~
social_quotient
Just as a heads up Cloudfront does compress on its own if you set it to. It's
super helpful when you have less control over the origin.

[http://docs.aws.amazon.com/AmazonCloudFront/latest/Developer...](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html)

~~~
oxplot
Nice! Back when I started using it, it didn't have that (or I didn't look
carefully enough).

~~~
nitrogen
It's a fairly recent addition.

------
brlewis
tl;dr for comments:

Article seems to fail to notice that gzip timestamps are specced to be
timezone-independent, and two of the three example domains at the end of the
article adhere to that spec. We don't know what happens to the article's "10%"
number after servers that adhere to the spec are removed from the data. For
all we know bing.com could be the only service that leaks this info.

------
jjoe
This is where this date&time retrieval mechanism sort of breaks:

1) Timezone of server is set to one other than the geographical location's

2) Gzip-compressed page or asset is _cached_. Therefore the embedded date/time
is from an unknown past.

So this reduces the good guesses a bit.

------
hannob
This is interesting, I wonder if there are more related issues. There is a lot
of stuff that embeds timestamps.

The first I could think of is the TLS handshake, but it's GMT and several
implementations don't set it any more to a valid value (they use a random
value instead). Various image formats use timestamps, they seem to be timezone
unaware.

Probably something to look into, and probably a good idea to question whether
we need timestamps everywhere. If they don't serve a useful purpose better
skip them. (They also happen to be a major issue with things like reproducible
builds.)

------
oxplot
Somewhat related, JPEG files have the potential to leak a whole lot of
information (through EXIF metadata) if put online straight of a camera/phone.
Things like time but also camera model and even precise GPS location. That's
why I always use `jpeghead -purejpg` to strip that info.

~~~
jlgaddis
A few years ago, the FBI was able to catch a hacker by examining the EXIF
metadata in an image posted by his girlfriend:
[http://gizmodo.com/5901430/these-breasts-nailed-anonymous-
ha...](http://gizmodo.com/5901430/these-breasts-nailed-anonymous-hacker-in-
fbi-case)

------
raverbashing
The concerns are valid but it is ironic that's being served from an http only
page.

(Why does GZIP needs a compression date again?)

~~~
filmor
AFAIK it's not the compression date but instead the last modified date stored
in the filesystem metadata. It's likely there such that `gzip file; gunzip
file.gz` keeps the mtime information intact.

~~~
masklinn
The problem's it's any date so which date is stored depends on the library and
the datasource. If you're gzipping and caching dynamic content there's no
filesystem mtime, so you'll probably get the compression time instead (unless
the remote end or library defaults to 0)

------
77pt77
That's why you always use deflate.

------
ryanlol
When is this actualy relevant?

It really shouldn't be in the context of hidden services, considering your
setup has to be seriously flawed for the webserver to be able to divulge any
information about it's physical location.

Perhaps it'd help with attacking some RNGs, but beyond that it's really just a
novelty.

