Hacker News new | past | comments | ask | show | jobs | submit login
Chromium: Secretly stores referer and url for downloaded files (2017) (debian.org)
124 points by IanSanders on Mar 14, 2019 | hide | past | favorite | 64 comments

Here is the finest example of privacy derangement syndrome that we will ever see: an open-source program implements an open standard in a way that's completely above-board and it's described as a nefarious scheme.

If an open standard has features that violate user privacy and don't provide sufficient value in exchange to justify it, it's reasonable to discuss violating or reforming that standard for the sake of privacy. The existence of a standard doesn't make the privacy issue go away.

Note that per the report GNU wget does this as well. So other well known, well regarded, widely used tools are following this same standard doing the same thing.

That doesn't excuse the standard, but it does mean Chromium is just matching platform on this, not setting any particular policy on its own.

My beef is with the adverb "secretly".

I don't have an issue with this feature, but to play devil's advocate: it obviously isn't being adequately advertised if it's surprising to so many people on HN. And if many among the audience of HN find it surprising, you can be certain it's even more-so to the general public.

Since the behavior is unexpected and very non-obvious for the average user, I don't see any issue with calling it "secretly".

You confused secrecy with ignorance.

Well I think at least that it does that in incognito mode is a somewhat valid point. I assume that most people are not aware of this feature at all, so incognito mode should just try to do the right thing.

The web is what I know well, this may not be true in other areas, but the W3C has a long history of being hostile to end users, ignoring privacy, and pushing corporate interests over others.

Following their recommendations and doing things just because they were in the spec does make those decisions ethical or even legal. Developers have an obligation to push back against standards boards and corporations that make bad decisions.

Hiding behind the business rules or the standards committee is completely unacceptable anymore. Especially when it comes to bad security/privacy practices that generally favor a market over an end user.

The W3C staff I've talked with seem very decent, in a public-interest kind of way. Though there's always been a tendency of Web standards (de facto, and de jure) to serve the interests of dotcoms, a bit like an industry consortium.

I think that industry-savvy people used to be mainly concerned with avoiding abusive monopolies, since we had examples of that. What I think many early Internet and Web people (who tended to be altruistic) didn't anticipate was the current culture of pervasive sneaky privacy abuses and often questionable engineering.

> Following [the W3C] recommendations and doing things just because they were in the spec does make those decisions ethical or even legal.

The spec being followed here is Freedesktop's, not W3C's https://www.freedesktop.org/wiki/CommonExtendedAttributes/

> user.xdg.origin.url: Set on a file downloaded from a url. Its value should equal the url it was downloaded from.

As the "xdg" in the name suggests, this is a Freedesktop thing. Silly to blame Chrome for supporting the very extended attributes the premier Linux desktop project defined.

totally, IIRC curl does this too and it's extremely useful later on..

Bug report's comments says wget does too, and you can't turn it off.

"premier Linux desktop project"

That's the real yikers to me.

The metadata for downloaded files thing is all over the place (i.e. how macs will tell you that you're running something downloaded from the internet and where).

This is a standard feature of many browsers, including Safari.

E.g: On OSX you can download a .dmg file or .zip file, and when opening the OS will warn: "XYZ is an application downloaded from the internet. Are you sure you want to open it?". The information about the origin of the file comes from extended attributes.

See: https://www.idownloadblog.com/2017/04/20/fix-application-fro...

Windows has the functionality you describe too, but it works only by storing a flag specifying what kind of origin the file has, not specifically what the origin was. Your article seems to indicate that mac OS uses basically the same system as Windows.

EDIT: According to some other comments in this thread, I'm wrong. Mac OS does store the whole origin.

EDIT 2: Looks like I'm wrong about Windows too, which also stores the whole origin. This actually disagrees with what is written in the bug report, so perhaps it needs to be updated.

Windows doesn't just have a flag.

For example:

- If I download this using Chrome: https://aka.ms/getvsdbgps1

- Open a command prompt, cd to my Downloads directory

- Execute this:

    notepad GetVsDbg.ps1:Zone.Identifier
I get:

Some files I see also have a ReferrerUrl

I've always thought NTFS streams was a really cool idea- wish MS had kept them in ReFS

Apparently they added it back after the initial release.

Support for alternate data streams was initially not implemented in ReFS. In Windows 8.1 64-bit and Server 2012 R2 the file system reacquired support for alternate data streams, with lengths of up to 128K


No, macOS stores the whole thing in the "com.apple.metadata:kMDItemWhereFroms" xattr as a plist. e.g.,

      0 => "https://raw.githubusercontent.com/marco-c/code-coverage-reports/master/web-platform-tests.tar.xz"
      1 => "https://github.com/marco-c/code-coverage-reports/blob/master/web-platform-tests.tar.xz"
Pretty sure [0] is the downloaded file, and [1] is the page from which the download was initiated.

No, MacOS stores the origin of the file. Downloaded with Safari:

    $ xattr -l Downloads/Ethiopian_Airlines_ET-AVJ_takeoff_from_TLV_\(46461974574\).jpg
    00000000  D9 76 8A 5C 00 00 00 00 9F E5 89 0D 00 00 00 00  |.v..............|
    00000000  62 70 6C 69 73 74 30 30 A1 01 33 41 C1 1D 57 2B  |bplist00..3A..W+|
    00000010  FF B1 3A 08 0A 00 00 00 00 00 00 01 01 00 00 00  |..:.............|
    00000020  00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  |................|
    00000030  00 00 00 00 13                                   |.....|
    00000000  62 70 6C 69 73 74 30 30 A2 01 02 5F 10 7D 68 74  |bplist00..._.}ht|
    00000010  74 70 73 3A 2F 2F 75 70 6C 6F 61 64 2E 77 69 6B  |tps://upload.wik|
    00000020  69 6D 65 64 69 61 2E 6F 72 67 2F 77 69 6B 69 70  |imedia.org/wikip|
    00000030  65 64 69 61 2F 63 6F 6D 6D 6F 6E 73 2F 64 2F 64  |edia/commons/d/d|
    00000040  32 2F 45 74 68 69 6F 70 69 61 6E 5F 41 69 72 6C  |2/Ethiopian_Airl|
    00000050  69 6E 65 73 5F 45 54 2D 41 56 4A 5F 74 61 6B 65  |ines_ET-AVJ_take|
    00000060  6F 66 66 5F 66 72 6F 6D 5F 54 4C 56 5F 25 32 38  |off_from_TLV_%28|
    00000070  34 36 34 36 31 39 37 34 35 37 34 25 32 39 2E 6A  |46461974574%29.j|
    00000080  70 67 3F 64 6F 77 6E 6C 6F 61 64 5F 10 19 68 74  |pg?download_..ht|
    00000090  74 70 73 3A 2F 2F 65 6E 2E 77 69 6B 69 70 65 64  |tps://en.wikiped|
    000000A0  69 61 2E 6F 72 67 2F 08 0B 8B 00 00 00 00 00 00  |ia.org/.........|
    000000B0  01 01 00 00 00 00 00 00 00 03 00 00 00 00 00 00  |................|
    000000C0  00 00 00 00 00 00 00 00 00 A7                    |..........|
    com.apple.quarantine: 0083;5c8a76d8;Safari;ADF309D2-762B-4FE2-AEC6-104E019BDBF9

Additionally, the same information is stored in an SQLite database in your home directory. In fact, the ID at the very end of the output is the primary key to the table:

    sqlite3 ~/Library/Preferences/com.apple.LaunchServices.QuarantineEventsV2 "select * from LSQuarantineEvent where LSQuarantineEventIdentifier = 'ADF309D2-762B-4FE2-AEC6-104E019BDBF9'"
The entry in the table also doesn't seem to be deleted when you delete the downloaded file. That is, you can get a list of all files you've ever downloaded:

    sqlite3 ~/Library/Preferences/com.apple.LaunchServices.QuarantineEventsV2 "select * from LSQuarantineEvent"

Hmmm, I'm not seeing any entries in that table from Safari (my primary browser). I am seeing all the Hombrew Cask downloads, iChat (messages) and a couple Firefox and Brave based downloads.

    > select LSQuarantineDataURLString from LSQuarantineEvent where LSQuarantineAgentName = 'Safari';

Maybe you've got Safari configured in some way that prevents it, but with the default configuration, Safari definitely gets entries in that db.

Safari only records that file should be quarantined, not the URL.

Here are the extended attributes for an image downloaded by Safari in Sierra:

  ecthelion ~>xattr -l Downloads/fm_800-2.jpg
  com.apple.quarantine: 0083;5c8aa472;Safari;1021CF85-4F78-492B-A8E3-766C44A3A671
Here is the equivalent in Chrome Canary:

  ecthelion ~>xattr -l Downloads/fm_480.jpg
  00000000  62 70 6C 69 73 74 30 30 A0 08 00 00 00 00 00 00  |bplist00........|
  00000010  01 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00  |................|
  00000020  00 00 00 00 00 00 00 00 00 09                    |..........|
  00000000  62 70 6C 69 73 74 30 30 A2 01 02 5F 10 29 68 74  |bplist00..._.)ht|
  00000010  74 70 73 3A 2F 2F 62 6C 6F 67 2E 6D 61 6A 69 64  |tps://blog.majid|
  00000020  2E 69 6E 66 6F 2F 69 6D 61 67 65 73 2F 66 6D 5F  |.info/images/fm_|
  00000030  34 38 30 2E 6A 70 67 5F 10 18 68 74 74 70 73 3A  |480.jpg_..https:|
  00000040  2F 2F 62 6C 6F 67 2E 6D 61 6A 69 64 2E 69 6E 66  |//blog.majid.inf|
  00000050  6F 2F 08 0B 37 00 00 00 00 00 00 01 01 00 00 00  |o/..7...........|
  00000060  00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  |................|
  00000070  00 00 00 00 52                                   |....R|
  com.apple.quarantine: 0081;5c8aa4ab;Google Chrome Canary;B718AF12-557C-47AD-840A-0CA98281F256

That is contrary to what I'm seeing. The extended attributes for an image downloaded by Safari (in Mojave) do contain the kMDItemWhereFroms attribute.

    00000000  62 70 6C 69 73 74 30 30 A2 01 02 5F 10 7D 68 74  |bplist00..._.}ht|
    00000010  74 70 73 3A 2F 2F 75 70 6C 6F 61 64 2E 77 69 6B  |tps://upload.wik|
    00000020  69 6D 65 64 69 61 2E 6F 72 67 2F 77 69 6B 69 70  |imedia.org/wikip|
    [snipped for brevity]
    com.apple.quarantine: 0083;5c8a76d8;Safari;ADF309D2-762B-4FE2-AEC6-104E019BDBF9
Have you modified the configuration of Safari in any way? I never use Safari, except this once to download an image to test with. It seems possible to me that Safari might disable it if you flip some privacy or security switches, which I haven't done.

It's interesting that this is considered a bug by Linux users. On the OSX side, populating the file metadata with the URL source has always been looked upon as a feature.

I'm a Linux user and I don't see it as a bug. Rather, I wish I had that in Firefox.

I also once contemplated making a browser extension, actually storing the url in metadata. I'm also not quite sure, how this affects user privacy, as the image content might be far more telling than the origin. Imo this compromises the origin of the file...

I was curious if Firefox supported this feature, and I found an unassigned issue from 8 years ago:


I hate how (in open-source projects in particular) anything even remotely associatable with security/privacy can be bike-shedded for almost a decade with no actual work done what so ever.

Talk about snailing your way to irrelevance. And I say that as a Firefox-user.

Sometimes I think Firefox would benefit from a more benevolent leader who just stomped down on issues like this and settled things properly without spending months or years doing so.

This is ridiculous.

On MacOS, FF marks files as downloaded from the web for the system. And also, in its standard 'Downloads' dialog you can copy the address of each file and, purportedly, go to the page it was downloaded from (the latter not working for me).

I use Safari, which does the same thing, and I actually find it useful. It's nice to be able to go back and find where you downloaded something from.

IMO, complaining that this metadata violates the user's privacy is as silly as complaining that storing EXIF location metadata in JPGs violates privacy. They're both forms of metadata that can be useful in certain situations, and which many users are unaware of. Yeah, there is a technical difference in that EXIF data is stored within the file while this metadata is stored in the file attributes, but I think the analogy holds.

I agree that they are comparable; both can be violations of privacy. It should be clear to the user that such data is being recorded.

I concur, and there should also be a simple UI option to disable both features.

Agreed. I'd say what would be relevant is whether this data is ever transferred to a third-party like Google. If not, I don't quite see the problem. There's a lot of data I gather about myself, and my having access to this kind of metadata would be fantastic.

I find this super useful on a Mac and routinely would dump the urls with xattr. That is before I switched back to FF which runs better on older hardware .

Can you please explain it a little bit more? I am just curious.

Browsers on OS X store this information in extended file system attribtes (accessible via the xattr command). It's also how the OS knows to prompt you the first time you open an executable--"you downloaded this from googlechrome.com via Safari, are you sure you want to run this?".

Calling it "secret", as the article does, seems disingenous. (Further, even if somebody downloads an application in an incog window, it is probably better for the system's security posture to record these xattrs for such a "hey, is this what you actually meant to download and execute?" situation.)

Users could reasonably expect that that information be stored in a separate database. My Linux distro doesn't make it immediately obvious that user.xdg.* attributes are attached with the file system. Normal users don't read EULAs, but we should expect them to read in-depth manuals to be up to speed on every gotcha that the operating system and browser could throw at them?

I consider XDG to be pretty critical to a Linux desktop environment, so...yeah, kinda?

Like, if you are going to Have Opinions about something like this--you are of the temperament to care (which is a way of saying "I don't and you probably shouldn't either, tbh, encrypt your drive if you're that geeked up about it")--you should probably know.

So, if I send the file to someone else, (1) on Linux and (2) on Mac, does this URL metadata come along, or no? Sounds like no, and the issue is just your device being compromised.

They might if you package the file in some archive format that stores xattrs. For example, tar(1) can store them, although it doesn't by default (you must pass --xattrs).

I could be wrong, but my understanding matches yours and that, this data is not ever read back in by common HTTP clients (browsers, curl, etc.).

Thank you. Appreciate the response.

    xattr -p com.apple.metadata:kMDItemWhereFroms <file downloaded with chrome> | xxd -r -p | plutil -convert json -o - - | jq "."
And one can catalog the sources of downloads made with Chrome.

It's not incredibly clear in the ticket, but the bug seems to be refering to Linux file system extended attributes.

Apart from whether this is nefarious and/or intended behavior, it seems odd that the bug report specifically uses protection of illegal content as motivation.

Well, it's invisible, not secret. You can also recover deleted files from a hard drive, but that's not a secret. Both things need to be more widely known, but the fact that they exist is still useful (both for individuals and law enforcement)

For some perspective here:

Various WWW tools for OS/2 back in the 1990s also did this, putting the source URL into a .SUBJECT extended attribute. The OS/2 port of wget was also modified to do this.

It wasn't in any way secret. The .SUBJECT of a file was visible in its Properties dialogue on the Worksplace Shell desktop. Which one could also use to edit it. People like me wrote other tools for manipulating and viewing these .SUBJECTs, which were also used for file descriptions by 4OS2 and various OS/2 file management and BBS softwares.

* https://jdebp.eu./Softwares/os2/

I reported this to chromium recently, but I wasn't the first, it was marked as a duplicate. I think it should be fixed in latest versions or a fix should come soon. wget had the exact same issue and they recently disabled the attribute storage by default.

See e.g. also

* https://www.openwall.com/lists/oss-security/2019/01/01/1

* https://lists.gnu.org/archive/html/bug-wget/2018-12/msg00034...

I wrote a small os X app based on this, that sorts your downloads in subfolders named like the domain you downloaded the file from (it's just a small shell script in a wrapper so it might run in Linux as well with some modifications): https://github.com/grothkopp/sortDownloads.app

Very cool.

I wrote a similar script to copy the source URL to the file "comments" field so it's viewable/sortable in Finder.

How did you package your script as an .app like that? Platypus perhaps?

And so does wget if this report is correct! How can I verify this?

I’ve often wanted this information but without having to rely on external book-keeping.

On most Linux distros you can use the `getfattr` command from the `attr` package, but it's usually not installed by default. http://man7.org/linux/man-pages/man1/getfattr.1.html

With Python 3.3+, which often is installed by default, you can use os.listxattr() and os.getxattr(). https://docs.python.org/3/library/os.html#linux-extended-att...

    getfattr -d <downloaded file>

On macOS:

    mdls FILENAME
`kMDItemWhereFroms` is what you are looking for. And for a list of all the URLs from the current directory:

    mdls * | grep kMDItemWhereFroms -A 1 | grep http | ruby -ne "puts \$_.strip[1..-2]"

    xattr -l <downloaded file>

It's missing a NSFW warning on that photo

If that is NSFW allow me to recommend you to go get a new job.

Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact