Hacker News new | past | comments | ask | show | jobs | submit login

".DS_Store", or, "how to tell if a ZIP file was created by a macOS user".

"Desktop.ini" and "thumbs.db", or, "how to tell if a ZIP file was created by a Windows user".

Yes, but these files only get made, when:

- There is anything to create a thumbnail cache for

- User has applied custom settings like a folder icon or designation to the folder

".tar.gz" or how to tell the file was archived by a Linux user.

.tar.Z or how to tell the file was archived by a Real Man[tm].

I received some ZIP files from macOS users which contain a directory called __MACOSX .

I seem to recall that the __MACOSX folder is there to support resource forks? Something along those lines at any rate.

Yep. And since many files don’t have resource forks it’s kinda rare. But any directory you look at basically gets a .DS_STORE.

I remember hiding those from windows clients with samba, and hiding thumbs.db from mac clients over netatalk (and samba)

It looks like they store the extended attributes in the __MACOSX tree too. And everything you download from the web has extended attributes describing the original URL and page that linked to it. (With chrome/safari, I haven't checked firefox.)

It looks like the finder "compress" functionality will include __MACOSX, and the command line zip doesn't.

If you run `xattr -l` on something in your Downloads, you should see the kMDItemWhereFroms metadata. mdls shows it too, but also includes other data that is extracted from the file itself.

You can also search on that metadata:

    mdfind kMDItemWhereFroms:citeseerx

Another issue with zip files on OSX:

> Zip files can encode their file names in two ways: CP437, or unicode.

> Each operating system does it wrong, but in a different way. For instance, Mac OS encodes its zip files as unicode, but doesn't set bit 11 correctly, so Python (correctly) reads them as CP437, and garbles the non-ASCII characters in file names.

> I wrote a quick and dirty workaround for Mac OS archives: if the file doesn't exist, encode the name as CP437 and check again. I'll think of something more clever if I ever switch to another OS.


Woah, that explains a lot of weird bugs, actually! This is very useful information, thanks!

Yep, I write scripts to clean up zip files because the default behavior is so bad for cross-platform.

Could you please share it?

This is my script to “clean up” a typical macOS zip file. It assumes you’ve started by asking the Finder to “Compress” (which creates a zip file but one riddled with macOS-isms).

    if [ "x$zipfile" = "x" ] ; then
      echo "$0: .zip file expected" >&2
      exit 1
    zip -d "${zipfile}" "__MACOSX*"
    zip -d "${zipfile}" ".DS_Store"
    zip -d "${zipfile}" "*/.DS_Store"
    unzip -l "${zipfile}" | sort -k 5

You probably want to test:

    if [ ! -f "$zipfile" ]
Instead of

    if [ "x$zipfile" = "x" ]

Thank you! I took your script, suggestions in this thread and added some of my sauce:

    #!/usr/bin/env bash

    set -exuo pipefail
    while getopts "v" arg; do
    case $arg in
        v) VERBOSE=true;;
    shift $((OPTIND-1))

    if [ ! -f "$zipfile" ] || [ ! "${zipfile##*.}" = "zip" ] ; then
        echo "$0: .zip file expected" >&2
        exit 1

    zip -d "${zipfile}" "__MACOSX*" ".DS_Store" "*/.DS_Store" "Thumbs.db" "*/Thumbs.db"

    if [ $VERBOSE = true ]; then
        unzip -l "${zipfile}" | sort -k 5

Processing the zip file three times over seems a bit excessive?

  zip -d "${zipfile}" "__MACOSX*" ".DS_Store" "*/.DS_Store"

True but performance wasn’t the goal (I tend to run it once on smallish files).

This is easier to extend if I see something new to exclude (or comment-out some rule).

BetterZip has an option for “clean” zips you can enable as the default for create.

Why does Windows not deal with hidden files correctly?

It's 2021. Hidden dot-files are not a new thing.

Hidden dot files were introduced as a bug and left to linger because they were kind of useful. It all started when someone tried to hide . and .. from the output of ls and messed up the if statement to only check if the first letter in the filename was a period instead of testing for the intended use cases. People then copied that behaviour around because it was a cool new trick, not because it was set up as a standard.

The Windows method, leveraging file attributes, is actually much cleaner in my opinion. You can set the hidden attribute in ZIP files and most tools do for OS specific files and folders, but I don't want my ZIP tool to put files on my file system that I don't get to see first so I always turn them on.

Windows does the same thing with desktop.ini, but I rarely encounter those anymore. It used to be that every ZIP had a bunch of thumbs.db files but Microsoft seems to have cut that out.

Alternatively, https://www.google.com/search?q=intitle%3A%22index+of+%2F%22...: "how to tell if an open directory index was created by a macOS user"

Not just zip files, USB drives too.

Came here to say this. Leaving satisfied.

Also, how to tell the new Sr dev mgmt just hired is really a jr dev.

Yes, I am talking about .gitignore

I can’t tell if you mean they included `.DS_Store` in their ignore file or left it out, but including it is just good practice.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact