

Ask HN: Checking uploaded file types - jayzee

My start-up is an esoteric corner of the universe where users often upload and share with each other myriad different file formats. So far I have been validating (and restricting) the files that they upload. But there is a backlash brewing since they want to share file types that I am not currently allowing.<p>My question is what do I do? It is practically impossible to check for all the file types and new ones keep popping up. Should I<p>1. Allow them to upload whatever and have a pop-up everytime somebody tries to download something warning them that they are responsible?<p>2. Restrict what they can upload?<p>3. Really try to validate each and every file format that they want. Do the rest of you do this for esoteric formats?<p>Thanks!
======
divtxt
(I'm assuming you're doing the above for security)

It's hard to make a call what you should do without being more specific about
the app, especially (i) purpose of the uploads e.g. image sharing can probably
be restrictive, and (ii) level of trust - public vs intranet, sharing with
known people vs strangers, etc.

Regardless, if you want to go liberal (option #1), I'd use the following
logic:

1\. serve all files so that the browser will save the file instead of opening
it

2\. optional: have a whitelist of exceptions to the above e.g. images

3\. optional: block (or fix) executables e.g. .exe, .vbs etc which the user
could still run after download

~~~
jayzee
Thanks a ton!

------
adamzochowski
This is loaded question :

1) you can't control formats to guarantee safety. You can have malformed files
(like pdfs used to hack google) or valid files with payload (most doc viruses
are valid files)

2) you can try to identify / help users with some files by providing more
information about files. Start by running commands 'file -b' and 'file -ib',
and then deeper inspections: 'pdfinfo' for pdfs, etc Also read zipinfo/file
comments (old school stuff like: files.bbs / descript.ion / zip global comment
/ zip per file comment, etc)

3) you can help people mark files up. Aka: is this secure / verified file. And
also is the person trusted. This heads into whole can of worms regarding trust
relationships.

4) in the end you can't guarantee anything. Instead of a popup just before
download give indication to how you see a file. Mark .exe as less secure file
than .txt . If you want cutesy go with a cake. A text file will be a nice
cake. EXE will be a rotten cake (and add a note 'cake is a lie' for gamers
refernce)

5) for internal things rely on various hash methods. For my stuff I use:
md4/md5/ sha1/sha224/sha256/sha384/sha512 /ripemd160/whirlpool/ crc32/adler32/
ed2k_chain/ed2k_aich_chain/ed2k_root (i use these to detect dupe files)

Additional benefit of these hashes is that you can block a whole series of
files. Many a pirate site will provide either SHA1 file hash (mostly gnutella)
or ed2k_root (which is same md4 for files below 9.8mb). Some torrent files
will contain md5 per file. CRC32 is also quite popular in pirate scene
(atleast rely on crc32 to flag potential pirate content that could land you in
prison).

Finally, also rely on hashing like ssdeep, which will help you detect viruses.

\---

Finally, talk to your users about file formats. They can help you deal with
files, give hints how to inspect them, etc.

Kind regards

------
tombaker85
I suppose it depends on the nature of your site.

If this is an esoteric group then responsibility for what they download should
be understood.

You could always have a specific warning that flags up for file types that
fall outside of your prescribed ones.

~~~
jayzee
Thanks tombaker. Appreciate the help!

------
bricestacey
Serve all file as zip files.

If you're going to open it up to everything, you may want to virus-scan each
upload too... unless your esoteric corner of the universe also shares viruses.

~~~
jayzee
That is an interesting option but that makes it too much work for the end
user. But appreciate your help!

