Given the title I was expecting the article to provide a solution.
From personal experience, the bigger the file, the more likely you will experience a connection cut in the middle of the upload. That is why the most important thing it to support resumable uploads.
At the moment there is no clear consensus on how to handle that. Amazon S3 has one protocol, Google uses two revisions of a different protocol, one on YouTube[2], another on Google Cloud Storage[3]. Both work by first creating a session that you refer to when uploading the chunks. There is also the Nginx upload module[4] that delegates the session ID to the client for some reason.
And there is no browser client available to my knowledge.
For the HTTP/2.0 discussion there was here earlier:
A way to continue an interrupted file upload.
Because POST variable are sent in order, if you put the file first and the other variables after, the server never sees them if the file was interrupted. So when I code a form I always put the hidden ones first so at least I can give a useful error message (since I know what the user was trying to do).
It would be better to decouple them and upload the files and the rest of the variables separately.
I'd really like to be able to use Dropbox as a magic upload handler for any file I upload on my local HD, not just those in my Dropbox folder. They handle the logic of getting all my files into the cloud. Why can't I point a website to my Dropbox and say here, this is handling the file upload?
Dropbox has an API that will (theoretically) let you do this, but there hasn't been a ton of people jumping up and implementing it yet. It'll be cool when it shows up.
I've used the dropbox API before to automatically upload photos in a dropbox folder to Flickr. It occurs on an interval (cron job every 2 mins). I'm sure you could do the same thing using FTP or a custom API that exists on your destination server.
They released a delta API recently. It's a bit of a pain to perform this task as you can't differentiate between a new file and a renamed file or a file that's been moved from one location to another.
In all cases you get a delete (if the file isn't new) and then a new file event.
I implemented it, and it kinda sucks for this sort of thing. The purpose seems to maintain local state to mirror the state on Dropbox. Not terribly interested in that...I just want to subscribe to specific events (webhooks, anyone?).
This was their solution to the frequently requested webhooks. It falls short. Way short.
ifttt.com has a dropbox-to-flickr recipe (http://ifttt.com/recipes/6804). Their dropbox channel provides 2 triggers, one for any new file in your public folder, and one specifically for new photos in your public folder (it doesn't say exactly what the definition of "photo" is, though).
I haven't tried it, so I don't know how gracefully it handles renames or moves.
Doesn't matter too much if you're just using Dropbox as a storage backend for a more complex service. If your box is full of encrypted 2MB segments backing a pseudo-filesystem you can upload files as big as you like and not have to worry about broken connections.
The best part is that you can use other services with similar APIs for pseudo-RAID or just a union-mount. Sharing is more difficult than vanilla Dropbox, of course, but I'm working on it...
Dragged and dropped an 8gb+ file and left it on for 5 hours. Worked perfectly. No time outs, no errors, and I'm on a shared hosting account at 1and1.
My problem with them is that it wasn't possible to hide the FTP username and password, they were always in javascript files. I whined, I complained, I bitched, and there was nothing they could do about it. :( So you basically had to password protect the whole directory with .htaccess and be very careful with whom you shared the credentials.
If you don't want people to download and install software just stick with JAVA FTP Applets.
What exactly did you expect them to do about it? For a client-side tool to establish a plain FTP connection, it needs to possess authentication credentials.
You could always just hard-code the username/password into the applet and recompile. That shouldn't be too hard...
Or, if you control the FTP server, you could dynamically add and remove random virtual users/passwords to the FTP server (hopefully virtual users). Then when the client javascript gets the username/password, it could only be used once.
It's been a long time since I've been on shared hosting, but I thought they usually offered some kind of anonymous upload-only FTP directory. Couldn't your users upload to that and then your application can read from that directory?
I've been dealing with browser-based large file uploads, which means dealing with lots of browser-specific issues.
Fortunately, things are getting better, especially for the webkit-based browsers. Firefox still has some issues, and I check https://bugzilla.mozilla.org/show_bug.cgi?id=678648 pretty regularly. Just today this bug, which was filed in 2003, changed from Status = NEW to Status = ASSIGNED.
To clarify, bug 678648 was logged in 2011 and is marked as a duplicate of bug 215450 (the one from 2003): "uploading files that are larger the 2GB fails" @ https://bugzilla.mozilla.org/show_bug.cgi?id=215450
That title is a bit misleading.. On some platforms you can already use Firefox to do >2GB uploads, but there is still a 4GB limit..
If anyone wants to help beta-test a HTML5 uploader that calls archive.org's S3-like endpoint under the hood (no IE or Opera support yet, though Opera 12 is now working..): http://archive.org/upload/
I've experienced this issue before when establishing a publisher backend for a D2D pc game business. It seems to be basically impossible without a Java applet of some kind, and even then it's wonky at best and just 'fails' at worst. The real fix for the issue seemed to be simply providing an FTP connection and letting people connect through the native client of their choosing.
That really seems to be the key for this problem, develop a simple native app capable of FTP uploads, that make it easy for users to deliver files to your app within the context of their use. Most browsers are capable of opening native applications via unique protocol, you could easily enrich the process by having the native app be a part of(or try to blend seamlessly with) major browsers.
As plenty of file transfer protocols, clients, and servers support resumable transfers (FTP, SFTP, rsync, proprietary browser-based tools, etc., or even basic HTTP if you arrange for the file to be pulled rather than pushed and your "client's server" has byte-range support), perhaps this should be titled "why you shouldn't use a single HTTP POST request from a browser to upload a large file". The general reason seems to be "because this is not a use case this feature is commonly designed for and tested against."
I ran into this problem with https://truefriender.com/ the solution I used was to use nginx instead of apache, nginx streams the file to disk and then I can handle it with PHP. I still have the 2GB problem but I've tested out Perl and I can go past it, now I just have to implement it.
Excuse me if this is a stupid question, but why would timeout issues on large files affect something like Heroku more often than other types of hosting services?
From personal experience, the bigger the file, the more likely you will experience a connection cut in the middle of the upload. That is why the most important thing it to support resumable uploads.
At the moment there is no clear consensus on how to handle that. Amazon S3 has one protocol, Google uses two revisions of a different protocol, one on YouTube[2], another on Google Cloud Storage[3]. Both work by first creating a session that you refer to when uploading the chunks. There is also the Nginx upload module[4] that delegates the session ID to the client for some reason.
And there is no browser client available to my knowledge.
That's all I know folks
[1]: http://docs.amazonwebservices.com/AmazonS3/latest/API/mpUplo... [2]: https://developers.google.com/youtube/2.0/developers_guide_p... [3]: https://developers.google.com/storage/docs/developer-guide [4]: http://www.grid.net.ru/nginx/resumable_uploads.en.html