I have a question though... regarding the resuming you do the HEAD request to see what has been uploaded:
> HTTP/1.1 200 Ok
> Content-Length: 100
> Content-Type: image/jpg
> Content-Disposition: attachment; filename="cat.jpg"'
> Range: bytes=0-69
Is it possible that the data that is already there could be corrupt?
I'm also wondering how things like proxies deal with this. A lot of mobile networks have nasty transparent caching proxies in their network. Also when uploading a file through Nginx (when the upload works correctly) it won't send anything to the backend until it has the complete data, is this the same if the connection cut half way through?
TCP error correction should take care of this for most parts, but we're actively discussing adding support for stronger guarantees: https://github.com/tus/tus-resumable-upload-protocol/issues/...
> I'm also wondering how things like proxies deal with this. A lot of mobile networks have nasty transparent caching proxies in their network.
That's a good question. A http proxy could always cause issues, but most proxies should leave POST/PUT/HEAD requests untouched. That being said, we won't freeze the protocol until we had a chance to try it against a variety of mobile networks, which is why we're already starting to implement an initial iOS client over here: https://github.com/tus/tus-ios-client (not ready yet, but keep an eye on it).
> Also when uploading a file through Nginx (when the upload works correctly) it won't send anything to the backend until it has the complete data, is this the same if the connection cut half way through?
NGINX is somewhat unsuitable as a proxy for file uploads due to the buffering you mentioned. Ideally people will implement the tus protocol as an NGINX Addon like this one: http://www.grid.net.ru/nginx/resumable_uploads.en.html
Meanwhile clients are free to choose a small chunk size for individual PUT requests (e.g. 1 MB), which will allow them to still have resumability (in 1 MB intervals) without changing their architecture.
Last but not least, we'll implement the tus protocol for our commercial uploading service, transloadit.com.
So I'm reasonably optimistic that NGINX won't be a major hurdle for the adoption of the protocol.
Oh no it doesn't! We have an analytics service receiving HTTP posts from browsers all over the world as JSON. There is an astonishing amount of single bit errors going on. Usually the initial 20 bytes are okay, but after that we see all sorts of patterns including a bit flip every 8 bytes or so. Note that these will have been received at Google's appengine servers with the correct checksum. I believe that much of the cause is intermediary devices (eg performing NAT or routers) that are responsible for the corruption and recalculate the checksum putting a good checksum on what is now corrupted data.
For that service we have to use HTTP (grumble grumble IE grumble). For our regular stuff we use HTTPS where we do still see the problem but it is considerably rarer. In that case the cause is most likely the client device having problems (eg RAM bit flips, cosmic rays, overclocked/overheated CPUs etc)
All else being equal I'd recommend you add a layer of checksums as a helpful sanity check. Using SSL also does that for you, but it sees the data late.
These guys know what they're doing.
I started playing with Go late last year, so far I'm under the impression that it's easier to write reliable software with it than node.js. Callbacks and exception handling are a huge PITA in node.js, and the community has chosen to refuse improvements that would help with some of the issues (promises).
Go is also an incredible joy to work with given the modern nature of the standard library, static typing, gofmt, built-in testing, and many other small things that the Go team has done right.
That being said, tus.io is not a Go project. Our first server implementation (tusd) is written in Go, but we're working on support for other plattforms like node.js as well.
Generally speaking node.js will continue to be part of our toolkit at Transloadit (for the quick & dirty), but I suspect that we'll use Go for the more criticial parts we work on going forward.
S3 is an incredible offering, but since I'm working on tus.io, I'll focus on what's wrong with it : )
- Multipart chunks need to be 5 MB at least. An interrupted part cannot be resumed. This kills the mobile use case.
- Throughput to S3 is bad from outside of EC2, uploads often start at very slow speeds and won't reach the capacity of the connection in many cases.
- S3 does not let you stream/access an upload in progress easily, so you can't start to transcode a video while it's still uploading.
- The S3 API is the opposite of RESTful.
- S3 is a proprietary service, their protocols are not intendent/documented for adoption, and IMO they don't deserve great people like you making free contributions to their ecosystem.
edit: I'm not trying to say S3 isn't a good choice for many people. But our goal is to bring resumable file uploads to every iOS, jQuery, Wordpress, Drupal, Rails, etc. application in the world - S3 is not the right starting point for that.
I realized my comment sounded overly negative, so I added a clarification to my comment: Our goal is to bring resumable file uploads to the entire planet, S3 or any other proprietary protocol should not be the base for that.
HTTP/1.1 200 Ok
Also, if you're going to return a zero-length 200 response, you might as well use 204 No Content instead.
Then, when resuming an upload, you send a HEAD that returns the following:
HTTP/1.1 200 Ok
Content-Disposition: attachment; filename="cat.jpg"'
Finally, the response to the resumed PUT has the same problems as the first PUT response. It should probably just be a 204 No Content response - no Content-Length or Range headers required.
We're currently discussing how to interpret RFC 2616 (http 1.1) for this here: https://github.com/tus/tus-resumable-upload-protocol/issues/...
If you have a better suggestions than using the Range header that will still allow clients to send multiple file chunks in parallel, I'd be very interested in it!
> Do you really need this info in the response anyway? The sender knows what they sent, and either it was entirely successful (a 2xx response) or it wasn't.
We don't need it for the PUT request, but we do need it for HEAD. Adding it to PUT is redundant, but simplifies the logic for clients who choose to upload multiple chunks in parallel.
> Also, if you're going to return a zero-length 200 response, you might as well use 204 No Content instead.
Good point, I'll investigate on that: https://github.com/tus/tus-resumable-upload-protocol/issues/...
> And the Content-Length should surely be 70, since that's how much content would be returned if this was a GET request.
It's 100. We haven't specified GET requests yet, but a server could stream an upload in this case until all bytes have been received.
Anyway, this is awesome feedback - thank you so much!
I've followed up with further comments there.
> If you have a better suggestions than using the Range header that will still allow clients to send multiple file chunks in parallel, I'd be very interested in it!
I don't see a way to support parallel transfers using only existing HTTP headers (without violating the HTTP spec). I would suggest maybe proposing a new header in the HTTPbis WG. For example, something like Available-Ranges that returns a ranges-specifier indicating the set of ranges that are avaiable.
This could possibly be returned as part of a 416 response when attempting to GET a file that isn't entirely available yet. A HEAD request would thus return the same thing.
> It's 100. We haven't specified GET requests yet, but a server could stream an upload in this case until all bytes have been received.
The reason I brought up a GET request is because "the metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request." (section 9.4 of RFC2616)
If you haven't got all 100 bytes yet, your GET request can't return a Content-Length of 100, thus your HEAD request shouldn't be returning 100 either.
I would have thought you would return whatever content you had available (hence the 70 bytes), but if you want to support parallel transfers, then a 416 error response indicating the available ranges might make more sense.
It is an issue where you object to a certain subset of language, language that is used to express strong or passionate feelings about things. I have never given swearing much thought and I don't understand what makes you object to it.
I guess I have always felt that the less people have to sensor themselves them better.
The wikipedia article on the subject [http://en.wikipedia.org/wiki/Profanity] is not as informative as I'd hoped.
That sort of language, in a context like that, reminds me of a Zed-style rant. It makes it harder for me to take it seriously, you know? The whole project ends up coming off as an amateur effort, even if that may not be the case.
Also: The protocol is still under heavy development, so please post any additional ideas, issues, patches or feedback you may have!
I'm also not seeing how the client indicates that the upload is complete. It could be done server-side, by just detecting when a file has no more holes in it, but that seems hacky. Holes can also be useful; suppose I make a 32GB .vmdk file (non-sparse) and put 2GB of data on it. If the server can support holes, then I can upload (and the server only has to store) about 2GB of data; if the server can't support holes, then I'll have to upload a bit more data (assuming compression), and the server will have to store a lot more data. If there were some final message the client could submit to the resource saying "I'm done, commit it!", I think the protocol would be a bit more complete.
Technically the Range header isn't valid in an HTTP response (something they are aware of), but conceptually I think the idea works fairly well.
Valid points, IMO. Those sound like use cases that might not have been contemplated originally (the idea for the spec grew out of the author's work here: https://transloadit.com).
That said, the spec, and the code around it are both still very much evolving, and are welcoming of input. You can join us on GitHub or in #tusio on Freenode.
It looked nice from my quick skim, it just brought that classic xkcd to mind!
If this can be put into any page/service, it'd be a huge contribution.
> Servers MUST handle overlapping PUT requests in an idempotent fashion given that the overlapping data is identical. Otherwise the behavior is undefined.
Does that adress your concern?