
Open Protocol for Resumable File Uploads - gluegadget
https://tus.io/
======
kvz
Was just casually (well ok maybe it’s more compulsive than that) browsing HN
and was pleasantly surprised to find tus on the front page. I’m one of the
core contributors and happy to answer questions. Although it’s late here so it
may take a few hours while I’m asleep :)

~~~
geekodour
I have not looked into tus properly yet. but how does this compare with
bittorrent seeding and can both be combined somehow?

~~~
kvz
People ask that more yes, on the surface they have a lot in common. Both can
be used to transmit huge files, both can chunk files up and only transmit
remaining parts, and pick up and resume at a later point in time, and (in case
of tus optionally with the Concat extension) send these chunks simultaneously.

Tus however works as a thin layer on top of HTTP, so it’s easy to drop into
existing web sites/load balancers/auth proxies/firewalls. BitTorrent ports are
often closed off on airports/hotels/corporate networks. But websites work. And
if you can access a website, you will be able to upload files to it with tus.

Another difference is that tus assumes classic client/server roles. The client
uploads to the server. Downloading is done via your regular http stack and not
facilitated by tus. BitTorrent facilitates both uploading and downloading in
single clients. It is more peer-to-peer and decentralized in nature, where tus
clients typically upload to a central point (like: many video producers upload
to Vimeo. Not very contrived as Vimeo adopted tus).

There are more differences (Discoverability, trackers, pull vs push, pulling
from many peers at once) but the comment is getting very long so I hope this
already helps a bit :)

Happy to dive deeper into this at request tho :)

------
chillaxtian
S3 Multi-Part Upload API can be used to chunk an object into smaller parts,
which can succeed or fail independently.

[https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview....](https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html)

~~~
kvz
Yes that is very helpful. Our s3 storage backend for tusd uses it, and our
[https://uppy.io](https://uppy.io) file uploader does too, usable directly
from the browser (so you can choose to not use tus at all with it). S3
resumable uploads do come with a few limitations that make some people still
choose tus tho:

* chunks need to be >5MB which can be problematic on flaky/poor connections (rural areas, tunnels, clubs/basements, people on the move switching connections all the time)

* your s3 bucket needs to allow write by the world, or you need to deploy signature authentication

* there’s an s3 vendor lock-in some might worry about

* not an open protocol, no chance of advancing it with the community

That said, that still leaves a large audience for direct s3 resumable uploads
and I’m thankful aws offers it!

~~~
michaelmior
As far as vendor lock in, it seems like there are a large number of other
vendors supporting the S3 API, so this doesn't seem like a huge concern.

~~~
kvz
That’s a fair point. And I guess with e.g. Minio you could selfhost too.

S3 is great and in fact, at Transloadit we deploy a content ingestion network
(reverse cdn) of many regional tusd servers, close to our customers’ end
users, but they all ultimately save to S3 using multipart. We’re happy S3
customers.

So why the extra layer. Because this let’s us offer resumability below 5MB,
lower regional latencies, roll our own auth, and switch to a different cloud
provider without introducing breaking changes at the customer facing side
(assuming the new cloud bucket provider does not offer an S3 compatible
interface, or even just a slightly incompatible one)

Ultimately you’re still locked-in with AWS protocol-wise, and there’s no
community platform for advancing it, so addressing any of these issues is
going to be hard.

~~~
michaelmior
My comment wasn't intended to a dig at tus. Like the work you all are doing!

~~~
kvz
<3

------
eps
If I read the spec correctly, PATCH method is actually more of APPEND, no?

It would seem logical and practical to allow PATCH to modify any part of a
resource that is already present on the server and/or to extend it by
appending. This would also make the whole thing useful beyond resuming of
interrupted uploads, e.g. to allow for rsync-style updating of existing files.

~~~
kvz
Yes, APPEND is not an official HTTP method though. Allowing to modify parts at
any location makes things a little bit more complex and comes with some
overhead. If you do need to upload multiple chunks simultaneously, you can opt
into our Concat extension however, which does exactly that. Our latest blog
posts has some images to illustrate.

~~~
eps
What overhead is that exactly?

My point is that you appear to be pushing for adoption of an extension that
handles one specific use case for PATCH, when a more general extension is
trivially possible with little to no extra effort.

~~~
kvz
(I hope I understand your proposal correctly, I fear I might not, if so please
clarify, but) more chunks come at the expense of more requests. After a
connection drop each separate chunk needs to be renegotiated and transmitted.
For some use cases that trade off is well worth it, like when latency is low,
but tcp settings or QoS policies won’t let you saturate single connections, so
tus does offer ‘sending multiple chunks by default and in parallel’, as an
opt-in, via the Concat extension.

If your question is why not make Concat the default mode of operation, the
additional roundtrips are the reason. For fragile connections these are often
very costly, and we want tus to really shine in those situations, by default.
If your users are all operating on big tubes, you’ll likely want to deploy
Concat, but that’s not an assumption we want to make.

------
digianarchist
The HTML5 FileAPI has been around for a few years now yet a lot of sites don't
support resumable uploads. I know it adds a bunch of complexity server side as
you have to restitch those pieces together but it makes for a good user
experience.

~~~
kvz
I hope with a client like [https://uppy.io](https://uppy.io) and a server like
tusd, it’s much more manageable these days. Less boilerplate writing and more
battle tested components for sure.

------
aiCeivi9
Slight Offtopic - why after so many years Chrome & Firefox have so poor
support for resuming interrupted file downloads? In case of Firefox I am
almost sure it was better in past. I have to use 'wget -c' or
[https://www.freedownloadmanager.org/](https://www.freedownloadmanager.org/)
for bigger filles.

~~~
chrisrhoden
As I suspect you may already know, this is dependent on the server 1)
indicating support for byte range requests and 2) correctly implementing it.

I don't think I have noticed Firefox getting worse at this over time, but I'm
not downloading large files every day. Would you be willing to share where
you're noticing this?

------
speeq
See also:
[https://news.ycombinator.com/item?id=10591348](https://news.ycombinator.com/item?id=10591348)

------
ioquatix
There is a ruby implementation too: [https://github.com/janko-m/tus-ruby-
server](https://github.com/janko-m/tus-ruby-server)

~~~
kvz
Love the work that Janko is doing in our ecosystem! There are implementations
for most major languages. So a tus server could even just be some php code
that you install with composer and add to your existing Apache setup.

------
amelius
Finally a Request-For-Comments that actually contains a comments section!

------
treve
This came a long way since 2013. Congrats, looks very robust now!

~~~
kvz
Thank you for the kind words!

------
JdeBP
Zawinski's Law needs some revision. Not only do WWW apps expand until users
can chat asynchronously, but WWW protocols expand until they incorporate
ZMODEM. (-:

------
silvestrov
Tus-Version: 1.0.0,0.2.2,0.2.1

seems like over design. The list will get very long over time.

Just use a single integer instead and have the header include min and max
version supported. E.g.

Tus-Version: 1-4

meaning it supports version 1 thru 4. No reason to be able to say version 1
and 4 but not 2 and 3.

~~~
kvz
We are discussing this very topic here [https://github.com/tus/tus-resumable-
upload-protocol/issues/...](https://github.com/tus/tus-resumable-upload-
protocol/issues/96) — it has stalled a bit so I would be very happy to see you
or other interested/concerned HN readers weigh in. People sharing concerns on
GitHub is the main way the protocol has progressed.

------
aaaaaaaaaab
What’s wrong with HTTP PUT with Content-Range?

~~~
treve

      > An origin server that allows PUT on a given target resource MUST send
      > a 400 (Bad Request) response to a PUT request that contains a
      > Content-Range header field (Section 4.2 of [RFC7233]), 
    

[https://tools.ietf.org/html/rfc7231#section-4.3.4](https://tools.ietf.org/html/rfc7231#section-4.3.4)

~~~
zzo38computer
Maybe that should be fixed, then. HTTP PUT with the range specified seem to me
it would be sensible.

~~~
treve
Responding with 400 Bad Request is actually something that was added after
some servers allowed Content-Range on PUT and others didn't.

It was never standard, but the end-result was that some clients assumed PUT +
Content-Range would work, which meant that some servers would apply the change
while others would ignore the header and overwrite the entire resource with
the chunk.

There's no sane way to add support for this header and make older servers
behave correctly, so now we have better facilities for this.

The standard way is to use PATCH + a mimetype that describes the update +
perhaps using Accept-Patch to find out what formats are available. It's
extremely doubtful that Content-Range for PUT will ever be standard. If
there's going to be a future standard, it's likely PATCH based.

It could be possible with PUT and a new 'Expect' header, but not sure if that
gives any advantages now over PATCH.

~~~
zozbot123
> There's no sane way to add support for this header and make older servers
> behave correctly, so now we have better facilities for this

KISS. Endow the "400 Bad Request" server response with a special header that
acts like a cookie or nonce, with the semantics "this server does support
Content-Range uploads and won't corrupt your resource". If the client resends
the PUT + Content-Range request with the correct cookie/nonce added to it, it
has acknowledged this semantics in turn, and the upload can now go through.
This adds a roundtrip, but it's still trivial compared to what's being
proposed here, and keeps the semantics of PATCH open for more complicated
cases.

~~~
irishsultan
You still need to update all those old servers to stop ignoring the Content-
Range and abort in it's presence.

------
dcbadacd
Uhh, 206 partial content??

~~~
wtfrmyinitials
206 is for downloads, not uploads.

~~~
dcbadacd
Oh, right. I misread the title.

------
eximius
Is there a TL;DR? I see the whole spec is there but I don't have time to read
it just this second.

Does it use anything fancy like fountain codes or does it just renegotiate
chunks each time or something else?

~~~
kvz
The latter.

1\. The client POSTs, this allocates a unique Location which the server
returns and

2\. the client saves this (e.g. in localStorage) along with local file
identifiers so it can be looked up later and can

3\. query that URL to check how many bytes were already received, and then

4\. PATCH the remaining bytes

Repeat step 3 & 4 on failures/resumes.

------
gsich
rsync?

Yes I know this is mainly for browsers.

~~~
kvz
Yes for browsers it’s cheaper to build upon http, and it let’s you move
through airport/hotel/corporate firewalls without problems.

Tus is also used in datacenters for high throughput & reliable transmissions.
Probably in most cases rsync is a sensible choice, but sometimes maybe you
already have tus, http based auth, loadbalancing, etc in place that you want
to leverage, or maybe you want to avoid exchanging ssh secrets

