
Serverless File Uploads – Netlify - peterdemin
https://www.netlify.com/blog/2016/11/17/serverless-file-uploads/
======
mnutt
This article makes it sound like upload requests are restricted to your domain
via CORS, but that is one of the pitfalls of thinking about everything through
a serverless lens: while another site could not use javascript to directly
upload files to your s3 bucket, a malicious user could absolutely make a
backend request to receive signed tokens for uploading to your s3 bucket.
Additionally, it doesn't look like the policy is locked down so they could
overwrite your existing files with malicious ones.

~~~
foota
A practical implementation of this would probably want to do a couple things:

1\. As you mention, make sure they can't overwrite files

2\. Have a content type whitelist on the requestUploadURL function

3\. Maybe authentication to keep track of who is submitting these requests?

Assuming that you're okay allowing someone to upload files with a given
content-type to your bucket is there anything I'm missing?

~~~
mnutt
For #1, I would probably disallow writing to arbitrary paths and instead
generate a path prefix using a UUID and return that to the client to ensure
that every upload is unique.

------
mamisp
How is using a node.js server to generate signed requests to upload files onto
S3 servers, "serverless"?

~~~
jedberg
It's serverless because you (your entity, company, whatever) don't have to
admin any servers to make it work.

~~~
khedoros1
They might call it "serverless" for that reason, just like someone might say
they drove to work "carless" by carpooling with a coworker, but it's a little
misleading.

~~~
jedberg
When you get to work via carpooling, do you say, "I drove to work" or do you
say, "I carpooled to work"?

~~~
khedoros1
Doesn't matter. Driving and carpooling both imply the use of a car. Uploading
denotes a server at the receiving end, even if it's a dynamically-provisioned
transient server uploading into some cloud data store.

------
chrisballinger
We built something similar to this for our Kickflip.io HLS/S3 live video
streaming service. The original version is closed source but we rewrote a
generic open source Python library called storage_provisioner [1] for a
client. It's minimal and super simple.

We also made a Django-Rest-Framework module that wraps storage_provisioner
called django_broadcast [2]. Working with AWS/S3 can be a pain, hopefully
these tools can help.

1\.
[https://github.com/PerchLive/storage_provisioner](https://github.com/PerchLive/storage_provisioner)

2\. [https://github.com/PerchLive/django-
broadcast](https://github.com/PerchLive/django-broadcast)

------
amelius
Silly nomenclature. This isn't "serverless" at all.

Suggestion: "File uploads using 3rd party server".

As an added benefit, this title shows that the data might not be safe from the
curiosity of other entities.

~~~
ceejayoz
As with "hackers" vs. "crackers", I suspect this little terminology war is
already lost.

~~~
mrmondo
Hackers hack computers and networks, Crackers crack software. There is a clear
difference between those two words.

~~~
ceejayoz
A clear difference that's meaningless to and ignored by most, as with the fact
that "serverless" isn't really serverless. Like I said, I suspect the war's
lost.

------
shakeel_mohamed
What's wrong with doing all of this on the front-end? I recently did just that
after generating the signature and policy locally.

See this guide:
[https://aws.amazon.com/articles/1434](https://aws.amazon.com/articles/1434)

~~~
CaveTech
You leak your secret key to every user who can view that page.

~~~
petethepig
No you don't. You leak AWSAccessKeyId which is not a secret. You use a
signature to authorize the file upload.

~~~
CaveTech
I should've been more verbose. You cannot calculate the signature client side
without leaking the key. So you need a server. That step is identical to what
this "serverless" implementation is doing.

~~~
shakeel_mohamed
Correct. But the signature doesn't necessarily need to be per-file upload, so
I have it embedded in JS. For my use case, saving the extra network hop is
worthwhile.

~~~
kuschku
So I can extract it from the JS, and just upload terabytes?

------
mrmondo
I really wish people would stop using the term serverless, it's not useful and
it's highly misleading.

------
danielrhodes
You do need a server to create a token from your access key and secret.
However, this doesn't really go very far in protecting your bucket, as
somebody could just grab that token and upload whatever they want.

So an additional layer of security is creating an upload bucket with a policy
where all objects over 24h old are deleted. When somebody finishes uploading a
file, you ping your server and move the file from the upload bucket to the
real bucket.

Another trick is putting Cloudfront in front of that bucket. You can then
upload to any Cloudfront server, which will then put the file in your bucket
-- the reduced latency to a Cloudfront (vs S3) will increase the speed at
which you upload by quite a bit.

------
cyberferret
Nice work. I have several web apps that run on EC2 instances, that have my
users uploading via Browser -> EC2 -> S3. This can cause high latency on some
of my smaller EC2 boxes, which is annoying, it else forces Elastic Beanstalk
to spool up more instances unnecessarily when it thinks traffic is being
flooded when large files are being uploaded.

I've always wondered about the best strategy to go 'serverless' with the file
uploads and have the user's browser essentially upload direct to S3, and this
tutorial gives a great insight into that - thanks.

------
curiousAl
"Each returned URL is unique and valid for a single usage, under the specified
conditions."

Where and how is the url actually invalidated after it is used? (or are you
relying on expiration as invalidation?)

------
jest3r1
"Serverless"

So is this 100% client-side, or is there a (server) dependency?

------
asciimike
What I've found people really want when they say "serverless" in this context
is, "direct file upload without a proxy server", which is basically what BaaS'
like Firebase and Parse do...

<pitch>

Firebase Storage
([https://firebase.google.com/docs/storage/](https://firebase.google.com/docs/storage/))
provides clients that perform secure, serverless uploads and downloads.
Instead of doing the dance with a server minting signed URLs, it uses a rules
engine that lets developers specify declarative rules to authorize client
operations. You can write rules to match object prefixes, restrict access to
particular user or set of users, check the contents of the file metadata
(size, content type, other headers), or check a current file with the prefix
to not overwrite it.

If HackerNews formatting were more forgiving of code snippets, I'd post one
here, but instead have to link to the docs
([https://firebase.google.com/docs/storage/security/secure-
fil...](https://firebase.google.com/docs/storage/security/secure-files)).

We've found that this model is more performant and less expensive (no need for
a proxy server), as well as lower cognitive load on developers, as they think
about what they want the end result to be, rather than how they need to build
up the end result.

And since I know people will bring it up: there are definitely limitations in
flexibility (you're using a DSL), and a steeper learning curve for the very
complicated use cases. The goal here is make it trivial for 90% of use cases
and possible for 9%; rather than making it possible for 100% and equally
difficult for everyone. Tradeoffs...

</pitch>

And if you want other examples, Parse did a similar thing with role based
access control to a Parse File, allowing direct client upload and access by
only a set of users. S3 and GCS can do this as well, assuming their
(relatively coarse) IAM models are granular enough for you (and you're a
authorized principle in their systems, which is often the harder thing).

Bringing this full circle, "serverless" typically involves a switch from
writing code (imperative) to writing config (declarative). You're not
validating JWTs signing URLs, or writing middleware, you're letting services
know how to configure those primitives for you. In some ways the Serverless
framework does abstract this for you (hey look, I didn't provision a VM), and
in some ways it doesn't (you still wrote code to generate a signed URL).

Disclosure: I built Firebase Storage

~~~
walterbell
Thanks for pointing out that "serverless" is a rebranding of Backend-as-a-
Service.

Any examples of the 9% "possible" Firebase use cases which are less easy to
configure?

What do you think of the Rebol/RED approach to DSLs? There are also a few
Ocaml papers on DSLs for finance.

Are particular languages better suited to implementing DSLs for
configurable/declarative interfaces to a BaaS like Firebase?

------
madmod
I use this technique extensively in several production systems.

As others have mentioned having an expiration policy is a good idea. Also you
can mitigate charges from malicious activity by using rate limits on the
signing endpoint. (API gateway supports this.) Using infrequent access or
reduced redundancy storage might also be a good idea if you expect a lot of
traffic. It's also good to limit the CORS policy on the bucket to the needed
domains and headers.

Signed metadata headers are very useful when combined with S3 event handlers
(SQS or straight to Lambda.) using a HEAD request on the uploaded objects.
This is a great technique for post processing an upload without requiring
client trust or an external data store. (With a separate falliable request
which could lead to consistency issues.)

Edit: It is also critically important to have some randomness in each key path
so it is ungessable. Otherwise user files would be overwriteable by an
attacker. (Many file names are easily guessable and an attacker with many
tries could eventually stuff malware in for example.) I used guids for this
because they are both URL and S3 key safe. If keeping the original file name
is needed I put it in a metadata value and rename the file on download using a
Content-Disposition header. Making the S3 headers work with symbols in file
names can be tricky but encoding it as a JSON string works around most issues.

In order to overcome the 30 second request limit in API gateway for longer
post processing while still offering realtime client feedback you can set up
an S3 event handler to trigger the post processing lambda which then updates a
DynamoDB record with the S3 key as it's id. A status endpoint lambda is then
polled by the client with the S3 key for status events.

For more complex post processing and client side workflows I have used key
prefixes (folders) each with seaprate event handlers or CORS configurations.
IAM polices with conditions including S3 key prefixes are used to restrict
access. Using the S3 API copy command can move large objects quickly between
workflow steps.

Also enabling server side encryption is a must imo. Be sure to specify AWS
signature version 4 in the S3 constructor so that all parts of the request are
signed. (Otherwise some older regions may not sign metadata headers.)

Also the S3 API copy command has an interesting append feature which can be
used to build objects iteratively. I once toyed with the idea of using it to
create large zip files of many S3 objects efficiently but ended up not needing
it. Someday I would like to try that because it could be great for a lot of
web apps where users can select a random list of files to download.

Also I (re)implemented most of the above this week using CloudFormation and
the newer AWS Serverless Template (not the serverless.com project but the
actual AWS feature.) which allows for really easy deployment.

