
S3: Plus sign is interpreted as space in the path part of URLs - ysh7
https://forums.aws.amazon.com/thread.jspa?threadID=55746
======
ryanbrunner
I don't necessarily think this is even breaking the HTTP standard. While '+'
should not be interpreted as spaces as part of a URL _while it 's being
treated as a URL_, the HTTP spec doesn't specify / care what file that may map
to on a server.

Edit: As mentioned below, this isn't correct since URLs should be able to be
escaped and return the same resource, and an escaped + differs from an
unescaped + on S3.

~~~
jamix
Exactly! The OP's point is summarized in this sentence:

> My point is that the spec requires + to be escaped only inside the
> querystring.

So what? What the standard mandates for query strings is irrelevant here. It's
up to the server how to interpret and map the URLs. "Unconventional and
unfortunate" \- yes, but breaking the HTTP spec? No.

~~~
zAy0LfpBZLC8mAC
Please read the actual spec before telling poeple whether something is
conforming to it or not. Just making stuff up is exactly how this mess is
created. The relevant section in this case:

[https://tools.ietf.org/html/rfc3986#section-6.2.2.2](https://tools.ietf.org/html/rfc3986#section-6.2.2.2)

------
tazjin
Amazon has a difficult time with the HTTP standard sometimes. Last time I had
to touch an AWS project we discovered a bug[1] in the C++ code backing a Java
library (sic).

They had implemented their own HTTP client, but forgot to add the "Host"
header to requests which is required by HTTP 1.1.

Interestingly this client sent requests only to their own services, which
means that they either released that without testing it or the backend once
accepted faulty requests.

[1]: [https://github.com/awslabs/amazon-kinesis-
producer/issues/61](https://github.com/awslabs/amazon-kinesis-
producer/issues/61)

~~~
hnlmorg
It's common for HTTP servers to accept requests without a host header. It's
not usually needed by the server unless you're hardening it (I don't class it
as a security issue but some security audits will flag it up if you don't
force the server to reject invalid host headers) or running named virtual
hosts (which is more common than it used to be thanks to SNI but you still
often see a 1:1 relationship between (virtual) hosts and IPs). So Amazon could
easily have tested their client on 3rd party servers and still not spotted the
problem.

As an anecdote, about 15/20ish years ago I wrote my own webbrowser. Obviously
something highly rudimentary albeit browsers were much easier to implement
back then anyway. I was too lazy to read the HTTP spec (it was a hobby project
and I was young and impatient) so a lot of what I did was trial and error. I
too wasn't sending a host header but it took long while before I ran into any
sites that rejected my HTTP requests. The web landscape was very different
back then though and IPs were plentiful but it just goes to show how servers
have coded around bad clients for years.

~~~
tazjin
> So Amazon could easily have tested their client on 3rd party servers and
> still not spotted the problem

This would still be a red flag, as the service in question is their instance
metadata service that provides authentication tokens.

Something that important should be integration-tested with the actual service.

~~~
hnlmorg
> This would still be a red flag,

Perhaps I don't understand the issue you're discussing but how would the
client working on 3rd party services be a red flag when that is the desired
behavior?

~~~
tazjin
Sorry if this was unclear: It's a client that they specifically wrote to talk
to their own services, and they're releasing it to their customers as an
official way to talk to their own services. It could not talk to their own
service.

Their own documentation refers to that library (or did at that point in time,
not sure about now).

~~~
hnlmorg
Ahh I did misunderstand you then. Sorry. Yeah that does sound bad.

------
gldalmaso
Does anyone know if this behavior persists when the bucket is served as a
website?

------
ComputerGuru
Anyone that's dealt with S3 in any capacity should be aware of this, it's
literally one of the first encoding problems to come up when dealing with
signing requests.

@dang can you please add (2010) to the title?

------
tolmasky
Funnily enough, "URLs and plus signs" is still my most up voted question on
stack overflow ( [https://stackoverflow.com/questions/1005676/urls-and-plus-
si...](https://stackoverflow.com/questions/1005676/urls-and-plus-signs) ) --
same a+b example too. 7 years later, it seems even the big names have issue
with this.

------
mike503
This burned me and because of it I can't host a specific static site on S3
because it requires plus signs. Can't change the files being uploaded due to
the system generating them... tried to rig up some sort of Akamai rewrite rule
to change it at the CDN level but couldn't get it to work.

------
mmahemoff
(Update - the original title mentioned AWS has been breaking standards since
2010. The new title is fine. Thanks for updating it.)

Little bit of hyperbole in the title imo. S3 has generally been very good at
embracing the fundamental principles of HTTP and REST, leaving aside corner
cases like this.

~~~
majewsky
Don't see a hyperbole. The title is technically correct (the best kind of
correct), although questionable from a grammar standpoint.

------
pawelkomarnicki
I change the "+" into the escaped code :-) It helped

------
mfer
&tldr; A legacy behavior is to treat + as a space. When you've been around you
need to keep backwards compatibility.

URLs and URIs have separate standards from HTTP and they have changed over
time (been replaced by newer ones).

Many years ago it was common to encode a space as a + sign. For example, the
PHP function urlencode[1] does the same thing with a + sign. If you're a PHP
user, don't use this function unless you know you need to. There are better
functions now.

[1]
[http://php.net/manual/en/function.urlencode.php](http://php.net/manual/en/function.urlencode.php)

~~~
jordanlev
> _If you 're a PHP user, don't use this function unless you know you need to.
> There are better functions now._

Don't leave me hanging! What are the better functions now?

~~~
godDLL
`rawurlencode()` is what you're after.

And here is where you'd ask that question, a coding forum
[https://stackoverflow.com/questions/996139/urlencode-vs-
rawu...](https://stackoverflow.com/questions/996139/urlencode-vs-rawurlencode)

~~~
jordanlev
Thanks!

And here is where you'd answer that question, a coding forum
[https://stackoverflow.com/questions/996139/urlencode-vs-
rawu...](https://stackoverflow.com/questions/996139/urlencode-vs-rawurlencode)

;)

------
marindez
It's reasonable they don't want to fix it because it will break existing URLs.
Welcome to the ugly world of back compatibility.

~~~
Liquid_Fire
They could make it configurable on a per bucket basis (perhaps defaulting to
the old behaviour if necessary; ideally you would make the conformant
behaviour the default, of course).

That way you could opt in to the standard conformant behaviour if you require
it, but they can still keep backward compatibility.

~~~
majewsky
I'm not familiar with how S3 works in detail, but I imagine this could require
additional API calls in the backend which increases the latency and resource
usage of API requests. In the worst case, such a change could easily require
Amazon to purchase dozens, if not hundreds of additional servers.

~~~
acranox
With the rate of AWS growth, they probably bought dozens more servers in the
time it took you to write out your response. :)

~~~
majewsky
Likely, but Amazon didn't get where they are by ignoring small costs.

------
_pmf_
Should this really be considered to be a spec violation? It's a restriction,
sure, but S3 is to be considered a specific application with specific
constraints.

~~~
rkeene2
Does S3 use HTTP ? If so, it's a violation of the specification of S3 by way
of incorporation of the HTTP specification.

Otherwise, if S3 does not use HTTP, we would need to see the S3 specification
to determine if it (the implementation Amazon uses) is in violation

------
bmn__
In response to the reported RFC violation, elving@AWS writes: "I agree that's
unconventional and unfortunate." My corporate bullshit detector is off the
scale.

In earlier times, we would have both the ability and the balls to treat that
unwillingness to uphold the rules we all set out with as damage to the
Internet, and route around it. But sadly, AWS has become too big to fail, so
the engineers introduce special cases into their products and deploy them.

~~~
mmahemoff
To the contrary, I think it's actually a refreshingly honest response. A
"corporate bullshit" response would be to ignore it altogether, try to argue
it's a feature not a bug, or give a canned statement about how we respect the
environment and want the world to be a better place.

The AWS support is explicitly acknowledging it's an issue, while giving a
rational reason why it probably won't be fixed (even if you disagree with the
reason). The back-compat concern is unfortunate but a good argument can be
made it's not in users' interests either (beyond being just a cost to AWS to
implement the change).

~~~
cm2187
But can they even change it without risking to break tens of thousands of
websites?

~~~
mmahemoff
They could, by versioning the API (e.g. add a /v2/ to all paths), but that
would benefit no-one and should only be done alongside any number of much more
important changes.

~~~
bpicolo
That opens a whole new can of worms. They have not deprecated a single part of
the s3 API ever, in history (okay I'm partially lying, the SOAP API is now
officially deprecated in the sense that they won't add new features to it).

They're not going to change it just for the sake of some minor path issues
that have a workaround. (Side note: They tend to uses headers rather than
paths for API declarations). I have personally been bit by this same issue,
but I would never recommend they change pieces of the service to accommodate
it. It's handled easily in client code. What they could do is make an obvious
"gotchas" section of documentation

AWS has done a tremendous job of getting things right-enough the first time.
They have never killed an AWS service. They'd need a much bigger reason to
version an API.

