Hacker News new | past | comments | ask | show | jobs | submit login
HTTP/1.1 just got a major update (evertpot.com)
248 points by treve on June 7, 2014 | hide | past | web | favorite | 73 comments



Why is it not called HTTP/1.2? Or, how will clients (or servers) tell the difference between a peer implementing RFC 2616 (HTTP/1.1 old version) and RFC 723x (HTTP/1.1 new version)?

Could it be that there is so much software hardcoded to look for "HTTP/1.1" that a "HTTP/1.2" string would break them all?


Because it does not really change the protocol. It clarifies details and implements spec fixes (e.g. matches the spec better to actual real-world use).


There are significant changes of semantics according to TFA, for example:

> Default charset of ISO-8859-1 has been removed

> The 204, 404, 405, 414 and 501 status codes are now cachable.

> The Location header can now contain relative uri's as well as fragment identifiers


All three are things pretty much all HTTP/1.1 clients have done for over a decade. It's incompatible with RFC2616, yes — but not with implementations of "HTTP/1.1".


Python's requests library maintainers insisted on sticking with this annoying and surprising rule (default charset of ISO-8859-1), because that's what the RFC says they should do. Hopefully, they'll reconsider now.

https://github.com/kennethreitz/requests/issues/1737

https://github.com/kennethreitz/requests/issues/2086


Can you send a 308 permanent redirect to existing software and get the expected behavior?


Try it out: http://webdbg.com/test/308/

I get failures in Chrome and IE. Firefox passes if I click OK in a funny dialog.


> Firefox passes if I click OK in a funny dialog.

308 preserves the HTTP verb[1], and the form on that page uses POST. POST is not idempotent[2], which means using it more than once with the same parameters may not yield the same output. For example, POSTing this comment form twice would append to the resource twice; as opposed to GETing twice which just returns the resource unmodified both times.

Firefox correctly (under the old RFC for a 301 redirect[3]) asks for confirmation before automatically repeating a request that is not guaranteed to be safe to repeat. Some implementations will instead convert the request into a GET, which is why 308 was needed in the first place.

[1]: http://tools.ietf.org/html/rfc7238#section-3

[2]: http://tools.ietf.org/html/rfc2616#section-9.1.2

[3]: http://tools.ietf.org/html/rfc2616#section-10.3.2

Edit: links


No, idempotency doesn't matter because the action is not applied when a 3xx is returned. Proof: permanent 3xx replies are cacheable.

The reason confirmation is asked is because the user might not wish to apply the action to the new URI. (Codified as "safety" in the new 1.1.)

Sources: your links and http://tools.ietf.org/html/rfc7231#section-4.2.1


Responses to POST requests are only cacheable when they include explicit freshness information[1], which is not the case with the linked test page.

[1]: http://tools.ietf.org/html/rfc7231#section-4.3.3


Worked in iOS Safari but failed in iOS Chrome.

So, this new "HTTP/1.1" feature broke my existing (old-)HTTP/1.1 chrome browser!

If they had called it HTTP/1.2 they could send a non-308 redirect to HTTP/1.1 clients and 308 to HTTP/1.2.

Instead, we now have servers and clients both speaking "HTTP/1.1" (whatever that is this week) not able to interoperate.

Poor job.


RFC 7238 with the 308 status code isn't part of HTTP/1.1 (even under the new revision, which is in RFCs 7230-7235), its an experimental extension that expressly notes that implementers must be aware that HTTP/1.1 clients not specifically written to that extension will fall back to the behavior for status code 300 when status code 308 is encountered, and that 308 should not be used where that behavior is not acceptable. (This is the standard mechanism for extensibility in response codes within the existing high-level groupings in HTTP/1.1.)


It is working in Chrome Beta for Android (v36)


Per the applicable RFC (7238):

   Section 6 of [RFC7231] requires recipients to treat unknown 3xx
   status codes the same way as status code 300 Multiple Choices
   ([RFC7231], Section 6.4.1).  Thus, servers will not be able to rely
   on automatic redirection happening similar to status codes 301, 302,
   or 307.

   Therefore, initial use of status code 308 will be restricted to cases
   where the server has sufficient confidence in the client's
   understanding the new code or when a fallback to the semantics of
   status code 300 is not problematic.


The argument is that the two should be able to transparently interop together, and specifically that RFC 723[0-5] simply codifies the way HTTP/1.1 already works in the real world.

We'll see how well that goes.


Shouldn't minor revisions of HTTP interop anyways? I.e. a HTTP/1.2 server should be able to talk with a HTTP/1.1 client even if they use different minor versions in their protocol string? So, it "SHOULDN'T" hurt to bump the version to 1.2, plus it would make it easier to identify up to date spec compliance?


Great writeup. One issue...

   It's now suggested to use the about:blank uri in the
   Referer header when no referer exists, to distinguish
   between "there was no referrer" and "I don't want to 
   send a referrer".
For the sake of privacy would it not be better if there was no such distinction. Basically now any privacy conscious services need to add 'about:blank' as the referrer when users do not want to have their behaviour categorised and fingerprinted?


If a user doesn't want to send the referrer when there is no referrer, no referrer should be sent. This then allows sites to distinguish between direct traffic from users that don't block referrers and traffic with blocked referrers. I wouldn't expect this to be a significant concern, because the volume of actual direct traffic is not very large.


> This then allows sites to distinguish between direct traffic from users that don't block referrers and traffic with blocked referrers

Any example of benefits for servers to distinguish direct traffic vs. blocked referrers?


When analyzing traffic sources for your site, you could use this to remove noise created by privacy conscious users. For example, if you wish to evaluate the efficiency of a magazine add, today you can't distinguish between ad conversions and privacy conscious users.

It'll take a while for clients to be compliant, if they'll ever be, though.


> if you wish to evaluate the efficiency of a magazine ad

Sorry I still don't get it. No referrer or about:blank are both "noise" in such case, I still don't see how the distinction is useful to the server to evaluate efficiency of a particular ad.


"about:blank" usually means "was opened from an external program, such as an IM client".

"No referrer" means "A referrer may have existed, but inclusion of that information was explicitly declined as a part of the request".

Both are useful.


Use a custom landing page url in the magazine ad that's not linked online.


The standard doesn't seem to suggest that. Quote from http://tools.ietf.org/html/rfc7231#section-5.5.2

    If the target URI was obtained from a source that does not have its
    own URI (e.g., input from the user keyboard, or an entry within the
    user's bookmarks/favorites), the user agent MUST either exclude the
    Referer field or send it with a value of "about:blank".
Am I missing anything?


Yep.

Yet another way to fingerprint.


And I just finished implementing 2616 in a toy server of mine. Sigh. C'est la vie.

I'm glad this new spec apparently resolves a lot of ambiguities. I hated reading 2616 and some specs it depended on (email, URI, etc).


These specs clarify 2616. 'Major' may have been a poor choice of words, but if you were 2616 compliant, you should be largely compliant with these specs as well.


Is your toy server open source? I would love to see the source.


Are you lacking OSS HTTP/1.1 servers to study? There are tons.


I don't think splitting it up like that is such a good idea; now, instead of searching through one file, I have to remember that there are several and look through them all, just for one conceptual protocol. (TCP has a similar issue, although most of it is still in 793.)

As for the extra verbosity, I'm not sure what to think; while some things may be specified more precisely, standards should also attempt to be concise and to-the-point. Some of the sentences in the new RFCs seem almost parenthetical (e.g. look at the description of GET.)


OTOH, that means I don't have to dive through the minutiae of response message format when I'm just looking for the basic header stuff. All the important concerns (core, caching, conditional requests, auth and forwarding) get their own RFC and are thus easier to skim and search through. Although 308 and Range (and Prefer) also getting their RFCs is a bit weird. Likewise, syntax and routing get RFC 3270 so if you're implementing a client or server the reading experience should be much tighter,


Copy+paste them into one text file?


Finally, the spec is crystal clear that message bodies on GET requests are not illegal.


Pardon my ignorance but how are GET payloads useful? Does that not violate REST principles?


Elasticsearch: it accepts JSON bodies in GET requests to define the parameters of a search. These can get quite large so it's preferable to encoding everything in the query string. The operation is read-only, so an argument can be made that a GET makes more sense than a POST.

That said, I POST my search params to Elasticsearch anyways.


Aren't message bodies on GET supposed not to impact the returned result at all?



Is that really restful though? If you need that many parameters to specify which resource you mean, you lose pretty much all of the benefit of a restful architecture.


GET vs. POST is a question of whether the request is intended to change state. Using JSON for a query is nice for preserving structure (e.g. nesting a Boolean expressions, not needing PHP-style array notation, etc.) but it doesn't change the idea that it's a GET request which can safely be repeated without affecting server state, may be cached for other clients, etc.


Is there something wrong with violating a REST principle?


There is if you don't want to lose advantages that following them bring. That said, I don't see how this would break any REST principles.


There's something wrong with breaking the semantics of GET; all the revision does is clarify that the syntax of HTTP is method-independent so that message bodies aren't syntax errors, but GET bodies still have no semantic meaning and ascribing one to them is likely to mean you server is doing things that clients, caches, etc., won't correctly handle.


I'm not sure how. As long as a GET request doesn't change your resources, it should be fine.

The reason I can think of for sending a payload in a GET is if the data you wanted to send in a query string is too large.


It's still a bad idea. It has no defined semantics, meaning that servers, clients, proxies and anything else are free to ignore or drop it.

It also defeats caching and any other reasons why you would want to use GET over some other http requests.

I would argue that the only reason left why you would use GET for that, is because it's aesthetically pleasing.


> I would argue that the only reason left why you would use GET for that, is because it's aesthetically pleasing.

Not all systems support practically unlimited URI payloads[0]. In the past, this forced either contorsion (gzipping URI payload) or piping everything through POST.

For instance, let's say you've built a music identification system, people can send in files and get information about the file's data. This is a purely readonly request, the backend just uses its information to identify the file and fetch whatever data and metadata on the file's content it has. But historically there was no choice but to route it via a semantically unsuitable POST.

[0] Apache defaults to 8190 bytes, IIS to 16384, Safari and Firefox support >100k but MSIE is limited to 2048


Another example: Searching for images similar to a specified one.


A great example, image search already being fairly common (and being useful to all sorts of purposes)


> It also defeats caching and any other reasons why you would want to use GET over some other http requests.

It does not break caching.

Instead of just using the URL as the caching key, use URL+body. Same URL and same body = same response.


> Instead of just using the URL as the caching key, use URL+body. Same URL and same body = same response.

Well, except that since the body isn't defined as having semantics that determine the response, an cache based on the HTTP protocol spec has no reason to do that.


Talking from a practical standpoint, to try and have GET requests with large parameters you could put the data in the message body, and then a hash of the message in the URL to avoid caching issues.


Do any caches operate like that?


Not many I would think. For compatibility with existing caches, you could do something like add an extra parameter to the URL with a hash of the message body. In the case of e.g. an image or music file given elsewhere in this thread, the cache doesn't actually need the full contents of the image file (like the backend does), just something like a hash to match it to a previous request.


My thoughts exactly - GET with payload virtually means no caching.


If you've got to use body payloads, requests are probably way too big and unique for the involvement of caches browser or intermediate caches to be a good idea.


They aren't. HTTP has a layered design, and the structure of requests is a lower level feature than the semantics of specific request type -- a request can always have a body, but that doesn't mean that there aren't request types (e.g., GET) for which the body is meaningless.

> Does that not violate REST principles?

Most attempts to impose semantics on GET bodies would violate REST principles as well as breaking HTTP (given that GET responses are cacheable and the body is treated as not relevant to the response.)


From the horse's mouth; they are not useful: https://groups.yahoo.com/neo/groups/rest-discuss/conversatio...


Having the parameters of your GET request in the query string or the request body doesn't have any semantic difference. You're just fetching a resource given some parameters, which doesn't violate REST principles.


Is it really so different to URI encoded parameters?


Phusion passenger throws 500 on get requests with body, I guess now they might change that.


Actually we've already changed that. It's slated for 4.0.45, due next week.


Forwarded and a permanent re-direct are nice changes. This is a welcome change.

It will likely be awhile before widespread adoption, but to see a standard move forward in such a seemingly small, but considerable way is great.

Kudos to those involved. I can't imagine it was an easy feat.


The clarifications are very welcome but I wish it included embedded unit-less progress information on chunk encoding without having to rely on a side channel [0] (shameless plug, but any progress — ha ha — on this front would be fine)

[0]: https://github.com/lloeki/http-chunked-progress/blob/master/...


The thing I'm most surprised by is the change in the default cacheability of 404 responses from uncacheable to cacheable. Though I guess since defaulting to cacheable doesn't mean that responses must be cached, so you can still be compliant with RFC7231 by never caching 404s.

RFC2616:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13...

"A response received with [a status code other than 200, 203, 206, 300, 301 or 410] MUST NOT be returned in a reply to a subsequent request unless there are cache-control directives or another header(s) that explicitly allow it."

RFC7231:

http://tools.ietf.org/html/rfc7231#section-6.5.4

"A 404 response is cacheable by default"


As someone new to HTTP, what would be the most pragmatic way to read through this RFCs in the context of building web applications or HTTP APIs but not to the level of wanting to implement a http server or http client? for example: order of reading, what can be avoided, what is not widely use or implemented, the basics of the protocol, etc...


HTTP methods, status codes, and headers are all you need to understand for developing at the level of HTTP APIs.


The problem being that it's been spread all over the specs.

The essentials would be:

* 7231 which covers core methods, statuses and basic headers. It looks like the spec authors have also added security considerations sections

* 7232 is probably a good read as it covers conditional requests (304 and 412 statuses)

* 7234 covers caching and cache controls, don't skip it. Even if you don't want your response to be cached, you need to know how caching works, which actors are involved and how to disable it

* 7238 is the 308 redirection, understanding it and the background for its introduction is a good idea and will help with understanding other redirection statuses (301, 302 and 307)

"Various others" would be

* 7233 is Range requests, can probably be skipped unless you have big media payloads. On one hand it's underused, on the other hand it has limited general applicability

* 7235 is Authentication, can be useful for API (the user experience being terrible for browsers) but can probably be skipped unless absolutely necessary

* 7239 is forwarding, to understand what happens when your HTTP endpoint is behind a proxy. Although I'd guess proxies don't implement it yet the ideas already existed as non-standard extensions and reading this is a good idea for "real-world" concerns. Not completely necessary, but useful

* 7240 is the Prefer header. It's a fairly recent and quite advanced addition, probably useful but not utterly necessary

You can ignore

* 7230 is about req/resp format. The only interesting parts are the URI and Host parts which your HTTP library probably handles for you

* 7236 complements 7235 with auth scheme registration for standard auth types. Only read it if you've read 7235

* 7237 is a registry of additional (wrt 7231) methods, mostly from WebDAV


And media-types and rel types if you're developing HTTP APIs


On a high level:

You can certainly skip most of "Message Syntax and Routing". That's the stuff that concerns server and client implementers that just have tcp sockets to work with.

I would absolutely read "Semantics and Content". It's a really good idea to be aware of "Conditional Requests", and you only really have to read "Caching", "Range requests" and "Authentication" if you need to know about those features.


I wonder what Googlebot will make of this. There is a lot of debate already amongst search engine developers on which type of redirect to use.


A crawler like that should typically only do GET requests. The 308 is really mainly useful for HTTP clients doing for example a PUT or POST request on some url, and the server wants the client to repeat that exact request on for example a different server.


Search engines largely handle all redirects the same, because they know nobody uses them correctly. If you aren't seeing the behavior you want, you can use Webmaster Tools or metadata to fix it.


RFC7238 is still marked as experimental.


I am still happily speaking HTTP/1.0 [1] :)

[1] http://mr.gy/software/httpd0/




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: