
Let's talk about WSGI - arthurk
http://www.b-list.org/weblog/2009/aug/10/wsgi/
======
ajross
FTA:

 _The WSGI spec impresses upon its readers (or upon this reader, at least) the
overwhelming desire for everybody to just quiet down and use ISO-8859-1
instead of whatever character set is actually convenient._

Is this really true? What a disaster. If you're going to pick/encourage a
single encoding, how can that choice _not_ be UTF-8?

~~~
andyn
Looking at PEP333, the only reference to ISO-8859-1 is:

 _Note also that strings passed to start_response() as a status or as response
headers must follow RFC 2616 with respect to encoding. That is, they must
either be ISO-8859-1 characters, or use RFC 2047 MIME encoding._

start_response is to set the HTTP headers. So I suspect that's a requirement
of the HTTP spec for headers (correct me if I'm wrong).

I don't believe WSGI cares about the character set you use for the HTTP
message body.

~~~
ubernostrum
Well, that and all the places where it goes on about how all "strings" it
mentions must be bytestrings (even on platforms which don't have bytestrings),
must only contain code points lower than 0xff, must be str, precisely str and
neither any other type of string nor any subtype... and all of it to try to
pretend that Unicode doesn't exist.

Yeah, HTTP is a byte-based protocol, and yeah, headers have to be ISO-8859-1
or MIME-encoded. But that doesn't mean that the particular bytes HTTP uses
have to leak up into what are supposed to be high-level applications. There's
no earthly reason why -- with _every_ Python implementation moving to native
Unicode strings -- WSGI should still have this attitude.

~~~
ajross
OK, I'll byte: what python platforms lack bytestrings? Is there an PDP-10 port
I missed somewhere? :)

~~~
ubernostrum
CPython 2.x and Python implementations built directly on it have bytestrings.
PyPy does, as far as I know. And Unladen Swallow is a CPython fork, so it
does.

But everything else either has strings which are natively Unicode (and hence
not byte-based in the sense we want here) or run on platforms where the
underlying string abstraction is natively Unicode. This includes Jython,
IronPython, Python 3.x, etc.

WSGI would like very much for those platforms to do something else, because it
has to expend a bit of verbiage to basically say "shame on you for having
these dangerous Unicode strings, don't you dare try to take advantage of
them!"

