

Building an invalid string in Python 2.x - bmease
https://gist.github.com/rspeer/7559750

======
twoodfin
I'm not a Python geek, but I found the C implementation for unicode strings in
CPython really interesting code reading:

[http://hg.python.org/cpython/file/tip/Objects/unicodeobject....](http://hg.python.org/cpython/file/tip/Objects/unicodeobject.c)

CPython supports several internal representations from one to four bytes per
character to optimize for space and performance. There's also a nifty sort of
Bloom filter for quick discrimination of strings that might contain characters
of interest.

~~~
gsnedders
That's new in Python 3.3, which doesn't fall foul of that bug. This is PEP 393
([http://www.python.org/dev/peps/pep-0393/](http://www.python.org/dev/peps/pep-0393/))
if you want more reading about it.

------
excitom
When working as an AIX kernel program in 1985, I set registers to a unique
value so it would be easy spot code that tried to use an uninitialized value.
My choice: 0xdeadbeef. Good to see that constant is still in use.

~~~
estebank
Whenever I find myself having to change a MAC address, I end up using
DEADBEEFCAFE.

I hope I never forget about changing them back and end up having to debug two
different machines with the same MAC (which has actually happened to me in the
wild, with two machines coming out of factory with the same MAC, talk about
bad luck and shitty quality control).

~~~
nknighthb
I've seen duplicate MACs twice in the last few years, on two different lines
of embedded/consumer electronics boards from two different factories. There
was a kind of Abbot & Costello routine that went on the first time, when a
Taiwanese colleague with limited English reported the problem to me.

------
Beltiras
I'm not seeing any meaningful exploits coming from this. You can maybe send a
request that will fail but I can't see any sort of injection taking place.

~~~
dlitz
It managed to insert invalid Unicode into a SQLite database, causing a
subsequent SELECT to fail. That's at least a DoS attack.

~~~
icebraining
Yes, but only if you're decoding user input as UTF-7, which would be insane.

~~~
doki_pen
What if you were scraping a webpage and it reported its encoding as UTF-7?

~~~
mattdeboard
...which is, in fact, exactly how this bug was exposed.

------
drunkpotato
That's really cool! Character encoding issues is something we wrestle with all
the time, and it is surprisingly hard to reason about all the ways supposedly
"string" data are handled in the course of a typical workflow. I cringe; I
hadn't even considered bugs in the encoding and decoding process itself.

------
mzs
Here's the bug (utf-7 decoder) so you don't have to login to github:
[http://bugs.python.org/issue19279](http://bugs.python.org/issue19279)

------
brokentone
This reminds me of Godel's incompleteness theorem - which I'll poorly present
as: Any system that is sufficiently complex and complete will contain legal
assertions that will disprove or destroy the system. (Those that do not are
not complete).

[http://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_the...](http://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems)
[http://www.amazon.com/G%C3%B6del-Escher-Bach-Eternal-
Golden/...](http://www.amazon.com/G%C3%B6del-Escher-Bach-Eternal-
Golden/dp/0465026567)

~~~
jerf
Neither throwing an exception nor having a perfectly-deterministic buggy
behavior is what Godel was referring to. This shouldn't remind you of anything
related to the incompleteness theorem, because it's completely unrelated.

~~~
anaphor
Not completely unrelated:
[https://en.wikipedia.org/wiki/G%C3%B6del_numbering](https://en.wikipedia.org/wiki/G%C3%B6del_numbering)
but what he/she was talking about is unrelated.

