Hacker News new | past | comments | ask | show | jobs | submit login

> having two distinctly different types of strings

You must be talking about Python 2, the deprecated version for 12 years now, with support ending next year.

Because last time I checked, current, modern Python only has one type of string.




Try passing a std::string in from C/C++ code. It treats it as a byte string, which you have to prepend with 'b', and Python 3 will not do any nice casting under the hood back and forth between the two types.

Sure, I am picking out one particular use case, but it isn't uncommon to wrap C code in python scripts to mung data going in and out of it.

You are right though. String manipulation does appear to be easier than Python 2's implementation.


That's because std::string does not carry any sort of encoding information, std::string is basically a wrapper around bytes (hopefully I'm not misreading this, I'm far from an expert C/C++ programmer). Due to this, python can't make any assumptions about encoding/decoding without the possibility of getting it wrong.

"Note that this class handles bytes independently of the encoding used: If used to handle sequences of multi-byte or variable-length characters (such as UTF-8), all members of this class (such as length or size), as well as its iterators, will still operate in terms of bytes (not actual encoded characters)."

https://stackoverflow.com/questions/1010783/what-encoding-do...

http://www.cplusplus.com/reference/string/string/


b"" is not "a byte string". It's a raw byte sequence:

   >>> type(b"foo")
   <class 'bytes'>
   >>> b'foo'[0]
   102
It can hold any bytes, it just happens that one way to contruct/represent it can be done with a string-like syntax as a convenience for developers. But you can actually built it in another way, or make it hold data in any other format:

    >>> bytes([102, 111, 111])
    b'foo'
    >>> struct.unpack('I', b'\x01\x01\x00\x02')
    (33554689, )

Also, std::string is exactly that, a raw byte sequence, with some string operations attached to it. But you don't have any encoding attached to it: https://stackoverflow.com/questions/1010783/what-encoding-do...

So it makes sense that Python is treating it has a raw bytes array (what you call "a byte string"): it has no way to know that it is UTF8 or CP850 if you don't tell it.

But because of c/c++ experience or habits from python 2, one tends to confuse the concept of text (represented with the type "str" in python) with some specific low level implementation (the raw bytes array).

Python explicitly avoid this problem, by defining that either you know what it is (utf8 text, big endian number, etc) or you don't (raw bytes array). Manipulating text as a raw byte sequence manually would be the equivalent of manipulating directly the IEEE 754 representation of a number: it's not what you want for a high level scripting language, and hence it's why Python 3 doesn't do that anymore.


> Try passing a std::string in from C/C++ code. It treats it as a byte string

Because that's exactly what it is? std::string is a bytes buffer, not actual text. There's no guarantee that the contents of std::string will be in any encoding, let alone a specific one.


You need to use `Py_BuildValue` with the `s` argument to get that into a python string.


Lots of shops still use Python 2

I was writing brand-new applications in it for my job as late as early last year

Why the defiant tone?


FUD is never easy to read, and I'm not always wise enough to answer in the proper tone.

As I do a lot of Python for a living, and still work on both Python 2 and Python 3, I have talked a lot with people writing new P2 apps in the last few years.

My experience is, either you have very niche constraints, or somebody made an unwise/uneducated engineering decision. Unfortunatly, I meet way more of the second type, and of course, most of them pretend to be of the first.


> FUD is never easy to read

On what planet was my comment you were replying to FUD? I simply pointed out a few things I'm not fond of in the language (with one point in particular stemming from my own ignorance of the language, which was corrected by folks). I was quite clear in stating that python is a language with pros and cons like any other. At which point did I claim anything to be objectively wrong with the language or spread any FUD whatsoever about using it?


Ours were niche constraints (legacy hardware still in-use), as I bet much of the world is. Oh the tyranny of libc...

I would, however, like to question this wisdom of always having to run the latest and greatest. Yes there are security considerations but properly hardened, old software and runtimes run fine. You need an actually security guru, though, and not some startup promising turn-key solutions.


Yes, although at this point, Python 3 is hardly new. 12 years in IT is a long time.


The problem is that "modern python" really includes Python 2 in addition to 3. You can't practically ignore it the way you can ignore ruby 1.x. Yes, things are very slowly moving to python 3, but there is a reason most systems have python 2 and python 3 and even now it isn't uncommon for /usr/bin/python be python 2.


The fact Python 2 has been well supported because the community cares does not make it modern, nor recomended. Just like I wouldn't call php 4 modern, nor recommend it over php 5 or 7.

And the reason you don't see that for ruby or node is because their community said "move or die". And many, many projects just died. I've seen the graveyard in the corporate world.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: