Hacker News new | past | comments | ask | show | jobs | submit login

Because many of these languages were created when Unicode already existed. Someone listed Java and Javascript, both of them started from the point that python 3 tries to bring.

When python was written in 1989 Unicode didn't exist yet.

As for your second argument, many people bring out Go, that had such amazing idea of using everything as UTF-8 and it works great. They don't realize that Go is pretty much doing the same thing that Python does (ignoring how the string is represented internally, since that shouldn't really be programmer's concern).

Go clearly distinguishes between string (string type) and bytes ([]byte type) to use string as bytes you have to cast it to []byte and to convert bytes to string you need to cast them to string.

That's the equivalent of doing variable.encode() to get bytes and you do variable.decode() to get a string.

What python 3 inroduced is two types str and bytes, and blocked any implicit casting between them. That's exactly same thing Go does.

The only difference is implementation detail, Go stores strings as utf-8 and casting doesn't require any work, they are just for having compiler catch errors it also ignores environment variables and always uses utf-8. Python has an internal[1] representation and does do conversion. It respects LANG and other variables and uses that for stdin/out/err. Initially when those variables were undefined it assumed us-ascii which created some issues, but I believe now that was fixed and utf-8 is the default.

[1] Python 3 actually tries to be smart and uses UCS1 (Latin 1), UCS2 or UCS4 depending what characters are contained. If an UTTF-8 conversion was requested it will also cache that representation (as a C-string) so it won't do the conversion next time.




> Because many of these languages were created when Unicode already existed. Someone listed Java and Javascript, both of them started from the point that python 3 tries to bring.

That was me in a parallel thread. Java and JavaScript internally use UTF-16 encoding. I also mentioned C, which treats strings as byte arrays, and C++, which supports C strings as well as introducing a string class that is still just byte arrays.

> As for your second argument, many people bring out Go, that had such amazing idea of using everything as UTF-8 and it works great.

Has Go ever broken backwards compatibility? Let me clarify my second argument: if you are going to break backwards compatibility, you should do so in a minimal way that eases the pain of migration. The Python maintainers decided that breaking backwards compatibility meant throwing in the kitchen sink, succumbing to second system effect, and essentially forking the language for over a decade. The migration from Ruby 1.8 to 1.9 was less painful, though in fairness I suppose the migration from Perl 5 to Perl 6 was even more painful.


Actually migrating from Perl5 to Raku may be less painful than migrating from Python2 to Python3 for some codebases.

That is because you can easily use Perl5 modules in Raku.

    use v6;

    use Scalar::Util:from<Perl5> <looks_like_number>;

    say ?looks_like_number( '5.0' ); # True
Which means that all you have to do to start migrating is make sure that the majority of your Perl codebase is in modules and not in scripts.

Then you can migrate one module at a time.

You can even subclass Perl classes using this technology.

Basically you can use the old codebase to fill in the parts of the new codebase that you haven't transferred over yet.

---

By that same token you can transition from Python to Raku in much the same way. The module that handles that for Python isn't as featurefull as the one for Perl yet.

    use v6;

    {
        # load the interface module
        use Inline::Python;

        use base64:from<Python>;

        my $b64 = base64::b64encode('ABCD');

        say $b64;
        # Buf:0x<51 55 4A 44 52 41 3D 3D>

        say $b64.decode;
        # QUJDRA==
    }

    {
        # Raku wrapper around a native library
        use Base64::Native;

        my $b64 = base64-encode('ABCD');

        say $b64;
        # Buf[uint8]:0x<51 55 4A 44 52 41 3D 3D>

        say $b64.decode;
        # QUJDRA==
    }

    { 
        use MIME::Base64:from<Perl5>;

        my $b64 = encode_base64('ABCD');

        say $b64;
        # QUJDRA==
    }

    {
        use Inline::Ruby;
        use base64:from<Ruby>;

        # workaround for apparent missing feature in Inline::Ruby
        my \Base64 = EVAL 「Base64」, :lang<Ruby>;

        my $b64 = Base64.encode64('ABCD');

        say $b64;
        # «QUJDRA==
        # »:rb

        say ~$b64;
        # QUJDRA==
    }
I just used four different modules from four different languages, and for the most part it was fairly seamless. (Updates to the various `Inline` modules could make it even more seamless.)

So if I had to I could transition from any of those other languages above to Raku at my leisure.

Not like Python2 to Python3 where it has to mostly be all or nothing.


> That was me in a parallel thread. Java and JavaScript internally use UTF-16 encoding. I also mentioned C, which treats strings as byte arrays, and C++, which supports C strings as well as introducing a string class that is still just byte arrays.

C and C++ doesn't really have Unicode support, and most C and C++ applications don't support unicode. There are libraries that you need to use to get this kind of support.

> Has Go ever broken backwards compatibility? Let me clarify my second argument: if you are going to break backwards compatibility, you should do so in a minimal way that eases the pain of migration. The Python maintainers decided that breaking backwards compatibility meant throwing in the kitchen sink, succumbing to second system effect, and essentially forking the language for over a decade. The migration from Ruby 1.8 to 1.9 was less painful, though in fairness I suppose the migration from Perl 5 to Perl 6 was even more painful.

Go is only 10 years old Python is 31. And in fact it had some breaking changes for example in 1.4, 1.12. Those are easy to fix since they would show up during compilation. Python is a dynamic language and unless you use something like mypy you don't have that luxury.

Going back to python, what was broken in Python 2 is that str type could represent both text and bytes, and the difficulty was that most Python 2 applications are broken (yes they worked fine with ascii text but broke in interesting ways whenever unicode was used. You might say, so what, why should I care if I don't use Unicode. The problem was that mixing these two types and implicit casting that python 2 did made it extremely hard to write correct code even when you know what you're doing. With python 3 is no effort.

There is a good write up by one of Python developers why python 3 was necessary[1].

[1] https://snarky.ca/why-python-3-exists/


> Going back to python, what was broken in Python 2 is that str type could represent both text and bytes...

You know, it’s astounding to me that you managed to quote my entire point and still didn’t even bother to acknowledge it, let alone respond to it.

If they had to break backwards compatibility to fix string encoding, that’s fine and I get it. That doesn’t explain or justify breaking backwards compatibility in a dozen additional ways that have nothing to do with string encoding.

Are you going to address that point or just go on another irrelevant tangent?


There is no migration from Perl 5 to Perl 6, but mainly because Perl 6 has been renamed to Raku (https://raku.org using the #rakulang tag on social media).

That being sad, you can integrate Perl code in Raku (using the Inline::Perl5 module), and vice-versa.


Yes, that was the joke :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: