

Go enjoy Python3 - crncosta
http://blog.surgut.co.uk/2015/08/go-enjoy-python3.html

======
crawshaw
There are several ways to solve this in Go. The first that comes to mind,
assuming you want to truncate to the first 12 runes, not bytes:

    
    
            func main() {
                v := []rune(os.Args[1])
                if len(v) > 12 {
                    v = v[:12]
                }
                fmt.Println(string(v))
            }
    

Or more in the spirit of the C example in the post:

    
    
            func main() {
                    res := make([]rune, 12)
                    copy(res, []rune(os.Args[1]))
                    fmt.Println(string(res))
            }
    

Note that res will stay on the stack, just like C.

I expect the author is trying to say something about Go that I'm not quite
getting. Perhaps that it is not an expression-based language, so to make code
readable you need to make use of multiple statements. That's by design, but I
understand it may be unappealing if you want to program in an expression-heavy
style.

~~~
jerf
"I expect the author is trying to say something about Go that I'm not quite
getting."

I assume "Go sucks because, look, this one weird case is a bit ugly." (that
is, as rhetoric, not dialectic; it is not literally claiming "one case bad" ->
"Go is bad" in the logical sense.) A weird case that I've programmed many
thousands of lines of Go code in but never once encountered. Taking a slice
out of a string blind like that is actually a bit rare; usually in some way it
turns out you actually have length information somewhere in the environment.
It's hardly like "slice index out of bounds" is some sort of terrible error...
it is, at least, arguable that Python is in the wrong here for being so
willing to return a string generated by [0:12] that is not 12 bytes/characters
in length, which seems like a reasonable assumption to make of such an
operation.

Now, if we want to talk about little examples like this, let's talk about
sending on something like a channel in Python, to say nothing of Python's
implementation of the "go" keyword... oh, yes, I see, suddenly this is an
unfair way to compare languages.

Yes, it is.

~~~
bsaul
This posts shows two very common issues that programmer have with the GO
language when they start using it (that includes me), especially since go is
advertised as compiled with the feeling of a dynamic language :

A low-level feeling when manipulating arrays (or slice), and a poor support
for generic functions ( that would be math.min in this example).

~~~
jerf
If it said that explicitly, I'd be fine with it.

But given the last paragraph, I don't think that's the most likely
interpretation.

And it's still a terrible way to judge languages without a _lot_ more context.
All langauges have gotchas that fit into 3-5 lines. Python's got a pretty
decent set:
[https://www.google.com/search?q=python%20gotchas](https://www.google.com/search?q=python%20gotchas)
It's still a good language.

And let me be very clear: I'm not "defending" Go here... I quite like both
Python and Go. I've got no trouble saying Python is incrementally easier than
Go when it comes to dealing with strings (but both are beat by Perl).
(Especially since the incremental advantage comes at a _stiff_ performance
price. Sometimes that's fine, sometimes that's not.) I'm specifically saying
as computer language polyglot, this _metric_ for measuring languages is
terrible. It's a rationalization, not a rational argument.

~~~
bsaul
I see your point, but after having coded a full (minor) project in Go, i can
assure you that those two points alone (cumbersome array data structure and
lack of generic code) made me rethink twice about using this language for the
common "web service for CRUD to DB" use.

Then i tried to see how did go data access layer libraries look and it
finished to convince me not to use it unless performance and memory usage were
a crucial matter.

------
masklinn
> Simple enough, in essence given first argument, print it up to length 12. As
> an added this also deals with unicode correctly

That's not true, Python 3 uses codepoint-based indexing but it will break if
combining characters are involved. For instance:

    
    
        > python3 test.py देवनागरीदेवनागरी
        देवनागरीदेवन
    

because there is no precombined version of the multi-codepoint grapheme
clusters so some of these 10 user-visible characters takes more than a single
you end up with 8 user-visible characters rather than the expected 10.

edit: the original version used the input string "ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ" where
clusters turn out to have precomposed versions after all. Replaced it by
devanāgarī repeated once (in the devanāgarī script)

~~~
hahainternet
That's a shame, it works as you'd expect in perl6:

    
    
      sub MAIN($s) { say $s.substr(0,12) }
    
      $ perl6 test.p6 ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ
      ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇ

~~~
masklinn
Turns out there are precomposed versions of these clusters, so your system
might just be using these.

Could you retry with the input "देवनागरीदेवनागरी"?

~~~
hahainternet
I'm not quite sure how to interpret the output as it doesn't render
particularly kindly in my terminal:

    
    
      sub MAIN($s) {
      	say "{$s.chars}: $s";
      	my $b =  $s.substr(0,12);
      	say "{$b.chars}: $b";
      }
    
      $ perl6 hn-test2.p6 देवनागरीदेवनागरी
      16: देवनागरीदेवनागरी
      12: देवनागरीदेवन

~~~
masklinn
So apparently perl6 is also "wrong" and operates on codepoints, your system
composed my original string and each (base, diacritic) pair was pasted as a
single precomposed character (I expect that if you try out the Python version
on your system you'll also get the "right" answer).

The new string is composed of 10 user-visible characters (5 character repeated
twice) but 16 codepoints (and this time I carefully checked that there was no
precomposed version):

    
    
        DEVANAGARI LETTER DA
        DEVANAGARI VOWEL SIGN E
        DEVANAGARI LETTER VA
        DEVANAGARI LETTER NA
        DEVANAGARI VOWEL SIGN AA
        DEVANAGARI LETTER GA
        DEVANAGARI LETTER RA
        DEVANAGARI VOWEL SIGN II
        DEVANAGARI LETTER DA
        DEVANAGARI VOWEL SIGN E
        DEVANAGARI LETTER VA
        DEVANAGARI LETTER NA
        DEVANAGARI VOWEL SIGN AA
        DEVANAGARI LETTER GA
        DEVANAGARI LETTER RA
        DEVANAGARI VOWEL SIGN II
    

Operating on codepoints, both versions cut after the second DEVANAGARI LETTER
NA (न) breaking that grapheme cluster (it should be ना) and not displaying the
final two clusters ग and री.

~~~
raiph
> I expect that if you try out the Python version on your system you'll also
> get the "right" answer.

I don't think so. In my tests standard python (2.7 and 3.5) ignores grapheme
clusters.

~~~
masklinn
Python ignores grapheme cluster, that point was about my original test case
using grapheme clusters I later found out had precomposed equivalent, so a
transfer chain performing NFC would leave the test case with no combining
characters (or multi-codepoint grapheme clusters) left in it.

~~~
raiph
Gotchya.

------
flohofwoe
Doesn't the C version have a serious bug? If the input string has 12 or more
characters, the destination string will not be zero-terminated.

From the strncpy docs:

"No null-character is implicitly appended at the end of destination if source
is longer than num. Thus, in this case, destination shall not be considered a
null terminated C string (reading it as such would overflow)."

~~~
ansible
I'm usually sticking +1s to the storage for any strings for this purpose. So
if I want to operate on MAXLEN number of characters, I'll allocation MAXLEN+1
for the character array.

And often times I'll be memset()'ing the destination to all NULLs when doing a
string copy operation. I'm not real happy with string handling in C... as if
that should be surprising to anyone.

Say, is there nice, small, suitable for embedded use string library anyone
would care to recommend in C? I just want a nice string type that carries
around its length and storage length, handles copies properly, and has the
usual utilities. I suppose I could just write one...

~~~
rch
You might look at the one from Redis:

[https://github.com/antirez/sds](https://github.com/antirez/sds)

~~~
ansible
That's interesting. Thanks for the link.

------
Ianvdl
The author awards some arbitrary points to C even though his implementation of
the solution is broken. His similarly poor Go implementation receives zero of
these arbitrary points.

Why does this deserve the attention of everyone here? The author did not
compare languages, he compared his aptitude with these languages, and
considered broken implementations to somehow be comparable.

A more meaningful comparison would be to implement simple, efficient,
_working_ solutions to these problems and comparing them. This, as it stands,
does not lead to any useful discussion.

------
BinaryIdiot
I'm not sure what the takeaway is from this blog entry. Is it that Python 3
can do substrings easier than the other languages therefore we should use
Python 3? That was what I thought it was, anyway.

Seems silly to pick a language based off this single, silly criteria otherwise
why not JavaScript or probably other languages that can make the code even
smaller?

console.log(mystring.substring(0, 12));

So it just seems arbitrary and weak in my opinion.

~~~
steeleduncan
The entire scenario seems to have been constructed to highlight the runtime
panic caused by out of bounds slices in Go. Either that or the well-known and
well-discussed lack of generics.

------
_kst_
There are at least three major flaws in the 7-line C program, even ignoring
character set issues. (main returns int, argv[1] can be null, and strncpy
doesn't always null-terminate the target). If you're going to compare
languages, you should find someone who knows each of them well.

------
Daishiman
The Unicode situation in most languages is dismal.

Honestly though, the lack of generics for that Math.min function makes me
happy I'm not programming in Go.

~~~
insertnickname

        if a > b {
            // use a
        } else {
            // use b
        }

~~~
ridiculous_fish
Oh dear. You had one job, min!

------
BossHogg
Article content aside, the slide out side menu that covers the scroll bar is
incredibly annoying. Is that Blogger? Whatever it is needs to stop. Now.

------
Sir_Cmpwn
The C code there fails if the unicode string includes characters whose width
is greater than one octet.

~~~
zokier
Which is noted right in the post:

> This treats things as byte-array instead of unicode, thus for unicode test
> it will end up printing just 車賈滑豈.

~~~
rakoo
Which is useless then, because the output can't safely be considered a string
anymore. I don't really see the point of writing the C "equivalent" and giving
it any point when it doesn't even do the right thing.

~~~
masklinn
None of the snippets comes even remotely close to doing the right thing so it
doesn't really matter.

------
darkstalker
Rust version:

    
    
        fn main()
        {
            if let Some(arg) = std::env::args().nth(1)
            {
                println!("{}", arg.chars().take(12).collect::<String>()); // chars() iteraters over codepoints
            }
        }

~~~
Veedrac
Idiomatic Rust would probably avoid allocations, which means something more
like

    
    
        fn main() {
            if let Some(arg) = std::env::args().nth(1) {
                println!("{}", {
                    match arg.char_indices().nth(12) {
                        Some((idx, _)) => &arg[..idx],
                        None => &*arg
                    }
                });
            }
        }
    

With the `unicode-segmentation` crate[1], you can just swap `char_indices()`
with `grapheme_indices(true)`.

[1] [https://crates.io/crates/unicode-
segmentation](https://crates.io/crates/unicode-segmentation)

------
Skunkleton
How is this on the front page of hacker news? What a shit post.

------
edofic
A mandatory smart-ass Haskell response

    
    
        import System.Environment (getArgs)
        main = do
          [str] <- getArgs
          putStrLn $ take 12 str

~~~
coldtea
Well, for smart-ass (and I know you meant it as a joke) is not very
impressive. Don't do anything more than the others, and the syntax is not so
great either.

~~~
Veedrac
On the contrary, his is the only one that crashes when more arguments than
expected are passed. Hooray progress!

------
_pmf_
Of course, the C version could be just

    
    
        printf("(%.12s)\n", argv[1]);

~~~
pjmlp
Assuming using 7 bit ASCII

~~~
_kst_
No, it merely assumes one byte per character. For example, it would work
correctly in Latin-1 or EBCDIC.

In any case, the problem statement (though it's a bit vague) requires building
a truncated string, not just printing it.

~~~
pjmlp
It is enough to have mixed 8 byte code pages and then it is worthless.

------
jackielii
why can't I downvote this!!! erhhhh

------
IshKebab
Now try distributing your Python code as a single statically linked exe.

~~~
PyComfy
[http://nuitka.net/pages/overview.html](http://nuitka.net/pages/overview.html)

------
chapium
Completely off topic, so if you are looking for discussion about the article
skip this.

The low contrast ratio and bright colors on this blog are a bit hard to read.
I normally switch to readability mode in safari when I encounter this, but the
sites layout prevents this from working.

~~~
jofer
The text is black on white... Am I missing something?

