
Data munging in Perl 6 vs. Perl 5 - lelf
http://perl6advent.wordpress.com/2014/12/09/day-9-data-munging-in-perl-6-vs-perl-5/
======
petercooper
Perl was my go-to language for 8 years before Ruby so I fancied a comparison.
And.. Ruby is rarely as brief, but even with my Perl hat on, it seems much
easier to read, especially if you didn't write it:

Perl 6:

    
    
        say "  {.key}: {+.value} student{"s" if .value != 1}"
        for %grade.classify(*.value.comb[0]).sort(*.key);
    

Ruby:

    
    
        grades.group_by { |name, grade| grade[0] }
              .sort
              .each do |grade, names| 
          puts "  #{grade}: #{names.length} student#{'s' if names.length != 1}"
        end
    

Or if it had to be as 'functional' as the Perl..

    
    
        puts grades.group_by { |name, grade| grade[0] }
                   .map { |grade, names| "  #{grade}: #{names.length} student#{'s' if names.length != 1}" }
                   .sort.join("\n")
    

Both can be improved, of course! :-) I'd be keen to see a Python version
actually because I know so few of its idioms but it often does well with these
sorts of things as well..

~~~
WayneS
Agreed, the new perl looks like Ruby. And I am willing to bet Ruby is more
self consistent and easier to learn. (Says the perl 5 expert)

~~~
kbenson
In what ways do you not find Perl self consistent? There's lots or criticisms
that can be leveled at Perl, but I'm not sure I follow that one.

~~~
elektronjunge
A lot of the standard lib constructs behave completely differently in
different contexts. Often subtly. For instance,

    
    
        @globvals = glob("*"); # returns all matches to @globvals
    
        $glob1 = glob("*"); # returns the first match and stores it in $glob1
    
        $glob2 = glob("*.c"); # returns the second match of "*" and stores it in $glob2

~~~
zzzcpan
Consistency is about expectations. It is possible to check if caller wants an
array or a scalar and return either or a reference to either. And so nobody
expects a single way of returning things across all the functions. Therefore
such behavior is consistent.

~~~
elektronjunge
Having implicit shared state dependent on the requested type is certainly not
consistent. To a reasonably experienced perl programmer one would expect the
following:

    
    
        @globs = glob("*") # all matches
        $glob1 = glob("*") # the first match of "*"
        $glob2 = glob("*.c") # the first match of "*.c"

~~~
kbenson
I agree that's the behavior you would want for that example. That is almost
the behavior you get when use use File::Glob's bsd_glob function. For some
reason it instead appears to return the last alphanumerically sorted match
instead of the first. I've submitted a bug about that as well as the general
lack of documentation on the true behavior of the CORE::glob() function, as
outlined in my other comment.

~~~
zzzcpan
I disagree. You would expect what it says in documentation, and it says it
should iterate over the list.

If you want the first element of the list, you should say exactly that to the
compiler, i.e. force list context and take the first element:

    
    
        ($file1) = glob("*");

~~~
kbenson
> I disagree

I assume because you misread the example given there. If you note that $glob1
and $glob2 come from what should be different lists (all files in working dir
and all files ending in .c in the working dir), then $glob1 and $glob2 should
contain the first item each of their respective lists. That's exactly what the
documentation says should happen.

> You would expect what it says in documentation, and it says it should
> iterate over the list.

Unfortunately it doesn't even do what it says in the documentation
consistently. I have another comment here that outlines that fairly
thoroughly. The behavior is very weird and specific to glob, and is not
documented accurately.

~~~
zzzcpan
I guess I misread the example, sorry. I thought we were talking in the context
of inconsistent behavior of Perl as a language and therefore its syntax, but
not in the context of an undefined behavior of glob().

~~~
kbenson
We are and we aren't. elektronjunge chose an example that definitely is
inconsistent, but I'm not sure that it's really indicative of Perl in general.
IMHO, It's a fairly specific kind o broken, where we can't really fix the
behavior because of backwards compatibility, but the documentation is just
plain inadequate in this case as well.

------
cletus
I don't mean to be snarky but... Ok, sorry I'm going to be snarky.

In all honesty, does anyone even care about Perl 6 at this point? It's become
the Duke Nukem Forever of programming languages as far as I can tell.

Back in the 90s I remember the first CGI scripts I wrote were in Perl but
really... the world has moved on.

~~~
mercurial
Minus the snarkiness, that's not a bad question. The linked article doesn't
showcase most of the interesting stuff in Perl 6 like optional typing,
junctions or macros, but there are some cool things about it.

That said, I think Python has eaten Perl's lunch, and who knows how things
will evolve with the proliferation of statically-typed, non-verbose, fast
functional languages? Personally, I'm not going back to dynamic languages as
my primary workhorse.

~~~
hyp0
Haskell and ocaml fit the bill, but aren't new... which ones do you mean?

~~~
mercurial
The difference is that Haskell has gotten way more popular in the last few
years, but you can take for example Scala or F#.

~~~
hyp0
OK, I thought you meant there were many more such languages - but
"proliferated" means spread, so my mistake, yours is valid use.

Yes, Haskell is better known, but not proliferated in real-world usage (though
_jq_ was based on it). F# and Scala aren't seeing much adoption.

I do think type inference will be borrowed from them (and pr many other
features), but not _pure_ fp (it makes some tasks unnecessarily difficult).
The growth of dynamic typed languages suggests to me that developers value
ease of development over almost everthing.

------
mercurial
It's definitely leaning toward the functional side, but some things look very
odd (like calling "filename".IO...). The new operators leave me dubious (eg,
'*' to create a one-argument lambda function). And '»'? At least the
equivalent '>>' is apparently available, but damn...

~~~
delluminatus
The * operator made me laugh out loud. Only Perl would have an operator called
a "Whatever star".

Although, I actually think a lot of the stuff Perl does is really cool. I
wouldn't want to have to maintain a codebase of it, but it never fails to be
interesting. Like the whatever star. "*.value" basically saying, "Whatever
value is here, gimme the value of it". It's different from a typical lambda
declaration, but intuitive in its own way.

~~~
munificent
> It's different from a typical lambda declaration, but intuitive in its own
> way.

Isn't it pretty similar to Scala's use of "_"? I think _.value creates a one-
parameter lambda in Scala.

~~~
tormeh
Yeah, but it would be neat if it was a word instead, for example "that". "*"
is hilarious, though. How do you multiply in Perl if the star is taken? I also
like that it's named "whatever star", reminds me of Scala's Pimp my Library
feature[0]

0:
[http://www.artima.com/weblogs/viewpost.jsp?thread=179766](http://www.artima.com/weblogs/viewpost.jsp?thread=179766)

~~~
cygx
An asterisk is only a Whatever if it's in term position:

    
    
        (* * *)(2, 3)
    

will happily print 6.

~~~
SwellJoe
I didn't find the whatever star funny until this point.

~~~
pwr22
Whatever ;)

------
Mithaldu
To be clear here, that Perl 5 code is not production grade code and i'd send
anyone trying to commit that to a repo back to the hash mines.

~~~
bglazer
Could you elaborate a little? I ask because the Perl 5 code looks a lot like
the Perl that I write. I'd like to improve, but the existing Perl code that
I'm writing on top of is really bad. Like really, really bad. I don't have
much quality Perl to compare against.

~~~
Mithaldu
Two examples:

    
    
        $freq{substr $grade{$_}, 0, 1}++ for keys %grade;
    

better:

    
    
        $freq{$_}++ for map { substr $_, 0, 1 } values %grade;
    

best:

    
    
        my @letters = map { substr $_, 0, 1 } values %grade;
        $freq{$_}++ for @letters;
    

\----------------------------

and

    
    
        open my $fh, '<:utf8', "grades.txt"
            or die "Failed to open file: $!";
        
        my %grade;
        while (<$fh>) {
            m/^(\w+) \s+ ([A-F][+-]?)$/x
            or die "Can't parse line '$_'";
            $grade{$1} = $2;
        };
    

better:

    
    
        use 5.010;
        use IO::All -binary, -utf8;
        
        my $line_format = qr/
            ^               	    # line start
            (?<student>\w+)         # name
            \s+                     # gap
            (?<grade>[A-F][+-]?)    # grade
            $                       # line end
        /x;
        
        my $fh = io->file("grades.txt");
        my %grade;
        while (my $line = $io->getline) {
            die "Can't parse line '$line'" if $line !~ $line_format;
            $grade{$+{student}} = $+{grade};
        }

~~~
pjfl
You should have use strict; use warnings; then you would have noticed the my
$fh and the $io->getline error

~~~
Mithaldu
Dammit, you're right. I was working in a buffer without perl instrumentation
enabled because i was composing a comment, not a script. Now i can't edit the
comment anymore. :/

------
themckman
Gosh...I want my next job to be in Perl...it just looks like such a different
language. It makes me smile every time I read it and read about it.

------
jrochkind1
Question which is meant to be an honest question and not a troll, I really
want to know and accept that some might have reasonable answers:

Why would you use Perl6 instead of ruby?

~~~
debacle
Speed. Always speed. Perl is still faster than every other scripting language
out there.

~~~
Ultimatt
Not true for Perl6 at the moment. Perl6 is an entirely different language with
its own implementation. At the moment it will be either on par or slower than
Ruby. For some select numerical operations Perl6 is faster than Perl5. But for
the rest it's between 100x and 10x slower than Perl5. This is what is being
worked on now up to the official release of 6.0.0 though.

------
Skunkleton
I feel like python provides a more readable solution.

    
    
        #!/usr/bin/env python3
        with open('data.txt', 'r') as f:
            data = dict(x.split() for x in f if x.strip())
            print("Zsófia's grade:", data["Zsófia"])
            print("List of students with a failing grade:\n  ",end="")
            print(*[x for x in data.keys() if data[x] == 'F'], sep=", ")
            print("Distribution of grades by letter:")
            for k,g in groupby(sorted(data.values()), lambda x: x[0]):
                print("  {}:".format(k), sum(1 for _ in g))

~~~
viraptor
A working version with some stuff improved - groupby implemented, len()
instead of sum(), a less squished version of printing, "student(s)" added:

    
    
        #!/usr/bin/env python3
    
        from collections import defaultdict
    
        def invert_dict(data):
            res = defaultdict(list)
            for k, v in data.items():
                res[v[0]].append(k)
            return res
    
        with open('data.txt', 'r') as f:
            data = dict(x.split() for x in f if x.strip())
    
        print("Zsófia's grade:", data["Zsófia"])
        print("List of students with a failing grade:")
        print("  " ,end="")
        print(*[x for x in data.keys() if data[x] == 'F'], sep=", ")
        print("Distribution of grades by letter:")
        grouped = invert_dict(data)
        for k, g in sorted(grouped.items()):
            print("  {}: {} student{}".format(k, len(g), 's' if len(g) > 1 else ''))

~~~
Veedrac
No need for `invert_dict`; you're just counting so `collections.Counter` would
work fine:

    
    
        from collections import Counter
    
        with open('data.txt', 'r') as datafile:
            students = dict(line.rsplit(maxsplit=1) for line in datafile)
    
        failing = [name for name, grade in students.items() if grade >= 'E']
        grade_counts = Counter(grade[0] for grade in students.values())
    
        print("Zsófia's grade:", students["Zsófia"])
    
        print("List of students with a failing grade:")
        print("  " + ", ".join(failing))
    
        print("Distribution of grades by letter:")
        for grade, count in sorted(grade_counts.items()):
            print("  {}: {} student{}".format(grade, count, 's' * (count != 1)))

------
sigzero
I don't think the P6 is more readable myself.

------
discardorama
> "and is joined by the new < > variant for literal strings."

... why? What was wrong with {} or [] , as most other languages do? Sigh.

~~~
masak
The postcircumfix {} is still there, and used. It just unambiguously takes an
expression, and doesn't try to intuit string quotes in some cases. So you use
{} for indexing with a general expression, and <> for string indexing.

~~~
discardorama
I dunno, this sounds like syntactic bloat to me.

~~~
elektronjunge
Welcome to perl

