Hacker News new | past | comments | ask | show | jobs | submit login
List comprehensions across languages (2007) (langexplr.blogspot.com)
31 points by tosh 59 days ago | hide | past | web | favorite | 35 comments



For whatever it's worth, here are Prolog implementations of these examples (the second one is specific to SWI-Prolog, there is no standardized file system library, I believe). Programming in pure Prolog is a bit like programming only in terms of the constraints inside a list comprehension.

These implementations don't actually produce lists but enumerate answers one by one, using backtracking. If you don't need all the answers, you just stop the enumeration whenever you like, and later elements are not computed. This is similar to lazy evaluation, and also works if there is an infinite sequence of answers. If you do want a list of all (finitely many) solutions, you would use one of several standard predicates (findall/3, setof/3, bagof/3) on the backtracking version. But this is needed less often than beginners think.

    list_evenmember(List, EvenMember) :-
        member(EvenMember, List),
        EvenMember mod 2 =:= 0.

    directory_minsize_file(Directory, MinSize, File) :-
        directory_files(Directory, Files),
        member(File, Files),
        size_file(File, Size),
        Size > MinSize.

    abcd(A, B, C, D) :-
        member(A, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
        member(B, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
        member(C, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
        member(D, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
        A =\= 0,
        (1000 * A + 100 * B + 10 * C + D) * 4 =:= 1000 * D + 100 * C + 10 * B + A.


I have never understood the benefits of making syntactic sugar for lists in languages having higher order functions. Python kind-of gets away with it, but I still don't get the benefits compared to using map and filter.


Because for more advanced cases requires flatMap instead of just map and filter. You can only nested as a callback hell instead of streamlined syntactic sugar.

Take an example in 'Learn You a Haskell':

  [ (a,b,c) | c <- [1..10], b <- [1..c], a <- [1..b], a^2 + b^2 == c^2]
b depends on c while a depends on b. You do need a Monadic operation instead of Functor/Applicative to do it because of the dependency in the contexts.

And then you can think of solving the eight queen problem[0]. It's far more readable if you have a syntactic sugar, I just found a Scala example: [1].

[0] https://en.wikipedia.org/wiki/Eight_queens_puzzle [1] https://stackoverflow.com/questions/42789854/n-queens-in-sca...


I think a huge benefit is the readability of more complex expressions (e.g. when you start introducing zip, groupBy, folds etc into the mix).


If you want to show some code examples of list comprehensions, I'd be glad to show the equivalent without list-comprehensions in JS (which had a rejected proposal for list-comphensions https://www.reddit.com/r/javascript/comments/5eottt/why_the_..., no regret though), so we can compare the readability


I'll show an example in C#, which has some of the best list comprehension syntax in my opinion (LINQ):

  from category in categories
  where category.Name.Contains("food")
  join product in products on product.CategoryID equals category.ID
  select new {
      Name = category.Name,
      MostExpensive = 
        (from categoryProduct in productGroup
         orderby categoryProduct.Price descending
         select categoryProduct)
        .First()
  }
In C#, with HOFs, this becomes:

  categories
   .Where(
      category => category.Name.Contains("food"))
   .GroupJoin(products, 
      category => category.ID, 
      product => product.CategoryID, 
      (category, productGroup) => new {
         Name = category.Name,
         MostExpensive = 
           productGroup.OrderByDescending(
              product => product.Price)
           .First()
         }
      )
The second higher-oder function form has 2 disadvantages that I see: (1) for every subclause, it must mention the lambda parameter again (with more filtering/ordering/mapping this becomes much more noticeable); (2) it is easier to mess up the order of parameters, especially for something like Join/GroupJoin, which takes 5 parameters.


It's probably this in JS:

  categories
     .filter(
        category => category.Name.includes('food'))
     .flatMap(category => {
       const productGroup = products.filter(product => product.CategoryID === category.ID);

       return productGroup.length === 0 ? [] : [{
         Name: category.Name,
         MostExpensive: products.reduce((a, b) => a.Price < b.Price ? b : a)
       }];
     })


How about finding triples a<b<c<30 of pairwise relatively prime integers such that the sum of their cubes is a perfect cube? Here's a list comprehension in Python (notice the weird syntax Python uses for "let x = y", namely "for x in [y]" :P):

    [(a,b,c) for a in range(1,30)
             for b in range(a+1,30)
             if gcd(a,b)==1
             for c in range(b+1,30)
             if gcd(a,c)==1 and gcd(b,c)==1
             for d in [int((a**3+b**3+c**3)**(1.0/3)+0.5)]
             if a**3+b**3+c**3==d**3]
I doubt that would look any nicer with higher order functions. Specially if you want to get the same efficiency: notice how here we test that gcd(a,b)==1 as soon as possible. In general efficient comprehensions involve pushing any filtering as early as possible, so with higher order functions you get an alternation of filter with map/flatmap.

Here's a (probably lame) attempt at a Python higher order version:

    flatmap = lambda f, l: [x for t in l for x in f(t)]
    
    flatmap(lambda a: flatmap(lambda b: flatmap(lambda c: (lambda d: [(a,b,c)] if a**3+b**3+c**3==d**3 else [])(int((a**3+b**3+c**3)**(1.0/3)+0.5)) if gcd(a,c)==1 and gcd(b,c)==1 else [], range(b+1, 30)) if gcd(a,b)==1 else [], range(a+1,30)), range(1,30))
    
Maybe it would look nicer with some filters thrown in instead of abusing flatmap of empty lists, but it would be longer since you definitely need a flatmap for a and b, and either a flatmap or map for c.

I'd love to see a cleaner version with higher order function.


The same sensibilities apply in Haskell. These sort of longer, more structured problems are exactly when do notation comes out

    do a <- [1 .. 29]
       b <- [a+1 .. 29]
       guard $ gcd a b == 1
       c <- [b+1 .. 29]
       guard $ gcd a c == 1
       guard $ gcd b c == 1
       let z = a ** 3 + b ** 3 + c ** 3
       let d = z ** (1/3) + 0.5
       guard $ z == d ** 3
       return (a, b, c)
Translation of this into HOFs is straightforward (the method guard uses helps tremendously), but it's not worth it most of the time. In Python where HOFs are so restricted it would be even worse.


I agree.

Also as minor nitpick, I doubt your Haskell typechecks (haven't tried it): you forgot to take the floor of d.


Probably, I didn't check it.

I think the actual answer is messier: fromIntegral on a, b, and c.


How about finding triples a<b<c<30 of pairwise relatively prime integers such that the sum of their cubes is a perfect cube?

I'd have to ask why you'd want to do that, it seems more like the problem was fit to the solution you wanted to write rather than to fulfill some piratical purpose. I'm not going to contest that there's a lot of business logic that gets more complicated than that, but they are not described so mathematically elegantly. There's no way you could tell what your code does, or why, without a comment at the top describing why. And you can do that with your example. But real business logic comes from the fact that Bob in accounting entered the wrong job codes for two months, but they can't be changed because they've already been archived, and Karen in sales told one of her customers the job code would default to the wrong value if they entered 42 as the compliance code, then the CEO said we should make that the default from now on, but we realized 2 years later that it messed some other stuff up and our manager had it changed without telling anyone, but Frank the DBA already added a flag for that, but it stopped getting updated when he left 6 months ago.

In those real world situations, I doubt anyone would question that very long winded individual statements with a description for each one is way more easy to understand what's going on rather than the most concise code possible.


I am sure you are right about the real world, I wouldn't know anything about it. :)

I'm a mathematician and I do use a computer whenever I can. In that domain the problem I gave is a reasonable toy model of the code I sometimes write. In particular mathematicians like comprehensions because, as I'm sure you are aware, programmers borrowed that notation from us.


Here's the flatmap version with better formatting:

    flatmap(lambda a:
            flatmap(lambda b:
                    flatmap(lambda c:
                            (lambda d: [(a,b,c)] if a**3+b**3+c**3==d**3 else [])(
                                int((a**3+b**3+c**3)**(1.0/3)+0.5))
                            if gcd(a,c)==1 and gcd(b,c)==1 else [],
                            range(b+1, 30)) if gcd(a,b)==1 else [],
                    range(a+1,30)),
            range(1,30))


the a:b notation for JS is still in proposal phase https://github.com/tc39/proposal-slice-notation/pull/32, but I'll use it below to create ranges

I don't understand your `+0.5`, I've written below the transcription for your description

    [...1:30].map(a => 
      [...a+1:30].filter(b => gcd(a,b)==1).map(b => 
        [...b+1:30].filter(c => gcd(a,c)==1
          && gcd(b,c)==1 
          && Number.isInteger((a**3+b**3+c**3)**(1/3))
        .map(c => [a, b, c])
      )
    ).flat(2)
Or another equivalent way:

    [...1:30].flatMap(a => 
      [...a+1:30].filter(b => gcd(a,b)==1).flatMap(b => 
        [...b+1:30].filter(c => gcd(a,c)==1 
          && gcd(b,c)==1 
          && Number.isInteger((a**3+b**3+c**3)**(1/3)))
        .flatMap(c => [[a, b, c]])
Or another way (generating first all the triples):

    [...1:30].flatMap(a => [...a+1:30].flatMap(b => [...b+1:30].flatMap(c => [[a, b, c]])))
      .filter(([a, b, c]) => gcd(a,b)==1 && gcd(a,c)==1
                          && gcd(b,c)==1 && Number.isInteger((a**3+b**3+c**3)**(1/3)))
with https://github.com/tc39/proposal-iterator-helpers, which would allow lazy evaluation, and make more sense than above

    (1:30).flatMap(a => (a+1:30).flatMap(b => (b+1:30).flatMap(c => [[a, b, c]])))
      .filter(([a, b, c]) => gcd(a,b)==1 && gcd(a,c)==1
                          && gcd(b,c)==1 && Number.isInteger((a**3+b**3+c**3)**(1/3)))
without the slice-notation, it's really ugly (hence the proposal in the first link)

    Array.from({length: 30},(_,a)=>a+1)
      .flatMap(a=>Array.from({length: 30-a-1},(_,b)=>a+b+1)
        .flatMap(b => Array.from({length: 30-b-1},(_,c)=>b+c+1).flatMap(c => [[a, b, c]])))
    .filter(([a, b, c]) => gcd(a,b)==1 && gcd(a,c)==1
                          && gcd(b,c)==1 && Number.isInteger((a**3+b**3+c**3)**(1/3)))


Now that I think of it, even the flatMap version builds a lot of unnecessary arrays. The comprehension isn't just more readable, it's also more efficient. (Unless you're using a fancy compiler like GHC that does deforestation, in which case the flatMap/filter version should be as efficient as the list comprehension.)


Those 2 solutions below, that currently work in JS, are equivalent to the python solution in terms of complexity

But, yes I'm not satisfied in terms of readability, I linked to a proposal for iteration helpers, which would do this lazy-evaluation, I don't know if it can do it deeply like you say, interesting thing to see

    function gcd(a, b) {
      return b ? gcd(b, a % b) : Math.abs(a);
    }

    function isInteger(n) {
      return Math.abs(Math.floor(n + .5) - n) <= Number.EPSILON;
    }

    const range = (start, end) => Array.from({length: end-start}, (_,i) => start+i)

    console.log(
      range(1, 30)
      .flatMap(a => range(a+1, 30).filter(b => gcd(a,b)==1)
        .flatMap(b => range(b+1, 30).filter(c => gcd(a,c)==1 && gcd(b,c)==1
                  && isInteger((a**3+b**3+c**3)**(1/3)))
          .flatMap(c => [[a, b, c]])
        )
      )
    )

    function* tripleSortedRelPrimeCubes(start, end) {
      for (let a=start; a<end; a++) {
        for (let b=a+1; b<end; b++) {
          for (let c=b+1; c<end; c++) {
            if (
              gcd(a,b)==1
              && gcd(a,c)==1
              && gcd(b,c)==1
              && isInteger((a**3+b**3+c**3)**(1/3))
            )
              yield [a, b, c];
          }
        }
      }
    }

    console.log(
      [...tripleSortedRelPrimeCubes(1, 30)]
    )


I think this looks about as clean as I expected, I definitely like the look of the list comprehension better. (But of course, this is just a superficial syntactic matter: there is no essential difference.)

Does JavaScript not come with a .flatmap or similar method? I'm not sure I like having to count the nesting to pass the appropriate number to .flat. Also building up a nested list and then flattening is less efficient than the comprehension which builds the flattened version directly.

The +0.5 is to deal with floating point. Sadly the cube root isn't always exact enough to be an integer when it should, so I used floor(cube root + 0.5) to round to the near integer and then checked if that rounded integer really was the cube root.

Have you tried running this? Is Number.isInteger lenient enough to actually find the solution?


Once you have lazy iterators (eg: generator expressions in Python) you can split your complex iterator operations into multiple lines without losing on performance nor conciseness, and increase readability over even list compréhensions.


I'd say we should have a proper looping facility that allows for building lists without the fuzz of having special syntax. Common lisp has one. The ITER macro in CL has is user-extensible, as are racket's for-loops.

For non-lazy languages, you either need some sort of loop fusion or a construct like transducers to get similar functionality.


Common Lisp's iter or loop macros are special syntax for, among other things, building lists. I think maybe you should have said instead that it's nice when a language let's you add such special syntax in libraries. :)


That is not quite what I want to get across. I think python's list comprehensions are great, but I think they are a shit replacement for a proper looping facility that does everything python's list comprehensions do and more.

Racket's for loops are also macros that expand to tail recursive procedures within a letrec, but they are "standard syntax" (in the sense that they are included in the base library) just as loop is.


Well, Python does have a looping facility that can do everything it's comprehensions can do and more. It's just more special syntax and looks somewhat different from the comprehensions (though they do share keywords where appropriate).

I mean, I too prefer loop or iter but Python isn't looping-poor.


It can't return a list :)

It is as if someone said "wow! Returning something from a loop is useful", but instead of going all the way they added special syntax for it.


Some LOOP implementations are user-extensible, too.


You made me read sbcls loop implementation. 1900 lines of macro code :)


In SBCL one could use ADD-LOOP-PATH...

see also here:

https://github.com/sshirokov/CLSQL/blob/master/sql/loop-exte...


What you get is multiple bound variables. Using only map and filter it would be harder to do i.e.:

[(x*y) for x in range(0,5) for y in range(0,x)]


To get multiple variables you "just" have to nest lambdas. In Haskell, for instance:

    Prelude> concatMap (\x -> concatMap (\y -> [x * y]) [0..x]) [0..5]
    [0,0,1,0,2,4,0,3,6,9,0,4,8,12,16,0,5,10,15,20,25]

    Prelude> [x * y | x <- [0..5], y <- [0..x]]
    [0,0,1,0,2,4,0,3,6,9,0,4,8,12,16,0,5,10,15,20,25]

I'd take the latter over the former any day. And yes, I did get an obscure type error on my first attempt at the concatMap version because I forgot to bracket x * y.


I'd replace the inner concatMap (\y -> [x+y]) with a simple map (\y -> x+y), or, if I'm honest, with map (x+).

Here's hoping HN doesn't mess up the asterisks...

EDIT: I've replaced all asterisks with plus signs, because I couldn't figure out how to protect them from HN.


Yeah, I realized that later. It's still more obscure than the version with the list comprehension. In particular, the place where x is introduced is at the very beginning, but the values it can be bound to are at the end of the expression, and that is much less clear than "x <- [0..5]".

The rule for asterisks is that they must be followed by whitespace to show up as asterisks, so you just need to format your code nicely :-) x * y works.


What's kinda missing in the whole article is that the member expression is actually map. The example is {x | generator(x) & filter(x) & filter(x)} while you could actually {f(x) | generator(x) ...} which is basically map(f,xs). The shown languages do support expressions in this component.


I don't think map is sufficient, apart from the trivial example where you bind a single generator. Even in python, these behave as bind?

Or I didn't understand what you mean :)


The generator for X is shown and a filter is shown but not that you can put any expression for X instead of just X. And this corresponds to map while the other part, as shown corresponds to filter. So the full power is missing when you don't put an expression.


pet peeve, but at least the C# examples (and probably others as well) can IMHO be written more elegantly:

Example 1:

    static IEnumerable<int> GetEvenNumbers(IEnumerable<int> l) => l.Where(x => x%2 == 0)

Example 2:

    static IEnumerable<FileInfo> GetFilesGreaterThan(int size,string directory)
    {
      DirectoryInfo dirInfo = new DirectoryInfo(directory);
      return dirInfo.GetFiles().Where(x => x.Length >= size)
    }


Example 3:

    static void SolveProblem()
    {
        var results = 
        from a in Enumerable.Range(0,9)
        from b in Enumerable.Range(0,9)
        from c in Enumerable.Range(0,9)
        from d in Enumerable.Range(0,9)
        where (1000*a + 100*b + 10*c + d)*4 == 
            (1000*d + 100*c + 10*b + a) 
            && a != 0
        select new {A=a,B=b,C=c,D=d};

        foreach(var r in results) {
            Console.WriteLine($"A={r.A} B={r.B} C={r.C} D={r.D}");
        }
    }
One important point is not to confuse List and IEnumerable of .NET. What is called "lists" in many languages are actually more akin to IEnumerable in .NET




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: