Hacker News new | past | comments | ask | show | jobs | submit login
Map/Reduce for Mortals (billwadge.wordpress.com)
75 points by herodotus 4 days ago | hide | past | web | favorite | 25 comments





The conversation Bill's touching on -- the tradeoffs between different notations -- is really valuable but I think it misses something a lot of us desire in a syntax: compositionality. Take the provided "onion" notation, loops, and the "new" syntax. They all look something like this:

  ┌────────┬────────────────────┐
  │myreduce│┌─────┬────────────┐│
  │        ││mymap│┌────────┬─┐││
  │        ││     ││myfilter│x│││
  │        ││     │└────────┴─┘││
  │        │└─────┴────────────┘│
  └────────┴────────────────────┘
(The math one looks more like this:)

  ┌────────┬────────────┐
  │myreduce│┌────────┬─┐│
  │        ││mymap   │x││
  │        ││        │ ││
  │        ││myfilter│ ││
  │        │└────────┴─┘│
  └────────┴────────────┘
But none of them provide the same kind of "putting pieces together" feeling as this:

  ┌────────┬─────┬────────┬─┐
  │myreduce│mymap│myfilter│x│
  └────────┴─────┴────────┴─┘
which we see in the wild as this:

  (myreduce ∘ mymap ∘ myfilter)(x)
or this:

  x | myfilter | mymap | myreduce
or this:

  myreduce mymap myfilter x

I didn't spot Clojure here yet:

  (->> [1,2,3,4,5]
       (filter #(-> % (mod 3) (= 1)))
       (map    #(* % %))
       (reduce +))
#( ... ) is an anonymous function, arrows are threading macros: https://clojure.org/guides/threading_macros

If you wanted more trickery you could play with transducers and compose them with "comp".


List comperhension version without threading:

  (reduce +
    (for [n [1, 2, 3, 4, 5]
          :when (= 1 (mod n 3))]
      (* n n)))

> reduce(lambda t,x t+x,map(square,filter(lambda x: x % 3 == 1,[1,2,3,4,5])))

> Clear? As mud

> the lambdas and the functions obscure what is being computed.

Simply use J!

g =: +/ @: : @ (] * =&1@(3&|))

Or:

f =: +/ @: *: @ (#~ (=&1@(3&|)))


This already exists in Python in the form of comprehensions

  sum(x * x for x in [1,2,3,4,5] if x % 3 == 1)
I'm not familiar with Lucid but maybe evaluating the implementation of comprehensions could help solve some of the design challenges.

Might work, worth thinking about

Having pointfree functions and curried application makes the functional example more readable, as it removes the obscuring lambdas:

    (reduce (+) . map (^2) . filter ((==1).(%3))) [1,2,3,4,5]
The above is pseudo-Haskell. If we want to

In actual Haskell I would write it as

    sum [ n^2 | n <- [1..5], n % 3 == 1 ]
which seems to express the idea fairly directly and I'm fairly sure has a close analog in python.

python: sum([n * n for n in range(1,5,3)])

haskell: sum [n^2 | n <- [1,3..5]]

I just remembered haskell's "range" operator ".." can also take a step size.

['a', 'd' .. 'z'] => "adgjmpsvy"


yeah to me this just hollers for the kind of "pour it through the functions" approach we see in functional languages or even with Ramda.pipe in JS. Ezpz.

Or in functional rust

  (1..5)
    .into_iter()
    .filter(|i| i % 3 == 1)
    .map(|i| i * i)
    .sum();
I don't know if we really need a new syntax for map reduce.

That's also basically the same syntax as doing it in JS, aside from the lack of builtins:

    import { range } from 'ramda';

    range(1, 6)
      .filter(i => i % 3 === 1)
      .map(i => i * i)
      .reduce((a, b) => a + b, 0);
Or, if you're using the pipeline operator (experimental, stage 1, nobody can quite decide on exactly the syntax but this gives the rough idea):

    import { range, sum } from 'ramda';

    range(1, 6)
      |> x => x.filter(i => i % 3 === 1)
      |> x => x.map(i => i * i)
      |> sum

We can get rid of these arrows using ramda's curried list functions, at the expense of being slower and harder to read but making us feel more clever:

  pipe(
    range,
    filter(pipe(
      modulo(__, 3), // x => x % 3
      equals(1) // y => y === 1
    )),
    map(flip(Math.pow)(2)),
    sum
  )(1,6)

> "that’s as close as I can get in wordpress but you know what it looks like"

Um, I still don't know what it looks like. I know mod and there is a range of numbers it seems and an i to the power of two, but I really have no idea what the other symbols mean or if their position/layout in that fashion is important. I can probably reverse engineer what the mathematician writes by reading the code but I'm still confused as to why many articles about software assume a complete mathematics syntax knowledge.


Agreed, I think the syntax used is not that common. (This is a problem, I think, with math notation -- everyone's notation means different things, google "substitution notation history" for the worst.) They definitely meant to write:

https://latex.codecogs.com/gif.latex?\sum_{\substack{i\in[1,...

"sum of all i² where i is in [1,5] and i mod 3 = 1".


https://en.wikipedia.org/wiki/Element_(mathematics)#Notation...

I'm not sure what the third line is supposed to look like, but I guess it's a filter. So: Sum i^2 where i is in the set {1,2,3,4,5} and i % 3 = 1.


What exactly is this pyLucid? I searched for it and found a CMS and a generative video ML library. Don't see how either one connects with this.

https://github.com/jedie/PyLucid https://github.com/yelantingfeng/pyLucid


The author of the post is the co-inventor of the Lucid programming language. In an earlier blog posts he describes a Python interpreter for a version of Lucid that he named pyLucid.

Thanks. For anyone else reading, it may or may not be this https://code.google.com/archive/p/plucid/

List comprehension in Haskell is probably the best approach so far, imho. However once you're using these tools then map, reduce, filter should be easy to understand, so the terseness and clarity of it doesn't come without a cost.

let mod3 n = if n `mod` 3 == 1 then n else 0

sum [ i^2 | i <- [ mod3 n | n <- [1..5] ] ]


> sort of objects are ranges? In Python they would be iterators, and that works out fine. In Haskell they would be lists.

> I think the best choice is to make them vectors (arrays)

In Haskell they are lazy linked lists! Which in theory is more efficient than vectors (in terms of space).


this seems to be a syntax problem. The for loop is not really 'equivalent' in complexity to the functional solution. In Elixir, I'd do this:

   list
   |> Enum.reduce(0, fn ->
      x, acc when mod(x, 3) == 0 -> acc + x * x
      _, acc -> acc
   end)
But the presented functional solution throws more functions at it, which might look like this, which I think is quite clear.

    list
    |> Enum.filter(&(mod(&1, 3) == 0))
    |> Enum.map(&(&1 * &1))
    |> Enum.reduce(acc, &Kernel.+/2)  #better yet use Enum.sum

You might even want to use the Stream module. Beautiful Elixir.

Stream would technically better, however given the discussion is about Map/Reduce, the only thing Stream has in common with Map/Reduce is it's lazy. If you wanted something comparable (mapping is done in parallel, reducing as well just over partitions), then you'd want to use the Flow[1] library. As it does the same thing as Stream.map |> Enum.reduce just parallelized/partitioned, and what's great is the Flow module is more-or-less a drop in replacement for Enum/Stream (with a few caveats like calling Flow.partition before Flow.reduce). But, with just some quick a dirty benchmarks you can see Flow outperforms Stream on all but the smallest data set (range 1..100):

    with_stream = fn range ->
      range
      |> Stream.filter(&(rem(&1, 3) == 0))
      |> Stream.map(&(&1 * &1))
      |> Enum.reduce(0, &Kernel.+/2)
    end
    
    with_flow = fn range ->
      range
      |> Flow.from_enumerable()
      |> Flow.filter(&(rem(&1, 3) == 0))
      |> Flow.map(&(&1 * &1))
      |> Flow.partition()
      |> Flow.reduce(fn -> [0] end, fn val, [acc | _] ->
        [Kernel.+(val, acc)]
      end)
      |> Enum.sum()
    end
    
    iex(4)> Benchee.run(
    iex(4)>   %{"stream" => with_stream, "flow" => with_flow},
    iex(4)>   inputs: %{"small" => 1..100, "medium" => 1..10_000, "large" => 1..10_000_000}
    iex(4)> )
    Operating System: macOS
    CPU Information: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
    Number of Available Cores: 4
    Available memory: 8 GB
    Elixir 1.9.4
    Erlang 22.2.1
    
    Benchmark suite executing with the following configuration:
    warmup: 2 s
    time: 5 s
    memory time: 0 ns
    parallel: 1
    inputs: large, medium, small
    Estimated total run time: 42 s
    
    Benchmarking flow with input large...
    Benchmarking flow with input medium...
    Benchmarking flow with input small...
    Benchmarking stream with input large...
    Benchmarking stream with input medium...
    Benchmarking stream with input small...
    
    ##### With input large #####
    Name             ips        average  deviation         median         99th %
    flow          0.0994        10.06 s     ±0.00%        10.06 s        10.06 s
    stream        0.0782        12.78 s     ±0.00%        12.78 s        12.78 s
    
    Comparison:
    flow          0.0994
    stream        0.0782 - 1.27x slower +2.72 s
    
    ##### With input medium #####
    Name             ips        average  deviation         median         99th %
    flow           83.87       11.92 ms    ±20.48%       11.30 ms       25.53 ms
    stream         74.88       13.35 ms    ±32.02%       12.32 ms       30.22 ms
    
    Comparison:
    flow           83.87
    stream         74.88 - 1.12x slower +1.43 ms
    
    ##### With input small #####
    Name             ips        average  deviation         median         99th %
    stream        4.98 K        0.20 ms    ±87.16%       0.169 ms        0.56 ms
    flow          0.70 K        1.42 ms    ±21.58%        1.35 ms        2.52 ms
    
    Comparison:
    stream        4.98 K
    flow          0.70 K - 7.06x slower +1.22 ms
[1] https://hexdocs.pm/flow/Flow.html



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: