

Ask HN: Does the idea of a "reduce-map" function make any sense? - dhs

Describing Google Search, I would say that it takes a formated string of words and returns a set of (links to) documents. I want do design a program that does the opposite: take an arbitrary text document (and maybe, at one time, a set of them), compute a formated string of words, and return that. This shall work by passing the text of the input document to a hierarchy of pattern matchers, looking for and extracting certain values and their relations. Next comes a function that takes these values and relations and knows how to distribute these data to the appropriate subfunctions, each of which computes and returns a substring; the parent then puts the substrings in the right order, formating them for the return.<p>I'm looking for a name for this concept, and wondered whether "reduce-map" would be a good one - maybe I could say that I reduce the document to a function, which returns a map (the formated string). To find out whether a "reduce-map" moniker had any currency, and if it did, in which context,  I googled <i>program "reduce-map"</i>:<p>http://www.google.com/search?hl=en&#38;q=program+%22reduce-map%22<p>But due to the fact that Google doesn't search for an exact string or substring even if you format it using doublequotes (which it does seem to promise; compare 'M.I.A.'s album "/\/\/\Y/\" is ungoogleable', http://news.ycombinator.com/item?id=1363489 ), what I got back was a slew of ordinary map-reduce tutorials.  So I still don't know whether "reduce-map" would mean to other people what I want it to mean. I would be thankful for your take on that.
======
rarestblog
Its really unclear what are you trying to achieve. My wild guess would be
you're trying to build a Markov chain generator (generation of random texts
from sample base text).

Other than that, here are unclear parts:

 _"to a hierarchy of pattern matchers, looking for and extracting certain
values and their relations"_

What are the "certain values" and how are they "related"?

 _"takes these values and relations and knows how to distribute these data to
the appropriate subfunctions"_

What are those "appropriate subfunctions", what do they do? How do they differ
for function to "know" where to send each one?

The concept of "reduce-map" doesn't seem to make any sense to me. It's like
taking a word frequency (output of MapReduce) and building original text with
that? You just don't have data to do that.

BTW. Google searches for doublequoted "reduce map" just fine, it's just that
there's no such thing as "reduce map". You collect some data and output it
piece by piece ("map"), then you aggregate it by key ("reduce"). "Map" often
works as a "splitter" or a "tokenizer", it won't make sense to supply
aggregated data ("reduced") to it, since aggregate data is already
"tokenized".

~~~
dhs
Thanks a lot. Your _"It's like taking a word frequency (output of MapReduce)
and building original text with that? You just don't have data to do that"_
made me understand what doesn't make sense to you. Your last paragraph made me
understand why it doesn't make sense to you. The weird thing is that, from my
point of view, your "you don't have the data" statement is not really true,
because in my model I have an oracle (more specifically, one or more human
authors) which _supplys_ the missing data (before compilation), so I can in
fact "build original text" from the input. Now I'm looking for a name which
describes that this happens. Any suggestions?

