Hacker News new | comments | ask | show | jobs | submit login

More readable, and IMHO more idiomatic version: (Disclaimer: never tested)

    let mut idx = HashMap::new();
    for fname in &args {
      let f = match fs::File::open(fname) {
        Ok(f) => f,
        Err(e) => panic!("input file {} could not be opened: {}", fname, e),
      };
      let f = io::BufReader::new(f);
      let mut words = HashSet::new();
      for line in f.lines() {
        for w in line.unwrap().split_whitespace() {
          if words.insert(w.to_string()) { // new word seen
            idx.entry(w.to_string()).or_insert(Vec::new()).push(fname);
          }
        }
      }
    }
People who was introduced to the functional approach for the first time seems to enjoy it so much that everything becomes a hard-to-read mess of functions. I had similar experiences with Python list comprehension and C# LINQ.



You can have it all though - at least, you can in Scala, Rust ought to support something similar. Something like (hybrid syntax/pseudocode):

    (for {
      fname <- args
      f = fs::File::open(fname).orElse(
        e => panic!("input file {} could not be opened: {}", fname, e))
      r = io::BufReader::new(f)
      w <- f.lines().flatMap {
          line => line.unwrap().split_whitespace()
        } .distinct
    } yield Map(fname -> Vec(w.toString))).sum
Just as concise as yours (more concise even), but easier to reason about and safer to refactor. Possibly you might need to do the mutable version for performance if this was a hot loop, but you don't get to even ask that question until you've profiled it and found out that it was the bottleneck; 99.9% of the time it isn't.


That one looks to be more efficient too - all those `collect`s in the original are a tad concerning...


Do the "collects" mean that memory is allocated and the whole collection stored temporarily, or is this all iterative generators?

(I thought I knew Rust, but I haven't used it for a year, and its usage has changed a lot even if the core language hasn't.)


collect will ... collect it all into some kind of data structure, so yeah, that version would be doing a bunch of allocation.


In my taste, it's also more correct to write "for" in a code which is more like statements (not expressions), and only use map/fold for stuff which is pure.


Sorry, but I disagree. Although more verbose (as in more characters of source code), I could easily understand what the original code did. Your code has a high cyclomatic complexity and I have to keep all those nested for loops in mind when trying to figure out what you are doing.

I write C++ for most of my day job, and this would not pass code review because of readability problems in my team.


I like to use (even abuse) lazy iterators in C++ as the next guy, but IMHO the imperative version in this case is not only shorter but significantly more readable.

The two loops contain a single statement and it is trivial to see what's going on. It is also easier to extend. And it is still using lazy iterators (I assume) where it makes sense, i.e. in the split_whitespace call.


I cannot upvote you enough.


I know there are obviously lots of rooms for improvements. Indeed, if the code would become more complex I would immediately refactor. I can list some problems with my example:

- The error handling should really have been refactored. `try!` would make this easier. (I haven't used it since it is in the `main` function and the code was a direct replacement.)

- Error during reading words is not accurately handled. Again, `try!` would make this easier.

- `words` and `idx` look coupled to each other, which ideally shouldn't have been.

- `words.insert` is quite an opaque method; not everyone is sure if it returns true on a duplicate key or not. I've added a comment but frankly I'm not satisfied of that. <deleted> One alternative is to use `words.entry` instead, which gives a named enum variant. </deleted> Oops, `HashSet` does not have `entry`...

- Ultimately, the body of the outermost loop should go to a function.

That said, do you really think that `line.unwrap().split_whitespace().map(|w| w.to_string()).collect::<Vec<_>>().into_iter()` is a chunk of code which can be read at a glance? It at least has to be named (like T-R's example). Functional approach means that you can split functions and individually review them; the original code, IMHO, didn't.


I think the issue with procedural loops (not to be critical, just in general) is that there's no abstraction - it's not clear what's getting mutated, or what the result is (or its type), it's harder to look at the individual steps, and it's harder to refactor.

The nice thing about functions like "map", "filter", "fold", "flatmap", etc., is that they describe intent - when you see a "map", you know it's not aggregating things, just applying a function: "map" has a distinct purpose from "fold" (and depending on how pure things are, you may not even be allowed to do anything crazy).

Aside from that, the higher-order functions have pretty intuitive algebraic laws for refactoring (like with "compose" mentioned in my other comment) - It's not clear how you'd factor code out of a nested loop without understanding the whole thing, whereas "flatmap" (a.k.a. "bind" for the list monad) has laws for refactoring it.


I agree to you with the general sentiment. That said, Rust is not a functional language per se; the degree of functional decomposition is thus fundamentally limited. This is partly because it tries to be efficient, and as arielb1 pointed out, ownership sometimes makes it worse.

> It's not clear how you'd factor code out of a nested loop without understanding the whole thing, [...]

You are right, loops are particularly hard to refactor. I still argue that my code is better (barring any future expansion) because it fits within handful lines; you cannot easily understand 100 lines of code with the cyclomatic complexity of 1, but you can often easily understand 10 lines of code with the cyclomatic complexity of 8 (e.g. triple loops). The size of code, syntax and contextual information matters as much as algebraic laws.


That part is certainly extreme because of ownership. It can be refactored to

        let mut words = HashSet::new();
        for line in file.lines() {
            let line_words = line.unwrap().split_whitespace();
            words.extend(
                line_words.map(|s| s.to_string())
            );
        }
        words.into_iter().map(move |word| (word, f))


The "idomatic" version (I agree with that) simply works better. You can insert the typical `try!()` calls that short-circuit on error and return errors to the caller.

Retrofitting better error handling in the single expression version requires changing the shape of the whole thing, editing the expression throughout.


Definitely more readable to my eyes.


Yes, it is both much shorter and immediately readable and understandable (without comments!), unlike all the Functional Programming style versions which were proposed in the comments.


This is maybe true, given that Rust is intentionally multi-paradigm, but it's a pretty unfair comparison: My FP-style version deliberately made almost no changes to the original code aside from adding names - I even stated in the comment that I don't even know Rust (haven't touched it beyond knowing what "Affine types" means, and once having gotten someone else's "hello world" to compile for ARM). The imperative version you're comparing it to completely changes the semantics of the program, and is written by someone very familiar with the language.


Yes. Functional programming is really useful and powerful in some situations, but sometimes the old school imperative programming is what you need. There is a reason why all cooking recipes are written in an imperative style.


The upper map-full code is also badly formatted and has comments everywhere. It's true that chaining functional abstractions is tempting at first; but with a bit of balance:

    let words = fn (f, file) => {
            file.lines().flat_map( |line| { 
                line.unwrap()
                    .split_whitespace()
                    .map(|w| w.to_string())
                    .collect::<Vec<_>>()
                    .into_iter()
            })
            .collect::<HashSet<_>>()    // prune duplicates
            .into_iter()
            .map(move |word| (word, f)) // and emit inverted index entry
        }

    let idx = args.iter()
        .map(|fname     | (fname.as_str(), fs::File::open(fname.as_str()))) //+ (str, fd)
        .map(|(fname, f)| { f.and_then(|f| Ok((fname, f))).expect(&format!("input file {} could not be opened", fname)) })
        .map(|(fname, f)| (fname, io::BufReader::new(f)))                   //+ (str, buf)
        .flat_map(words)                                                    //+ {str}
        // absorb all entries into a vector of file names per word
        .fold(HashMap::new(), |mut idx, (word, f)| { idx
                                                     .entry(word)
                                                     .or_insert(Vec::new())
                                                     .push(f);
        });                                                                 //+ {word:[filename]}


I think it'd actually be good to show both to show how versatile Rust can be in this regard. I added it to the article with a link back here. Thanks!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: