authors_of_long_books = set()
for book in books:
if len(book.pages) > 1000:
authors_of_long_books.add(book.author)
return authors_of_long_books
You are told explicitly at the beginning what the type of the result will be, you see that it's a single pass over books and that we're matching based on page count. There are no intermediate results to think about and no function call overhead.
When you read it out loud it's also it's natural, clear, and in the right order— "for each book if the book has more than 1000 pages add it to the set."
That isn't natural to anyone who is not intimately familiar with procedural programming. The language-natural phrasing would be "which of these books have more than thousand pages? Can you give me their authors?" -- which maps much closer to the parent's linq query than to your code.
That isn't natural to anyone who is not intimately familiar with procedural programming.
This is not about "procedural programming" - this is exactly how this works mentally. For kicks I just asked me 11-year old kid to write down names of all the books behind her desk (20-ish) of them and give me names of authors of books that are 200 pages or more. She "procedurally"
1. took a book
2. flipped to last page to see page count
3. wrote the name of the author if page count was more than 20
The procedural is natural, it is clear and it is in the right order
That's when you're doing the job, not what the mental representation of the solution. I strongly believe if you ask her to describe the task, she would go:
1. (Take the books)->(that have 200 pages or more)->(and mark down the name of the authors)->(only once)
I respectfully disagree. And I think one of the core reason SWEs struggle with functional-style of programming is that it is neither intuitive nor how general-joe-doe’s brain works.
I haven't really encountered software engineers who really struggle with functional style in almost 20 years of seeing it in mainstream languages. It's just another tool that one has to learn.
Even the people arguing against functional style are able to understand it.
Strangely, this argument is quite similar to arguments I encounter when someone wants to rid the codebase of all SQL and replace it with ORM calls.
Strangely, this argument is quite similar to arguments I encounter when someone wants to rid the codebase of all SQL and replace it with ORM calls.
we must be in completely different worlds cause I have yet (close to 30 years now hacking) to see/hear someone trying to introduce ORM on a project which did not start with the ORM to begin with. the opposite though is a constant, “how do we get rid of ORM” :)
I haven't really encountered software engineers who really struggle with functional style in almost 20 years. It's just another tool that one has to learn.
I recall vividly when Java 8 came out (not the greatest example but also perhaps not too bad) having to explain over and over concept of flatMap (wut is that fucking thing?) or even zipping two collections. even to this day I see a whole lot of devs (across several teams I work with) doing “procedural” handling of collections in for loops etc…
I'm more talking about projects that do start with an ORM, but have judicious (and correct) usage of inline SQL for certain parts. It's not uncommon to see developers spending weeks refactoring into an ORM-mess.
The argument is always that "junior developers won't know SQL".
But yeah I've also seen the opposite happening once. People going gung-ho on deleting all ORM code "because there's so much SQL already, why do we need an ORM then".
And then the argument is that "everyone knows SQL, the ORM is niche".
I guess it's a phase that all devs go through in the middle of their careers. They see a hammer and a screwdriver in a toolbox, and feel the need for throwing one away because "who needs more than one tool"...
You are describing how to execute the procedure, while the gp is describing what the result should be. Both are valuable, but they're very different.
My personal take is that "how to execute" is more useful for lower level and finer grained control, which "what the results should be" is better for wrangling complex logic
I tried scaling up the original into an intentionally convoluted nonsensical problem to see how a more complicated solution would look like for each approach. Do these look right? And which seems the most readable?
# Functional approach
var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = books
.filter(book =>
book.pageCount > 100 and
book.language == "Chinese" and
book.subject == "History" and
book.author.mentions > 10_000
)
.flatMap(book => book.author.pets)
.filter(pet => pet.is_furry)
.map(pet => pet.favoriteFood)
.distinct()
# Procedural approach
var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = set()
for book in books:
if len(book.pageCount > 100) and
book.language == "Chinese" and
book.subject == "History" and
book.author.mentions > 10_000:
for pet in book.author.pets:
if pet.is_furry:
favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory.add(pet.favoriteFood)
# Comprehension approach
var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = {
pet.favoriteFood for pet in
pets for pets in
[book.author.pets for book in
books if len(book.pageCount > 100) and
book.language == "Chinese" and
book.subject == "History" and
book.author.mentions > 10_000]
if pet.is_furry
}
FWIW, for more complex problems, I think the second one is the most readable.
I'm more partial to the first one because it keeps a linear flow downwards, and a uniform structure. The second one kind of drifts off, and reshuffling parts of it is going to be … annoying. IME the dot style lends itself much better to restructuring.
Depending on language you might also have some `.flat_map` option available to drop the `.reduce`.
True! Good point on the restructuring, I haven't thought about it in that way.
I think I like the second approach because the loop behavior seems clearest, which helps me analyze the time complexity or when I want to skim the code quickly.
A syntax like something below would be perfect for me if it existed:
var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = books[i].author.pets[j].favoriteFood.distinct()
where i = pagecount > 100,
language == "Chinese",
subject == "History",
author.mentions > 10_000
where j = is_furry == True
Hm, LINQ query syntax form is kinda going in that direction
(from book in books
where book.pagecount > 100
&& book.language == "Chinese"
&& book.subject == "History"
&& book.author.mentions > 10_000
from pet in book.author.pets
where pet.is_furry == true
select pet.favoriteFood)
.Distinct()
But it also demonstrates the...erm, chronic "halfassedness" of LINQ's query syntax form with distinct() not available there and having to fall back to method syntax form anyway...
You would likely approach it in any style with some helper functions once whatever's in the parentheses or ifs starts feeling big. E.g. in the dot style you could
fn bookFilter(book: Book) -> bool {
return book.pageCount > 100 and
book.language == "Chinese" and
book.subject == "History" and
book.author.mentions > 10_000
}
var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = books
.filter(bookFilter)
.flatMap(book => book.author.pets)
.filter(pet => pet.is_furry)
.map(pet => pet.favoriteFood)
.distinct()
I slightly prefer this style with such a long pipeline, because to me it’s now built from standard patterns with relatively simple and semantically meaningful descriptions of what fills their holes. Obviously there’s some subjective judgement involved with anything like this; for example, if the concept of an author being famous was a recurring one then I’d probably want it defined in one place like an `isFamous` function, but if this were the only place in the code that needed to make that decision, I might inline the comparison.
Without syntax highlighting, "book.author for book in books if book.page_count > 1000" requires a lot more effort to parse because white space like newlines is not being used to separate things out.
You've had some answers already, but I also think this is a good argument for syntax highlighting. With tools like tree-sitter it's pretty easy these days to get high quality syntax highlighting, which allows us humans to receive more information in parallel. A lot of the information we pick up in our daily lives is carried through color, and being colorblind is generally seen as a disability (albeit often a mild one which can be undetected for decades).
Syntax highlighting in print is more limited because of technological and economic constraints, which might leave just bold, italics and underlines on the table, while dropping color. On screens and especially in our editors where we see the most code, a lack of color is often a self-imposed limitation.
That's not the point though. If you need the syntax highlighting to quickly make out the structure, perhaps the visual layout is not as good as it could be.
Yeah, if you treat it as javascript vs python they're likely correct (I'm not that familiar with js). The article and original comment were about function vs imperative though, so I assumed half decent runtimes for both.
True, but now you're relying on a specific implementation and optimization of the compiler, unless the language semantics explicitly say that lambdas will be inlined.
This is true of literally anything and everything your compiler emits. In practice the functional style is much easier to optimize to a far greater degree than the imperative style.
> You are told explicitly at the beginning what the type of the result will be
I would argue that's a downside: you have to pick the appropriate data structure beforehand here, whereas .distinct() picks the data structure for you. If, in the future, someone comes up with a better way of producing a distinct set of things, the functional code gets that for free, but this code is locked into a particular way of doing things. Also, .distinct() tells you explicitly what you want, whereas the intention of set() is not as immediately obvious.
> There are no intermediate results to think about
I could argue that there aren't really intermediate results in my example either, depending on how you think about it. Are there intermediate results in the SQL query "SELECT DISTINCT Author FROM Books WHERE Books.PageCount > 1000"? Because that's very similar to how I mentally model the functional chain.
There are also intermediate results, or at least intermediate state, in your code: at any point in the loop, your set is in an intermediate state. It's not a big deal there either though: I'd argue you don't really think about that state either.
> and no function call overhead
That's entirely a language-specific thing, and volatile: new versions of a language may change how any of this stuff is implemented under the hood. It could be that "for ... in" happens to be a relatively expensive construct in some languages. You're probably right that the imperative code is slightly faster in most languages today, and if it has been shown via performance analysis that this particular code is a bottleneck, it makes sense to sacrifice readability in favor of performance. But it is a sacrifice in readability, and the current debate is over which is more readable in the first place.
> a single pass over books
Another detail that may or may not be true, and probably doesn't matter. The overhead of different forms of loops is just not what's determining the performance of almost any modern application. Also, my example could be a single pass if those methods were implemented in a lazy, "query builder" form instead of an immediately-evaluated form.
In fact, whether this query should be immediately evaluated is not necessarily this function's decision. It's nice to be able to write code that doesn't care about that. My example works the same for a wide variety of things that "books" could be, and the strategy to get the answer can be different depending on what it is. It's possible the result of this code is exactly the SQL I mentioned earlier, rather than an in-memory set. There are lots of benefits to saying what you want, instead of specifying exactly how you want it.
Set is a well defined container for unique values. It's much clearer what it is than some non-existent .distinct() function with no definition and unclear return value.
Procedural code in JS doesn't say how you want something done any more closely than the functional style variant. for-of is far more generic than .map/.filter() since .map() only works on Array shaped objects, and for-of works on all iterables, even generators, async generators, etc. In any case you're not saying how the iteration will happen with for-of, you're just saying that you want it. Implementation of Set is also the choice of a language runtime. You're just stating what type of container you want.
Sometimes functional style may be more readable, sometimes procedural style may.
When you read it out loud it's also it's natural, clear, and in the right order— "for each book if the book has more than 1000 pages add it to the set."