Teeing, a hidden gem in the Java API

Laremere · on May 10, 2021

I prefer imperative code over collector like APIs for two reasons, both visible in this example:

1. The intention of the code is obscured by more layers of abstraction, increasing complexity.

2. Small changes to what the code is conceptually doing tend to lead to larger changes to the actual code than with imperative code.

For the first point, reading through this post and the previous post linked, the code is doing the following:

Taking the entries of a map of product to count. Turning those entries into a different class, a row of each product and count. Then it collects this list on one hand. On the other hand, it sums the total, based on the per product cost and the count of the product in the cart. Then it combines the row list and the total cost into one object, returning it.

Deconstructed, half of this code is just unnecessary complexity. A map of products to count, and a list of unique product/count tuples are theoretically identical. You can iterate over them, find specific products, etc. There /might/ be a reason to specifically desire a list; However why would that code be coupled with code to sum the cost of the cart?

All in all, why is the code not just:

  public BigDecimal sumPrice(Cart cart) {
    BigDecimal sum = BigDecimal.ZERO;
    for (Map.Entry<Product,Integer> entry : cart.getProducts.entrySet()) {
      sum = sum.add(entry.getKey().getPrice().multiply(new BigInteger(entry.getValue())
    }
    return sum;
  }

For the second point, briefly: Consider how code would have to change to calculate a deal, such as Buy One Get One Free. Such code to do this calculation would add another layer to the collector with another function defined somewhere (or hidden in some other existing abstraction, such as OP's CartRow::getRowPrice) instead of just visible in the function that calculates a cart subtotal. If the deal relied upon concepts not limited to one row at a time, eg buy any two flavors of chips for %25 off, the proposed solution would have to be completely rewritten.

throwaway4good · on May 10, 2021

What you write could be written as:

   ...stream().
     map(e -> e.getKey().getPrice().multiply(new BigDecimal(e.getValue()))).
     reduce(BigDecimal.ZERO, BigDecimal::add)

The problem here is not the use of streams but that the author goes at the problem in a confusing and round about way.

theshrike79 · on May 10, 2021

That's very pretty to look at and easy to read. But it's a huge pain in the ass to debug when getPrice starts failing. You get a 42000 line stack trace and the only information is that it failed somewhere in that long singular line.

The first thing you do is unravel all that crap to a normal loop so you can debug it properly.

And when you're done, you really don't want to go back to the stream way of doing just in case it breaks again and you need to start debugging once more.

callmeal · on May 10, 2021

   >> ...stream().
       map(e -> e.getKey().getPrice().multiply(new BigDecimal(e.getValue()))).
     reduce(BigDecimal.ZERO, BigDecimal::add)

>That's very pretty to look at and easy to read. But it's a huge pain in the ass to debug when getPrice starts failing.

The trick with java and chaining calls is that you have to do the chain on each line, so the stack track can pinpoint the line it failed at.

so instead of the above, write the code as:

    ...stream().
                map(e -> e.getKey()
                          .getPrice()
                          .multiply(new BigDecimal(e.getValue())))
               .reduce(BigDecimal.ZERO, BigDecimal::add)

And the stack trace will tell you if the getKey() or getPrice() failed or the multiply(...) failed.

mumblemumble · on May 10, 2021

If someone would kindly explain this to google-java-format, life would be so much better for everyone involved.

chii · on May 10, 2021

intellij IDEA does this formatting very easily, and you can also customize it pretty easily to fit your aesthetics too.

Hackbraten · on May 10, 2021

Plus: you can put your formatting rules into your `.editorconfig` and put it under source control.

mumblemumble · on May 10, 2021

One downside of this approach is that it only works when you can require everyone who touches the code to use IntelliJ.

forgetfulness · on May 10, 2021

Friends don't let friends use Eclipse.

But less flippantly, what other tool is workable for Java?

mumblemumble · on May 10, 2021

IntelliJ is my favorite Java IDE, too, but I'm not in the business of telling people what editor to use.

I can see a company deciding to do that for internal projects. But if it's open source, I'm certainly not interested in creating a situation where it's difficult to comply with a project's coding standards without buying a $500 piece of software.

chii · on May 10, 2021

the community edition is free, and usable for most projects that aren't enterprise level.

Bjartr · on May 10, 2021

Looks like it shouldn't when the fluent chain wouldn't fit on one line.

https://github.com/google/google-java-format/issues/341

fouric · on May 10, 2021

> you have to do the chain on each line, so the stack track can pinpoint the line it failed at

...this, to me, speaks to a huge tooling failure. The stack trace should have precise information about the region of the source file that the call was embedded in (line+column start+end), as opposed to merely the line.

dcminter · on May 10, 2021

I don't think you should be downvoted for this view, but my experience has been pretty good with streams; they took a little getting used to, but now I find they're generally a LOT more robust than the handcrafted stuff. Off-by-ones and NPEs are fairly hard to achieve, and if well done the semantics can be a lot clearer. A lot of enterprise code IS basically plumbing after all!

Where it goes wrong IMO is where a mix of styles and too many inline lambdas splatter a mess of logic into some god-method.

As other comments note, good formatting will help with intelligibility of failures - and IMO sensible logging (at least of error paths) makes it all rather noce to debug.

throwaway4good · on May 10, 2021

Why would getPrice start failing? If you have side effects or anything other than a trivial implementation inside getPrice, you have more fundamental problems with what you are building than whether to use a for loop or streams.

mbreese · on May 10, 2021

Why does any code start failing?

That’s not the point they were making. The point is that when you start chaining together so many calls, it can become difficult to debug. Prettier to write, but more difficult to debug. But that’s okay, you need to find the balance that’s appropriate for the particular project.

shagie · on May 10, 2021

> but more difficult to debug.

I've found the stream debugger in IntelliJ to be very useful when looking at debugging streams.

https://www.jetbrains.com/help/idea/analyze-java-stream-oper...

The old plugin version of it has the same functionality (and better pictures) - https://plugins.jetbrains.com/plugin/9696-java-stream-debugg... (click the 'more' link)

Selecting a piece of data within the stream shows you how it moved through the stream.

The other part of streams, for me, is that with the additional "each line does one, and only one thing" and you're not putting too much complexity in a single map, I feel that it forces you to write simpler code that doesn't need much debugging. The question of "how did that data get into the stream is where most of the debugging comes from.

throwaway4good · on May 10, 2021

First.

If getPrice fails, the stack trace will start there. If it returns null (which it should not) and trigger a npe, then the line:

   map(e -> e.getKey().getPrice().multiply(new BigDecimal(e.getValue()))).

Is just as dense as the original:

   sum = sum.add(entry.getKey().getPrice().multiply(new BigInteger(entry.getValue())

And the stack trace would be just as confusing.

Second.

The key thing with streams is to borrow from the functional programming paradigm: Split data and functions, avoid or isolate side-effects.

Do this correctly and there is a quite real plus in productivity.

budlightvirus · on May 10, 2021

Could I trouble you for an example or a reference on what this would look like refactored with "Split data and functions, avoid or isolate side-effects"? I would like to understand better but I don't know enough about functional programming to grasp your meaning just from the comment.

imoverclocked · on May 10, 2021

You might read up on pure functions [1] which are unable to produce side-effects when run.

Imagine you have two methods in Java:

  int add(int a, int b) {
    log.info("Adding two numbers {} {}", a, b);
    return a + b;
  }

  void doStuff() {
    add(1, 2);
    add(2, 3);
    add(3, 4);
  }

The result of add in doStuff is unused. However, add has a log statement which someone might be relying on elsewhere. The log line makes understanding the usefulness of this code much harder. ie: Can you delete this call? It's impossible to know without understanding everything that might consume the log line. The log-line is a side-effect in these methods.

In languages that understand "pure functions" there are optimizations that can be done by the toolchain (think automatic memoization, deferred computation, and much more) when only pure functions are called.

[1] https://en.wikipedia.org/wiki/Pure_function

throwaway4good · on May 10, 2021

I don't know any good resource. And not all functional programmers are good programmers; but try to understand what they are trying to do.

There as a bit of zen to it, less is more, in sense that a language gets more powerful if it is more constrained. For example if you know (by the type system or just coding conventions) that p.getPrice() nevers returns null, it is easier to reason about (proof, test, read) the code.

Like wise if you know that if p1 == p2 then p1.getPrice() == p2.getPrice() (that would be no side effects).

If you as some one suggested, need to support some crazy localization, then don't put it into p.getPrice(). If you must, change the name to something telling and make its input explicit: p.calculateLocalizedPrice(locale). Or better make it an explicit function (static method, or maybe something sitting in a service) calculateLocalizedPrice(product, locale) and again have it be without side effects.

labawi · on May 12, 2021

I think definitions vary slightly, but the quality you mentioned (a == b => f(a) == f(b)) would be called a deterministic function [1] - very useful, often applicable or even type check enforced in functional programming languages.

Having no side effects is a different very useful quality - function doing exactly as specified and no more (I/O, setting variables ..). I'm not sure if it means not accessing global state, though it is usually better if both inputs and outputs are explicit.

All in, it is usually easier to reason about functions that are explicit, deterministic and side-effect free, yet I find it profoundly more valuable if it can actually be relied on (a known subset of) functions having those qualities.

[1] https://maksimivanov.com/posts/pure-functions-and-side-effec...

theshrike79 · on May 10, 2021

Because stuff fails?

Maybe getPrice used to be a static lookup from a map that couldn't fail, but then the Sales Team wanted to go multi-national and now it's a database lookup with multiple dependencies, that can fail.

"But why wasn't it caught in a code review" etc...

Have you actually worked with a big team ever? Why would (or how could) anyone (outside of Google) go through every dependency of the getPrice function and check that every use case is handling errors/exceptions properly?

Stuff breaks, code is read and debugged more often than it's written. Optimising stuff to be easy and fast to write is the wrong way to make maintainable code. Unroll your loops, add toggleable debug logging and add comments why stuff is done the way it is.

throwaway4good · on May 10, 2021

The method getPrice is not really correctly named if it did the things you describe. Naming things carefully will give you far more productivity than "unrolled loops, toggleable debug logging and comments on why stuff is done the way it is".

theshrike79 · on May 10, 2021

Functions aren't always refactored to their perfect names, for multiple reasons.

getPrice might start up as something that just gets the price, after 5-10 years it might be a complex process accessing some ERP systems.

nepeckman · on May 10, 2021

> Stuff breaks, code is read and debugged more often than it's written. Optimising stuff to be easy and fast to write is the wrong way to make maintainable code.

This is exactly why people like collection functions. For loops can do anything, you have to spend more time reading and understanding the loop to build your mental model of what is happening. Mapping does one thing, transforms a collection into another collection. Same with filter, etc. If you are optimizing for readability, collection functions give way more information to the reader. Your approach is to optimize for debugging, which I'm not saying is wrong, but it's not optimizing for readability.

watwut · on May 10, 2021

What map returns greatly depends on lambda inside. So yoir collection of beans is changing into collection of good knows what and you have to keep while chaining in mind - because it is nowhere visible.

nepeckman · on May 10, 2021

Whatever exists inside the map lambda would have to exist inside the for loop as well. So if you're dealing with a confusing transformation, a loop doesnt offer you any extra tools for making that transformation more apparent to the reader. Loops have plenty of advantages (computer execution is more obvious, stack traces can be cleaner, easier concept for beginners to grasp), but I have never seen a loop be more readable than a well written functional composition.

watwut · on May 10, 2021

Yes and it tends to be more apparent what is its type. It also tends to have a name that helps understanding a lot . It is not even primary loop vs stream difference. This particular frustration is the fluent api vs procedural difference.

It is chaining that obfuscates in this case. Through, I really don't find functional style more readable in general.

yongjik · on May 10, 2021

If it were Python, I could probably write something like:

    sum(key.price * val for (key, val) in cart.items())

But the Java example makes me think "Wait, what?"

Personally, the lesson I'm drawing is that your library shouldn't actively fight the native syntax of the language you are using - just use what your language provides, that's the cleanest and easiest way to do it.

But I guess people's tastes are different.

stickfigure · on May 10, 2021

Python's `sum()` works only on numbers and assumes an empty list sums to 0. Java's `reduce()` is quite a bit more flexible.

I use Joda Money and frequently find myself doing:

    final Money bucks = stream.map(Thing::getPrice).reduce(Dollars.ZERO, Money::plus);

If you don't pass in a baseline, you get optional:

    final Optional<Money> bucks = stream.map(Thing::getPrice).reduce(Money::plus);

Also, StreamEx improves the ergonomics of Java streams. The way I would implement the original example:

    EntryStream.of(cart.getProducts())
        .mapKey(Product::getPrice)
        .mapValue(BigDecimal::new)
        .mapKeyValue(BigDecimal::multiply)
        .reduce(BigDecimal.ZERO, BigDecimal::add);

Unfortunately BigDecimal is not especially ergonomic (it could use a `multiply(long)` method, which would eliminate the annoying `mapValue()` above). But unlike the python version of this, it will preserve scale. And streams work as-is on Money types.

nine_k · on May 10, 2021

What. Python's `sum()` works on anything that defines `__add__`, which can be numbers, strings, lists, or your own custom classes.

  sum([[1, 2], [3, 4]], []) == [1, 2, 3, 4]

nitroll · on May 11, 2021

Actually it seems sum is special cased to fail for strings, but otherwise you are correct

forgetfulness · on May 10, 2021

Half the problem is that the BigDecimal class is kind of horrid in Java, and so is Map.Entry. The language lacks features that would make them more workable, like operator overloading, implicit constructors and destructuring/pattern matching.

Now, the first two are very fair not to want in a language because they allow programmers to bring in an unlimited amount of user-defined complexity.

Some destructuring however, would just make it easier to work with what's already in the language and libraries.

watwut · on May 10, 2021

I dont find 'reduce' to be more readable then + inside loop.

kitd · on May 10, 2021

I think your point is valid with a trivial example as shown in the article.

In a more complex example, with multiple steps and potentially multiple streams, using the Streams API allows the algorithmic and business logic parts of the whole calculation to be kept separate, more visible and more maintainable/adaptable as a result. Imperative code in such cases usually involves factoring out chunks of the for-loop into separate functions or classes which IME goes in the opposite direction.

IceDane · on May 10, 2021

Your advice is OK for when you are doing a one-off, simple thing.

In general, though, streaming abstractions leave that paradigm in the dust once things get become just a tad more complicated.

Using streaming abstractions, you can reason about the flow of your data at a much higher level and create abstractions which would require 100s or 1000s of lines of code to duplicate.

For example, if he decides he needs to buffer the stream, then perform processing on it, then send it out into to the world, he can do all of this super easily by using a few streaming combinators while everything is super clear and type-safe.

skohan · on May 10, 2021

> create abstractions which would require 100s or 1000s of lines of code to duplicate.

I would like to see an example of this, I just don't think streaming libraries, or abstractions in general, can get rid of that much complexity. In most cases, abstractions can't really get rid of complexity, they just move it around.

The issue I have with streaming abstractions is that the implementation tends to be overly complex. For instance, Rx libraries tend to be thousands of lines of code, and they're often written in ways that make it impossible to trace code or understand a call-stack. It makes some things easier, but when you run into a problem it becomes a nightmare to debug.

I'm just not sure it's warranted a majority of the time. Those libraries tend to be huge because they have to cover every single use-case, and they work at a very high level of abstraction. When you're working with stream-like domains, it's often possible to implement the subset of what you would need of that streaming abstraction with normal code, and in a way which is much easier to understand and debug.

whateveracct · on May 10, 2021

> In most cases, abstractions can't really get rid of complexity, they just move it around.

People say this pithy truism all the time, and it's just not true. You're probably just talking about indirection and encapsulation.

Real abstractions by their very nature reduce complexity. You could say "No true Scotsman" but I deal with actual abstractions in Haskell all the time and it only simplifies code.

So maybe Java is just deficient in its abstraction capability, but parametric polymorphism and pure streams sound like a good start to me. If a steam makes you uncomfy because it's doing O(n) things "under the hood" and you "like to know what the computer is doing" that's probably just a personal comprehension issue.

skohan · on May 10, 2021

Who said I was talking about Java?

There are some abstractions which objectively reduce complexity: i.e. the C programming language abstracts over assembly, and essentially formalizes a subset of assembly in a way which reduces cognitive load for the programmer. This is a good abstraction, but this is not what streaming libraries do.

> If a steam makes you uncomfy because it's doing O(n) things "under the hood" and you "like to know what the computer is doing" that's probably just a personal comprehension issue.

If you have to reach for "you can only not like this because you don't understand this", it means you don't have a real argument. I have implemented event-stream-like systems before, and I am perfectly aware what they do "under the hood".

I can give you an example of how these libraries go wrong. I was working a few years ago on a project which made heavy use of event streams, and one of my colleagues opened a PR which passed all unit tests locally, but failed when the CI system ran the tests. After about half a day of trying to figure out the problem, we determined that it was because a different scheduler was being used when the tests were run during the CI pipeline, which resulted in some concurrency related issues. What's worse, it didn't fail reliably, but only intermittently, so this made it extremely difficult to track down. This is what happens when you hand over your flow of control to another system: it makes it very difficult to reason about.

Abstractions are good when they reduce boilerplate and provide a shorthand for monotonous work. When abstractions try to be smart or magical and do work for you, it's almost always a bad thing.

imoverclocked · on May 10, 2021

> Abstractions are good when they reduce boilerplate and provide a shorthand for monotonous work. When abstractions try to be smart or magical and do work for you, it's almost always a bad thing.

Depends on your environment ... and the abstractions.

Haskell has a lot of guarantees that C will never have. These guarantees often mean that making abstractions are a lot less error-prone and potentially a lot more useful. In general, stronger language semantics gives the implementation of the language more leeway to optimize. All of the undefined behavior in C is a pretty good reason to step extremely carefully while doing anything, especially creating abstractions.

> the C programming language abstracts over assembly, and essentially formalizes a subset of assembly in a way which reduces cognitive load for the programmer.

Languages like Haskell attempt to abstract over computation in general; It makes C and Java seem a lot closer together in the spectrum of languages. I'm by no means an expert Haskell'er but just dabbling with it has been illuminating and I've written code in various languages since the 90's.

I hope the GP comment "personal comprehension issue" hasn't thrown you too far off. I think there might be a more tasteful way to express what they were thinking but I can't speak for them.

chii · on May 10, 2021

> you run into a problem it becomes a nightmare to debug.

this happens sometimes due to both java being a PITA to do functional programming with, and also partly due to the initial programmer of the streaming logic not breaking down into composable functions.

And with most people being more familiar with imperative style, it makes the functional style harder to maintain _for them_.

skohan · on May 10, 2021

I'm not just talking about Java - I've also seen this in mobile projects in Swift for example.

> And with most people being more familiar with imperative style, it makes the functional style harder to maintain for them.

It has nothing to do with functional style. It's perfectly feasible to write functional code with minimal abstraction which is easy to trace and understand. The problem is in handing over your flow of control to an overly complex black box of an abstraction layer.

watwut · on May 10, 2021

All of that is good when you need it. None of that is good when you don't actually need higher level reasoning about data.

shock · on May 10, 2021

Why do you need to program at a higher level than assembly?

watwut · on May 10, 2021

Too much abstraction harms as much as too little abstraction. And java is exactly environment that made "too much abstraction to the point of hurting maintainability" in the past.

If you need higher level abstraction, you should use it. If you dont need it, you absolutely should not use it.

le-mark · on May 10, 2021

This x1000, we are not writing code for ourselves, we write code for our employers, and it behooves us all to keep it as simple and maintainable as possible. Will you be there in 10 years to explain it, will you want to if you are? If this is your open source project or whatever do as you please, otherwise follow the principal of least abstraction.

davewritescode · on May 10, 2021

The first point I won't refute. Some people like imperative style and that's fine. I personally don't and find the stream based approaches much simpler.

As for your second point, you could easily add such functionality to the code with a stateful mapping operation that sets the price to 0 for every other item encountered. In your code, you'd have to add another pass of the loop or you'd have to stick the logic for computing the price inside your for loop.

Personally, I've found that decomposing problems into stream based pipelines makes it much easier to decorate additional functionality than imperative code but that's just my personal experience.

  public PriceAndRows getPriceAndRows(Cart cart) {
    DiscountApplier discounts = new DiscountApplier()
    return cart.getProducts()
        .entrySet()
        .stream()
        .map(CartRow::new)  
        .map(cartRow -> discounts.apply(cartRow)) // Stateful discount application logic                                                           
        .collect(Collectors.teeing(                                                     
            Collectors.reducing(BigDecimal.ZERO, CartRow::getRowPrice, BigDecimal::add),
            Collectors.toList(),                                                        
            PriceAndRows::new                                                           
        ));
  }

grumpyprole · on May 10, 2021

The "teeing" combinator is giving you separation of concerns. There are two separate calculations, potentially defined in separate functions, that are composed into a single operation that is performed in a single pass of the stream. Sure, you could write a monolithic imperative for-loop that does it all, but such an approach will not scale.

gnud · on May 10, 2021

You don't want the "buy one get one free" to apply directly to the cart subtotal. You want to show the customer which entry is free (or discounted), so they can see that the rebate is applied correctly.

You probably want to check all 'rebate rules' attached to all products in the cart whenever one is added/removed. An active rebate will then modify the price of the cart item, and the sum method can be stupidly simple.

stickfigure · on May 10, 2021

> Consider how code would have to change to calculate a deal, such as Buy One Get One Free.

There's nothing wrong with passing in a function other than `CartRow::getRowPrice`. That's one of the beautiful things about functional programming, you can alter behavior by passing different functions as parameters.

However, as someone who writes ecommerce code that deals with exactly this situation all the time, I can tell you that the question doesn't really make sense. Discounts are usually represented by data in the cart - either a special field, or a negative line item that gets summed along with the others. You don't just mysteriously have a second cheaper item in the cart.

voidifremoved · on May 10, 2021

The original is almost impossible to debug when something inevitably goes wrong too.

I have encountered dozens of places where streaming API calls have been reworked into imperative by whoever ends up maintaining it just so they can figure out why the hell it is breaking in some unforseen edge case.

grumpyprole · on May 10, 2021

Side-effects do not mix with lazy on-demand streams! This is unfortunately a problem with bringing functional programming constructs to a language with idiomatic pervasive mutation.

lostmsu · on May 10, 2021

BTW, this is a non-issue with C# 1.0 and later due to `yield return` syntax.

twhitmore · on May 10, 2021

Sorry, I'm skeptical -- how exactly does a 'yield return' resolve the divergences between functional programming & mutable data structures?

My layman's impression was that 'yield return' was largely syntactic sugar around an iterator, rather similar to Java.

lostmsu · on May 10, 2021

The comment you responded too were talking about debug experience.

agilob · on May 10, 2021

>All in all, why is the code not just:

Because it's hard to review, too many things are happening on one line.

What if you have to negate multiplication? You're messing that line even more, with streams, you're adding one line that's easy to read, easier to review, track changes.

watwut · on May 10, 2021

> What if you have to negate multiplication?

Add sum *= -1; on the next line if you think the line is already too complex.

Or just do return -sum;

agilob · on May 10, 2021

The example is using BigDecimal class where arithmetic operators don't work, you need to call .negate() on the object, or multiply again BigDecimal("-1"). Negating inside a loop will give different result than multiplying at the end in return line. Nevertheless, you're adding ANOTHER operation to the line which is already complex and hard to read.

watwut · on May 10, 2021

The exactly same thing can be said about stream version. There is no meaningful difference between the two when it comes to this change - either you do it in the middle of loop or add line for it.

Except that the stream version is harder to debug and read, but more effective on potentially unlimited stream.

agilob · on May 10, 2021

You add a new line with .map or .peek before .collect

watwut · on May 10, 2021

Which is oh so much different then to add the new line into the imperative code? Adding new line into imperative code tend to be simple.

setr · on May 10, 2021

A new .map line has limited scope of access, and thus highly restricted possibilities.

The same line in a for-loop can access anything, including things from other lines.

The primary benefit of these kinds of functional chains is that each step is highly constrained in their capabilities, mainly by their function name and access scope, so you can better verify that it really does do what it says on the tin.

A for loop’s content has to be ultimately read thoroughly, because anything goes. Including modifying lists unrelated to the list under question.

watwut · on May 10, 2021

The price for that limited scope is that it is much harder to figure out what is actual parameter of lambda that goes into map function. If is also not even limited all that much. In lambda, everything from surrounding scope is visible too.

My personal issue with these is that it is all harder to reason about, harder to read, harder to debug. And each time there is real issue, I have to unpack these into procedural, fix and then encrypt again.

I dont see less bugs in code since we started using these. Bugs did not dropped.

setr · on May 10, 2021

> The price for that limited scope is that it is much harder to figure out what is actual parameter of lambda that goes into map function

I’m not sure why that soul would be the case... it’s whatever came out of the previous action? You’re doing list operations...always.

Nested list operations get convoluted fast, so I generally avoid them (break them up into separate iterations) but otherwise. The only thing I can remember complicating things is issues with type conversion, but the IDE usually tells you what’s up.

The loss of print debugging and useful debugger support is a pain but usually resolved by commenting out half the manipulations and printing immediately.

Otherwise IME it’s a lot simpler to see a series of independent operations on the total list, then it is to follow a for loop interacting with one element at a time.

But personally I switch between the two styles freely, depending on which one is “cleaner” so I likely naturally avoid the more convoluted/difficult cases.

agilob · on May 10, 2021

Fluent interfaces spread complexity of reading complex operations into multiple lines, you're squeezing everything into one line where tracking few characters change will be difficult to review, difficult to compare changes, bisect bugs (change in a complex line vs addition of a new simple line).

Adding new line is simple change with no increased reading complexity. Modifying already complex line to add new operation will only increase complexity, change log of this line, making it all harder. That's all I'm saying, I'd rather read 11 simple lines, than 4 complex lines.

watwut · on May 10, 2021

I can add one new line into procedural code that does exact same thing.

It wont increase reading complexity either. It wont even force me to think about what hidden map parameters. It will all be visible directly.

kimi · on May 10, 2021

While I understand the reason of Streams and collectors, the Java implementation is really ugly and obscure. It is a language in itself. I want to like them, but I really can't.

On the other hand, in Clojure they are totally natural.

vincnetas · on May 10, 2021

beauty is in the eye of the beholder

hathym · on May 10, 2021

streams are best used against data with an unknown size (potentially infinite) that isn’t necessarily all held in memory or cannot fit the memory limits.

g051051 · on May 10, 2021

> All in all, why is the code not just:

In the other code, you could substitute parallelStream for stream and have it execute in parallel for "free".

a_imho · on May 10, 2021

Another point to consider that is strictly superior is how streams offer ~effortless parallelism.

nindalf · on May 10, 2021

Not Java, but an example of streams enabling parallelism in another language - https://developers.redhat.com/blog/2021/04/30/how-rust-makes.... Change 4 characters and it goes from saturating one core to saturating all.

bloat · on May 10, 2021

True, sometimes imperative loops are easier than streams. But your code is not a suitable substitute in this case where the constraints are clear. He does want the total sum, and he does want the list, and he does only want to iterate once.

danidiaz · on May 10, 2021

In Haskell terms, Collectors have an Applicative instance, and teeing corresponds to the liftA2 function:

> liftA2 :: (x -> y -> z) -> Fold a x -> Fold a y -> Fold a z

The "teeing" functionality is actually one of the "selling points" of the Collector-like library, not a hidden gem:

> This module provides efficient and streaming left folds that you can combine using Applicative style.

It's curious how the same abstraction can have different emphases across languages.

Collectors are also Comonads:

- You can always extract a value of the type that parameterizes the Collector, just by "closing" it.

> extract :: Fold a x -> x

- You could, in theory, "duplicate" a Collector<X> and get a Collector<Collector<X>>. This seems like a dumb function, but it would allow you for example to feed different Streams to the "same" collector, by duplicating it before consuming a Stream, the taking the result Collector, duplicating it again, passing it to another Stream...

> duplicate :: Fold a x -> Fold a (Fold a x)

http://hackage.haskell.org/package/foldl-1.4.11/docs/Control...

cole-k · on May 10, 2021

I imagine part of the reason is because Java has the idiomatic usage of

list.stream().functionalOperations.collect(Collectors.toList())

whereas in Haskell you can just do whatever on your lists and it should fuse... or so I thought. Clearly I don't know too much about which way is the "right way" in Haskell since I haven't used Control.Foldl before and I just sort of assumed fusion would happen at least for most list operations.

lmm · on May 10, 2021

You should look into iteratees. It's the same insight: if you have the concept of something that receives values and eventually yields a value of a given type, that's a structure that has some very nice algebraic properties (e.g. they're monads).

danidiaz · on May 10, 2021

I'm not sure you could implement a useful flatMap() for Collector-like types, at least without forcing the collector to hold all the received values in memory, which would defeat the purpose.

It would be like a function that takes

- a Collector that produces an X

- a function that takes an X and returns a Collector that produces an Y

and returns a Collector that produces an Y.

The thing is: while being fed, the result Collector should first feed the initial Collector and, at some time, "switch" to the Collector produced by the function. But when to perform the switch?

lmm · on May 10, 2021

Iteratees use a slightly different interface: when you feed one values you get the next iteratee state (which is either "done" or "in-progress", roughly - you can feed an EOF if you want an iteratee to finish, you don't have a "current" value until then) and any unconsumed values (possibly all of them). It's counterintuitive to start with, but it makes for a really nice representation.

danidiaz · on May 10, 2021

Here's a possible implementation of "duplicate": https://stackoverflow.com/a/67475265/1364288

unwind · on May 10, 2021

Just in case it's not obvious to everyone, the name is a reference to the tee(1) [1] shell command, part of "coreutils" [2]. The manual page begins:

tee - read from standard input and write to standard output and files

It's a proper vintage tool, listing RMS as an author.

Edit: less repetitive repetition, added coreutils link.

[1]: https://man7.org/linux/man-pages/man1/tee.1.html

[2]: https://www.gnu.org/software/coreutils/

chriswarbo · on May 10, 2021

The name "tee" is also a reference to a physical piece of plumbing which looks like the letter T, and allows connecting one "input" pipe to two "output" pipes.

kleiba · on May 10, 2021

Of course, the command evokes an association with the Unix tee command which fulfills a similar purpose: you can use it to pipe the output of some other command into a file while having it be printed to stdout at the same time. So, for instance:

    grep "error\|warning" log.txt | tee /tmp/issues.txt

would find mentions of the terms "error" and "warning" in a file and both print them to the terminal window as well as write them to the file /tmp/issues.txt. This can be quite handy at times.

According to Wikipedia, the name "tee" is a reference to a T-splitter used in plumbing, which makes sense.

galaxyLogic · on May 10, 2021

You can write either imperative for-loops or a set of connected stream-processors. An arbitrary set of connected "streamers" can always be converted to imperative code and I assume that is what is happening under the covers.

But the reverse is not true, an arbitrary set of for-loops can not be translated into a set of streamers. Right?

That means that the structure of your program is much more constrained when you compose it out of streams, than if you compose it out of arbitrary for-loops.

And if you know that your program obeys a set of constraints imposed on it by the connected streams, the program becomes easier to understand, because you can RE-use your knowledge of how those streams always work to understand every component of the system, meaning every stream-component of it.

rdsubhas · on May 10, 2021

Multiple leaps of faith. "More constraints mean more easier to understand" is a little too reductive.

Replace streams with "operator". All your program code can be written as operators like dot or plus, and since those operators gave constraints, then the whole program will be simpler, right?

If that were true, we'll still be writing code in assembly, managing registers by hand. Because hey, just 256 registers means that you can reuse your knowledge of registers everywhere, which means programs will be easier to understand again, right?

There are examples in threads which show real examples where plain code is better.

galaxyLogic · on May 11, 2021

All streams work very similarly. Much more similarly than if you mix and match all kinds of operators together.

Think Unix pipes they are easy to understand because they all behave similarly.

Yes sometimes plain code is better sure but in cases where streams fit the job they are better.

I guess the main point is that streams operate on multiple elements and they operate the same on every element. Therefore you don't need to reason what happens to every element that goes through the pipe.

It is a bit like adding and multiplying matrices, you can understand the calculation without having to mentally follow how each matrix element is processed.

_old_dude_ · on May 10, 2021

There is a talk from Venkat Subramaniam, if you want to explore Collectors to death,

Here is the part about teeing https://youtu.be/pGroX3gmeP8?t=5499

vips7L · on May 10, 2021

I don't think I've watched this one! Thanks!

BiteCode_dev · on May 10, 2021

At first, I though "this article just made me realized I didn't use the python itertools.tee to its full potential".

But then I tried to think of a code where I would rather use this than a list comprehension or yield and a more manual control flow.

And I couldn't.

Those streams are elegant when the business logic flows perfectly like a river. Unfortunately, reality is messy, and production code will have matching, conditions, casting, extractions and transformations all over the place, leading to:

- very long chains of calls

- hard to change code when feedback pushes for it

- limiting your tooling (specially debuggers) to stuff that have exceptional support of chaining, and it's rare

mrw · on May 10, 2021

This article made me realise that I don't know how to perform step-through debugging on chained of operations on Streams in Java, so I looked it up. I found out that there's Stream Trace Dialog in IntelliJ IDEA [0]. I guess it proves your point that working with Stream API requires additional tooling in cases when the code doesn't 'just work'.

[0]: https://www.jetbrains.com/help/idea/analyze-java-stream-oper...

chii · on May 10, 2021

the ideal scenario with functional programming like this is to reason about the code, and may be algebraically model it mentally so that you "know" it works.

But i find a lot of programmers don't do that - but instead write a first version which they don't truly understand (or understand completely), and then use stepped debugging to tweak the program until they get to a verison that works to their desired outcome.

BiteCode_dev · on May 10, 2021

Ideal scenarios rarely exist IRL. You may be in a rush. Inexperienced. Tired. On a problem you don't completely understand yet. With incomplete information. Exploring data or the problem space. Experimenting with an API. Trying to debug the code your colleague wrote, or a bug in the underlying lib.

That's why practicality beats purity in the vast majority of situations.

There is a place for purity, but you need a hell of a setup.

jillesvangurp · on May 10, 2021

The way out of this is called unit testing. Code like this is just over engineered crap without it. Most of the effort is writing good tests. Show me the test that tells me in a concise way what this is actually supposed to do.

Mostly purity in this context boils down to weird combinations of premature optimization or complete disregard for that. Mostly it doesn't matter of course since code like this runs on trivial amounts of data so giving the garbage collector a little more work with silly stream objects, boxing/unboxing, etc. does not matter. Code like this does not matter, at all. Unless it's wrong. Hence the need for tests. Without tests it's just more likely to end up in tears. With tests, it doesn't really matter what the code looks like as long as the tests pass. If it's convoluted without tests, it's a problem waiting to happen.

BiteCode_dev · on May 10, 2021

Tired won't make you write a proper test, not be a beginner, make you explore data more efficiently, etc.

In fact, it's very hard to write a test first when you are exploring.

PedroBatista · on May 10, 2021

Completely agree, streams are good for when you have a ( you guess it ) stream of things where you need to do a limited and well defined amount of things.

throwaway4good · on May 10, 2021

I think this is made more complex and confusing than it should be.

Notice that the author uses the word price (or row price) for two different things: the price of a product and the total (price * quantity) in a shopping cart line (cart row).

The set of CartRow can be calculated as a straight forward entrySet().map(...) of the shopping cart in products map form (Map<Product, Integer>).

The PriceAndRows object is really the total for all the cart rows and the union of all the cart rows. Both things can be calculated as a straight forward map / reduce.

titzer · on May 10, 2021

In theory, the streaming APIs are meant to make it possible to operate on streaming database queries, where the entire result set is simply too large to fit into memory. A single, unified API over both in-memory datastructures and streaming queries is supposed to make it easy to do both with the same set of APIs without needing reason about the details, nor materialize the entire intermediate result in memory [1].

But I find the opposite to be true; it's hard to reason about where/when the entire result might need to be collected, and the streaming APIs cannot really match the query language underneath (SQL, e.g.). Instead, streaming APIs are frankly confusing, and less efficient than just doing the straightforward for loops, IMHO. Especially when things get complicated, with multiple joins and map/reduce.

[1] Another way to achieve the incremental streaming result effect is to write everything in terms of generators. It is sooo much clearer to see a loop over a data structure and a yield to know how much the computation is actually incrementalized, IMHO.

dfee · on May 10, 2021

As an aside, I’ve recently been onboarded into a heavy Java ecosystem (backend, Java 8). What are the best resources to follow for growing in this language - and not just mapping patterns from one language to another?

ed_blackburn · on May 10, 2021

C# Aggregate(..) extension method (linq)?

kaba0 · on May 10, 2021

That’s just reduce. Teeing is basically creating multiple streams from a single one and after doing something with those “branches”, aggregating them.

jayd16 · on May 10, 2021

Aggregate does the job just fine in this case. Multiple streams are not needed.

    .Aggregate(new PriceAndRows(), 
    //here result is the PriceAndRows instance and next is a cart row.
    (result,next) => 
    {
       result.Price+=next.Price; 
       result.Rows.Add(next);
       return result;
    })

I'm trying to think of a better use case where multiple collectors would be cleaner than LINQ but LINQ has a lot of tools in the toolbox, SelectMany, Aggregate, temporary anonymous types, etc.

However, even in the Java side, the example could be done with reduce alone, I think.

kaba0 · on May 10, 2021

> However, even in the Java side, the example could be done with reduce alone, I think.

That’s my point: teeing itself is not aggregate/reduce. If after the branching the streams differ in size, reduce no longer applies, for example.

jayd16 · on May 10, 2021

But my point is that Aggregate and reduce can handle that in this case because they both can simply add to the PriceAndRows instance incrementally. You can sum the total and build the list in the body of the Aggregate/reduce method. Teeing is pointless here. There's no need to use another collector.

cma · on May 10, 2021

Would the Python3 version be:

map(merge_func, zip(X,Y))?

Doxin · on May 10, 2021

It looks to be the inverse of that. i.e. split a single stream of stuff so two different things can consume it. In python you'd do something like that using itertools.tee, though I think the java API goes about it quite differently.

https://docs.python.org/3/library/itertools.html#itertools.t...

lovasoa · on May 10, 2021

No. In idiomatic python, you would not use a functional approach for something like this, but if you really wanted to, you could do:

     def teeing(reducer1, reducer2):
         return lambda acc,elem: (
             reducer1(acc[0],elem),
             reducer2(acc[1],elem)
         )


    functools.reduce(
        teeing(
            (lambda l,elem: [elem] + l),
            (lambda l,elem: elem.price + l)
        ),
        cart.getProducts(),
        ([], 0)
    )

mjburgess · on May 10, 2021

It is quaint how there is supposed to be an "idiomatic python" just as mypy, "pattern statements", walrus operators, etc. are being added.

"Idomatic python" is dead, and lives now only as a pretty dumb ideology which states: everything must be phrased as a naively-typed naively-imperative program. Python is being made, retroactively, a bad imperative programming language: largely because its creator always thought that's what he'd made (he is wrong).

It's a great tragedy "pythonic" has become this: a back-reaction against the times (of increasing adoption of functional programming driven by increasing data-transformation needs).

eskaytwo · on May 10, 2021

That seems a very broad statement. Are there any such articles on this topic? I don’t see how Pythonic as a concept has to be mutually exclusive with new language features.

To me, the features you cite seem to fit the Pythonic approach intuitively.

cma · on May 10, 2021

Ah I didn't read their description well and didn't realize it was doing reductions.

mt5918 · on May 10, 2021

Can somebody provide a link what this is all about? What's the objective?