On the other hand when companies and maintainers are honest about the limitations it allows for a much easier conversation about how those limitations affect our products and how to trade off the mitigation approach and risk.
When the vendor claims there's no need for developers to use their brains they're all too eager to believe it.
I've often heard wisdom like "it's irresponsible to use Kafka when you could just use RabbitMQ" but why does that statement not work equally well in the opposite direction?
It seems like some "you don't need this fad technology" sentiments don't provide a whole lot of justification for sticking with the unfashionable choice. Does it all come down to "the devil you know?" What if you aren't that experienced with either choice?
It could, in some circumstances. Generally, though, the two have only a very small overlap with respect to messaging use cases. One was designed to be general purpose and solve a variety of MQ use cases and serve even "dumb" clients. One was designed very specifically to impose a high burden of state tracking on clients for a very specific paradigm.
> It seems like some "you don't need this fad technology" sentiments don't provide a whole lot of justification for sticking with the unfashionable choice.
The "unfashionable choice" that works doesn't need justification. The new shiny one does.
Why? That's what I'm asking.
- More people who know how to operate it
- A greater portion of its failure modes that are known, with workarounds
- Mature tooling and libraries
New technologies need a compelling reason that their pros will outweigh the above incumbent benefits.
Yeah well that's a poor sales strategy. Being honest does not pay the rent.
There's also a lowest common denominator effect, where the other guy's marketing makes false claim X, and if your marketing doesn't, you are at a big disadvantage. Most of the time competitors aren't dumb enough to make a claim that is so blatantly false as to be legally actionable; as long as they steer clear of that high bar, they will make a lot of gains by making a misleading claim (note: they may make a lot of gains even if they can be held legally accountable, as that is a slow and expensive process, and the expected cost/penalty may fall far short of the expected profit).
1) Their product hasn't launched yet.
2) They desperately regret using Mongo and are trying to get rid of it.
Everything is a mind game. People have identified their use of things like Mongo as being on the forefront of a developing technology, it makes them feel important and interesting. Try taking that meaning away from them and see how it goes for you. The practicalities of actually using the thing hardly matter.
And is the guy who initially advocated for Mongo going to show up with his tail between his legs and admit he made a mistake? Nope, even if he wants to, that would be a big hit for him career-wise and after our mid-20s most of us have been disabused of our egalitarian notions and know better than to do that.
RethinkDB is the "good engineering" counter to MongoDB. Didn't oversell, worked hard to build a world-class product that targeted the same general product class. Compare for yourself and see what you get by proceeding with an engineering emphasis.
Marketing is mandatory, and developers are naive if they believe they or their field is immune.
The recent discussion between Sam Harris and Scott Adams might be interesting to you. Take into account that moral and good are synonymous with truth, to Sam Harris (strict rationalist) and Scott Adams is a consequentialist/utilitarian.
Admit your mistake and clarify you really mean "idempotency" or "effectively once," and only if you say completely within the bounds of Kafka, and move on.
It's becoming a bit of a joke having to combat their fairy dust in my profession.
Another objection I’ve heard to this is that it isn’t really “exactly once” but actually “effectively once”. I don’t disagree that that phase is better (though less commonly understood) but I’d point out that we’re still debating the definitions of undefined terms!
I have to say I agree with this.
how do the two guarantees, "effectively once" and "at most once" differ?
"Effectively once" implies "you got the same thing 1000 times but ignored 999 of them because you already got it."
With an "exactly once" guarantee, I could send you a stream of integers (1,2,3,4,5) and you could blindly/naively/simplistically add them with no special concern and be confident that your answer of 15 was correct.
With "effectively once," you'd have to keep track of what you've seen before so you know not to add 4 an extra 6 times and come up with the wrong sum of 39.
With "at most once" you may be sitting around with a sum of 0 and think that's correct.
Surely Kafka isn't claiming "exactly-once" semantics AND availability? I thought the claim was that they will do exactly-once, or none-at-all in the event of an outage (until the outage is cleared, in then you'll get the messages you were waiting for)
That seems solvable by consensus - indeed, it's the equivalent of what Zookeeper offers.
What am I missing here?
* What about network failures? In FLP model network is reliable so there are no network failures.
* What about node failures? In FLP model node failures are permanent, so there is nothing illuminating to say that you cannot deliver message to a node that is permanently offline.
* What if node failures were transient? If network is still reliable and state transitions atomic, then failures are completely unobservable.
* What if state transition are not atomic, and you cannot process message and record that it has been processed in a single step? That would mean that exactly-once delivery is impossible even within a single node, and has nothing to do with distributed nature of computation.
FLP says it's always possible to not achieve consensus, but it says nothing about the probability of it. In practical systems, unless you have a partition such that no quorum of nodes can talk to each other, the probability of not reaching consensus rather quickly is effectively 0. Such partitions are rare in real systems (basically requires multiple data center failures or multiple fiber cuts). You are much more likely to run into other problems that affect your availability, like code bugs or failed isolation between some components.
It says that the probability is not zero. That is a very important distinction for some people.
> You are much more likely to run into other problems that affect your availability, like code bugs or failed isolation between some components.
All the more reason not to claim that your system provides the guarantee.
I'm not a mathematician, but I'm pretty sure it doesn't even say the probability of failed consensus must be nonzero. For example, the cantor set shows that it is possible to have a set with infinite elements, but zero length. That is equivalent to an infinite number of interleavings where consensus doesn't occur, but the probability of hitting any of them is zero.
Sure in real world systems, the probability would always be nonzero, but when it's still so close to zero, it practically doesn't matter. Which is why in the real world, people do build very reliable distributed systems out of unreliable components.
Getting into a quantitative evaluation of the likelihood of consensus is beyond my ken, but different people have different beliefs about what constitute "in real systems", and different interpretations of "practically doesn't matter". For example, the fact that you are talking about data centers rather than satellites suggests (to me) that your beliefs about the scarcity and transience of partitions may not generalize.
We can afford to be clear about what is true and what isn't, especially when trying to build reliable distributed systems.
Note: Ben-Or's consensus algorithm uses randomization to drive the probability of termination to zero, but still has non-zero probability associated with any length of execution.