What a meaningless parade of speculative bullshit. There's a science for discussions like this -- it's called corpus linguistics. There are people who have spent years of their lives trying to understand observed patterns in behavior using real data and statistical analysis. This a handful of anecdotes that even the most generative of armchair linguists would roll their eyes at.
Back in January I went through the effort of outlining a solution that was approved by the maintainers before implementing it. After I provided a PR, I responded to revision requests by the maintainers, and still haven't seen this change go into the project.
It's a simple change. If this feature isn't the architectural direction Docker wants, they need to close the issue and reject the pull request, instead of changing the project over and over again so that I have to maintain a PR that's over three months old.
The fact that it's still open probably means it is under some level of serious consideration. It was opened before they released Machine (and Swarm?) so maybe they just didn't know how/when it should fit in until the dust settles. Agree that they could have said something to this effect though.
Hey can we not just make up stacks that spell things?
Believe it or not, each one of these components are choices you should make an independent educated decision about, and then make sure that those educated decisions integrate well. Integration with best-in-class solutions is A criterion, but not the only one.
Like, can you imagine someone using a stack like this to build a to-do list app, just because?
Also, what I'm seeing is someone lumping one of the best graph databases I've seen in with some utter schlock technologies just to force a catchy analogy to a separate anagram.
Holy shit! Stop the presses! Enriching source data by filtering out low-signal information and selecting for high-signal information improves unsupervised clustering?
It's great to see physicists getting into other disciplines to show everyone else how wrong they are. As we all know, when you study physics and math, you're at the top of the reductionism pyramid, so you don't even really need to worry about having a background or experience in the thing you're studying. You just data at a formula, and if it doesn't work, you get to go to town berating professionals in the field using that formula. That's how it works, right?
Pre-processing your topic-modeling data is a pre-requisite to getting good results. This is fairly common knowledge. Not pre-processing is certainly the more naive approach. Computational linguists have backgrounds in things like syntax, semantics, and discourse so that they know what components of language to select for and how to format them without mis-representing the source data.
This article characterizes a straw man to tear down in the service of advertising a proprietary solution. It's basically an advertisement with a scientific reference for good measure. We call these "white papers" -- not serious journalism.
I can make these assertions from personal experience. I used latent Dirichlet allocation on high-fidelity natural language data to provide topic modeling for hundreds of thousands of wikis. We used the data for ad optimization as well as recommendations -- both of which provided statistically significant improvements in engagement. The approach worked. The recommendations were reproducible. I used all open-source software.