How to (not) invent concurrent algorithms

bborud · 2023-09-30T10:08:06

Good Lord. Someone else independently came up with something similar to what we came up with a while ago and didn't credit us. How sad. And by the way, they are idiots and we are so much smarter than them. Look, we even wrote a paper.

greatgib · 2023-09-30T11:00:52

Indeed, and even worse, lots of people probably used the same algorithm (that is quite basic in the end) for a lot of things before without making a lot of fuss but "we did an academic paper about it, so we consider ourselves as the inventor of it". "How come we don't get credits for being genius".

JackFr · 2023-09-30T10:31:49

Actually read the article.

That is neither the point the author makes, nor the tone he takes.

lijok · 2023-09-30T10:36:59

> Turns out, their concurrency control is a poor-man's version of Left-Right

> Obviously, it's a bit sad when due credit is not given where it should be, but smart readers have added to the blog's comments and on hackernews that what is being described is indeed Left-Right (or a simpler but not as powerful version of it).

> This is ok because the good stuff will be re-discovered several times over, and Left-Right is certainly one of those gold nuggets.

gpderetta · 2023-09-30T11:47:46

IMHO the tone was quite annoying and overshadowed any good point the article could have about the complexity of proving concurrent algos.

But slightly on a tangent, re: the QuestDB algo, wouldn't hazard pointers solve the concurrent map problem in a cleaner way and avoid a globally contended reader counter?

bborud · 2023-09-30T13:28:29

That's exactly the tone he takes.

mrkeen · 2023-10-01T05:42:39

Read past the intro.

> it's ok to re-discover existing algorithms, it just validates the usefulness and correctness of such algorithms, and it also validates that you, as a researcher, are on the right track to making important discoveries in that field.

> But what is not ok is to read other people's paper and code, make a stripped down version of it, claim it as your own, and after being so stripped down it is not correct anymore, claim that this is correct and lock-free.

> But my feelings don't matter, what matters is that when applied to generic code (or even non-generic code), the technique described in the Peredvizhnikov Engine is neither correct nor lock-free.

I thought the jab at HN was a bit rough:

> I blame hackernews and its sensationalist take on the world of Computer Science, because it gives voice to anyone who shouts loud enough

But then again, the top comment as posting this didn't read the article.

opportune · 2023-09-30T20:18:25

Yeah, I really want to read this guy’s stuff because he’s definitely knowledgeable on the topic and I also work on high performance distributed computing. The way this is written… sheesh it’s just insufferable. I’m just going to tell myself it’s some kind of Swiss cultural thing not translating well to English

puzpuzpuz-hn · 2023-09-30T13:23:59

> Obviously, it's a bit sad when due credit is not given where it should be

The post actually references your paper. :)

goodroot · 2023-09-30T22:39:55

And it does so with three links across three paragraphs.

From the QuestDB article:

> First, I found the Double Instance Locking[1] pattern at the amazing Concurrency Freaks blog[2]. This pattern is very similar to the one I described here. It also uses 2 internals structures and readers are alternating between them. It uses a read-write lock to protect the map being mutated. Given there is only a single writer then at any given time there is at least one internal map which is available for reading. This gives readers lock freedom.

> It's fair to say that the Double Instance Locking pattern is simpler to reason about. It decomposes the problem better. But I'd still argue my contribution is the trick with delayed replay of the last operation - if writers are sufficiently rare, then writers won't be blocked at all.

> The same blog also links to the paper Left-Right: A Concurrency Control Technique with Wait-Free Population Oblivious Reads[3] which on the surface also looks similar. It claims not only Lock-Freedom, but also Wait-Freedom. I have yet to deep dive into it.

[1] http://concurrencyfreaks.blogspot.com/2013/11/double-instanc...

[2] http://concurrencyfreaks.blogspot.com/

[3] https://master.dl.sourceforge.net/project/ccfreaks/papers/Le...

JonChesterfield · 2023-10-02T14:39:41

The lock free / wait free datastructure literature is not an especially fertile land to go searching in. The most common flaw is probably assuming you're running on a garbage collector with better properties than the structure you're describing. A close second is assuming CAS on two unrelated addresses exists.

A fair few are just wrong as these things are difficult to design by thinking hard and people don't tend to write the formal proof of correctness.

Even those that look right, actually wrote the algorithm in something approximating a formal language and don't assume they're built on magic, are difficult to search for and to compare with other designs.

At this point I'm just going to continue DIYing the things and previous inventors can only have credit if I know their thing exists and can reasonably determine whether it's the same as my construct. Which will presumably annoy people much like the authors of this post.

I note that I hadn't seen the left/right paper prior to this thread so have no idea whether I've recreated it without proper attribution. Seems likely that'll be the case for other people writing stuff too.

giovannibonetti · 2023-09-30T22:11:36

Locking algorithms are great, but before applying them to your problem, I think it is important to take a step back and ponder whether you really need to have lots of threads fighting for the same resources. If you're working with in-memory data structures that is expected, but if you're dealing with an external database or another shared external resource, you may be able to do better.

Often, you can batch the requests and hit the external resource with less concurrency, but larger payloads. If the resource handles well batches, you'll need far less concurrency and locking. For example, if you're working with Postgres, you'll need fewer connections, and you may be able to forego adding PgBouncer, which makes things more complicated.

Granted, batching requests is not something most programming languages are well suited to do. Those that are optimized for high concurrency like Go (channels) and Elixir (processes) can do it well, but for languages that do everything with threads it can be painful.

pharmakom · 2023-09-30T13:04:45

Gosh! The sneering tone of this article is quite painful. Could only imagining the “well actually…”-isms they inflict on coworkers.

That said, I found the description of catching up with the cutting edge through invention quite interesting. Does this ring true for people who earned a PHD?

chaboud · 2023-09-30T16:05:41

That sounds more like a bizarre brag than anything else, a "we would have invented it if you hadn't" statement.

There's value in being unencumbered by the directions of others, but there's an even larger value in being able to understand the state of the field and your new work's place in it, even if your new work is just a spark of an idea.

Doing otherwise is wildly inefficient, like trying to place an order at a restaurant with no menus and a no modifications policy. Unless you need to do it yourself to truly learn (plenty of people learn better by doing than reading), I'd recommend a healthy practice of reading about the cutting edge and building/inventing, without restriction on either.

mathisfun123 · 2023-09-30T18:36:42

> Doing otherwise is wildly inefficient

I hate to break it to you but irrespective of a pursuit for efficient progress in science, the majority of papers reinvent things unencumbered by an obligation to locate the work in a broader body - in every paper I've been on, citations are post-hoc. And everyone I know works the same way - build a thing suited to your needs, craft the sales pitch for the reviewers (including how you're different from competitors), sprinkle citations for good measure.

Clearly I don't think this is bad.

j-pb · 2023-10-03T07:50:38

The majority of papers are also garbage that add nothing to scientific progress, and are just there to satisfy some departments publication or project deliverable quotas.