I don’t like the snarky attitude in the post. I don’t work on backend systems but the problem described is similar elsewhere: less knowledgeable or naive programmer writes a feature which looks simple but would create more work to maintain in the long run, a more expert programmer trying to sway said programmer to change the design without forcing a total change.
What irritated me was the underlying sound that programmers of category 1 are always building bad solutions and programmer 2 has to fight to save the day. It feels like this person has worked to long in a bad position and started to see the world in this simple black and white form.
To be fair I’ve been there before myself. I maintain a ton of build pipelines and build tools. And for a while I had the world view of them against me. Them, who by design create bad build scripts which never scale and never survive a second use-case change and me, the unsung hero who fights the verbal battle why one should not write build output into the build source directory etc.
For me it boiled down to understand that these other people where not standing in line to make my life harder. They simply didn’t know or had no capacity to learn this because they also had their fair share of problems in their respective domains etc.
I think there is some good information in this post. But I would have liked if it had been wrapped differently.
About 10 years ago I joined a new company, and I looked at their code and thought "what idiots wrote this?", which was strange, because everyone there was very bright. A little while later, some more new devs joined, looked at the code, and thought "what idiots wrote this?", and I was a bit annoyed, because some of that code was mine. They did a big rewrite. A little while after that, some more devs joined, looked at the rewrite and thought "what idiots wrote this?".
Fundamentally, no-one likes working with other people's code or solutions. They always think that they'd have done it better if they'd had the chance.
I see this attitude a lot and I now kind of categorize it within "contempt culture", which I think mostly just falls out of insecurity. I've been guilty of this myself and I try pretty hard to not do it now, but life is a journey and all that.
I do two things to dig out of this particular hole:
- Have some (relatively) objective standards by which I judge code/a project
- Have some empathy for people who were probably just doing their best, in the most multivariate sense ("I wrote this in Clojure so I didn't burn out", "I had to ship this in three days and I was just back from paternity leave", "I was onboarding a junior engineer and dumped this spaghetti in a spare 3 hours to mollify an important client"
#2 is very useful because it snaps me out of contempt and helps prevent me from spreading that culture to the engineers around me. But #1 is actionable in that when I do get time to "do things right" I have some values to aim at. Mostly they're:
- Does this code build/respect a coherent mental model of the problem space
- Is it possible to represent invalid state
- Are surprises/pitfalls noted (I'm not a big commenter in general, but this is a case where I find them useful)
- Does this code consider likely engineer workflows (e.g. is it easy to add another case to handle, is it easy to enable debugging, etc.)
Notably none of these are like, "methods are too long" or "variable names are too short" or whatever. I find orthodoxies like these just serve to create in/out groups and have no impact on code quality. Or, another way to say this is that if you're building a coherent mental model, your methods are probably the right size.
> Fundamentally, no-one likes working with other people's code or solutions.
I dislike this generalization, as it doesn't go down to the gist: Solutions that adhere to "some" well-known patterns or ways to build things are more likely to be accepted by other fellow developers.
I, personally, can happily work with other people's code or solutions. :)
> no-one likes working with other people's code or solutions
I think there's a frequent cause to this... It's difficult to know the requirements and constraints in place when that code or solution came to be. Many of us would have done something very similar, at least in spirit, under the same conditions. But those conditions constantly change and we tend to judge solutions based on current conditions. Try judging your own solutions by current conditions years after developing them and they sometimes won't look as good as you remember then being. When doing that, it's tempting to think "but x wasn't available and we didn't know y." With other people's solutions, we often lack the knowledge to be aware of those justifications. Investigating to find some of those justifications has helped me better figure out when replacing something is the right or wrong thing to do.
Agreed. If there was a thesis in there, I wouldn't have mind reading through but it became tldr and I wanted so hard to find the life lesson. I also don't understand what's hip with ignoring the uppercasing of the initial letter in sentences. In code or short tweets, fine, but in composition of an essay it looks unscholarly and pretentious.
> In code or short tweets, fine, but in composition of an essay it looks unscholarly and pretentious.
I think a good general rule is: for a single sentence, not capitalizing the first letter (and not using a period) is fine. Longer than that and it both makes parsing more difficult and looks strongly like an affectation. There is a practical purpose for sentence start/end punctuation.
snarky gen-x attitudes can be fun, but i think these days there's a high of a risk that they'll be taken the wrong way.
but yes, this is the lament of the senior developer or manager. the ideal is that people have the freedom to explore, and that they feel good for figuring things out, but that ultimately they're subtley shepharded such that they don't fall into infinite gravity wells or crash the ship along the way.
Use a database. Wrap it with a scheduler. Have your workers poll it. Use Redis if you must in higer-contention situations for locks. Doing this will solve countless swathes of distributed systems problems, and you won't get woken up at 2am every third night.
Most of these issues stem from the fact that queues provide no useful guarantees about how many times each entry will be processed (zero? one? many?), at least not without a large amount of work around the queues, usually enough to make them unrecognizable.
Despite the prose, the article is dripping with actual production experience, and many things I wish I had known earlier. To craft a good design, one must be aware of all the problems that the design should fix in order to be good (and accept that settling early into a design will prevent some problems from ever being fixed).
One of the points of the article is that duplicate messages are a side-effect of error recovery, which is external to the queue itself. Even if your queue guaranteed exactly-once delivery, the thing which processes the delivery may fail, which requires re-enqueuing the message, and such error recovery re-enqueues may trivially race with normal enqueuing.
I would add that some queue solutions (such as Azure) implement a lease system where a dequeued message re-appears at the head of the queue after a while, unless its removal is explicitly confirmed.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.
Full disclosure, the author sounds like an arrogant ass and I couldn't stomach the full article...
But I find that queues are often misunderstood/misused. You have to be careful not to use them in systems where the producer cares about the output (true story - I've had someone earnestly suggest an "ack queue," essentially implementing TCP-over-SQS), you absolutely cannot use them in latency sensitive contexts (if you need an async job pool, use a stack. Your peak load customers will thank you), and you have to be careful about idempotency, ordering, and delivery guarantees.
Also to put on my own arrogant hat, most of the time I've seen linked queues it's been better to orchestrate the system as a state machine.
I have coworkers debating how to automatically update half a hundred build scripts to pick up new runtime versions.
I’ve long, long since given up getting the, to realize that separate build plans are only accomplishing one thing: Enabling (as in codependency) a bunch of bad, slow tests in two sections of the code to persist instead of needing to be rewritten.
This project has no proper reason to be more than a dozen modules, and we’ve spent time on tooling to reduce the burden of managing so many repos at once.
If you don’t have a small mountain of builds you have to compose, you don’t have the queuing problems that go along with them.
What kind of situation would have a producer push into a queue but not care about the output ? The only one I could come up with, is doing some computation in advance that would otherwise need to be done just-in-time when the result is requested later.
I think this was bitten by HN’s title algorithm (but I’ve never fully understood it, so maybe the “how” was just dropped accidentally by the submitter).
I rarely comment on HN, but this site’s lack of basic grammar (seems to be their “style”) meant I just stopped reading the article. Maybe it should have been called, “How to make your content hard to read”.
I am not sure if author is familiar with RabbitMQ.
I am also not sure if the author has valid assumptions about business requirements - thumbnails being not generated doesn't look like a reason to be waked up at night to me.
Also you usually have an intern or two in the support team who can re-run failed jobs and don't need to waste time of your devs or ops or devops for that.
Re-running failed jobs should be automated wherever possible. Expected, routine failures should not require manual intervention. If you don't have this attitude, toil will gradually increase over time until all anyone ever does is put out small fires.
Thumbnails not being generated might not be worth an early morning alarm, but running out of disk space might be, or not getting to do other work because it's blocked by the failure of thumbnail generation.
Nothing should be done if it is not economical. In my experience, issues with the message busses happen very often in the first weeks after their rollout and then disappear for a while or forever.
This means: merge the PR first, let it go live, use your working students or interns to rerun stuff, wait for a month - if it is still happening, then you have a proof of a problem that needs to be fixed.
Disk space: use your monitoring tool to proactively warn you when the free disk space is below of 20% or is reducing too quickly.
If some other work is blocked by failed thumbnails, this is a logical bug and not the consequence of a message bus. This stuff has been blocked even before the introduction of the message bus anyways.
What irritated me was the underlying sound that programmers of category 1 are always building bad solutions and programmer 2 has to fight to save the day. It feels like this person has worked to long in a bad position and started to see the world in this simple black and white form.
To be fair I’ve been there before myself. I maintain a ton of build pipelines and build tools. And for a while I had the world view of them against me. Them, who by design create bad build scripts which never scale and never survive a second use-case change and me, the unsung hero who fights the verbal battle why one should not write build output into the build source directory etc.
For me it boiled down to understand that these other people where not standing in line to make my life harder. They simply didn’t know or had no capacity to learn this because they also had their fair share of problems in their respective domains etc.
I think there is some good information in this post. But I would have liked if it had been wrapped differently.