The satire in the title is reminiscent of how Firebase was born.
We were previously working on a chat system called Envolve (https://www.envolve.com), that was 'Facebook Chat for any website'. A game that was using us for in-game chat created channels, used display: none on them, and passed game state through the chat.
We scratched our head, asked them why, and learned they wanted to focus on the frontend, not to deal with realtime message passing.
This led us to create a 'headless version' of our chat infra (re-written in Scala) that became the Firebase Realtime Database.
This is an important lesson - if someone is using your tool in unexpected ways don’t just shut them down; there’s likely a business case that could be identified and specialized in.
Nobody wants the product you want to make. They all want you to make the product they need, and will do things the devs of your product could never imagine with your product.
We make a CRUD desktop app, with a lot of integrations to customer systems. We got a standard set of XMLs, but we'll deal with whatever the customer has if we need to.
We recently got a support call from an upset client. A recent change had broken their integration, halting production.
This lead to some head scratching on our side, as we couldn't figure out what integration they were talking about.
Turned out they had paid some consultant quite a lot to script some tool similar to AutoHotkey to transfer data from their order system to our system. It would emulate copy pasting fields, tabbing along.
Our recent update had introduced a new field, thus screwing it all up.
As an aside, the "integration" used half an hour to transfer a single order due to how it did its work. An XML would have taken seconds tops.
I had a client who wanted to build his business on top of our business via scripting our administrative back end - basically he wanted to be a value added reseller. He'd get angry when we modified our admin console. We offered him API access for a fee, but any fee was too much for his business model. It was quite frustrating and eventually we had to ignore his pleas for UI stability over years of time. All the while he could have had access to the API for less than he was paying the person to script our admin console. This was circa 2007, so perhaps API programmers just weren't the thing back in the day.
I think this is unfortunately what is going happen in the next few years as more tech companies will price their APIs out of reach of small businesses :/
Indeed. Tangentially, there is another side to the coin.
The antithesis of this is the contract model, when a client comes to you and asks you to make something specific, rather than describing what they need. In this case they are almost always wrong. They’ll rarely admit it unless you show them some amazing alternative, though.
Requirements = a set of problems to be solved. More often you get handed a diagram of a 12” tall Stonehenge sketched on a napkin with exceeding confidence that it will be 12 feet tall.
This happens outside development, too. More often than not in IT, when the business has a need, they start with a solution and approach IT asking them to implement it. I've gotten better over the years (although always room to improve) at steering the conversation around to "what exactly are you trying to accomplish?" Let's start there.
The quote is alleged to be Henry Ford, and I've always encountered it as "better horse."
In either case, the customers aren't stupid but instead are using the language and paradigm they are familiar with to describe their needs. Ford did indeed provide a faster/better horse, because the horse was the main means of non-human motive force in the US.
The technologist's job is to understand the universal of possible solutions and to understand the customer's needs and goals, which often includes interrogating and parsing the customer's language/worldview, and craft a product that achieves those needs/goals (and doesn't introduce negatives that overwhelm the benefits).
Regardless of the provenance of that quote, it is quite apt wrt IT clients. They always come asking for a faster horse, ie: something they know that is just better in a specific way.
A lot of the times this might be wrong but I think the interesting part is how that request gets handled.
The horses were themselves a more immediate sanitation problem; in New York alone there was a million pounds of horse manure every day, plus thousands of horses that dropped dead from overwork. Disposing of all the feces and corpses was a challenge.
And to be fair there was a lot of grumbling about the "horseless carriage" for quite awhile.
What's interesting to me is that the oldest streets (ignoring the Main Street) in the local small town are the widest, because they needed to be able to turn around a team of horses and a carriage/wagon even when other wagons are parked on the side. The newer streets (but still 1940-50s) around are narrower because they didn't have to accommodate that.
It is frustrating how often I have to tell my non-techy CEO to rewind a step, and tell me the problem he is trying to solve not the half-baked solution he came up with. One bad idea he has had for over a decade, and I have to keep beating it down since it is an extremely stupid method to solve nothing.
There is another side to this coin, too. It is clients that describe their needs, not some specific and wrong solution, and then either get told they did not do their homework, or their needs getting ignored by a dev who thinks he is smarter than the client in the client's own field.
It is correct that "Requirements = a set of problems", and therefore it does not make sense at all to repeatedly ask the client for a technical solution instead of asking for the problems to solve, and then be surprised that the client does not talk about problems anymore.
The kamikaze was a grim solution to Japan's losing of the carrier offence/defence balance; the rate of pilots returning from sorties was poor, and the rate of pilots returning from tours to be instructors was extremely low, so the solution was to take very young men with minimal training and use them as human guided missiles. Pilots were probably going to be killed anyway, so the focus was on trying to achieve carrier kills at any cost.
(Humans will really adapt anything into a weapon, including human lives themselves, especially in a total war)
Also consider that more than half of the hits to Japanese aircraft in the Pacific were caused by the use of the top-secret VT radar proximity fuse.
The Japanese predicted they were going against "dumb" timer-based anti-aircraft, which was probably two orders of magnitude less accurate and less effective, where their weak armoring would have been a good approach: if you think it will take 200 shots to bring down your planes, fielding twice as many units that could each be brought down by one lucky direct hit makes great sense. But if it only takes two shots, you need to be able to survive the shrapnel.
The strategy they put in motion in 1942 suddenly became obsolete in 1943, and it was too late to pivot.
> This is an important lesson - if someone is using your tool in unexpected ways don’t just shut them down; there’s likely a business case that could be identified and specialized in.
One of the craziest cases of this I've seen was with a web-based survey application I was the principal engineer on in the mid to late 2000's. A big feature we implemented was the ability to create surveys to support multiple languages. To make translation easier for our customers to translate surveys outside of the application, there were ways to export/import the text strings as well as a standalone screen that allowed finding and editing them. Both of these were made easier by the fact that the strings had semantic identifiers like "/survey/question/choice" or similar.
Since this functionality worked so well, we also used it internally for all text in the application. As a convenience, both the import/export and edit screen were capable of editing these strings. One customer figured this out and, due to the naming, was easily able to identify the text for all screens and dialogs. They ended up completely modifying the application interface through this interface by editing text, adding HTML elements, and injecting CSS blocks. They added descriptive tutorial text, guides for users, and branded the application well beyond built-in functionality. It was pretty amazing.
It's amazing with how much creativity users will abuse your creations. ;) And in many cases, something new is born out of it. The problem is getting the information about just how people are using your service differently than you've intended. Sometimes it's impossible to tell from traces, analytics or logfiles alone. But finding out can be quite an advantage, especially if you're a startup probing for PMF.
The best thing that can happen is that you have a channel to your customers that's constantly open. At WunderGraph, we use Slack for that, and it considerably lowers the barrier to just check in and have a quick discussion. The sooner you find out about use cases, the better - and ideally create a product around it. :)
Also, you can actually make it cost competitive if the object you store is the last n milliseconds of packets instead of one packet each. So, instead of incurring two API calls per packet, you incur two API calls per minimum buffering time. If S3 is zero-rated for any ingress/egress then you get “infinite” bandwidth for 4.22$ * 3 (for the active case) = 12.66$ a day if you are willing to accept 500 ms minimum latency, or ~600$ a day at a more reasonable 10 ms. If you are saturating even just a 1 Gb link for a whole day that is ~10,000 GB which would be ~700$ via the blessed channel, so you could very well come out ahead.
You could do even better if you out-of-band signal the readiness so you do not need to poll while idle. Then you only incur a cost while actively transmitting so as long as you average 1 Gb/s on the channel you should be coming out even or ahead with minimal latency impact.
This isn't theoretical; many companies do PostgreSQL async 1:N physical replication, by using e.g. https://pgbackrest.org/ to have the primary push WAL segment files (a.k.a. "the last n milliseconds of packets" in the write-ahead log) as objects to S3. All the read-replicas then independently discover the new objects in S3 as they become available; fetch them; and replay them.
> You could do even better if you out-of-band signal the readiness so you do not need to poll while idle.
S3 and its clones have "object lifecycle notifications", where you can be informed by a push-based mechanism whenever a new object is put into the bucket.
But — what do you have to do, to get these notifications?
Subscribe to a real message queue, that S3 puts these notifications into.
Not really, but you can use Patroni for that, which automatically promote one of your replica to master, it can also create the replica from WAL archive in S3 using wal-g/pgbackrest.
I once used a MySQL database as a replacement for a message queue. This was the easiest solution to implement since all the servers were already connected to the database anyways. A server would write a new row to the table and all the servers would remember the last row they had already seen. Occasionally the table is cleared. I'm sure there are some race conditions in the system but its only purpose is to send Discord notifications when someone breaks a highscore in a video game, so its not really critical. It's still working that way today.
The code is in there for Postgres, MS SQL and MySQL (which all support SKIP LOCKED) though at some point I abandoned all but Postgres.
If I was to write another message queue then I wouldn’t use a database, I’d use the file system based around Linux file moves, which are atomic. What I really want is a message queue that is fast and requires zero config, file based message queues are both…. better than a database.
I really feel like file systems aren't used for enough things. File systems come with so much useful metadata.
I've experimented with using the file system for storing configuration data where each config value is a single file. Nested structures just use directories. The name of the file is the field name. The file extension is a hint about the data type contained within (.txt is obvious, but I also liked .bool). Parsing data is trivial. I don't need any special viewing or editing tools, just a file manager and text editor. You can see when a specific config value was changed by checking the file update time. You don't have to load the whole configuration just to access one field. And you could conceivably TAR up the whole thing if you wanted to transmit it somewhere.
I use it to configure little sub-projects in my personal website. I really like it, but I shudder to think of the complaining I'd hear from other developers if I ever used it in a work project, just because it's not whatever they've ever seen before and would require a moment of two of thinking on their behalf to get over ingrained habits.
A company I used to work for extensively used this method. It's incredibly useful to be able to read a config or state value from any language and even bash scripts quickly.
However, and this is a big drawback, once you have too many config files and you start reading and writing from different processes, you get into bottleneck situations quickly.
I haven't used this system extensively yet. But I don't really see how that situation gets improved by having a single-file configuration system.
First of all, if you have multiple processes trying to read/write the same config, that's kind of suspect, and if file I/O is a bottleneck for your config system, that's a different suspicious situation. Why are your processes writing to the config so... often?
But regardless, I can't see how those problems get immediately better by storing that config in a single file. If anything, having it split across multiple files would improve the situation, as different processes that might only be concerned with different sections of the config won't need to wait on file locks from process unrelated to their concerns.
I realize it may have sounded like I was suggesting the approach. Quite the contrary: I would never do that again or suggest it. I was merely pointing out that it was quite useful at times :-) We had lots of problems similar to what you were describing. Luckily that's all in the past now.
The paradigm is also used by /proc and /sys, so I guess other developers won't get confused. However I never tried to tar -x into /proc to start the same set of processes on another node, or as an alternative to /etc/sysctl.conf :)
This was tried and called Elektra I think around Y2K. Don’t believe the idea was even new then, but there was also research into tiny file performance at the time, resulting in things like reiserfs. I think it packed tiny files into the directory itself resulting in blistering speed.
Anyway it’s an elegant idea. Silly to have dozens of config file formats when the fs already has everything it needs. We have xattr too.
The flaw on the OS level is that it is hard to get everyone to change. For new apps not a problem, and any performance concerns are no longer an issue for config.
Oh man, reiserfs. Seeing that name reminds me that the original developer, Hans Reiser, is currently spending time in prison for murdering his wife. She was the interpreter during his first meeting with a Russian "mail-order bride".
It's a single byte file. You read the entire file contents and if it's not zero, it's true. The existence of the file tells you whether or not to use a default value. An INI file would have to be fully parsed before we know whether it contains a value for that config value.
You read the entire file contents, trim leading and trailing whitespace and toLower for good measure if you want, then validate against your list of installed themes. Done. No goofy JSON or YAML parser in site.
And what reinvention is there? If you're just using a system that already exists, you're not reinventing anything.
I actually did this once (over SMB on Windows, though) - and unintentionally crippled our corporate SAN with all of its polling and locking activity. I had a cluster of 20 workers which would poll every five seconds for messages, and I believe we had an EMC VNX storage appliance. I never did figure out why that was enough to bring the whole thing to its knees, but IT was very quick to track the problem back to me.
Interesting. What makes you want to switch to the file system? I wrote one for a project[0] a while back (for MongoDB) and it didn't seem like the database introduced too much complexity. I didn't write the implementation from scratch, but the couple hundred lines of code were easy to reason about.
I found almost all message queues to be horribly complex to configure, debug and run. Even database queues require a lot of config, relative to using the file system.
I did actually write a file system based message queue in Rust and it instantly maxed out the disk at about 30,000 messages a second. It did about 7 million messages a second when run as a purely RAM message queue but that didn’t use file system at all.
It depends what you’re doing of course… running a bank on a file system queue wouldn’t make sense.
A fast message queue should be a tiny executable that you run and you’re in business in seconds, no faffing around with even a minute of config.
> I did actually write a file system based message queue in Rust and it instantly maxed out the disk at about 30,000 messages a second. It did about 7 million messages a second when run as a purely RAM message queue but that didn’t use file system at all.
Did you try an in-memory filesystem through tmpfs?
Database config should be two connection strings, 1 for the admin user that creates the tables and anther for the queue user. Everything else should be stored in the database itself. Each queue should be in its own set of tables. Large blobs may or may not be referenced to an external file.
Shouldn't a message send be worst case a CAS. It really seems like all the work around garbage collection would have some use for in-memory high speed queues.
Are you familiar with the LMAX Disruptor? Is is a Java based cross thread messaging library used for day trading applications.
Since you seem to be from citusdata: I used cstore_fdw 2 - 3 years back and at least when paired with TPC-H it was horrendously broken for both small (10 gig) and large (100 gig) datasets. It has been integrated into some other product by the time being, I hope you managed to improve it.
This is actually pretty common, and usually a "good enough" solution. You can also add things like scheduling (add a run_at column), at least once execution (mark a row when it is being processed, delete it only when successful), topics, etc with minor modifications to your table.
If you want something that works "well enough" I'd say it's a reasonable choice.
Yeah, I'm using it as a transactional outbox to ensure at least once delivery to SNS.
Can't really think of a better way to ensure that a message is always sent if the DB transactions succeeds and is never sent if the DB transaction fails
You can get so far by ensuring at least once and making everything idempotent (will get you as close to "exactly once" as you can). With a database, the most common pattern is: insert the row for the job, when a worker starts working on it, mark it as in progress so it doesn't get started again, if the task fails, or after some reasonable time-out period, another worker can pick up the task again, ultimately the row for the task is only ever deleted when a worker successfully completes it.
> We decided to store Centrifuge data inside Amazon’s RDS instances running on MySQL. RDS gives us managed datastores, and MySQL provides us with the ability to re-order our jobs.
If you don't want to publish events from uncommitted transactions you'll have to first store them in a local table and then move them to the queue after the commit. But if all consumers have direct access to the database anyway...
I am doing the same with SQL Server. The messages table is more of a bus than a queue in our case (columns like ReplyToId, etc). Using it for RPC communication between cloud bits. Much cheaper than Azure Service Bus and friends.
I don’t know the OPs answer, but I’d hazard to guess because ssmb is a completely neglected feature with very little in the way of community or UI. In theory it would be great, but MS basically never invested in it after its release and now it’s just a random, “who knows when we’ll drop support for this” sql server feature.
I briefly worked for a major corporation 15 years ago that did this with SQL Server to create distributed worker processes to handle all the AI-generated used car listings and photo recolorings [0] for almost all of the used car lots in the country.
[0] Why take hundreds of photos of Honda Civics in red, green, blue, and black when you already have a dozen in white?
Why even take the dozen in white when they have a model you can render in any manner? Most car commercials do not have real cars in them. Maybe the shots of a car actually in motion, but most of the static shots are 3D models placed onto backgrounds. I don't know why, but I was surprised by this when I worked in a post house that did a lot of car commercials. One of the roles for a coworker was to get flown around to locations to take the images for the background plates using photogrammetry. "Can't fly an Alexa through the back glass to zoom in on the dash now can we" was one comment.
I've built a hybrid task queue/process supervisor on top of SQL. Classical task queues like Celery didn't exactly fit our use case: a single process could run for hours or days, but in case of a node failing, it must be resurrected elsewhere as soon as possible (within seconds). I didn't have the time to re-architect everything for Kubernetes, or rewrite half the product in Erlang; so I built that weird thing. It's been super stable, running mission critical code, and making us money - for several years now.
I implemented a message queue in MySQL too and it worked pretty well. Incoming messages would be written to the table and the workers would poll the database each cron period and process whatever rows were in the queue. To avoid race conditions, the workers would lock the records they were working on and then delete them as soon as the work was complete. It was simple but it worked just fine for my purposes
This has been a thing since before databases were relational. 4G languages (Progress, etc.) were especially nice for their ability to wrap a queue table around a series of reversible transactions, if you coded things right .. meaning a lot of modules written for app infrastructure were based on an 'inbox table' methodology ..
I’ve run into all sorts of database locking issues and concurrency issues when using a database as a queue. I saw that mistake made a long time ago and I would never do it myself.
Database engines are getting features like SELECT FOR UPDATE SKIP LOCKED, so what were once serious blockers on this idea may no longer be as much of a problem.
It’s not necessary, but it is a lot less fiddly: you automatically look at only the tasks that someone else isn’t currently working on, and because the lock is held by the database connection you get automatic retries if your worker crashes and drops the connection. You could figure out all of the interactions needed to make this work yourself, but if the database already has support built in you may as well use it (and there’s a straightforward path to migrate if you need more sophistication later).
No? Unless there's some edge case with that statement I don't know about. That statement is basically tailor made for queues so you can select jobs that aren't currently being worked on by other workers.
Inasmuch as you trust your db's locking correctness it eliminates the concurrency issues. You can very naively have n workers pulling jobs from a queue not stepping on each-other.
It’s not so out of the ordinary. A few libraries in Rails create message queues in Postgres using advisory locks and listen/notify.
Hell, if it’s not an RDBMS then it’ll be Redis (at a much greater expense for a managed instance). I’ve seen that setup in the Ruby world far more often than using a dedicated message queue.
One of my customers used email (a gmail account, no less) as a message queue between their front end site and the back-office processor. This worked quite well for close to a decade I think.
It basically evolved from when applications from their original customer facing site were emailed and manually entered into the back-office system by a human. They were looking to automate this with a minimum of changes to too many of the moving parts at once, so I reformatted the sent email to contain an XML payload so the new back-office automation could read and process it (and in a pinch, a human could still review any problem applications) using Java's mail APIs.
Things evolved, the front-end web site got replaced with a Wordpress site, but the email message queue kept working for a long time. In the last year or so it was getting more and more onerous though. Reconciling information between the front-end and the mail box showed not all emails were being delivered, and authentication to gmail was becoming more and more of a burdensome moving target.
I just recently replaced the whole thing with an API call made from the back-office to the Wordpress site to access the stored data. (The original site didn't store, just emailed, which was why this was not an option historically.)
Yes, it's actually somewhat of a decent (if cursed) message queue for many usecases too. Not to mention the debuggability (you already know how to use an email client).
As long as you keep it internal-only, get a reliable server that doesn't do any anti-spam shit (AKA, nothing from MS or any other large company), doesn't use some unreliable proprietary database (again no MS), and have a reliable journaled disk, your only important failure mode will be losing disks. That's an easy failure mode to deal with.
I love email, and to me it is a very important part of our infrastructure, but it is quite cursed as an arbitrary transport. The semantics are very complex, and any transported data will generally need to be encoded as base64, which is very inefficient.
You also need to add misbehaving clients on your list of things not to use with a message-queue, just opening the queue's folder on Outlook to inspect it may be enough to completely corrupt it (Outlook will happily reorder, even rewrite, header fields of incoming messages, replicate messages to infinity, or who knows what else).
I'd be worried to have a message queue that is compatible with such a cursed ecosystem and transport protocol.
Before there was REST and other http based standards ebXML was a hotness and tools like bizspark etc. You could see the logical progression through:
I fill in the order form and post it. They mail back an invoice.
I fax the order form, they fax back an invoice.
My computer sends the order form in a structured way over email.
My computer sends the order form in a structured way over http.
And that's why many large telcos and banks still use ebXML in their B2B transactions. It's fundamentally the same business process and logic it's always been, with glacially slow improvement in performance over time.
I co-founded a webmail provider in '99, and when we needed a message queue reaching for our heavily customized Qmail setup was a relatively natural choice. I've mentioned it here before. Provides all the routing and retries we needed, and made debugging trivial (e-mail from our desktop clients to the queues worked; cc:'ing a real mailbox with copies worked; checking out queues with POP3 worked...)
I always thought using e-mail in this would be good for systems that occasionally need a human in the loop. Normally one service processes and mails results to the next, occasionally something is forwarded to a human, who can make some decision, edit the mail, then forward it along back into the automated path.
I was hoping someone would notice this by arguing about my pricing estimates in the arguments, but nobody did so I'm gonna spoil the surprise: if you have JavaScript enabled your browser will render slightly different prices if you view this post from Hacker News. This will persist to when you visit from not-Hacker News too, just so you don't notice the difference. The prices range from 0.8x to 5x the ones I figured out in the AWS calculator.
If you want to see how I did it, view source and search "gaslight".
Not really my type of humor to begin with, but having to explain the "joke" to the community you're trying to bait because nobody cared really ruins it.
I'm going to have to spoil some of it to point out that part of what he's done is reinventing the idea of delay-line memory, although he makes no mention of this: https://en.wikipedia.org/wiki/Delay-line_memory
...of course if that's all there were to it it wouldn't be interesting, but I won't reveal what the rest is as per your suggestions. :)
I always thought it would be fun to host a Rube Goldberg competition for systems engineering. Whoever could accomplish a simple task with the most ridiculous system would win.
Something like: read the 54th line of this file hosted at xx address.
Then a submission could look like:
1. FTP a file to a server with the address.
2. the server reads the file and spins up a VM.
3. The VM polls an endpoint to download code to pull the address.
4. The code downloads the file and splits it into a single file per line.
5. A script loads all the files into an array and accesses array[53] to get the answer.
I used to host a competition called Anything But Ethernet.
(You could use ethernet hops, they just didn't score points, so say you had some ethernet-to-whatever bridge, only the 'whatever' segment would count.)
Over the years, we saw physical proof that a T1 circuit would in fact run over barbed wire, we had a human-in-the-loop following a DDR-esque stream of arrows to stomp on a pad that encoded data from one hop to the next, we had a writable RFID tag stuck to the front bumper of an R/C car that would drive back and forth between readers...
That's awesome. Was this part of a company? University? I've brought the idea up to coworkers but nobody wants to waste extra cycles for something this dumb.
Notacon was a hack-tech-art-music-demoscene conference in Cleveland, Ohio that ran from 2003 to 2014. I knew some of the organizers and got heavily involved in volunteering and setup for the event.
The drive from my home north of Detroit to Cleveland is a couple hours, so I carpooled with another Michigander making the trip, a guy who ran large parts of the campus network at MSU. I was in telecom, mostly slinging SONET networks at the time, he was mostly Ethernet, but adjacent to a lot of other technologies. We spoke just enough of each other's language to have a lot to learn from each other.
One year (I think it was 2005) in the car, we got to talking about transport networks, and lamenting the fact that "traceroute hides all the fun stuff"; it only shows IP-aware hops that decrement the TTL, and there could be thousands of miles of glass and hundreds of pieces of equipment, which considered the IP traffic to be "payload" and thus would never dare to alter it, between two hops that show up in a traceroute. (Plenty of otherwise-bright LAN engineers might naïvely assume that a packet leaves their router in Chicago and just arrives at another router in Denver because there was just a long, long piece of glass between them -- I think a lot of people have inflated ideas of fiber's capabilities because they're ignorant of the dozens of payload-transparent transport nodes and repeaters handling their circuit between the hops they see.)
"Wouldn't it be fun", we mused, "to somehow shine some light on all those other networks that aren't IP, aren't ethernet, aren't visible to the normal user?" And as the miles ticked by, Anything but Ethernet was born. We proposed it to Notacon's organizers, who gleefully encouraged us to take it and run with it. I'm not sure how it ended up being mostly my baby, but I ran around and found sponsors for prizes, spent the year needling potential contestants into actually building the crazy stuff they verbalized while couched in hypotheticals, and then emceed the chaos when the conference weekend and judging time came around.
This was pre-hackerspace, when all our nerd energy for the year was focused on one or two insane weekends (many of us also attended Penguicon or Defcon). So, extra cycles were less likely to be otherwise spoken for, and a few folks were highly motivated by the contest's appreciation of perverse, obscure, flashy, and MacGuyvered hacks.
My greatest regret looking back on the whole thing is that I didn't put more emphasis on recording and documenting the insanity. That wasn't valuable at the time -- it existed in and for a short-duration event -- but it would be amazing to look back on more of it now.
As a junior dev I used folder pairs (to_process and processed) as a way to move messages between loosely coupled systems with file system watchers picking up new files in the to_process folder. Very light weight, got the job done, and was told it was in production for over a decade (even after better best practices came into use).
This is a very sensible design for small-scale queueing systems. A few implementation notes I’ve learned along the way:
- You’ll want writers to create a temp file and then rename it into place; otherwise your reader might pick up a file that’s only half-written. (POSIX rename() is atomic within a filesystem.)
- It’s also helpful to have an `in_process` folder, and a cronjob to send alerts if a task sits in that place too long. That way you can quickly catch crashes and other failures.
- You can have multiple readers; they cooperate by rename()ing the input file into in_process/ before they start working on it, and ignoring ENOENT (which indicates some other process got to it first).
- This pattern is great for loosely-coupled systems. In my case I was importing orders into a system that ran very slowly; once a reliable import queue was available new use cases kept presenting themselves, and it was very easy to add them; since the API was just “write a file in the correct format”, individual scripts could be written in whatever way was easiest for the particular task at hand. (It helped that the input format hadn’t changed in a couple decades; if it had needed changing a lot you’d want a more structured system. But this is the same as any distributed systems interface design.)
With respect to how to move things into folders, the simplest advice is to tell people to just read the Maildir spec. And read Qmail as well. While Qmail is dated as a mailsystem, as a description of how to compose a decoupled system that reliably processes filesystem based queues, it contains lots of little nuggets.
3 years ago, I extensively used the all-company's shared filesystem to pass information between 2 independent Jenkins instances (One on Windows for jobs that worked best under a Windows machine and one on Redhat which was considered as the "main" instance); and between the Jenkins and target application servers (Windows and Redhat) which all had the company's filesystem mounted. It took time to perfect, but worked wonderfully once I adopted the rename-in-place technique as described by parent.
I wonder if the system is still in place. Last time I checked the Jenkins folder was occupying 270 GB (!) of the 10 TB shared FS, most likely because FS block size was 1 MB.
Literally spooling message-filled tape onto a tape reel! I never even thought about that.
On the topic of message queues, I wonder if anyone ever designed a mechanical, asynchronous hardware IPC mechanism based on tape. Imagine: an endless-loop tape that stretches between a "sender" tape-drive on one machine, and a "receiver" tape-drive on another machine; with each tape drive locally tensioning the tape as it passes through, but letting it hang slack outside the drive; with two tape-width buffer boxes in between the two drives, for taking up the slack; and where each tape-drive has a strain gauge on its input-end capstan.
With this design, you'd never need to seek the tape (backward, at least) — the receive head would just read the tape forward at whatever speed it could until it ran out of slack in the tape (i.e. blocked on read); and the send head would write the tape forward at whatever speed it could until it ran out of slack (i.e. blocked on write.) The tape itself would be a literal, physical ring buffer.
Western Union built a store-and-forward message switching system in pretty much exactly the way you describe (but with paper tape rather than magnetic): https://en.wikipedia.org/wiki/Plan_55-A
Analog audio delay/echo machines work like this. They could have multiple read heads and you could move them relative to each other and adjust the speed of the tape to get various cool effects.
So having "discovered" this approach as a young .NET dev retroactively gives me a smug smile that I came up with such a battle-tested pattern. Then again, it's also quite an obvious and simple pattern which is why it was used in the first place, I'm sure.
At my old company, we had a system where we had a limited number of physical machines that multiple testers would need to remote into. We didn't want to disallow two people logging in simultaneously because sometimes that was necessary, and if someone stayed logged in then wen to lunch, where was no way to kick them off to get access without waling to the server closet, pulling up the machine on the KVM, and kicking them that way.
I wrote some VBScript that wrote a locking folder into a sub-folder in a network folder, then launched VNC targeting the proper VM. When VNC was closed, the script would complete and delete the locking folder. That let us know what was available and when. I always thought it was hacky, but at ~60 lines it's incredibly simple and has never really failed.
Sometimes a filesystem-based queue like that is all you really need. Recently I used S3 API compatible storage for something similar, as introducing AMQP or pubsub would have been immensely overkill for this low volume component.
> <Cadey> Hello! Thank you for visiting my website. You seem to be using an ad-blocker. I understand why you do this, but I'd really appreciate if it you would turn it off for my website. These ads help pay for running the website and are done by Ethical Ads. I do not receive detailed analytics on the ads and from what I understand neither does Ethical Ads. If you don't want to disable your ad blocker, please consider donating [snip] or sending some extra cash to [snip] or [snip]. It helps fund the website's hosting bills and pay for the expensive technical editor that I use for my longer articles. Thanks and be well!
> The bytes are stored in the cloud, which is slightly slower to read from than it would be to read data out of the heap.
Given latency and bandwidth differences, that's like saying that it's slightly slower to transport water by driving standard, 1-gallon jugs between the US east and west coasts than it is to transport it a few miles using a tanker truck.
The only thing worse would be this “on the blockchain.”
On second thought it seems likely that some charlatan has already years ago raised a couple of millions in an ICO for such a project. “We’re making the entire Internet web3-compatible!”
This is astoundingly well-written and informative. It's the sort of thing I come here for. It's also engaging and charming in a way rarely seen in tech writing.
I like that they started out by talking about $0.07/G cost and went through the whole exercise before pointing out what immediately came to mind for me when they started pushing bytes in and out of S3.
"anything stored in that pointer to memory you got back from malloc() is stored in an area of ram called "the heap", which is moderately slower to access than it is to access the stack."
Is this true, or a myth? Ignoring the allocation cost and access patterns making cache misses more likely, surely memory is just memory
By definition, everything on the heap got where it is dynamically, so the minimum number of pointers to chase to find it is 1.
In contrast, what’s where on the stack can often be (and often must be) known statically by the compiler and accessed directly, even moved entirely out of memory and into a register. (It’s possible to do this with suitably constrained dynamic allocations, but the optimization is much harder.)
Some architectures (e.g. the 65816, the 6502's "big brother" used in the Apple IIgs) include stack pointer relative addressing modes, which will complete the memory read and return data faster than the same instruction with a full address encoded.
Many architectures have hardware support for stacks, which could be slightly faster than arbitrary load/stores. Only works in the function owning the stack frame of course, if you pass a pointer to a stack object somewhere else, it's back to being normal memory.
I'm pretty sure this is still the case. I'm not sure how cross-platform that assumption is (it won't work in Go where it has heapstacks I don't think), but classically yeah the stack is put is slightly faster memory with fewer access barriers.
I believe that’s inaccurate, at least on a modern CPU. The bookkeeping for the stack is faster, since ‘allocating’ and ‘deallocating’ is just subtracting from and adding to a register. And the area of the stack in active use at any given time is usually tiny (well under a kilobyte, even though the full stack is usually several megabytes), so it’s likely to stick around in L1 cache. And return addresses on the stack get special treatment by the branch predictor. But other than that, it’s treated the same as any other memory.
>"Access to S3 is zero-rated in many cases with S3, however the real advantage comes when you are using this cross-region."
Or cross-Country... as in two Countries that are geoblocked from one another...
>"This lets you have a worker in us-east-1 communicate with another worker in us-west-1 without having to incur the high bandwidth cost per gigabyte when using Managed NAT Gateway."
Simplified algorithm for bypassing geoblock via the above method:
1) Select a cloud storage provider (could be any cloud storage provider or shared persistent storage platform; doesn't necessarily need to be Amazon/S3) that works or does business in two countries, Country A and Country B, where normal IP traffic is geoblocked between Country A and Country B.
2) Use shared cloud storage objects/buckets/rows (call them whatever you will, "keyed persistent storage discrete thingies" for lack of a better term!) as the article suggests, to emulate IP traffic between user A in Country A and user B in blocked country B...
3) Combined with a P2P or other front end app that knows how to use this method of communication (along with code stubs, such that people could customize it to their own cloud storage provider or platform) if/when normal country-to-country IP communication is blocked for whatever reason (zombie apocalypse? <g>) could make a powerful future P2P communications tool, for lawful purposes...
I just had a flashback of creating a forwarding-buffer-queue on memcached, a complete abomination but it was able to drastically reduce the load on the endpoints that saved game progress in a casual game.
It updated or created an entry for each call. But if a buffer was already in the queue, it would write the most recent one into the existing slot. This had the effect of reducing the load by the multiple of the update rate. So if you had clients sending save game payloads every 30s and your queue depth is 2.5mins, then your write rate to disk is 1/5. I think.
But memcached wasn't Redis, this was pre-Redis, and memcached could have evicted any of those keys at any time. We gave it lots of space, it never GCd, we never needing to fix it. The game slide from above the fold and wasn't fun enough to be viable long term.
I'm not part of this subculture, but I honesty enjoy this type of idonsyncratic presentation which always was, and hopefully always will be part of hacker culture.
The content is not only relevant and hilarious, but I'd much rather personal blogs like this keep being updated instead of seeing the the sterile Medium or LinkedIn Pulse versions, usually written to impress future employers.
Zany visuals, doing things with your friends just because you can and writing up detailed accounts of it to share with everyone is the quintessential hacker culture. If Hackers' Cyberdelia was set in the 2020s, no doubt Mara would fit in.
Author here. Thank you for liking this. One of the most amusing things is that doing this zany visuals/detailed writeups/unabashed character in my writing actually makes me more impressive to future employers than any sterile LinkedIn posts ever will. I estimate that at this point if I wanted to get hired I could literally post a banger like this, add a banner to it halfway in that says "by the way, I'm looking for gainful employment, if you want someone doing DevRel that writes like this and gives talks like [link], please get in contact" and I could probably get a job in a week or two.
I want to keep hacker culture alive by not accepting the gentrification of it. Hacker culture is queer, neurodiverse, furry, weaboo, and more. As a philosopher of the arts, that is the kind of culture I want to create more of. Not platitudes on LinkedIn. I want to create the kind of culture that celebrates art for the sake of art.
Only if you make it. Relax. What someone is wearing or what innocent pictures they put alongside their presentation does not necessarily have to concern you at all. Just ignore it and focus on the actual subject matter.
I’m sorry to hear that you felt offended by the cartoon shark in the article about programming. I understand that you may have different preferences and expectations for the content you consume. However, I hope you can also respect my creative choices and freedom of expression. I am not obligated to cater to your personal tastes or opinions, especially when I am providing my content to the world for free. I appreciate your feedback, but I also ask you to be more considerate and respectful of other readers who may enjoy my work. Thank you.
I would ask you to perform self-reflection to ascertain as to why the cartoon shark discomforts you. You may want to sit with that discomfort to understand why you feel that way so you can hopefully move past that and towards a more peaceful future.
There are a number of logical fallacies being presented in your assertions, however I don't feel like I need to spend the time to pick them apart.
sneak wanted OP to be less openly queer in their personal blog for their own comfort, due to their own inability to imagine that the subculture OP is part of is more than their own narrow definition of it.
I guess the disrespect is clearer when one has grown up with its many forms, but assuming your request for more context comes out of a sincere desire to increase your understanding, I hope this helped.
> assuming your request for more context comes out of a sincere desire to increase your understanding, I hope this helped.
I appreciate the benefit of the doubt -- I was trying to approach it with the assumption that there's probably some perspective that I wasn't aware of.
> sneak wanted OP to be less openly queer in their personal blog for their own comfort
I can see how it may be interpreted that way. To me it read more like "hey, great content, presentation of it wasn't for me" in a way to convey feedback in the event the author wanted to better capture their attention. I didn't get the sense that there was an expectation that the author change their writing style to accommodate their preferences.
re-reading this, I don't think I appreciated that it was really in defense of the readers rather than the content
> I also ask you to be more considerate and respectful of other readers who may enjoy my work
I'm surmising that the real problem statement is:
> I'm sure there are non-sexual furries but the subculture is primarily a sexual one (same as with BDSM)
which I can appreciate as a generalization, I suppose I wouldn't have personally considered furries as under the queer umbrella, but I'm certainly no voice of authority on the matter. I suppose there's also the possibility that 'furries' was a dog whistle for 'queer', and being less charitable to sneak, I could better understand finding disrespect in the post.
So queer the fandom got booted out of the anime cons it spawned out of for being so. Unchained from the narrow views of the anime community, our power only grew.
> sneak wanted OP to be less openly queer in their personal blog for their own comfort, due to their own inability to imagine that the subculture OP is part of is more than their own narrow definition of it.
This is entirely false.
soft_dev_person, what I do want is for you to constrain your assertions about me to facts, and refrain from inaccurate or baseless speculation. It does neither you nor I any favors.
I understand that is what you want, but you cannot reserve yourself from being interpreted when posting on a public forum.
The language you chose to use and the details you chose to focus on is what triggered the response. If your point was that the cartoons was distracting, you could have made that clear in a less disrespectful way (or let the content speak for itself), or perhaps consider if it was worth contributing to the discussion.
Again, you double down on one definition of this particular subculture in another reply. It is reminiscent of how others have tried to define what certain subcultures related to sexual orientation (some now more mainstream) entail, while not being part of them, and I hope you understand how this can be problematic. I appreciate that this may not have been your intention.
i used Couchdb as "RPC" pseudo-HTTP request-response transport layer, essentialy two queues, both ways. Especialy useful for mobile devices when connectivity is a random thing. Hugely Simplifed both client and server - Let the db handle all the multiplication, connections, switch on and off, timeouts, retries, delays, repeats, etc.
Estimate includes the additional latency you will experience while downloading the post through the mechanism described in the post, which is the only true way to read it.
I've been trying to play with the constants over the years to make the read time estimate more "accurate", but it's a tough nut to crack in general. So I can go over my numbers more accurately, how long did it take you to read it?
fwiw, it took 5m to read. Nb: I was already familiar with a lot of the terms in the post (partly because I've already experimented relaying IP over Cloudflare's DurableObjects instead of S3), and skipped the dialectics.
This is one of those articles where the journey is far more interesting than the destination.
(Edit: this comment made more sense when it was replying to a different complaint about the article; the parent comment seems to have been edited in the interim.)
> In Linux, you can create a TUN/TAP device to let applications control how network or datagram links work. In essence, it lets you create a file descriptor that you can read packets from and write packets to. As long as you get the packets to their intended destination somehow and get any other packets that come back to the same file descriptor, the implementation isn't relevant. This is how OpenVPN, ZeroTier, FreeLAN, Tinc, Hamachi, WireGuard and Tailscale work: they read packets from the kernel, encrypt them, send them to the destination, decrypt incoming packets, and then write them back into the kernel.
So the premise of the problem is the need to egress traffic to the internet from a private VPC, without using an expensive NAT gateway. The solution is to do NAT manually by having an EC2 instance in a public VPC, and tunneling traffic from the private VPC through that instance.
The meat of the article is how to create that tunnel using S3 for data transfer instead of using a more traditional VPN service.
my dear child, the fun is not the destination, the fun is the slow decent into madness to get there. For other people: don't read the summary, read the whole thing. its a work of art.
Find someone who reads this and goes "i want to learn what all those words meant." The explain could be done amusingly too, i think.
dunno if the effort would be worth it; but I'm thinking of people like me who read th eBOFH stories and decided they needed to learn more, to understand them properly. "First hit free, kids!"
It’s unfortunate that you were offended by the cartoon shark in the programming article. You are entitled to your own preferences and expectations, but so am I. I have the right to express myself creatively and share my content with the world for free. I don’t have to please you or anyone else with my personal choices. Your feedback is noted, but please be more mindful and respectful of other readers who may like my work and the artists who were contracted to create the art in question. Thank you.
One path to find peace is to kill the part of yourself that cringes rather than try to smother out others who are "cringe". Otherwise, there are plenty of cookie-cutter medium.com blogs and LinkedIn posts if you want something more to your standards.
It’s not clear to me which bit you’re complaining about—I assume it’s either that the format is still used and/or that name describes an obsolete use. But the former seems perfectly reasonable if it still serves a useful purpose, and the latter largely irrelevant (these days a “tarball” is generally as an arbitrary word, like “zip file”); so I’m curious to hear your perspective on what the problems are.
Don't fear the furry. Embrace your inner 6ft anthropomorphic wolf thing, its not going to kill you. (well not unless you get heat stroke, furries die in hot convention centres people. )
Sadly, they fundamentally misunderstood the "everything is a file" paradigm of *nix. It's not that everything is an extent of octets, it's that everything has a directory entry so C programs can use the open call to create a file handle. It might be more appropriate to say "Everything looks like a file in the file system and most every operation on a thing represented by a directory entry goes through a filehandle."
We were previously working on a chat system called Envolve (https://www.envolve.com), that was 'Facebook Chat for any website'. A game that was using us for in-game chat created channels, used display: none on them, and passed game state through the chat.
We scratched our head, asked them why, and learned they wanted to focus on the frontend, not to deal with realtime message passing.
This led us to create a 'headless version' of our chat infra (re-written in Scala) that became the Firebase Realtime Database.