Also the justification for the Actor model is flimsy compared to the other concepts.
I'd love to see an actual diagram of the architecture that was built, and maybe also why sharded + replicated postgres (+/- kafka maybe) wasn't enough to solve the data needs of this application.
Assuming they farm out the actual processing of payments (talking to credit card companies) to stripe/paypal/braintree or whatever, all this "payment system" is doing is taking API requests to make payments, calling out to whatever processor they're using, and saving the results, and probably maintaining the state machines for every transaction (making sure that invalid state transitions don't happen).... It sounds like they broke up a monolith but it's completely unclear how they made it better.
I found this piece interesting, it introduces some basic concepts in a structured and easy to understand manner. Not very many writers take the time to develop such articles, they are rare thus useful as reference to point coworkers and for a quick recap.
The details of the precise technologies used will be irrelevant in 3 years, the fundamental concepts will not. It's not a recipe, it's a conceptual primer to build an architecture that scales.
> Why did the actor model matter when building a large payments system? We were building a system with many engineers, a lot of them haivng distributed experience. We decided to follow a standard distributed model over coming up with distributed concepts ourselves, potentially reinventing the wheel.
Literally, the word "distributed" is just used repeatedly to get the point across. While I might have a bit of an axe to grind with people that just throw the actor model (or any pardigm for that matter, functional programming doesn't solve every problem, sometimes you should maybe just mutate some state and call it a day) around as a way to fix things, I am well aware that it is indeed a useful paradigm, this was a bad way to explain your choice of using it. If he built the kind of system I think he did, I'm not sure that the actor model even conveys much of a benefit, and was looking to that explanation to drop some wisdom on me. It did not.
My problem is that this article is indeed too basic. The problem is that he goes to great lengths to desribe why the architecture he built has the methodologies/approaches he's laid out, but then forgets to lay out the actual architecture -- this post is all ivory tower and none of the rubber-hits-the-road that you'd expect from a good technical piece. Things fall apart when the rubber hits the road, and sharing how they fell apart enriches people who read the article.
Most intermediate engineers working with distributed systems (in good engineering orgs) have already realized/run into these concepts, for those who may not have heard of these concepts, the article is very helpful, but for those who already know them, it's basically a rehashing of level 1.
For example, take any single chapter of the Google SRE handbook (e.g. https://landing.google.com/sre/book/chapters/service-level-o...), his post reads like the first half of a chapter, not the complete thing. Of course, the SRE handbook is a high bar to set for quality, but at least a whiff of this production-grade payments system he built would have been nice.
even the rudimentary experience/skills with the former gets you hot $400K+ jobs while even the advanced experience/skills with the latter gets you only $200K jobs.
> Why did data durability matter when building a payments system? For many parts of the system, no data could be lost, given this being something critical, like payments.
I wish the author had shared more first-hand experience, especially those little anecdotes you can never find in textbooks...
- Efficient (and intelligent) testing of distributed systems (check out his clojure libraries that he uses)
- Effective clojure
- Introduction (and re-introduction, which is IMO one of the best ways to learn) of distributed system guarantees (linearizability, serializability, and lesser guarantees), in the context of database system guarantees
- how to write good information dense blog posts
- how to deftly (and openly) navigate the ethics of paid testing of a product by a corporate entity
- how to critically think about distributed database hype trains
Also, plug for RethinkDB, the document store database that got it just about right the first time he tested (https://aphyr.com/posts/329-jepsen-rethinkdb-2-1-5 and re-configured https://aphyr.com/posts/330-jepsen-rethinkdb-2-2-3-reconfigu...)
Before anyone dismisses one of his tested databases, I'd however like caution against trusting his work fully at this point. Some of the documents are 5+ years old, thats a lot of time for the database developers to resolve issues.
thankfully, the tests are all replicable, so if you're evaluating cassandra for example, please repeat his tests before dismissing the database.
so I opened link
> What Happened? The problem went something like: Foursquare uses MongoDB to store user data on two EC2 nodes, each of which has 66GB of RAM
There's your problem. Using Mongo sharding in production in 2010.
Main pro-tip: think very hard on how you're sharding your data, based on it and how you're sharding (hash based? range based?) and how do you intend to rebalance it or be more granular should the need arise.
There are more solutions other than M/R now though (spark, presto etc) but they require high performance servers (presto suggested RAM amount is 128GB).
The author writes how they didn't think a mainframe could handle the load in the future. But even super old mainframes still handle their load 40+ years on. They just had better design parameters to make sure they were used appropriately, and they knew how to keep their requirements compact. You may only have 8 alphanumeric characters to store a customer's record name in, but you're never going to have more than 2.8 trillion customers. That hard limit also means you how large the data will get, which makes capacity planning easier. (Even in the cloud you must do capacity planning, if for nothing else, cost projections)
if you would write to all nodes your performance would be terrible.
I also do not think that they needed more than one master in their payment system. I mean it's highly unlikely that a big machine couldn't handle all their transactions.
if it would still be a problem I would still use a old school rds and shard the hell out of it.
however keep in mind that uber did a lot of wierd things in their past, from an engineering perspective.
Re: scaling Cassandra, I haven't experienced it personally, but I've heard horror stories regarding replication and load balancing when nodes start to go down. With other systems, you see things like uneven loading, running out of storage or compute, limits on iops or network bandwidth, and literally the probability of overall failures increasing because math. It's also just harder in general to manage 250 nodes rather than 50. Running commands takes longer, you need bigger infrastructure, you start running into weird limits like what your original subnet sizes were set to, etc. There are a myriad number of variables that just become more difficult the bigger things get.
Now, compare this to three giant-ass nodes. You start getting close to capacity, and maybe you add a fourth, and that lasts you another year. So. Much. Freaking. Easier. AND more reliable. There's no contest - if you can scale mostly-vertically, do it. (Unless you're Google or Facebook and have a few billion dollars to spend on engineering, in which case this actually _is_ scaling mostly-vertically, it's just datacenters instead of nodes)
Access services (e.g. browsing and searching in a product catalog) scale horizontally without any limitations. For this type of applications, 4 huge servers are more expensive, less reliable, and not significantly easier.
On the other hand services around scarce resources (e.g. a last-minute travel booking application) are indeed very hard to scale horizontally, so in such cases the parent's advice is sound.
> Distributed systems often have to store a lot more data, than a single node can do so.
Microsoft's full product name is "Azure Service Fabric" which properly disambiguates it rather than staking a claim to a common network-related single English word as you claim the Python fabric library does.
Workflow engines neatly solve a lot of the challenges you mention in the article essentially for free:
* Model your problem as a directed graph of (small) dependent operations. Adding new steps does not require adding new queues, just new code.
* let the workflow engine manage the state of your transactions (the important, strongly consistent part) and just focus on implementing the logic. Scale horizontally by adding more worker nodes.
* get free error handling tools, such as automatic re-tries on failed steps, support for special error handling steps, and metrics you can alarm on (e.g. number of open workflows, number of errored workflows, etc.)
* get a free dashboard where you can look up the state of a given transaction, look at error messages, (mass) re-drive failed transactions (e.g. after an outage or after having fixed a bug), etc.
Design up front for reuse is, in essence, premature optimization. - @AnimalMuppet
Pike's 2nd Rule: Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest. - Rob Pike, Notes on C Programming (1989)
... quotes from http://github.com/globalcitizen/taoup
Consider a Redis-based cache: For future proofing you could introduce a sharding scheme that deterministically allocates keys based on cluster size. That would be operationally pretty complex. Not to mention also introduce complexity in edge case scenarios (e.g. your cluster size is transitioning).
Alternatively, you could do a little back of the napkin math to figure out what the size of your keyspace is going to realistically be and just...allocate a machine that fits your needs up front.
Obviously we could make arguments about budget and all that, but those are optimizations that are better to defer.
This is implemented in Twemproxy and resizing the cluster is a known edge case for which Twitter developed Nighthawk (centralized shard placement).
OTOH, my team owns ~10 micro-services, and some of their build times, due to integration-tests, are already painfully slow. I'd hate to have to run all of them every time we build.
In fact a distributed system can have some advantages over a monolith in an application like that if done properly, for instance you could set up a central message log that keeps data for an x period allowing you to 'rollback and replay' to your hearts content.
The new payment system unified all of these into pure money movement where any entity can move money to any other entity. This enables all current and future products to have flexibility in payment models. (Think about how payments would be modeled for Uber Freight or with the new Uber Transit partnership )
Source: I work on the Uber Payment System
What we have built at Uber is a realtime system that is reliable, provably correct with high availability.
As I've said in another comment, for many other systems, if it fails, people can just try again later. If we fail, someone's mother, grandfather, son or daughter are stuck on the side of the road in the middle of the night somewhere. Our needs (or at least the standard that we put on ourselves) is higher than many other payment systems.
I'm relaly impressed with the diversity of their offering as well:
I'm currently in Japan and it's actually relatively common to pay for things at the convenience store here, I'm amazed they support it.
"I joined Uber two years ago as a mobile software engineer with some backend experience. I ended up building the payments functionality in the app - and rewriting the app on the way".
Following the "rewriting the app" link in that paragraph takes you to another page "Engineering the Architecture Behind Uber’s New Rider App" - although that's co-written by two people, neither of which are the author of this article it would seem.
Modern computer systems can scale to 500 or more processor cores. Each core runs billions of instructions per second.
A system for a billion accounts on the scale of Uber probably has a million active users at a time, probably a quarter that involving payment.
Is Uber saying that they can't support 250k payment transactions per second on the largest system today? That's maybe 1000 transactions per second per core on the largest systems, or about 1ms per transaction. Why is that impossible for them?
Or, put it another way, why can't one transaction be completed in less than 1 million CPU instructions?
And that's for the very largest company like Uber.. can't even imagine a typical startup needing to scale horizontally for payment processing.
One box can’t be distributed across multiple racks in the data center to guard against downtime if a switch crashes. Never mind that—one box can’t be deployed across multiple data centers. If you deploy to multiple DCs you can fail over if one DC starts having issues.
Then there’s deploys. Do you canary your deploys? Deploy the next release to a subset of production nodes, watch for regressions and let it ramp up from there? Okay, I’ll give you that one, it could be done on one big box.
In any case, payments aren’t CPU intensive but it’s a prime case of hurry-up-and-wait. Lots of network IO, so while you won’t saturate the CPU with millions of transactions on the same box, I could easily imagine saturating a NIC. Deploying to shared infrastructure? Better hope none of your neighbors need that bandwidth too.
One transaction likely involves checking account and payment method status, writing audit logs, checking in with anti-fraud systems and a number of other business requirements.
(I lead a payments team, not at Uber but another major tech company)
Wouldn't you just have multiple NICs on one box for redundancy there? With any backups being sent a database write-log for replication?
> n any case, payments aren’t CPU intensive but it’s a prime case of hurry-up-and-wait. Lots of network IO, so while you won’t saturate the CPU with millions of transactions on the same box, I could easily imagine saturating a NIC.
If you're vertically scaling, wouldn't you just have the main database server host the database files locally, using fast NVMe SSDs (or Optame), in the box itself, instead of going over the network?
Enterprise NVMe drives can perform 500,000-2,000,000 IOPs, with about 60us latency. And Optane is about 4x faster. Why would a database server need to saturate network bandwidth?
Anyways, I'd love to see the actual SQL query for one of their transactions...
What happens when the FBI raids the DC to confiscate the servers of another person, and also takes yours? https://blog.pinboard.in/2011/06/faq_about_the_recent_fbi_ra...
20 years ago we had 1000+ days uptime on DEC kit. No one was even impressed by 500 days. Nowadays people build all sorts of elaborate contraptions to do what used to be entirely ordinary
Uber? The upper bound of transactions is restricted to the total number of cab rides a day. That number isnt in the billions is it?
Transactions in this system are almost certainly network bound. The relevant CPU overhead is likely trivial—-likely comparison and not arithmetic in nature even. In that context “add more NICs” is effectively an exercise in horizontal scaling. On top of that any network operation has consistency concerns to contend with.
You could contrive a system that is block device I/O bound but it’s likely to have significant network overhead as most block devices are network attached these days anyway!