> - The demands of a consumer Web site/API or multitenant enterprise application simply exceed the computing capacity of any one machine.
> - An enterprise moves an existing application, such as a three-tier system, onto a cloud service provider in order to save on hardware/data-center costs.
When you've exhausted the capacity of a single machine, typically you don't jump straight to a distributed system. You can scale out horizontally with a stateless application layer, as long as the data storage on the backend can handle all the load. You can also scale database reads horizontally, using read replicas.
This horizontal scale-out is not a distributed system, since consensus ("source of truth") still lives in one machine.
So I think a better phrasing of #1 would be "When your write patterns exceed the computing capacity of any one machine".
But just because your storage layer is abstracted behind an API doesn't mean eventual consistency isn't your concern.
Take S3 for example.
LIST calls lag deletes but exists() doesn't, so exists() could return 404 for a document that's still included in the result of LIST. Plus, all exists() calls go through a caching load-balancer, so you could receive different answers in two subsequent exists() calls.
That's just a small subset. There's a good comment here on S3's consistency semantics.
Point is, 12FA makes things easier but any time you have more than one computer involved (i.e, always), consistency becomes complicated.
Read-replicas, though, don't have much to do with that; a read-replica is essentially a point-in-time snapshot of the DB, that's going to get replaced with an updated snapshot every few milliseconds as WAL-log events stream in. That isn't "eventually consistent"; it's just "inconsistent." You're always looking at the past when you're talking to a read-replica.
If you need to update values using a client-side calculation, you'll need to do all that in a single transaction to get ACID guarantees. Since a transaction can only be "against" one node, and since the transaction will contain a write, that transaction (including both its reads and its writes) is forced to happen against a master node. A read-replica just isn't involved in the writing "story."
If you use synchronous replication, you now require at least two systems to commit a write, so you've increased the chance that your commit will fail. Maybe that's the right choice—but it's a tradeoff and the decision ultimately rests in your business requirements.
My point is simply that there is no magic bullet in ensuring that all relevant systems have the same answer to the same question at the same time.
Modern computers are not only CPU bound, they are also interrupt bound, and sadly very limited in "available reliable timestamp" capacity. And controlling which clock you use is hellish.
And if the availability of a monotonic raw clock was not enough, I feel bad that in 2017 we still rely on 8254.
Documentation for having a reliable clock on every OSes is close to 0.
Without reliable clocks, without reliable synchronisation, we are anyway unable to provide "good" distributed systems at low costs (yes GPS is a solution that can be easily hijacked).
x86 has won because of legacy supports, but the legacy support means we are stuck with outdated components
I'm sure some apps need to be distributed systems, but I bet it's a tiny minority.
If anything, considering the fat client trend we're in, I'd go the other direction and claim a single web server backend with multiple clients (browsers) is a distributed system.
I think this is a key idea, though my experience with it only comes from writing game servers. Things are much simpler when there is a single "source of truth" per user. In my system, there is an "active" state maintained in an "instance" where the user is rapidly updated in memory, and a "persisted" state stored to the database when a user joins an instance. This keeps the frequency of database writes down. (And enables many users to be able to interact with each other in realtime.)
The tl;dr is that you analyze successful system outcomes and inject failures along those paths to see if they subsequently fail. If they do, you've found a bug. It's the next generation of Chaos Monkey.
Although I've only worked at Amazon for about a year, I've learned that you should always consider building siloed/regionalized applications—if not, expect major headaches when the service needs to be deployed in multiple environments.
If you're interested in joining a professional society for computer scientists, software engineers, and electrical engineers that doesn't resort to dark patterns, I'd recommend checking out IEEE. I don't get spam from them and they respected my decision to cancel my membership. I generally find their digital library to be of higher quality as well.
My two cents. YMMV.
Yes .. they do send a lot of material I don't care for. But if you are a student, it is definitely cheap and worthwhile to join. As a professional, you have to make your own call .. I pay for it because I feel the money goes to conferences, student subsidies, etc. (not really charity but I feel like I am paying it forward). You get discounts at conferences but my employer pays for those anyways.
I did find it difficult to unregister from ACM SIGs (Special Interest Groups), had to press on through several screens to get to one that let me unselect them. That was obnoxious, but only resulted in my being a member SIGGRAPH one year longer than I desired and being out $20 or so, interesting reads but wasn't an active pursuit or career-wise applicable topic anymore.
My personal ideal system design is one that you can pick up and drop into a single machine, a LAN, a cloud network, geographically dispersed colocated datacenters, etc without relying on 3rd party service providers. If you go from a start-up to a billion dollar company, you will eventually have offices with their own labs, dev, qa, middleware and ops teams, datacenters and production facilities, and your hardware and software service providers will run the gamut. If you can abstract the individual components of your system so that dependencies can be replaced live without any changes to any other part of the system, you have the start of a decent design.
However, nobody I've ever worked for designed their system this way initially, and they made millions to billions of dollars, so there certainly is no requirement that you have a perfect distributed system design for your emoji app start-up.
What makes him a tech "bro", exactly?