The architecture is in general 'fine'. But communication paths of subsystems is probably the easiest part of the problem. And in general, re-organizing the architecture of a system is usually possible - if and only if - the underlying data model is sane.
The more important questions are;
- What is the convention for addressing assets and entities? Is it consistent and useful for informing both security or data routing?
- What is the security policy for any specific entity in your system? How can it be modified? How long does it take to propagate that change? How centralized is the authentication?
- How can information created from failed events be properly garbage collected?
- How can you independently audit consistency between all independent subsystems?
- If a piece of "data" is found, how complex is it to find the origin of this data?
- What is the policy/system for enforcing subsystems have a very narrow capability to mutate information?
If you get these questions answered correctly (amongst others not on the tip of my tongue), you can grow your architecture from a monolith to anything you want.
I can answer the above for systems I've built, but I've spent quite a bit of time with those systems. How do I get better at doing this during the planning phases, or even better, for a system I'm unfamiliar with (ie. are there tools you lean on here)?
But in general the cliche of "Great artists steal" applies here. If AWS/GCE/Azure (or any other major software vendor) is offering a service or a feature, then it is almost certainly solving a problem somebody has. If you don't understand what problem is being solved, then you cannot possibly account for that problem in your design. Today, the manuals for these software features are documented in unprecedented accuracy. Read them, and try to reverse engineer in your head how you would build them.
For example; AWS' IAM roles seems like a problem which could be solved by far more trivial solutions. Just put permissions in a DB and query it when a user wants to do something. Why do we need URN's for users, resources, services, operations, etc? And why do those URN's need to map to URI's? Well, if you look at the problem - it ends up being a big graph which is in the general case immutable over namedspaced assets. So reverse engineer that, how would you build that?
I agree with you about reverse engineering the giants, it is one way of acquiring knowledge.
However I disagree on :
> If AWS/GCE/Azure (or any other major software vendor) is offering a service or a feature, then it is almost certainly solving a problem somebody has.
AWS/GCE/Azure have industrialized the process of proposing new building blocks. The cost for them to propose and maintain a new service is lower than a few years ago. They are logically able to experiment more with users, and eventually shut down services with no actual needs (or overlapping with another service they propose). Especially true for Google.
I have the intuition it also works as a marketing process : more you spend your time reading their documentation, more you accept their brand, more you are statistically going to buy something from them.
I suspect you have a mis-adjusted notion of "usually". "Usually", as in, for the majority of systems designed and in-use in the world, a well tuned, reliable RDBMS will be able to do this absolutely fine. The scale of systems that the world needs vs the quantity of them is an extremely long tailed curve.