All engineering advice comes with the implicit caveat "subject to your local concerns".
Yes, of course you may be working in a startup that requires 25GPUs to serve even one customer. No sarcasm. I can imagine some startups that might meet that requirement.
But there are an awful lot of startups that start massively overengineering their footprint early when all they need is a web server with a cold or hot spare (use a load balancer with automatic failover if you've got one in your cloud or something, if not I wouldn't stress, and automatic failover on prototype-level application code can cause its own issues) and a database server with a good backup story and some mechanism for quickly bringing up a new one if necessary. (This generally leads you to some sort of clustering thing or a hot replicated spare because it doesn't take long before your database requires hours to rebuild from a backup or something.)
You're often better served just giving occasional thought to how you might split things up in the future and using that to at least hint your design than actually splitting things up immediately.