While the application is still very profitable, the cost is ~10x its implementation on traditional servers. I'm not going to argue the pro's and con's, just providing some numbers. At this point, we're working to optimize the original stack in order to reduce costs.
I will say though...honestly, working with Serverless and the AWS stack is a very pleasant experience.
Can you talk about the cost of your development time (one of the premises of serverless is increased developer productivity)?
Again, thanks for the real world numbers--always nice to see them.
Development time was roughly the same: frontend work was identical to a standard enterprise-grade SPA app and backend is all Lambda functions, which would've been implemented similarly on, for example, NodeJS and Express (which is what as done anyway for testing).
The primary cost (~60%) is due to the high volume of API Gateway requests. (Edit: see below comment for reason and plans to optimize)
A HUGE time saving comes from being able to orchestrate AWS infrastructure easily via API and is well worth the premium in most cases. Our frontend application performs many automated management tasks of AWS resources such as S3, CloudFront, Lambda, Route53, etc... This of course does not directly relate to "Serverless," as it may be done by any application able to make API calls to AWS, however just the fact that it's possible for these resources is very appealing to a multi-hat developer with limited time on a side project.
The GraphQL API is backed by DynamoDB and is served via API Gateway. The frontend was built using ReactJS and Relay. I am currently seeing anywhere from 200-450ms per GraphQL request on the frontend, which is more than acceptable as it's just a management frontend application...and of course it may be optimized using techniques such as prerendering, caching, etc...but these applications being super optimized is not a requirement.
I can confirm a couple things. It seems like we spend a bit more time getting things to work as expected.
Troubleshooting can get expensive given the system disappears.
Telemetry on the lambdas needs to improve. Finding what’s eating time gets tough when you are looking to optimize. (A lot baked in to this statement but getting 800ms of compute to 400ms is sometimes important whilst staying on lambda)
We're currently working on migrating the event collectors to a pure CloudFront based solution (inspired by SnowPlow) where the events will be submitted to CloudFront via signed GET requests and the CF logs will be streamed to the same Lambda-based processing pipeline. Doing this will eliminate almost all of the overhead of API Gateway (still required for certain events that cannot be collected via HTTP GET).