I'm the tech lead for GCE. I'm sorry to hear that App Engine didn't work out for the poster. Perhaps someone from that team may have some suggestions. In addition, I'm happy that he was able to stay with the Google Cloud Platform.
With that being said, I'd really like to encourage the OP to store his data base (and boot his instance) from persistent disk. Running any database on scratch disk (without replication) is probably not a good idea. Even with hourly backups (make sure you are testing restores!) you still stand to have up to an hour of data loss and the pain of doing the restore if your instance should fail.
In addition, when using PD for all block storage you can start with a smaller instance. If you need more horsepower you can terminate that instance and boot from a larger instance with a minimal amount of downtime.
Can you give your colleagues on the App Engine team the number of the Dell sales rep you buy your servers from, because a quick comparison of App Engine vs. Compute Engine prices shows that App Engine is at best 10x more expensive per unit of RAM:
I was hoping that eventually App Engine would be implemented on top of GCE, with all the services App Engine now provides available for GCE, and move to container‐based sandbox, like OpenShift/Heroku/Elastic Beanstalk, instead of application‐based sandbox. Especially now that App Engine has switched to instance‐based pricing.
Unfortunately, since all of this effort went into securing yet another runtime (PHP), I don’t think it will happen.
It would be great if at least it was possible to use Appscale over GCE or just the Datastore – which is by far the most attractive feature of App Engine, other than Task Queue – with custom deployment solution.
Google Cloud Datastore could deliver just that if you don’t mind paying twice:
You should be aware that Cloud Datastore has a serving component
that runs on Google App Engine, so there will be instance hour costs.
> Finally, App Engine has a hard 30 second limit on frontend requests. For the most part, this was fine. But certain requests started to take longer than 30 seconds, particularly when empires started getting larger
This is a very sensible limit. As your game is growing, more and more of these requests will pile up and will take resources away from quicker requests, making the game slower for everybody and crashing your architecture.
Adding limits like this is one of the very basic things you do when you need to scale. Changing platforms in order to get rid of limits like this will only mean that you'll have to re-add them at some point.
The correct fix for not running into such limits is to fix the code, not remove the limits.
If your requests need more than 30s, something is wrong. Most apps strive for <200ms responses. If you need a long time, rearchitect it. Run a background thread. Poll for completion or email the user when complete. Again, annoying, but your app will be better for it.
Saying that a legacy product sucks, but that you still need it to run is not an effective mindset. These platforms have constraints. Work within them and the app will scale and perform very well.
That is a very narrow-minded attitude. Not all apps have to scale and not all parts of an app have to scale. It may not make sense or be worthwhile to spend the effort to make a long running request shorter if it's used infrequently or by a small number of users or if the system as a whole does not require a high level of concurrency. Not every company is Google.
No, when utilizing a REST protocol it is not desirable to have long requests. The correct way to do something that could potentially take a long time would be to process it in the background queued and then have the client poll the server until complete or to push it to the client when complete.
Having an open connection for 30 seconds + is bad practice, and App Engine is a shared resource. Arbitrary limits are sometimes bad but in this case it will deter poorly designed APIs from starting in the first place.
Out of curiosity, can you describe a situation where 30+ second requests on a large-scale, high-traffic system is acceptable (in the sense that it is better to hold the connection waiting rather than erroring out)?
EDIT: long-polling aside, and even then, 30 seconds is a perfectly acceptable limit.
I have a game running on GAE (www.runesketch.com), our initial design was really ineffecient. Using Khan Academy's mini profiler (https://github.com/kamens/gae_mini_profiler) we slashed our gets, puts and general overheads by two orders of magnitude. Our main cost, running a live game, we managed to completely remove gets and puts and run in memcache. We hardly scratch our daily usage now.
One of the overheads I did not realise that took a long with was using channels. We found them about 3x slower than a get or put. By parrallelising those with another thread we reduced our latency hugely. Our game went from taking 8 seconds to service a move, to 300ms.
Just my experience, did you work hard with a profiler before moving all the code?
This is basically a fundamental observation that applies to nearly all service-oriented-computing: unless you've collected the data of how your system operates on a platform, you have no basis to complain about the platform, except for subjective issues. And subjective issues are boring.
> Our main cost, running a live game, we managed to completely remove gets and puts and run in memcache. We hardly scratch our daily usage now.
Using memcache for store gameplay data is not the best idea. Memcache could be evicted at any time without any previous warning. Using a backend instance and storing the data in the ram is an order of magnitude faster, and thanks to the runtime environment api the service could stop gracefully.
I agree losing all games in progress is a pain, but our card game is meant to have a 5 min order of time investment, so its not a huge biggie that users occasionally have their games wiped. Annoying 2% of our users is sufferable at the moment
I totally bought into GAE when it was first released - evangelized it, wrote and adapted a bunch of libraries for it and ended up implementing a dozen or so projects on it.
The price hike completely killed me. I optimized what I could but there wasn't much to gain. Because of the way GAE is architected it forces you into efficient design.
I switched to AWS, and now with a lot more traffic across all the apps I am still paying ~20% of what my GAE bill was. I invested a lot of time into the platform, only some of which can be applied to other platforms, and fell for the bait and switch.
I really don't like what Larry has done with squeezing profit out of each business unit - the pricing just doesn't make sense for eg. implementing a simple spam filter for blog and forum comments using the Prediction API cost $30 per month alone for a site with thousands of visitors.
Completely killed a product that could be the center of making Google the best cloud platform for developers.
Hm. $230 a month can pay for a lot of dedicated servers. I am sure if you only need one instance for all of this it is far cheaper to just get a dedicated server at a hosting provider with decent bandwidth and reliability and run it from there. Should come in at around 50$ max. With that you could even do a distributed setup with two or three boxen at different geos.
I would be interested to know if he started using the traffic splitting functionality. That feature is a sure way to multiply the number of instance hours an app engine app is using. It doesn't seem right that the cost would just increase on its own the way the author described.
Can someone comment on data processing and statistics on GAE? Any good experiences? Any bad ones? I'm not an expert on GAE by any means, but I ported my app away from GAE to Django + MySQL because the app was data and statistics oriented (fantasy football) and I found myself wasting a lot of time on data processing, data cleaning, import, and export. Switching to a SQL database in this case felt like getting out of handcuffs (not that I know how that feels).
The GAE data store is really nice for persisting objects that come from users filling in HTML forms, or equivalent. One user can only type in data so fast and that kind of volume is fine on GAE. Plus you get great read scalability.
However, if you're creating data in some sort of simulation or loading it in from elsewhere, use a relational database (you can connect to external servers from your GAE app). Loading a few million rows into the GAE data store would cost you a fortune and the lack of joins or proper SQL probably means you have to get them all back out into memory again before you can do anything useful.
QueryTree uses the GAE data store as a central hub for user account info and settings, then shards lots of MySQL instances for actual data work, which seems to work well.
An SQL query can do just about anything, it'll do it right next to the data, and a lot of time and effort has gone into making that fast. For an app with data up to a certain size and usage, I'd totally go with SQL for all of the reasons you mentioned.
What worries me with SQL is that queries can do just about anything, and it'll do it right next to the data. That means that you want your database machine(s) to be hulking monsters, and sharding / replication gets complicated.
In my personal experience, the App Engine datastore exposes fewer and simpler operations which scale horizontally more or less perfectly. It's harder to write for initially, but it scales up incredibly smoothly.
I host a game on App Engine as well. I have about 3k DAU for which I pay about $3 a day. I can _almost_ pay for this in ads alone. Then the premium players I convert is profit!
The secret of affordable app engine is the pending latency slider. The default is CRAZY FAST. I have it set at min 5 seconds and max automatic. This is about 10x price reduction over the default settings. This prevents new instances spinning up until a request has not been served for 5 seconds.
The client replicates all the game rules so as far as the player are concerned there is no latency at all.
The game is called Neptune's Pride 2: Triton if any of you would like to have a look.
I'd stay away from GAE, personally. The lockin is not worth the benefits, though you will certainly find lots of happy GAE customers.
The big thing is that GAE is only suitable for certain types of apps, so make sure your's is one of those before even starting.
The one small, hobby app I wrote for GAE was unsuited and it was a painful process to port away. My company, with a real GAE app, has a cost threshold where we know it's cost effective to move from GAE; write heavy apps are expensive in the GAE environment. Each day we inch closer to that threshold.
At this point the headaches of using GAE are more than the headaches of not using GAE, but I won't miss it when we pull the trigger on moving away.
For getting around the limitations of the GAE task queue, you may want to consider using PiCloud (http://www.picloud.com) to handle background processing. We have a number of users who have successfully used GAE for their front-end and PiCloud for handling background jobs.
> Similarly with backups: on App Engine, data integrity is gauranteed so backups are really not required.
How would you have dealt with a data corrupting code bug? If you don't have real backups, wouldn't those screw you?
> I ended up coming up with a simple cron task that takes a full dump of the database every hour and copies it to a "permanent" storage device. In the event of catastrophic failure, I should be able to spin up a new instance, copy my image to it and get the last hour's backup copied over in fairly short order. The main issue is the fact that I sleep for ~8 hours a day and work for ~8 hours a day, meaning it could be some time between when a problem occurs and I'm aware of it.
If you are just keeping the latest backup, you are still vulnerable to data corrupting code bugs. Generally for this kind of application you'll want to keep several backups, from several different times. Say, hourly backups going back 24 hours, then daily backups going back a week, and so on. The details depend on what kind of data you are storing and how important it is to your users that you can recover from data corruption.
I've been using WF for a long time for a small site. Their resource use limits can be a little stringent at times though.
If you keep running into the limits, I heartily suggest looking at switching away from their default stacks (especially when they use Apache -- uggghh) to uWSGI, for instance, if you're running a Django app. (`--http-socket :99999` or whatever the public port they assign your app will get you going easily).
EDIT: Comparing WF to GAE is slightly apples-to-oranges, though, as WF gives you a slice of a dedicated server -- GAE just runs your app without letting you touch the infrastructure, much like Heroku.