Hacker News new | past | comments | ask | show | jobs | submit login
App Engine charges $6,500 to update a ListProperty on 14.1 million entities (groups.google.com)
221 points by branola on Jan 5, 2012 | hide | past | web | favorite | 141 comments



> it cost me a few thousand dollars to delete my millions enities from the datastore after a migration job (ikai never replied my post though...) and im still paying since the deletion is not completed yet (spending 100-300$ a day for the past 2 weeks now!!).

I don't know much about GAE, but a datastore-as-a-service that takes 2 weeks to delete your data and charges $300 a day to do so just seems... absurd.


In my opinion Google App Engine is a non-starter for serious applications. I only know of experiments and toys running there. Look at their gallery of successful applications:

http://code.google.com/intl/it-IT/appengine/casestudies.html

Contrast with AWS:

http://aws.amazon.com/solutions/case-studies/


There are definitely serious applications and businesses built on appengine, mostly started before their price hike coming out of Beta last November. http://www.optimizely.com (YC W10) and http://shoesofprey.com are two notable examples.


We built http://www.schoolbookings.net on Appengine. Over a thousand schools and half a million parents use it - not enormous numbers, but a very real and profitable business.

We pay under a hundred dollars a month for extremely reliable and scalable infrastructure. I'm happy that our platform is designed and supported by the best sysadmins in the world - we've had less than five minutes of degraded service (high latency but still accessible) in the last six months. I'm soooooooooo glad we didn't choose AWS!

Appengine is perfect for developer-centric startups. You can concentrate on your app instead of the infrastructure until you've got traction, and if you do hit the big time you'll be popping champagne corks instead of server cores. At that stage you'll be able to make an informed decision about hiring some kick-ass sysadmins to build your own platform, and migration really isn't as big a deal as some people make out. Python apps are WSGI compliant so will run with minimal modification on any WSGI server, and you can access all your data without difficulty. You would need to write some wrappers around API calls, but this isn't a huge deal and realistically you're going to have to do this for any platform migration.


We (Optimizely) do use App Engine and have been very happy with it. We handle hundreds of millions of requests every day and make heavy use of both GAE and custom components running on EC2. EC2 does most of the heavy lifting, but our web front end is all on App Engine and the benefits (great deployment and mgmt tools) far outweigh the cost.


We're built on appengine at Getaround.


Rescuetime (YC W08) also uses it I believe


I wouldn't call those two "serious" examples -- especially since the competitors (Amazon, Heroku, Rackspace etc) have orders of magnitude more popular, complex and profitable services running on them.

This is like saying, "Famous actors frequent my restaurant" and point to a picture of Ralph Macchio. OK, somewhat known, but De Niro, Brad Pitt, Clooney and co eat at the joint across the street.


Actually, I think its great for startups. If you're a small team trying to work out what your product is, how to sell it and who is going to buy it, you have more important things to do than learn how to set up and run a server.

Once you've worked out you have a product that people want you can start thinking about managing your own servers.

People complain about migration, but its really not that big a deal.


So instead of learning how to admin a server - a skill that is transferable anywhere and is pretty much a commodity skill for most web-based startups when they're - you know, "starting up" - you suggest people tie themselves to a proprietary app engine and datastore which limits what you can do in ways that aren't really transportable anywhere else? Spending time learning about GAE oddities, pricing limits, language limits/restrictions and such... seems a pretty horrible waste of time for people to engage in.


I've been developing on appengine for over 2 years now. Thanks to the platform, I've never been worried about traffic spikes. Whether you get profiled on TC or a celebrity tweets about your app, you know it will run. Given, you would not learn admin skills, but you'll learn a lot of other things.

I learnt how denormalized data in the datastore can help speed things up for your app. I learnt sharding thanks to app engine. The pricing has made me make use of memcache more often and trying to avoid hitting the datastore. When I look at it, I feel I've brought in more discipline in my code because of app engine. I am now in the habit of building APIs that would run instantly. If a http call takes too long, my instinct is to make it run as a background task and then return the result.

I've learnt all of these because I've been on App Engine. I can see a significant improvement in what I'm building now. The apps we build now, all our users say it's "fast and responsive". A lot of that credit goes to App Engine and the things it has taught me.

Nothing you learn in life is really a waste of time.


Great that you've learned things, but nothing you've mentioned there is specific to GAE. Many people learned abour sharding and such well before GAE even existed (indeed, perhaps even before Google existed).

This is akin to saying "Rails made me a better developer", which in and of itself is not bad or wrong, but isn't really terribly useful as a datapoint when deciding which framework to choose. Frameworks by their very nature (almost always) force concepts on you that improve your code. So too with GAE - by limiting some of what you can do, they can provide a more focused service.


That's because he/she was replying to the post above that said you can't learn any transferable skills from GAE. They were just mentioning some more abstract/general things they learned from GAE.


I've been building apps professionally on GAE for the last year, and one of the reasons I'm leaving my job is because the knowledge is non-transferrable and the platform is not really growing, as far as I can tell. It's a dead-end career wise.


Learning how to admin a server is a total waste of time which brings zero value to the end user. If I can hand that drudgery off to someone else, I'll be adding features and stealing your customers while you're giddy about colorizing your bash prompt.


Setting up a load balanced system and front-end web cache brings "zero value" to end users? You may as well make the same argument over design or for that matter development.

Learning how to program a loop is a total waste of time which brings zero value to the end user. If I can hand that drudgery off to someone else, I'll be adding features and stealing your customers while you're giddy about nesting your loops.

Learning how to use photoshop is a total waste of time which brings zero value to the end user. If I can hand that drudgery off to someone else, I'll be adding features and stealing your customers while you're giddy about reducing your PNGs.


And both those statements are true. If there's a third party that you can pay to do your photoshopping, and they deliver high-quality product, damn right I'm going to hand my photoshop jobs off to them rather than do it myself. Likewise if there were a third party development shop _that delivered higher quality than doing it in house_. Concentrate on your USP; for anything else, if you can get it done competently externally, hand it off.


Learning how to admin a server is a simple enough task for a small-ish company.

We set up on Rackspace Cloud in a couple of hours, and we're running just fine. We're lucky in that our core market doesn't give us traffic spikes.

If and when we run into scaling problems, we'll have a learning experience but that's the same whether you're on App Engine, Amazon or just a cheap web-host running PHP/cPanel/MySQL for £20/year.


Yeah, good luck stealing customers from someone else by developing for a third party's arbitrary stack, that they manage and update as they see fit, lacks tons of critical components, often sees unreported downtimes, and isn't flexible when your business needs change.


As someone who's been building major production applications on GAE for the last three years (and on traditional stacks for the decade prior), it's obvious that you haven't even cracked the manual.

GAE is not perfect but it's a much faster environment to develop on than IaaS services like EC2. I spend 100% of my time developing features and 0% of my time developing infrastructure. "DevOps" isn't even a job description in my present company.

It is true that there are occasionally some components (eg large memory indexes) that need to be "outsourced" to other parts of the cloud. It's trivial to do, but we judge each very carefully because of the added operational load.


GAE skills are also be transferable anywhere. It's by far the quickest way to setup little reliable API's like the stripe liason service I built to send email alerts about transactions with secure links to the card numbers. There's only a few transactions a week, so App Engine is free.

It's also a great queuing service for running background tasks on an infrastructure that needs zero maintenance.

The cloud is the future and GAE greatly increases your power as a developer, so I'd say learning App Engine will make you a commodity rather than someone with unwanted skills.



Instead of learning to admin a server, and then follow up with admin for the rest of your life, you can hand that duty over to experts. Should hundreds of thousands of people be engaged in doing essentially the same thing.

If you really thing appengine is shitty or overpriced, then it should be pretty easy for others to replicate and eat their lunch. The fact that no one else has managed to yet says a lot.

As an anecdote, reddit frequently has outages, they run on aws. If they switched to gae they wouldn't, it's as simple as that.


Instead of learning to admin a server, and then follow up with admin for the rest of your life, you can hand that duty over to experts. Should hundreds of thousands of people be engaged in doing essentially the same thing. =================

Yep. Everything I've ever taken up in life I'm still doing X years later. I've never ever transferred a job to someone else, outsourced a task I learned, or hired someone to do what I do. Nope. 100% of the time I'm doing everything I ever used to. Still. WTF?

I think most cell phone service is shitty and overpriced, but it doesn't mean anyone can just set up a competitor - it's extremely capital intensive to compete at that scale.

Yeah, I forgot that GAE never goes down.

http://www.crn.com/news/cloud/231001916/google-app-engine-cl...

http://groups.google.com/group/google-appengine-downtime-not...

Reddit is one of the top 1% of the 1% in terms of traffic. I suspect GAE would probably have trouble with Reddit on a global 24/7 basis. Perhaps not as much as Reddit's having themselves at times, but nothing's perfect. And I suspect Reddit's pocketbook wouldn't be able to afford the marginal benefit GAE would provide.


You're leaving out several key parts:

* reddit on GAE would have no options if they ran into a feature which is too expensive on the GAE architecture. Reddit on EC2 has been able to change their backend architecture dramatically to deal with growth and new features — that's a really powerful thing to give up since it means you aren't limited to GAE's lowest-common-denominator functionality.

* Even assuming reddit could affort to spend kilobucks per day on GAE, how many good sysadmins and developers could they hire for that same amount of money? At their scale, the answer is a LOT of people - and that gives a lot more flexibility since money going to GAE is a sunk cost which scales linearly with traffic whereas people generally give better than linear returns.


reddit on GAE would have no options

This is not true at all. It's very easy to run a hybrid with parts of your application in other parts of the cloud. I do this. There's even a remote API stub so you have direct access to the full suite of GAE APIs from servers in EC2 or whatnot.

There are certainly limitations to what you can run inside of GAE, but there's almost always a good way to work around it while preserving the benefits of scalability, multi-datacenter reliability, etc. I'm not saying it's for everyone, but it's great for most common web applications.


Costs between aws and gae are roughly the same, and architectures when you get above rdbms's are also the roughly the same.

The only real difference is that gae forces you to code for scalability while aws leaves with possibilities to shoot yourself in the foot, which reddit apparently does.


Reddit historically had an extremely poorly designed backend, resulting in lousy performance and thus the outages that people experienced at scale. This has little to do with AWS vs AppEngine, and more to do with suboptimal code being forced to run at extreme scales. Not the best anecdote, sorry.

I'll say they're getting better now--for a while they were a major PITA for AWS folks.


Pulse is pretty big: http://googleappengine.blogspot.com/2011/11/scaling-with-kin...

Does Heroku have any particularly high-traffic sites?


At least a half dozen at that scale, I believe.

Urban Dictionary, as an example.

Here's another one I haven't heard of before now:

http://success.heroku.com/Banjo


Banjo only has 500K users[1].

Urban Dictionary is bigger (30MM page views/month[2]), but Pulse is an order of magnitude bigger ("100Ms of requests per day" [3]). Pulse ships on a lot of Android devices by default now.

[1] http://ban.jo/blog/banjo-hits-500k-users-in-under-six-months

[2] http://success.heroku.com/urbandictionary

[3] http://googleappengine.blogspot.com/2011/11/scaling-with-kin...


Khan Academy (http://www.khanacademy.org/) runs on App Engine.


Is Khan Academy a serious application, or just really good content? I think its smartest software is all JS anyway - you could host that on GoDaddy.


I think they are serious examples. Of course once they get to a certain size the cost of running on App Engine becomes exorbitant. At that point, there's two options:

1. Migrate away from Google-hosted App Engine to your own AppScale instances.

2. Negotiate a better rate with Google.

In my experience App Engine has been fine for apps that only get a few hundred or thousand visits per day. The convenience of not worrying about the environment or deployment too much is a great enabler.

It definitely depends on the type of app. You probably wouldn't want to run anything that is particularly heavy on datastore use. Using AJAX rather than full pageloads is a good way to keep bandwidth and response time (and therefore instance hours) down. For example, one of my apps is able to pre-load the most important AJAX calls, so it can handle running with high latency and I don't exceed the free instance hours.


Frankly, having worked on a serious AppEngine application for over a year, it most certainly is not. Of course, that isn't an absolute statement--perhaps some serious applications do exist on AppEngine, and leverage it in a way that doesn't cost countless thousands in hosting and development costs. The one I work on doesn't fall into that camp. I miss managing my own infrastructure more than anything.


Sorry but are you saying that your serious AppEngine application _does_ cost countless thousands in hosting and development costs? Implying that it's more expensive than it might be on competing platforms?


Yes, it's probably an order of magnitude more expensive when accounting for the countless hours spent pre-optimizing everything, tracking down intensely opaque performance issues, dealing with the AppEngine team's tendency to introduce changes that entirely alter performance characteristics without notice, and etc.

This, of course, is on top of the now rather ludicrous cost associated with maintaining a write-heavy application that must be both highly concurrent and consistent, with a large number of composite indexes and relational schemes that the datastore was simply not designed to cater to. That isn't AppEngine's fault, but it is by no means a generic hosting platform and I truly believe that their goal of creating an "infinitely scalable" and robust system is directly at odds with providing a product that is a good choice for the majority of applications.


There are quite a few businesses that are run on AppEngine that don't necessarily advertise the fact that they are running on AppEngine.

I've worked with two clients that have extremely successful websites running on top of AppEngine and are happy with it, but the costs mentioned in the article are absolutely true and something that they chose to bite the bullet on for a one time transition. I can't imagine what it would cost to get all of the data out to do a migration.


http://btbuckets.com/ runs the heavy lifting on App Engine. It's a lot of lifting. Another one I know of is http://www.enstore.com/.

Without seeing the code, it's hard to tell what's going on. I would suspect cascading ungrouped datastore puts and their index updates, but, from what's on the list, I can't even make an informed guess.


We use GAE for our app (http://www.webfilings.com/) and have had quite a bit of success with it. It really depends on your usage. For certain types of applications it makes a ton of sense and is still quite affordable.


Humble Bundle (YC W11) uses GAE as well and is definitely a profitable business, although it runs on a much smaller scale than the big AWS applications.

http://googleappengine.blogspot.com/2010/06/how-app-engine-s...


AWS has been around for longer. Appengine just moved out of beta. There are a lot of "serious" applications on app engine. Many applications have their infra distributed over multiple providers. A combination of AWS and App engine will be more heard of in the future. Tweetdeck has (had?) a backend implementation on App engine - http://goo.gl/38nSU

The point is, terming an infra stack as great as App engine as made for non serious applications is a gross misunderstanding of what the app engine is capable of. You should first give it a try, develop an app and then make a statement.


I am curious why Dropbox isnt in the case studies list. They seem to be the biggest success of note that is entirely built on s3.


And 3scale is listed on both.


I love GAE, but deleting stuff is really the major WTF point.

I am about to shut down one application that declined in popularity, because it costs me $20 / week to run it and revenue just dropped under $20 / week. The cost is not from instance hours, but purely from the stored data. Deleting the data from the data store would cost more than I could recoup, so that is not an option either.

Also I really feel frustrated giving hours of thought to something that should be a really simple operation. Perhaps .delete() should be free? After all, when I shut down the app, Google does delete everything for free.


My suggestion would be to turn off billing. That's what I did when the price went up on an app I had running and I didn't want to deal with deleting the data.


Well, If you run a few million entries, each one needing to write an individual entry and each entry takes 200 ms to write, that's your 2 weeks right there. I don't know how they are using GAE and if there is an alternative to their methods, but I personally always saw GAE as a very specialized platform. You really only should use it if you need to tie into a google account.


Even then it's not necessary. It's straightforward enough to use OAuth on any platform.

In my experience, App Engine is mainly useful for in-house infrastructure apps for companies using the Google Apps platform. That or cheapskate developers throwing together a proof of concept / toy app in their spare time.


Even toy apps can be had for free on other platforms. Dotcloud, Heroku, and AppHarbor all don't charge for the first instance.


The details have finally been posted in that thread. And while $6500 is a lot of money, we have to realize that this is a lot of writing - the headline doesn't give the full sense of it.

For those unfamiliar with GAE, a ListProperty is really a collection of properties. The author is using the property as a geohash with a significant number of values, plus he has additional multiproperty indexes defined, plus he's doing a rewrite (delete + write). All combined it appears to be ~460 writes per entity.

So what we're talking about is $6500 for 6.5 billion writes... exactly what is printed on the sales brochure. Is that a lot? Most datastores don't charge by the operation so I don't have a lot to compare it to. It seems expensive but not crazy, especially considering that the data is replicated via PAXOS to 3+ datacenters with automatic loadbalancing and failover.


"Geoboxing is a technique used to search for entities near a point on the earth in a database that can only perform equality queries (like App Engine)"

So their implementation is a compromise on account of GAE's limitations, and they have to pay through the nose to use it. This is when I'd be looking at hosting some features outside of GAE, which is what we do with Full-text Search.


No single platform or environment encompasses every use case. GAE doesn't offer R-trees or other spatial indexes; for that matter neither does MongoDB or MySQL.

Geohashing is a reasonable solution for some spatial problem domains; it's one solution along the spectrum of "precalculate a lot up front and make queries cheap" vs "write in a cheap & easy format but make queries more expensive". Pre-calculation strategies are usually more scalable when you have large query loads, but they suck bigtime if you need to fully recalculate a large body of data (as the original blog author is doing).

Maybe the blogger would be better off using PostGIS; but then, scaling and synchronizing a large cluster of PostGIS systems is nontrivial. The issues here are too application-specific to draw any positive or negative conclusions about appengine.


FYI, MySQL supports R-trees on MyISAM tables: http://dev.mysql.com/doc/refman/5.0/en/creating-spatial-inde...


Reading the thread (I'm curious about GAE, not an expert), it seems like the details aren't clear at all. Neither the original poster nor the Google rep seem to have a clear idea of what I/O operations are being generated.

But this bit stood out: $0.10 per 100k writes. That price seems to be far too high. The poster is doing (something like) a reindex of 10M entries (that kind of data is pretty small really: it's the kind of database you might use as a test set on your laptop interactively). Figure each modification is atomic, and that the b-tree height of the storage is ~4. So that's 40M writes to create an index, or $400!

Seriously? Again, this is the kind of task you'd expect to do quickly and interactively on your development box, and it costs a price of the same order as your day's salary (!) to execute in the cloud?

Looking at this from the perspective of the underlying I/O device: this index consumes just a tiny, tiny fraction of a hard disk drive's capacity. Yet creating it costs enough to buy the device several times over?

Something is wrong. Is that a misquote or have I misunderstood?


$0.10 per 100k writes is just a way of them billing you. Sure it's a lot compared to your avarage HDD price, but the data is hosted on at least 3 datacenters at all times, running google's software stack & everything is being constantly monitored by Google's SRE team.

App Engine pricing might seem expensive if you try to do a simple table comparsion with alternatives, but when you get more deeply into it you'll find that a lot of stuff that is included in the service with GAE will cost you extra when you use the alternatives.


combine that with the fact that the High Replication Datastore will most probably never see a downtime and you tend to be ok with the cost (of course as a app engine customer, you would always want lower costs, but the current write cost is not a deal breaker).

The only problem here is that "delete" is considered a write and when you want to delete data you just cannot accept the fact that you need to pay for something you do not want to keep. I think GAE should definitely look into this aspect and try to get some cheaper alternatives for data deletion.


One of the things that is not included is any real support, unless you go for the $500/month Premium package, which then only buys you a 2-hour turnaround with 18h/5d support. For that much money you should at least be able to get some support on the weekend.


I believe that $0.10 per 100k writes is what both Windows Azure Storage and Amazon S3 cost. But honestly, the cloud is the future!

Disclaimer: I work at Microsoft and am required by the terms of my employment to believe in "the cloud".


I believe that $0.10 per 100k writes is what both Windows Azure Storage and Amazon S3 cost.

Correct. Of course, those are object writes -- Elastic Block Store disk I/O is 10x cheaper.


Right. The granularity to be expected with an S3 object (~= "a file") is much higher than with a database index of tree nodes of a few kilobytes each. And of course EBS is subject to caching on the host. Rolling the index into a transaction can eliminate almost all the duplicate I/O to the tree nodes.


> So that's 40M writes to create an index, or $400!

At $0.10 per 100k writes, 40M would be $40.


According to the billing page [1] writes are indeed billed $0.10 for 100k operations.

Also, reads, writes and small operations (which are the ones billed) are low level operations. An API operation actually translates into several low-level operations. And the way it is described [2] I think the poster is doing more writes to reindex 10M entries.

Considering that reindexing takes 1 write for the entity itself (existing put) + 4 writes for each element in the list property and considering that the poster has on average 18 elements in that list for each entity, then he's probably doing on average 73 writes per entity (I'm taking the "Existing Entity Put" scenario into account, otherwise for new entities it would be 2 + 2 per list element == 38 writes).

So by these numbers, that's 730,000,000 writes, or a cost of $730 -- if you go over them sequentially, only one time. But considering that he's doing manual full-text indexing, maybe he had to go over those items several times for the reindexing being done.

Maybe I'm missing something here. I don't know.

[1] http://code.google.com/appengine/docs/billing.html

[2] http://code.google.com/appengine/docs/python/datastore/entit...


Those are "writes" from the users perspective, it doesn't take into account writes due to index updates. So that's 1 write multiplied by however many indexes that need update.


AWS seems to be $0.11 per 1 million I/O requests to EBS/RDS.


A lot of these AppEngine costs too much notices have been coming up recently, but upon further inspection, they all tend to boil down to operator error.

Unfortunately, AppEngine isn't forgiving of that and there is a real monetary value associated with questionable engineering design. Or, design that wasn't thought through enough in the context of a service like AppEngine.

This leads to a few people getting upset and making a lot of noise when the reality is that AppEngine is actually an amazing service.

So, to boil down the operator error from a quote in the thread:

"We're running a mapreduce to change the geobox sizes/precision for a large number of entities."

That is the real source of the problem. Instead of using geoboxes, they should be using geohashes, which allow arbitrary precision.

http://code.google.com/apis/maps/articles/geospatial.html http://en.wikipedia.org/wiki/Geohash

Instead of an indexed property that looks like this (what they currently have):

[u'37.3411|-121.8940|37.3395|-121.8926', u'37.3411|-121.8929|37.3395|-121.8916', ...]

They would have an indexed List<String> property that looks like this:

[8, 8f, 8f1, 8f12, 8f12a, 8f12ac, 8f12ac6, 8f12ac60, 8f12ac605, 8f12ac605f, 8f12ac605fb, 8f12ac605fb3, 8f12ac605fb34]

Finding if the location is in a box would be computing the hash from the lat/lng (there is free code out there to do that) and then doing an indexed 'in' query. The indexes would only need to be updated if the location of the entity changes, not when they want varying levels of precision.


This is incorrect for several reasons.

First off, they mention that when the initial design decision was made, a similar operation cost ~$160, which is tenable for an operation that only happens once in a while. This is in fact a case of them getting bitten by the pricing structure changing after a reasonable design decision (at the time) was implemented.

Secondly, they mention that this is part of a larger issue: "In our most common case we might have to add and delete a couple items to the list property every once in a while. That would still cost us well over $1,000 each time. Most of the reasons for this type of data in our product is to compensate for the fact that there isn't full text search yet. I know they are beta testing full text, but I'm still worried that that also might be too expensive per write."

This is a real problem that GAE needs to solve.

Finally, their problem doesn't seem to be that they need arbitrary precision, its that they seem to need fast location centric queries of a large database.

Geoboxes allow you to solve this problem correctly (and quickly), returning the results in the database that are closest to you. Matching on a geohash can end up serving the incorrect data unless you resort to hacks involving a number of queries.


1) You are making an assumption that the original design decision was made around cost. I bet the more likely fact is that they made the change and found out that it cost them $160. Remember, that was when AppEngine was a _beta_ product and it seems they got lucky the first time.

2) They seem to have an extreme use case. No one is going to argue that maybe AppEngine doesn't fit the bill for them. Or, one could argue that doing 6.5 billion writes times a large number of customers, across multiple datacenters is something that a lot of databases would choke on.

3) Running more queries, while admittedly hacky is less expensive than doing more writes.


We wrote some App Engine code to do geospatial lookups.

Before anyone gets the idea that the links in the parent are worth trying, they're not: the performance is absolutely atrocious – on our data set, 13 seconds per lookup.

We wrote our own approach on App Engine and now get stable performance on our datasets at ~300ms per lookup.

(We're doing Foursquare-type lookups.)


I can't imagine how an indexed query on a List<String> of ~10 items takes 13 seconds.

'We wrote our own approach on App Engine'

Hmm... details?


You can find the code we wrote here, along with the testing infrastructure:

https://github.com/erichocean/appengine-geospatial-lookups

Like Foursquare, mobile clients send their current location, plus a search radius, and App Engine code returns a result set ordered by distance.

We were getting query times with realistic data set sizes in the ~13 second range from the code you linked to. With the code above, we're ~40x faster. YMMV.


You should write this up somewhere and post it to HN and the App Engine Reddit. Getting geospatial right is hard, and it sounds like you've done a good job.


That's awesome. How does it work, basically?


Yes, blame the operators. The operators designed their apps based on the original pricing incentive published by Google but got the rug pulled under them when Google hiked the price to completely change the pricing incentive and invalidate their design.

I got burnt by AppEng, too. Picking AppEng as a platform is one of my worst technical decisions.


http://googleblog.blogspot.com/2008/04/developers-start-your...

"Google App Engine is free to use during the preview release, but the amount of computing resources any app can use is limited. In the future, developers will be able to purchase additional computing resources as needed, but Google App Engine will always be free to get started."

You got 2+ years of use for 'free' and now they decided to turn it into a supported business model and are asking you to pay for what you use. Seems reasonable to me.

I don't disagree that they fubar'd their original pricing release announcement and should have had multithreading Python 2.7 for those folks.

But they did listen to the (loud) feedback, made adjustments and even apologized (were you there at the ThirstyBear meetup where they bought us all beers?).

Not having to hire an IT staff or be woken up in the middle of the night when AWS decides to reboot the host and your servers go down is worth its weight in gold.


Yes, stupid of me to trust the marketing speak in that blog post. I've already said it was one of my worst technical decisions. I've advocated to clients to use AppEng and I looked real stupid when the projects failed because of operation cost hike.

Software are architected and designed with constraints in mind. Features are feasible or unfeasible because of these constraints. Cost is one of the big constraints. With the new cost structure, the apps have to be re-architected and redesigned, lots of things aren't possible.

I didn't go to the ThirstyBear meetup because I have given up on AppEng and moved on. I simply do not have trust for them to be a platform vendor.


If you correctly architected your application around the constraints of AppEngine, then it wouldn't be costing you a lot of money. That is the whole point... you know the constraints, so you architect wisely and you spend money where you need to. You decide which indexes you need and which ones you don't need. You decide how many reads and writes you are going to make. You figure this stuff out in advance and if you make a mistake, you refactor your code to improve things.

I'm not saying it is easy, it isn't. It takes a skilled engineer to learn this stuff and make it work. But, when it does work, it really does work.

It sounds like you failed to create an application that took all of that into account and of course you are going to look 'real stupid' when your clients figure out that you took shortcuts. There is no way that could be the fault of AppEngine.

I'm curious where you went after AppEngine. Are you hosting on AWS now? Heroku? EngineYard? Do you have the same reliability and scalability as what AppEngine provides? Maybe that isn't something that is beneficial to you and in that case, I can see how AppEngine is not your cup of tea.

For me, AppEngine is amazing. I love the fact that I'll never have to hire a sysadmin. I love not carrying a pager. I love knowing that when my site gets an asston of traffic, I won't ever have to think about making it scale. I love not having to worry if my database is on big enough hardware, replicated across data centers, backed up. I'll never have to think about whether or not my OS needs an upgrade to plug a security hole or ssh'ing into a server at 2am to figure some esoteric problem out. To me, all of these things are worth the 'cost' of AppEngine. I'd rather spend my time adding features than doing sysadmin.


At certain point, I am not sure whether you are trolling or not. All I see is you kept repeating the AppEng marketing nonsense.

A skilled engineer would know not all problems are the same and one platform can't solve all the problems. You don't know what product requirements I had and you assert it's my failing since I couldn't beat my product into AppEng's square, despite the AppEng's square turned into a circle.

Just because your niche app happens to fit into AppEng's mold doesn't everyone else can do the same.


We don't know your application, so of course we can't tell if you made poor design decisions or not. But have you considered that the poor match went both ways?

Google said that they adjusted their prices because people were getting wrongly incentivized to do things that were expensive at their end. And the way free market is set up, pricing is one of the ways that signals are sent to tell customers that they should make different optimization decisions --- including, perhaps, switching to a different technology which is a better match for their requirements.

Yes, that can be painful --- but it's the free market. You might as well complain that some silly folks moved to exburbs hours away from their work, and bought gas-guzzling SUV's, and then got upset when the price of gasoline went up to 3-4 dollars/gallon. Whose fault was that? OPEC? Or the consumer for choosing to live far away from work and to buy a car that had horrible mileage?


Valid points except "people getting wrongly incentivized." People were rightly incentizivized to optimize their apps under Google's pricing guideline. It was Google who got it wrong to misalign its infrastructure cost and pricing, and unable to bring down their cost over the years.

I don't think your gas example are relevant but if you want to stretch it, it just means Google like OPEC is not a trusted platform vendor.


I'm definitely not trolling. I'm trying to have a conversation and understanding of your rationale for why GAE didn't work for you. So far, you've left me totally confused.

I heartily agree with you about GAE being a niche. Not all apps (or developers) belong on GAE.

I ask again, if GAE didn't work for you, where are you hosted now?


[deleted]


Wow, you seem angry. Why do you keep replying and not answering the question?


You seem disappointed. Ask again.


Now look who is trolling. First you respond with the now deleted statement, which I'll post back here:

"ww520: You were not trying to have a conversation. You were trying to have a bragging session. I don't find it constructive to continue the "conversation."

Then you come back and respond again? I'm so confused. Anyway, I'll consider this thread over, unless you actually want to have a real grown up conversation.


It's funny to see a hotshot troll feinted confusion. It's even funnier to see a rude brat pretending to have a grown up conversation. If you want a grown up conversation, act like one first.


If something is marketed as a preview release, don't rely on it for critical infrastructure.


I know. I misplaced my trust.


> (were you there at the ThirstyBear meetup where they bought us all beers?)

How is "must live in the Bay Area in order to receive friendly customer service" a reasonable hidden addition to the T&C of a supposedly global service?


GAE has become completely infeasible as a hosting solution for me (ThatHigh.com). My hosting cost increased by 90x, and I did not get nearly enough notice.

I don't have the time or resources to move the site, so I'm forced to shut it down. It really, really sucks.

Personally I'm more disappointed by the lack of notice (1 month is nowhere near enough time) than the actual increase. I totally understand the need to charge.


If you implement optimizations, you can really significantly curb the cost.

I spent some time tuning SharedCount's API, which would have cost me $30-$50/day, and its now at about $1-$2/day.

- Move to Python 2.7 and enable multithreading

- Setup Cloudflare (this swallows about half of all my requests)

- Increase minimum latency and reduce the maximum number of idle instances. (I have 5-8s and 1-2 set, respectively)

- Setup the semi-undocumented Google edge cache (basically, just a Cache-Control: public, max-age=[seconds] header.

- Take advantage of memcache.

With this setup, I'm doing 3 million API calls per day at $2.


Python 2.7 is experimental and requires migrating to the new datastore (high replication) which is a lot more expensive. So you might not want to do this if your costs are coming from datastore writes.

Also, high replication queries can return stale results unless you use ancestor queries. Ancestor queries require putting entities in groups by giving them all the same parent (which can never be changed). Basically it's a very inflexible semaphore and kind of sucks IMO.

Your suggestions in general are very good though. Thanks, I'm switching my DNS to CloudFlare now.


The HRD is the same price as the Master/Save datastore. In the old pricing regime it was more expensive, but now they are at parity.

It's true that eventual consistency of queries on the HRD can be tricky to program around. On the other hand... your data is replicated to 3+ datacenters and failed over in realtime. Pretty rad.


Ah, yes. My unusual advantage here is that SharedCount doesn't use the datastore, which I now realize is likely not the case with ThatHigh.

(In fact, when I had to migrate my app from MS to HRD, it kept failing because I didn't have any datastore entities. The workaround was to just create a single entity)


Do you see much benefit to CloudFlare swallowing half of your requests when you're already using the GAE edge cache?


As far as I can tell, the edge cache doesn't work predictably; it relies solely on standard caching at the Google infrastructure lebel. The Cloudflare caching appears to be more centralized and covers a higher percentage of hits. But, I haven't looked into the effect of relying solely on the edge cache and getting rid of Cloudflare. Can't think of any reason to try, though, since Cloudflare is free.


I checked out SharedCount, and was wondering why the front-end uses PHP.


Inertia; I initially built the API in PHP, but switched it to AppEngine when ProgrammableWeb listed it and people actually started using it. The front end uses the API though, and doesn't get enough traffic to justify worrying about scale.


As a fan I urge you not to shut it down. Please consider selling it or something!


Hey I shot you a tweet, I'd be interested in helping migrate the site. Email is in my HN profile, shoot me a message if you're interested.


Not to be an asshole, I'm genuinely curious: what about that site makes it impossible to run on a 10$/month vps with plenty of capacity to spare for running mail and 5 other sites of the same complexity? From the dates on the posts it doesn't seem like you get much traffic.


Nothing is impossible about it. It just takes time. It also means downtime, which I suppose is unavoidable at this point.

Honestly it has more to do with how much time I want to spend on the site, how much it returns, and whether or not I should spend my nights and weekends transferring it to another host.

The last time I backed up all the data from GAE it took 4 days to download all of it to a VPS. 4 days to download all of it. Migrating means 4 days of downtime, or alternatively some complex solution involving posting all new data to BOTH places while the migration takes place.

That takes time and energy, and quite frankly I'd rather see someone with the resources do it right instead of trying to hack it.


If you are up for SQL, the downtime issue is solvable with ChronicDB pain-free.


Did you turn multithreading on? It's helped on my sites somewhat.


For those questioning the use of Google App engine as a serious platform for applications, we at Pulse use Google App Engine: http://googleappengine.blogspot.com/2011/11/scaling-with-kin...

So, yes you can build serious applications on GAE but like everything else it boils down to, it depends on what you really need.


I like the spot price idea (in the comments http://groups.google.com/group/google-appengine/msg/fe9a05c6...). It's similar to adwords' automatic auction for how close to the top your ad is, with the same benefits of getting the best market price (for buyers and sellers) of a limited resource. If no one else is using it, it could become close to free.

It also casts the other users as the opponent, instead of google.


I wonder what a managed or dedicated server would cost to perform the same calculations.

Sometime's it's still cheaper to have your own managed / self-managed gear... and from the looks of this pricing, even hire someone fulltime/freelancing to manage it all for you.


>"I wonder what a managed or dedicated server would cost to perform the same calculations."

It wouldn't cost you any money (unless you have metered electricity), but rather just opportunity cost of being able to do other work with your resources.

Unless you only need a short-term lease on the equipment, cloud servers will be more expensive that dedicated/colocated servers.


I have run my own servers in a datacenter for 7-8 years and going to a VM setup was the quantum leap. I can copy and paste a server in minutes and be up and running with a nearly identical setup that I have to setup for my dev environment anyways.

Hardening the server is something that can be outsourced for a lot less than thousands. Services like linode seem to be a nice middle ground. While I don't see myself going back to my own hardware in a rack i run in a datacenter, I do still see the benefit of knowing your stack a bit beyond coding. Knowing how the stack works helps when building software quite often.

Anyhow, those are just my experiences. VPS' with a very strong toolkit to take the edge off self-administering like Linode, etc, seem to be a very nice option. Heroku has caught my eye too but they have completely different measurements.


s/sometimes/all the time/;

I honestly do not get why people are so fascinated with the cloud. It's a very expensive way to avoid having to know what you're doing.


In my experience, all ways to 'avoid knowing what you're doing' are expensive.

It's all about time, and lack of it. If you can spend 1/10th of the time and still make a good profit, you could spend the other 9/10ths doing other profitable things.


At the end of the day, you're still going to need to A) learn what you're doing or B) hire someone who does know what they're doing - and given that one of the major selling points of these PaaS offerings is that you can get up and running without needing a dedicated sys admin, this seems particularly silly to me!


In a way, putting off getting to know your hosting stack is a very real form of technical debt.


Like any other tool "the cloud", as you say, can be the right fit for the right job.


Besides the cost, if your startup can survive without SSL support on your own domain, go for App Engine.

See:

http://code.google.com/p/googleappengine/issues/detail?id=79...

People requested custom SSL support at 2008, and today is 2012, if you still believe in App Engine, good luck!



See the last comment of my link:

>> the "trusted tester program" is a joke . They never respond so it's just a waste of time .

Even they launch this feature TODAY, so 4 years for a basic requirement, what you can expect from them?


I can think of TONS of services I've used over the years, both in and out of the software field that have talked about adding features and never did. Big deal, don't play the victim.

If you really wanted onto the trusted tester program, you'd bring it up on the app engine mailing list or contact someone at google directly (their emails are all over the place, Ikai is a great guy, and they are very responsive). I'm sure they'd be happy to have enthusiastic beta testers.

I find great irony in your quote on your G+ profile:

"Do you create anything, or just criticize others work and belittle their motivations? -- Steve Jobs"


Thanks for your reminder, I shouldn't belittle the enthusiasm of GAE's engineers/supporters, my bad.

But as SSL support is not public yet, my statement above is still valid:

If your startup can survive without SSL support on your own domain, go for App Engine!

Good luck!


Google has a great platform here with lots of potential if they opened it up more, but they're pricing themselves into a corner for this already-niche service.

Sadly, this is kind of "typical Google" -- great product, decent execution, but a bad identity problem -- it really feels like they're not sure yet what they want to do with this.


This is all because of their new pricing model. Overnight our pricing went up by 5X and that was after 50% discount, which make it really a 10X hike. Here is a graph of it: https://plus.google.com/114790424055754975707/posts/eUMhYDVf...


GAE is a mistake. By that I mean that it's got a big design flaw that's bound to cost Google money, which means it'll always be expensive than alternatives. Consider shared PHP hosting: a request comes in, apache finds which PHP file is responsible for it, then directs the request to the PHP interpreter. The PHP interpreter will parse the file (or more likely load the parsed bytecode from a cache) and return a response, which apache will then forward to the user agent. Notice that aside from the cache the PHP interpreter is stateless. As soon as it is done serving a request from site foo.com it can immediately jump on a request from bar.com and the context switch doesn't cost anything (once again disregarding finite cache size issues).

Contrast this with running a stand-alone application server for each site, which is what GAE does. Here, even if your code is not serving any requests it's still waiting to get them. Now, GAE has powerful magic in it to retire request handlers which aren't frequently used. This way if site foo.com is getting 1 request/minute, it only really needs one process/thread/hander abstraction at a time. However, it is expensive to start/stop these "processes", so instead GAE is forced to keep this "process" around for a while after a request has been served hoping that the cost of keeping it alive would be justified by a second request. Thus these stateful, slow-to-start processes are always taking up resources that could be used to serve other requests.

Disclaimer: all my knowledge of GAE has been from reading their docs/blog, not from deploying projects to it.

Disclaimer 2: I am not saying that PHP is better/worse than GAE in any way. However, I am saying that the model that GAE uses is more costly for a typical application. This can be easily seen by comparing the cost of running a basic site on GAE vs $2/month shared hosting.


Heroku uses the same model. They're doing quite well.

GAE has problems, but I think the root is just how unique everything is. That manifests itself in people using a datastore that they don't understand, with Google expecting them to know how many writes an action will take and whether that feels like the right number of writes or two orders of magnitude more than if they made a different decision about how to store their data and solve their problems.

It also manifests itself in the lockin that Heroku mostly avoids (which is a huge problem if some subset of users get to a point where they realize "whoops, this would be much easier if I could do things Google won't let me do, time to leave").

I think a good counterexample is Engine Yard and GitHub. Engine Yard had a somewhat limited offering (especially for what GitHub was willing to pay) that didn't really fit with GitHub's heavy direct disk I/O. (Most Rails apps almost entirely read and write from the db, but GitHub does a lot of direct operations on the git repositories.) But GitHub was still just a Rails app, not an app for some specially-designed Engine Yard framework. So it was fairly painless for them to decide to solve the problem in a way that didn't fit with what Engine Yard would offer them and migrate to their own hardware. It wasn't easy, especially since they weren't solving an easy problem, but at least they didn't have to replace their database.


The GitHub example is the reason why I never wrote any code against GAE. The lock-in into their data store is a hard pill to swallow and was an early warning sign.

I am not familiar with the internals of Heroku and don't know how they solve the problems I outlined. Maybe someone else can elaborate.


Heroku runs a process for each application (or multiple). The first one for an application is free. But just like with GAE, they'll spin it down if it's idle long enough. So I know that if I go to a site of mine that no one has visited in months, where I'm not paying for extra instances, it'll take a few seconds to load. But having to run those free instances doesn't kill them because all but the first one for an application is paid for at 5 cents/hour, which presumable is enough margin to cover the free instances, and then some.

But they run normal applications. It started out being any Rack (Ruby web standard--Rails, Sinatra, etc) app; they've expanded into other languages now, but it's always some open framework that they're running for you, not something they own and keep proprietary. They give you a normal Postgres database. There are a few restrictions that you might not have on your own hosting (like a read-only filesystem). But you could basically take an app running on Heroku, install a webserver and Rails, install Postgres, make sure any config you had was the same, and run it.

Another huge advantage Heroku has is the ability to give other people access to their datacenters, since they're just running on EC2. So there's lots of Addon services that can add various pieces of functionality, like hosting a different database, many of which are only tractable because they're also hosted on EC2 and thus have very good latency to Heroku servers.

This means that you're not stuck with Postgres. If you think a part of your data, or all of your data, would be better stored in Mongo or Couch or Redis or flat files on S3, there are hosted services for that, and you can even deploy your own solution on EC2 if you'd rather. This leads to nice halfway solutions where you use Heroku to have super-scalable application servers, and maybe to manage your main relational db, but then you can tack on other things where that doesn't fit with your problem. Now, if you're running some of your own EC2 instances, you're losing some of the "never have to worry about hosting again" value of Heroku, but at least it's possible. It could be a temporary solution that keeps you above water while you migrate off of Heroku, or maybe you decide it really is the best long-term solution.


Thanks for the great overview.

It's interesting, since GAE charges $0.08/hour per front end instance, with 28 front end hour instances free per day. However, I once helped someone debug an application that on average gets one hit a minute, and yet manages to max the 28 free hours and then some. Closer examination showed that GAE was having these instances hang around for much longer than seemed necessary after they fulfilled the initial request.

So either Heroku has more magical magic than GAE, or there is some other kind of efficiency that they are tapping into. One thing I can think of is that possibly Heroku is more conservative with spinning up extra processes, preferring longer response times.


I may be misunderstanding what you mean but heroku does not spin up or down any processes, the amount of workers you have is controlled entirely by the end user and you will be billed by the worker regardless of how much traffic you're getting. I'd venture that 95% of heroku apps have a single process, and that 95% of the time those workers are in some sort of suspended state.


If you have 1 dyno, heroku spins that dyno down after 20-30 minutes of inactivity.

Also, there are services out there that will monitor your queue depth and increase your dyno count for you. Or, you could use the heroku gem to do that yourself.


Depending on how long it takes a call to be dealt with it might be worth telling GAE to never spin up more than one instance. Sure, the occasional user will have to wait a couple of extra seconds to be dealt with, but you won't have to worry about cost.


I don't understand the downvotes. This is a on-topic related detail discussion.


Speaking for myself, I down-voted because of the absurd false dichotomy of comparing Google App Engine to $2/month shared PHP hosting! It's like complaining that an oil tanker costs more than a rowboat.

The value proposition of App Engine is that with no systems administration expertise you can rent an extremely reliable, massively scalable web platform that is managed around the clock by a world class devops teams. Unsurprisingly this costs money. If you don't need the reliability or scalability of App Engine, no one is forcing you to pay for it. But it's absurd to suggest that you can get anything remotely comparable in PHP for $2/month.


Thanks for the explanation. The reason for the comparison is two-fold. First, there seems to be a section of GAE users who use it solely to host a blog, a need satisfied by a much cheaper service elsewhere.

Second, while I understand the value of high availability and not having to worry about Ops, I am talking about cost per HTTP request. Here, the $2 PHP host is a clear winner. As I said, that does not mean it is better. Your analogy with the oil tanker and a row boat is applicable: one is more appropriate if you just want to cross the pond. The other is better for going from Alaska to California, with the caveat that GAE exists in the world where there are vanishingly few cost-effective ways to use it.


This comparison is still absurd (and incredibly naive). If you have a few hundred requests per day, sure a PHP server is far more efficient use of your box. For that matter, if you have zero requests per day, it's more efficient to just power it off.

At any kind of scale, PHP's model of "start the interpreter, load the code, start executing" is abysmal. To make efficient use of CPU resources you need compiled code - JIT or otherwise. You often need to make use of instance RAM and pooled database connections. Long-running processes are vastly more efficient when running flat-out.

There is a reason Facebook invented hiphop.


    start the interpreter, load the code, start executing
For pedantry's sake, there is of course FastCGI and/or mod_php to keep the interpreter alive, and APC to keep the code around.

Hiphop is of course much faster.


If you have the interpreter live and the code loaded, you're circumventing the parent's "point" that GAE is inefficient because it keeps your application running in a process vis-a-vis the standard way appservers work in java, python, ruby, node, etc.


The difference is that PHP's interpreter is stateless, while a Django app server is not.


But if you need the reliability and scalability that App Engine provides, I still think it's cost effective compared to attempting to build the same functionality yourself. You get the benefit of directly using the work of Google's top flight App Engine programmers (e.g., Guido van Rossum).

By way of comparison, I've done quite a bit of work w/ the AWS components (e.g. EC2, S3, SQS) that in theory allow you to build a highly reliable, highly scalable site but in practice there's still a lot of assembly required, whereas w/ App Engine that's provided out of the box.


I think you misunderstand him. He was using PHP to illustrate the efficiency of request handling and the lack of in AppEng. Whether he got it right or wrong technically doesn't deserve a downvote.


With Amazon AWS you can handle the scaling yourself, when you need it, with components tailored to your use cases, and better latency.

And with Heroku you can have it taken care for you, following a few simple rules.

So, why exactly would one use the crippled GAE platform, that constantly breaks its promises (re: reliability), forces you to code with very little flexibility (and, no, not every app that needs to automatically and massively scale "has to be coded exactly like a GAE app anyway"), costs a fortune (and sometimes an unexpected fortune), and breaks for you as soon as you need a technology not on offer?


They jacked up our price by 5X, here is a nice graph of it: https://plus.google.com/114790424055754975707/posts/eUMhYDVf...


I understand you're angry, but did you really just post that same thing twice in this thread and twice in the linked thread??


Hate to take it off topic, but your site (http://www.f8daily.com/) is throwing errors: ValueError: Values may not be more than 1000000 bytes in length; received 1053462 bytes


The image he linked said he was forced to shut down his site, so I'm not sure that he cares that there is an error.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: