"We’ve effectively hired them to deal with managing the hard parts of our scaleability infrastructure and they are working for a fraction of what it would cost to do this ourselves."
This, in my mind, is the most insightful line in this post. When cloud hosts like Heroku and GAE are discussed on HN often there is cost comparison between using them and doing the sysadmin yourself on Amazon, or on your own hardware. But what doesn't figure in is that with Heroku and GAE (I suppose more with GAE) is that you aren't just getting out of doing your own sysadmin work, you are also getting out of doing a lot of scaling work yourself. This is expensive, difficult work. Of course it is the type of work that many of us here salivate over, but that's another issue... ;)
It may be expensive, but you don't have to do "scaling" right now. A startup doesn't even know the business requirements or bottlenecks yet, how can it even begin to talk about scaling?
Scaling on GAE really works because of all the restrictions, and nothing stops you from restricting yourself in a similar way, e.g. in a similar vein the FriendFeed people have done: http://bret.appspot.com/entry/how-friendfeed-uses-mysql
The sysadmin part is also easy when you've got only 1 or 2 servers to worry about, the hard part of doing sysadmin is also when "scaling". Personally I can get an initial EC2 instance up and running in 2 hours tops, and then I can and have been automating that.
I've also worked on GAE apps, and from my experience the sysadmin stuff is replaced by at least 3/2 as much developer time, and this for trivial stuff.
Sorry for the analogy / bad language, but do you know what else scales? Fucking in the ass, i.e. no unexpected pregnancies, but that would be a stupid suggestion to make, wouldn't it?
At the moment, it's a tradeoff. Google App Engine frees you from operations but requires more development time. A VPS like EC2 requires less development time but a lot more operations time.
It's good to have the options but there's no silver bullet yet.
"On top of the great steady-state service, we were mentioned on ABC’s The View and had a massive surge in traffic which was handled flawlessly by AppEngine. It transparently scaled us up to 20+ instances that handled all of the traffic without a sweat. In the end, this surge cost us less than $1.00"
I'm the original author of the post. If anyone has any questions about AppEngine or what it's like to develop on that platform, feel free to post here.
I'd love to hear about your view on which frameworks to use for python (tipfy, webapp, tornado, django ports etc) and their advantages over each other. I've developed a webapp using django-nonrel, it was pretty quick to have it going but I have this uneasy feeling in my gut that something may be wrong.
I'm not as strong with GAE/Python as I am with GAE/Java, but my feeling is that your AppEngine experience will be far more productive if you use AppEngine's native Datastore APIs rather than any abstraction layer.
There's a lot of detail at the low level that you miss by using any layers of abstraction (on both Python and Java platforms). That detail is important for getting maximum performance out of your application and correctly dealing with AppEngine's unique transaction model.
The choice of web frameworks is tougher. Startup time is important right now for AppEngine (although slightly less important with some of the new features rolling down). You'll want to use the lightest-weight framework you can find.
On Python, I've had great luck with webapp + Django templates + raw Datastore APIs.
On Java, Guice + Guice servlets + raw Datastore API is really lightweight and fast. We've also developed our own Django-like templating system for Java that we hope to open-source at some point.
Anyone else with more GAE/Python experience want to chime in?
Could you share some information about how your app code is structured? The main reason I'm using Django is because I like that it gives me a directory structure. The only examples I've seen of webapp are a ton of handlers in one main.py file, which doesn't seem maintainable.
Usually I put one handler per .py, to the point, what that particular module should do and nothing else. I leave the routing stuff to the app.yaml file.
For a dir structure, I have admin, css, js, prog and view. Not hard to know what goes in every folder. Right now I have like fifty progs and fifty views, each for every specific task.
For every program there is usually one line to get the data from models and one line to render it using a template. Nothing else.
I really don't get what a framework would do to help me, I already have everything I need at my fingertips.
I've heard really good things about it. It's written for GAE and GAE only, so there aren't any abstractions that insulate you from how the datastore works. It's still worth your time to write some code against the raw datastore APIs. It takes some time for entity groups, keys and parents to sink in and working with the raw API helps you see what's going on.
I started with the raw datastore API and was able to do what I wanted after quite a bit of research and trial and error. I was then turned on to Objectify and couldn't be happier. I know what is going on underneath and the limitations but I have a much easier way to deal with them. Objectify is highly recommended, but my app doesn't have massive scale so I am not sure if it adversely affects performance much.
I did a study of the bigger templating systems available on Java earlier this year and the landscape wasn't that great: JSP, Freemarker, Velocity and a number of smaller projects. We decided that it was worth some time developing a system inspired by Django's lightweight syntax. If you send me a note (matthew@mastracci.com), I can let you know when we get our project up on github. It's rendering every page on gri.pe right now and has been stable for a few months now.
We use python on GAE, and chose tornado, and admittedly this is partly because at the time it was the new hotness and I like friendfeed a lot. That said, even though with the wsgi adapter you've neutered what tornado is famous for (evented python framework), it is a nice concise web framework with a flexible templating system that is easy to grok and get working quickly, yet still has some nice features like UI modules, named url matching with reverse_url construction, xsrf protection and secure (in the sense that they are unforgeable) cookies. Its templating system doesn't auto escape, so we hacked it to do so.
Using a minimalist framework is a pretty good match for app engine since other frameworks might assume something about the environment that won't be true of GAE.
And you can always mix in other ala cart libraries. For instance, we use wtforms. That combined with a tornado UI module to render the forms nicely has been a effective.
If I were to choose again, I might just use the built in web.py since they often include utility handlers for things like parsing multipart uploads; we've had to reimplement those for our tornado BaseHandler. Then I might use mustache for templating since we have started using that on the client side.
Kind of random thoughts I guess, hopefully somewhat illuminating nonetheless :)
I used Flask in two of my currently running apps. It works nicely together. AppEngine limitation never gets in the way, at least not much as running Django on AppEngine.
I can't remember exactly why, but it has something to do with how tipfy handles WSGI middleware back then (and it was very confusing, since tipfy also call their plugin system a "middleware"), and the framework being a bit too verbose for my liking.
My first impression of GAE (admittedly, many months ago) was that it was very painful to deal with loading or otherwise manipulating data in BigTable. Do you have any shortcuts or advice here? Has it improved?
Annoying Example: I realized that I needed to delete a lot of records from the table. I created a URL that I could ping periodically to delete a bunch, then get killed because it was taking too long, repeat.
I am really glad someone wrote a response from another POV. I was thinking about doing the same if I get the time on the weekend. We have been very happy appengine campers and the 1.4.0 SDK release looks very promising.
There are and have been some rough edges but overall it is a great platform and it was well worth it for us to invest time to evaluate and embrace the constraints.
Video is in development, but the system is pretty straightforward. We're integrated with Zencoder. Our clients (e.g. mobile/web) request an upload endpoint from our server (GAE). The endpoint is a pre-signed S3 upload url computed by the server. The client uploads to that url and then triggers a "process" API on our server which issues a request to Zencoder on the backend to pick up the videos from S3 and begin transcoding them. Zencoder sends back a response that allows us to associate the "video" entity in our datastore with job ids/output locations. You can set up a callback so that Zencoder will notify GAE when transcoding is complete. The output is available in S3, including thumbnails.
So I guess this is also an endorsement of Zencoder: it rocks and was very easy to integrate with!
Thanks! our startup (realtimefarms.com) also runs on GAE (python though), and we use the blob store image serving to great benefit, but have been toying with supporting video uploads, helps to hear how you guys are doing it.
Thanks, Nick! That 1,000 deployment limit is daily, AFAICT. Our quota dashboard shows 9/1000 deployments, which matches up with the number of builds we've pushed live today.
Do you know of any good links for GAE development patterns?
I've read tips in various comments on various websites, such as use filters instead of GQL, using task queues and handling storage exceptions, but very little in one place :)
I'm more familiar with GAE/J than the Python stuff myself. I don't have specific links to patterns, but I highly recommend watching the AppEngine videos from Google I/O. I watched nearly a dozen hours of video during development:
It's worth spending some time and writing the ORM stuff by hand for your first application. The low-level datastore APIs are well-written and you'll learn a lot about how stuff works by being close to the metal.
AppEngine rarely throws storage or memcache exceptions outside of maintenance periods anymore. Unless there's someone here who's had a different experience, I would consider them rare.
Use as few abstractions as possible, no matter which language you are choosing.
If you don't need something that interacts with an external system done right away, stick it in a task queue, no matter how quick you think it will run. Assume that it'll succeed in your code (it probably will, eventually).
One particularly good use case for App Engine is a simple back end for thick client applications that do most of their heavy lifting on the client side.
I'm working on a moderately sized app that uses a fairly complex range of data, but I only have a few tables with a few columns each in the datastore. All the relational complexity of the schema is tucked away in blob fields and handled on the client. Besides greatly streamlining the back end code, it also cuts down on cpu cycles drastically by dumping most work to the user's machine, thereby saving lots of money.
I'm using this approach with a Flex UI, but it applies just as well to javascript.
As a programming noob, I've used both RailsTutorial's method (RoR/Git/Heroku), DjangoBook's Method (Py-Dj/no mention of version control/Apache+mod_python), and GAE's method.
I found RailsTutorial to be the easiest for my noobishness - though all were admittedly pretty good. GAE didn't seem easily portable at all compared to the other two - which was concerning, while DjangoBook's system seemed incomplete without teaching version control - like R-T does. For noobs, I like the easy way RoR/Git/Heroku are together, but I like Python better.
I'm anxious for http://www.Djangy.com to come out as the Heroku for Django and encouraged to see them use Git and not Hg.
It is an interesting conundrum right now. I am considering investing in using Ruby/Rails DUE TO Heroku. If djangy was ready and of comparable quality it would be an easy choice for me (I am comfortable in python, completely new to Ruby/Rails). But having to compare Heroku vs GAE vs Python-Stack is much more complex. I am not a fan of the limits in GAE, but I don't want to deal with system administration at this point. But I want to use python. Arg.
Yea, I feel like I'm in the same boat - only I'm the "business guy" at my startup so I do this to relax and learn about Python (which our code-base is mostly in).
HN member: endlessvoid94 is the guy behind Djangy - I hope he sees a lot of us begging for this!
Just wanted to say that we're happy with GAE too. We use GAE for small, fast projects, like our main "brochure-ware" website. For big, burly projects, we use Amazon.
This really isn't about good and bad... it's about right tool for the job. The guy yesterday whining about GAE picked wrong.
I understand your article and benefits you seem to have received through GAE. But apart from sticking it through, I don't see useful message. I would like to know exact scalability issues you faced and how you leveraged GAE infrastructure to tackle them.
We face the same scalability issues as most other startups: a fairly steady stream of normal traffic and the occasional big spike from big mentions on blogs/TV/etc.
We could have done it the same way that we did with DotSpots: EC2 + MongoDB/MySQL. Instead we chose AppEngine to host our system and have spent very little time managing infrastructure.
An added bonus: we've also spent somewhere around $2.00 total to keep the lights on.
The main thing keeping me off AppEngine for my current project is that there doesn't seem to be an easy way to populate the blobStore. If there was an easy way to upload 10000 binary blobs I'd switch tomorrow.
I have tried quite a few different languages/frameworks on App Engine and Gaelyk is the one I have stuck with. I love it now. If I don't know how to do something in Groovy there is no hiccups because I can just default to writing Java code. Pairing it with Objectify makes it a very productive, fairly lightweight setup.
Yes, it's great. There are some limitations in the types of queries you can run, but other than that it's a pretty seamless transition. You also still have full access to the GAE framework's api if you need to do something specialized that web2py doesn't cover, though I haven't needed to do this very often (it also breaks portability of course). Overall an excellent combination.
The main point of this article is how GEA can scale very very fast... Basically, it sounds like: "you can win a lottery too, and GAE is there to scale in case you win a lottery".
Of course, installing everything on more expensive EC2/rackspace machine / mysql service would take half of a day, ....
Using heroku would be way more easier, but heroku is not so cool and hippie as Google.
I don’t think it has anything to do with hype.
Heroku is a great choice to run apps written in ruby. It is the only language they support — with support for node.js coming soon.
GAE, through Java, can run a multitude of languages, basically everything that runs on the JVM. Not every language is a good choice for GAE, but if you want, you can.
You’re right about installing many things on EC2/Rackspace is becoming easier every day. But tuning the system to get the most out of the infrastructure requires advanced knowledge in all layers of the stack. On appengine, you just have to write your app, you don't worry about the infrastructure. It’s a fairly good thing for many people.
With the generous free quotas, you don't have to pay a cent until your project gets some traction, which is not the case with EC2 for instance, because they charge you by the hour and not by the amount of resources used.
GAE is not perfect, but it enables small teams to rapidly launch prototype and iterate until the project takes off. Whether you have 10 users per day or 10M, you’re running on the same stack, built by some of the brightest engineers at Google.
Yes you’re right, I had forgotten this. I’ll argue that their nature is different though. small-EC2 won’t help you deal with a surge in traffic like GAE will. But small-EC2 will let you install anything you want. And like the OP pointed out, both are not mutually exclusive.
Heroku would be easier for some, but not myself. I'm not a Ruby developer. I do most of my work in Java and dabble in Python for the occasional project. AppEngine fits this perfectly.
We spent three years working on DotSpots on the EC2 platform. While you can certainly boot up an instance of your application and get it running, there's still a lot more infrastructure work required to keep it going, scale beyond a single webserver, etc. On top of that, you've got to keep something like Pagerduty running to make sure that your instances aren't wedged or your MySQL box isn't running close to its storage limit.
The big benefit for me was being able to chose a language I'm strong in and just deploy something that automatically scales. This is particularly important during the early phase of the project where you are a lone coder doing all of the work.
The main point of the article was to pick the right tool for the job. The poster even mentions they are using EC2 for videos instead of wedging it into GAE.
Of course Heroku can be easier if you're using Ruby/RoR, but again using Ruby means Heroku is the right tool. Heroku is useless for someone that wants to use Java or Python to build their site.
At the end of the day a real entrepreneur doesn't care what's cool or hip and instead one focuses on the end product and how he can deliver it.
This, in my mind, is the most insightful line in this post. When cloud hosts like Heroku and GAE are discussed on HN often there is cost comparison between using them and doing the sysadmin yourself on Amazon, or on your own hardware. But what doesn't figure in is that with Heroku and GAE (I suppose more with GAE) is that you aren't just getting out of doing your own sysadmin work, you are also getting out of doing a lot of scaling work yourself. This is expensive, difficult work. Of course it is the type of work that many of us here salivate over, but that's another issue... ;)