
Are Google employees being discouraged from using Python for new projects? - megamark16
http://groups.google.com/group/unladen-swallow/browse_thread/thread/4edbc406f544643e
======
iseff
It sounds less like they're being discouraged from using Python, and more like
they're being encouraged to think critically about what sorts of projects
Python would excel at.

Put another way, they're being encouraged to use the right tool for the job.

~~~
amichail
What's wrong with prototyping a service using Python and if it takes off port
it to a better language for scalability?

~~~
lacker
The problem is that at Google, "taking off" is often equivalent to
"launching". When Google launches stuff, it can have millions of users on day
1.

~~~
aristus
This. Working for a company with a large audience means that your problem
often is no longer getting people to pay attention. Your problem is getting it
right, at scale, in multiple languages and locales. This can be alleviated
somewhat with internal tests, invite-only alphas, bucket testing, and "labs"
features.

~~~
rbanffy
That and good monitoring.

I always try to launch services with very detailed server monitoring - I want
to know how much memory is being used with what, how much I/O and how much
time the CPUs spend doing non-application stuff. I want to monitor response
times, queue and dataset sizes and anything that helps me say if we will need
more servers, different servers or what parts of the application we should
port to amd64 assembly.

~~~
voxio
Out of interest.. How do you do that?

~~~
rbanffy
Munin and some custom plug-ins. It does not give me all the data I would like,
but we found a bug the other day by looking at some graphs and how one related
to the rest of them.

------
staunch
The biggest risk Google faces is _not_ developing some awesome new thing.
There is very little risk that they will develop some awesome new thing, but
not be able to make it scale.

If discouraging developers from using Python means that one great project
doesn't happen it's probably a net lose for them.

~~~
joe_the_user
But would it mean that? Would using server-side javascript make the project
less likely to happen? How many projects absolutely depend on the language
they're implemented in - especially given that javascript gives you all the
scripting features that c++/java lack.

~~~
SwellJoe
_Would using server-side javascript make the project less likely to happen?_

Yes. The library infrastructure of server-side JavaScript is pitiable compared
to Python (or C++ or Java). This will change, I have no doubt, and I believe
JavaScript will become the most popular language for practically everything
short of systems programming in the not too distant future (7-10 years,
perhaps), but it's definitely not as easy to build a server-side app in
JavaScript today as it is in Python, Ruby, Perl, etc.

~~~
litewulf
I'm using Rhino on the JVM.

Imagine writing Java code, without having to type classnames all the time and
you have a rough approximation of what its like.

(And yes, you can import arbitrary Java bits into Rhino. It is "freaking
sweet")

~~~
SwellJoe
But Rhino has the same performance issues as Python and Perl and Ruby, doesn't
it?

According to one benchmark ( <http://ejohn.org/apps/speed/> ), Rhino is
generally several orders of magnitude slower than Spidermonkey and Tamarin.
Which probably means that it is also orders of magnitude slower than Python,
Perl and Ruby.

Since JavaScript was suggested as an alternative to Python to act as a
reasonable substitute for C++ and Java for performance reasons, suggesting a
JavaScript implementation that seems to be _dramatically_ slower than even
Python seems nonsensical.

While JavaScript is a lovely language, and I'm all for it being used more on
the server-side, Rhino only solves the lack of libraries problem when compared
to Python...it does not solve the performance/memory problem of using Python
at Google scale.

Unless, of course, things have changed dramatically since any of the
benchmarks I found were run.

~~~
litewulf
Rhino is a mature project, so at least for my purposes, this was actually a
bit more important than performance.

The other thing is that it allowed my team to rewrite pieces into Java as
needed without it looking dramatically different.

------
Kirby
Also, keep in mind that you have to get _very_ big before this becomes an
issue. If you're at Yahoo, Google, MSN - yes, language issues can become a
performance design consideration.

If you're at merely a big site, like Ticketmaster, IMDb, or Livejournal, with
good software design you can handle a lot of load with reasonable
responsiveness. (All those three sites are written in perl, in fact. I've
worked for one of them.)

If your page views per day on your project aren't peaking in the billions,
you're probably better off optimizing for the language that your team is most
competent in.

~~~
fauigerzigerk
No you don't have to be _very_ big, not even big, for the language to become a
design consideration. You just have to do more than ship a few strings back
and forth between the browser and the database.

------
Tichy
"I don't think it's possible to make an implementation like CPython as fast as
an engine like V8 or SquirrelFish Extreme that was designed to be fast above
all else."

Are they saying that JavaScript is already faster than Python? I've only
recently started using some Python, and while it seems to be a decent
language, I have not yet seen much that would make me prefer it over
JavaScript. Some things are a bit smoother in Python, others are smoother in
JS. Here's hoping that JavaScript will win :-)

~~~
rbranson
V8 is very impressive. It is one of the fastest virtual machines for dynamic
languages. Look where it scores on the shootout.

[http://shootout.alioth.debian.org/u32/benchmark.php?test=all...](http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=all&box=1)

It is nearly 10x faster than Ruby 1.8 and 5x faster than CPython. Of the
shootout, it is the fastest dynamically typed language other than LuaJIT.

~~~
jrockway
Uh, SBCL is dynamically-typed.

~~~
bad_user
The SBCL tests submitted to the shootout are using type annotations heavily.

In that test it doesn't count as a dynamic-typed language.

~~~
jrockway
So is C dynamically typed when it converts an integer to a pointer?

~~~
bad_user
No, that's called loosely typed.

Having type annotations is a big deal because you can infer certain things at
compile time.

Take for instance (+ x y). In CLisp "+" is a multi-method, the dispatch being
done at runtime, but if you know that x and y are both integers, there's no
need to search the right method to call at runtime, its address is already
known. And then you can also decide to inline its code at compile-time if
there aren't any conflicts (mostly like a macro, but without the laziness).

Of course, in dynamically-typed languages you have the freedom to infer these
things at runtime, in certain situations you can infer the types of "x" and
"y", you can use a cache for the call sites, and so on, but it's a lot more
complicated, and one of the reasons is that for every optimization you do, you
have to be ready to de-optimize when the assumptions have been invalidated.

That's why I said SBCL doesn't count as being dynamic in that test because it
probably makes full use of those static annotations.

------
lacker
It just depends on what sort of project it is. If you know that you will get
millions of users per day starting from day 1, then you have to design for
some level of scale from the beginning. Prototypes in Python are common at
Google, but usually it will have to be rewritten before launch.

------
maigret
So (at least small) Python programs tend to have less lines than Java ones -
OK. That makes them maybe less bug prone. But Java, being laughed at all the
time, has got to be one of the most mature and supported programming languages
out here. It means it have an outstanding toolset on every platform, and that
the Virtual Machine is extremely well optimized. Damn, Java is almost as fast
as C++ when written well. After that, it depends on the programmers skill.
That's another discussion :)

------
coffeemug
The scalability argument doesn't make sense to me. When you're dealing with
Google's scale, you need to parallelize horizontally, and you need to design
your software in a way which lends itself to horizontal parallelization. For
this type of software how much you squeeze out of a given machine is almost
irrelevant to scalability (other than the cost of maintaining the extra
machines). Of course once the cost goes up too much, you can always rewrite in
C.

~~~
ewjordan
The cost _starts out_ too high to justify releasing something that's not
optimized when you're operating at Google's scale, especially since the
development costs are largely up front whereas the ongoing compute costs keep
going for at least a few years. If they didn't keep an eye on this stuff,
those factors of 2-10x would eat them alive.

For some value of M and N, having N million users around for M months means
that development costs are actually cheaper than compute costs, and I'd assume
that they've now reached a point where most new products expect to see more
than that magic number of users. A few engineers for a few extra months is
only ~100k, and I have no trouble believing that many Google products cost
them at least that much over their lifetimes due to resource consumption.

The rest of us can safely ignore all this because our products actually need
to grow before we pass that threshold, at which point we deal with the issues;
Google is in an enviable position where no optimization is premature.

Even for the rest of us, though, given the performance differences in Python
vs. Java (best case 2-4x, worst case more like 40x) relative to the
productivity differences (I can't believe this is more than 10x, even for
someone very comfortable in Python and merely competent at Java), I'd suspect
that many high use software projects even outside of behemoths like Google are
actually cheaper in the long run if they're done in Java than they would be in
Python.

Prototypes are another issue altogether, but I haven't seen anything that says
Google is discouraging people to do those in whatever language they want;
AFAIK it's production code that we're talking about here.

~~~
btilly
Some concrete numbers on the productivity differences would help.

The chart in _Software Estimation_ by McConnell, adapted from _Software Cost
Estimation with Cocomo II_ is that projects are 2.5 * bigger in C than Java,
and 6 * bigger in C than they would be in Perl or Smalltalk. Assuming that
Python is equivalent to Perl, that would mean that Java requires 2.4 times as
many lines. Those estimates are from 2000. Both languages have improved since
then, but I'll go with that estimate because I don't have more recent figures
backed up by quantitative data rather than someone's opinion.

Research across many languages suggests that lines of code/developer/day are
roughly constant, so Python development should average 2.4 times as fast as
Java.

Let's suppose that your programmers are paid $80K/year. On average people cost
about double their salary (after you include benefits, tools, office space,
etc), so each programmer costs you $160K/year. Per Python programmer replaced
you need 2.4 Java programmers which means an extra $224K/year in cost. Let's
suppose your computers have an operating cost of $14K/year. (I am throwing out
a figure for regular replacement, networking, electricity, sysadmins, etc.)
Then the cost per Python programmer replaced will cover 16 more machines.

And it can get worse. Research into productivity versus group size suggests
that productivity peaks at about 5-7 people. When you have more developers
than that the overhead for communication exceeds productive work done unless
you introduce processes to limit direct communication. Those processes
themselves reduce productivity. As a result you don't get back to the same
productivity a team of 5-7 people has until you have a team of around 20
people. If you grow you'll eventually need to make this transition, but it
should be left as long as possible.

Therefore if a Python shop currently has less than a dozen webservers per
developer, switching to Java does not look like it makes sense. And
furthermore if you've got 3-7 Python developers then the numbers suggest that
a Java switch will force you over the maximum small team size and force you to
wind up with a large team at _much_ higher expense.

For this reason I believe that the vast majority of companies using agile
languages like Perl, Ruby and Python would be worse off if they switched to
Java. When you serve traffic at the scale of Google this dynamic changes. But
most of us aren't Google.

~~~
nearestneighbor
> Research across many languages suggests that lines of code/developer/day are
> roughly constant, so Python development should average 2.4 times as fast as
> Java.

Some people, with more Java experience than me, claim that modern IDEs like
IntelliJ make them as productive with Java as they would be with Python.
Java's verbosity is "mechanical", and if your tools help you with that, it's
hard for me to buy 2.4x productivity difference. You are also assuming that
static typing has no effect on how many people can work together. In Python
(which I use for prototyping and appreciate), one might often wonder "does
this method take a class, an instance of a class, or the returned value as an
argument", etc.

~~~
btilly
Debates on relative productivity are endless. I used the only numbers I have
available that are backed by actual quantitative data rather than opinion. I'd
love to see more up to date numbers.

However even if you assume that the difference is much smaller, say a factor
of 1.2, then you can afford an extra 2.3 computers for every developer. Oddly
enough in the various companies I know well with small teams of experienced
scripting programmers I've never seen that high a ratio of webservers to
developers, so even so switching to Java doesn't make sense.

On the question how many people can work together, needing to talk about data
types is such a small portion of what people talk about that I would be
shocked if it changes where the cutoff is between where small teams break
down, or where large teams become as productive as that peak. That said I
fully agree that Java is designed to let large teams cooperate, and that's
likely to matter when you have teams of 50+ programmers. However if the
productivity difference really is a factor of 2.4, and we assume linear growth
in productivity for large teams, then you'll actually _need_ a team of 50 or
so programmers in Java to match the team of 5-7 programmers working in a
modern scripting language. Given that, if you're working in a Java team below
that size you should seriously ask yourself whether having a team size that
requires getting that many people working together is a self-inflicted
problem.

------
yason
Where comes the imperative that Google ought to always consider using Python
for new projects? Or more Python in general?

They already contribute to Python and apparently use it in production, too,
but I don't see that translating to the above.

------
draegtun
Now if "all" the opensource dynamic languages could run off the same
opensource VM then that would be something?

The combined effort of all communities working towards producing a scalable &
fast VM could make a big difference?

Is it time to think Parrot?

~~~
easp
Are you posting this from 2005?

~~~
draegtun
Are you posting from 1991? ;-)

------
megaman821
If Google had a choose their development stack today without worry about any
legacy development, what do you think it would look like.

I think Python/C/Erlang or Jython/Java/Erlang would be a good fit.

------
Virax
Question popped into my head: why is Colin Winter wasting his time with idle
chit-chat like this? Is he bored of Unladen Swallow already?

------
mkelly
No. Next question?

------
jxcole
<sarcasm>Maybe they should just develop everything in lisp!</sarcasm>

PS Here's to hoping the W3C adopts the sarcasm tag.

------
fjabre
Java instead of python..? ick..

Also, with the resources that Google has I'm a little baffled as to why they
couldn't devote some serious effort into getting python up to spec performance
wise.

~~~
rgoddard
They are, hence the Unladen Swallow project. But the improvements are that are
possible are constrained by the language and having to maintain backwards
compatibility. If Google were to just take the language and do with it what
they want, they would risk alienating the community which would weaken the
utility of the language. So while improvements are being made, the types of
improvements are limited.

~~~
fjabre
Thanks. Makes a little more sense now.

So basically: it's complicated.

