Hacker News new | past | comments | ask | show | jobs | submit login
PyCon 2011: How Dropbox Did It and How Python Helped (blip.tv)
135 points by trevorb on March 13, 2011 | hide | past | favorite | 25 comments

The really interesting technical bits start at 18:17. "We use Python everywhere. 99.99% of the code at Dropbox is in Python ... People talk to each other in Python, express ideas in Python ... and it works."

~22:23: To reduce the CPU time your app uses, make sure all your inner loops are in C

Also, "Optimizing CPU is easy, Optimizing Memory is Hard"

I was at the keynote and his explanation about using any() to move a tight loop to C blew my mind. I had never thought about optimizing that way, and that is a very enlightening look at making tight loops faster. Quite a dramatic improvement, too.

Correct me if I got it wrong, but does it mean that putting any Python function in any() makes it run in C???

From my understanding, the any() version is faster is primarily because it avoids looking up .update() method each iteration. The usual Python way of optimizing such cases is:

    # the original slow version
    # 2.87753200531 sec
    def run():
        _md5 = hashlib.md5()
        for i in itertools.repeat("foo", 10000000):

    # slight change
    # 1.82029104233 sec
    def run():
        _md5 = hashlib.md5()
        update = _md5.update

        for i in itertools.repeat("foo", 10000000):

    # using any()
    # 1.50683498383 sec
    def run():
        _md5 = hashlib.md5()
        any(itertools.imap(_md5.update, itertools.repeat("foo", 10000000)))

The iterator is consumed in C using any(), instead of the Python bytecode being executed to perform the loop. I don't know if it'd work for "any Python function," but I'd definitely look at it for long-running iterators.

Just a little not so readers don't get the wrong impression:

- imap returns an iterator, any is used to "force" the iterator run through all elements.

- the loop runs on C because of any. Instead of a for loop where jump and body are repeatedly evaluated by the interpreter, any consumes the iterator through a loop implemented in C.

does anybody know if AppEngine supports any() and itertools ?

Python 2.5 has any (and all) Python 2.3 has itertools so Google App Engine(2.5) should have both of them.

I'm at pycon and really enjoyed this keynote yesterday. The numbers he gave were amazing. More files shared per day than tweets on Twitter! 1 million files every 15 minutes.

Just a point of clarification, I think it's files synced per 15 minutes, not shared. Although I suppose shared is a loose term cause you could be sharing with the Dropbox server or your other devices.

slides are here: http://db.tt/ZWBTfdl

I love how at around 06:15 he takes a snack break right in the middle of his talk.

I love python. It the swiss army knife of the coding work whether its web development or file management, python can do it...in a clean and concise manner.

Great talk. I really appreciate when conferences put up videos. I'm 15, so I can't make it there yet :)

Dropbox and Python. 2 of my favorite things.

Can someone please post the summary? Will really appreciate all the help. Totally OK, otherwise ;)

Video was down for me; this is a direct link to the mp4


Great talk. I didn't, like, like the excessive "like" in the beginning, but it got better :)

He graduated in 2008, so he's only ~23, and it's his first conference. I think he did a good job for his inexperience. It takes time to learn to calm down on stage and stay composed.

He also has a bit of lisp, so yea, that he put on such a good speech is an impressive feat of self-confidence.

Because people with lisps should default to not being confident or capable?

I don't think he's suggesting that they _should_ default to lack confidence, but rather that many are very self-conscious about it, so it's probably even more difficult for him. Hence, his performance is even more impressive.

Yeah, you're right. And it was only "like". "uhm", "uhh", "like", and "basically" can get pretty bad sometimes.

Dropbox is the last solution for filesharing.

Once they integrate a way to sync google docs & docs.com then they are done!

Is that because, Google docs or docs?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact