

Building Lanyrd - mace
http://lanyrd.com/2011/brightonpy-building-lanyrd/sgptt/

======
coderholic
There's a comment on slide 43 about finding a way to replay access logs. Siege
is a great tool for this, as it takes a URLs file as an argument.

You can get a list of URLs from your apache access log with

    
    
        cut -d ' ' -f7 /var/log/apache2/access.log > urls.txt
    

And then hammer your test server with

    
    
        siege -c<concurreny rate> -f urls.txt

~~~
sciurus
Siege is great. httperf is also worth trying. There's a good guide at
[http://www.comlore.com/redist_files/httperf-quickstart-
guide...](http://www.comlore.com/redist_files/httperf-quickstart-guide.pdf)

------
po
Just went through the slides... watching now. Very interesting. One thing that
is a bit surprising is the diversity of key-value/caching technologies. I felt
like there was a lot of overlap in capability between Varnish, redis,
memcached, mongodb, etc...

~~~
simonw
I'm a big fan of polyglot persistence - we're using MySQL, memcached, redis,
Solr, Varnish and MongoDB now.

MySQL is where all of our key data lives up. I trust it, it's backed up, and
the entire site can be recreated just from the MySQL dump.

Redis and Solr are both used for denormalisation. Solr provides search and our
core calendar view, and is updated every 60-90 seconds by a cron job. It's
replicated, which means that our calendar view (the most expensive page on the
site) scales horizontally with the number of replicas.

Redis powers a few features, most notably pages that show which of your
Twitter contacts are attending an event (a simple Redis set intersection,
which Redis will happily perform 100,000 times a second). It's also used for
our message queue, which means I don't have to run RabbitMQ as well.

memcached is used for caching. I could use Redis for this, but the nice thing
about memcached is that it has a hard memory limit and will throw away keys
without any fuss when it hits that limit. It's also a good idea to keep
resources set aside for caching separate from resources being used for other
purposes, in my opinion.

Varnish is currently just used as a layer in front of our JavaScript badges (
<http://lanyrd.com/services/badges/> ), purely to protect us against a super
high traffic site deploying our badges (all badge requests are cached for 10
minutes, and Varnish handles dogpiling for us). I'd like to use Varnish in
front of the main site as well just for logged out users, but I haven't had
time to deploy that yet. Our badges are also designed to not block the loading
of your site if we're down for some reason - varnish helps a bit there as
well. See our badge performance notes here:
<http://lanyrd.com/services/badges/docs/#performance>

We recently started storing application logs in MongoDB, mainly as an
experiment. MongoDB is very fast at writes, and lets us easily run structured
queries across our logs. I don't care too much about persistence here, since
the data isn't as valuable as e.g. our core database of conferences.

~~~
Swannie
So it this a question of:

Learning new tools and understanding their idiosyncrasies vs. building new
functionality on top of existing tools?

Where learning new tools is more interesting, so it wins? :P

------
mccutchen
I like this approach to static assets, where each file's name is changed to
include the hash of the file's contents on deployment. But I've never
implemented it myself, because I can't figure out a good way to refer to those
static assets in my templates.

simonw, if you're listening, how do you solve this problem?

How do you go from, e.g.,

<link rel="stylesheet" href="/style.css">

to

<link rel="stylesheet" href="/style.{current-hash}.css">

?

~~~
simonw
I'm using a Django template tag, {% static "css/example.css" %}

In development, the above tag would output "/static/css/example.css?0.234234"
- the random number at the end cache busts so e.g. IE will always load the
latest version of the file.

In production, the tag looks up the transformed filename in a dictionary,
which looks something like this:

    
    
        STATIC_ASSETS = {
            "css/core.css": "css/core.b1b09227.min.css"
        }
    

The deployment script includes a bit of code that goes through every file in
the static directory, figures out the hash, renames it and then writes out
that dictionary in a generated static_assets.py file ready to be deployed to
the servers. There's a separate management script that pushes the renamed
files to S3 - I run that before doing a deploy.

The only really fiddly bit is that the script needs to rewrite all of the CSS
files to include the updated filename of any referenced images. I'm using a
dumb regex to do this:

    
    
        css_url_re = re.compile(r'url\((["\']?)([^)]+?)\1\)')
    

Since we control the coding standards for our own CSS, there's no need to do
anything more robust than that.

~~~
arthur_debert
On universalsubtitles.org, we're doing something pretty similar to that.

1) Move all static media to a unique hash (we get that from git's commit id,
making it trivial to correlate code & static) to /[static-media]/[static-
cache]/[git commit uid]/...path to file

2) Set the MEDIA url accordingly

The nice thing about this is that the generated file names are very readable
and you always know what changeset generated it just by looking. It's open
source if someone finds it useful
[https://github.com/8planes/mirosubs/tree/master/apps/unisubs...](https://github.com/8planes/mirosubs/tree/master/apps/unisubs_compressor)

~~~
mccutchen
That's a pretty good approach. I guess the main drawback is that the git
commit hash will change far more often than the contents of most of the static
files, though, right?

~~~
arthur_debert
Of course, but when in development, we use the a template tag that inserts the
original url (no commit hash mangling on MEDIA URL), which means that yes, for
each deployment we nuke statics, but on our dev cycle that would happen any
way (it's very rare to have a release that does not touch static files), so
it's not an issue in practice

~~~
simonw
We've written our CSS and JavaScript to have a bunch of reusable components,
so often we can deploy new features without changing our static assets at all
(we reuse classes for components that are already in use elsewhere on the
site). I can see how for heavier JS sites this wouldn't be worthwhile though.

------
danielknell
I've noticed some multi-variant testing on some of the design elements, and
have rolled things out to beta testers first in the past, is there anything
juicy behind the scenes powering these features?

~~~
simonw
Nothing fancy at the moment - just some if blocks in our templates. We'll
probably start using redis set membership for feature flags in the future
though.

------
sarp
Could you elaborate on your read-only mode?

Is it only hiding functionality that would cause writes or is there more to
it? Is it built into the application logic?

~~~
simonw
Yup, it's hiding functionality that causes writes. Lots of messy template
logic and a few bits of app logic as well. It isn't very neatly abstracted at
the moment.

