
Realtime pixel tracking with nginx, syslog-ng, and Redis - benwilber0
http://benwilber.net/realtime-pixel-tracking-nginx-syslog-ng-redis
======
seiji
Sir, I can't tell if you're brilliant or insane.

For those who didn't read the post: he's formatting syslog messages as redis
wire commands, telling syslog to forward those messages to redis over the
network (normally you're forwarding to another syslog client, but because of
the custom message format here, it acts against a redis server), so syslog
ends up as a write-only redis client.

Insane. and brilliant.

~~~
spc476
He's already patching nginx to support logging via syslog, so why didn't he
just patch nginx to log directly to redis? One less place for things to go
wrong.

~~~
seiji
Some ad contracts require you keep server logs of hits and not post-processed
versions in your DB (in case any discrepancies come up). Running through
syslog allows you to centralize where your logs are being processed/sent since
each log hit can branch out to multiple endpoints.

~~~
spc476
Okay, that I can buy.

------
gfodor
Nice. A similar (in spirit) hack I've used is to host the pixel on a CDN (like
Akamai or Cloudfront) and run map reduce over the logs there. You don't get
real-time, like this, but look ma, no server! CloudFront is especially useful
since it can log directly to S3 and you can run elastic map reduce directly on
those files.

~~~
wiremine
I've started doing this as well, and it works well. Cloudfront lists their log
delivery as "best effort" which gave me pause me at first. However, after a
month of testing, I've seen the stats to be on part with Google Analytics.

I'd love to see an open source package emerge to implement the collection and
the data processing. Cloudfront + Redshift would rock.

~~~
alexatkeplar
It exists :-) Check out
[https://github.com/snowplow/snowplow](https://github.com/snowplow/snowplow)

~~~
wiremine
Awesome! Thanks for the link

------
jbyers
As others have mentioned, nginx + one of the redis (or lua-redis) modules does
this very well without the complexity of syslog in the middle. We load many
millions of values a day via httpredis2. It's been rock solid.

[http://wiki.nginx.org/HttpRedis2Module](http://wiki.nginx.org/HttpRedis2Module)

We log the same requests to a file in a custom log format that gets batched to
s3 and then Cassandra and EMR/Hive. Makes a great platform for realtime +
historical analytics.

~~~
ClifReeder
Seconding this. I've been doing something similar using OpenResty[0] and
Redis. Handles millions of page views a day on a pretty low end server without
breaking a sweat. Documentation on OpenResty is kinda tricky to wade through,
but man is it lightweight and fast.

[http://openresty.org/](http://openresty.org/)

------
zvikara
A more sane approach would be to script nginx using lua and lua-resty-redis.

------
joshstrange
I've been playing around with Docker and containers a little recently and I
decided to test my skills in creating a container, installing software, and
packaging it up for re-use and possibly even try to do it all in a Dockerfile
after I do it through just "docker run -i -t ubuntu /bin/bash/".

I followed your guide but I am not able to see the Redis records. I was able
to find pixel.log in /var/log/nginx and inside was what I would expect to see
in Redis:

    
    
        1379192937.393,test=2
        1379193104.365,test=5
        1379193105.308,test=7
        1379193106.311,test=28
    

.....

At this point I went to check the syslog-ng program as I realized I never
verified it was running/working. I am getting a "Error parsing source, source
plugin system not found in /usr/local/etc/syslog-ng.conf at line 2, column 3:"
error and after some googling I found some people suggesting[0] adding
'@include "scl.conf"' to the syslog-ng config file. I tried added this and it
caused another syntax error. I googled for a while longer to no avail.

I know HN isn't really the forum for tech support but I couldn't find a better
place to post it other than over email (My email is in my profile if you
prefer that). If you have any pointers please let me know. Thank you for any
help you may be able to provide.

[0] [http://comments.gmane.org/gmane.comp.syslog-
ng/15325](http://comments.gmane.org/gmane.comp.syslog-ng/15325)

~~~
spc476
Did you patch syslog-ng?
([https://github.com/yaoweibin/nginx_syslog_patch](https://github.com/yaoweibin/nginx_syslog_patch))

~~~
joshstrange
Yes, I patched it. I think the issue is with syslog-ng as it is complaining
about the configuration file. Nginx is working fine (and patched).

------
hackerboos
If you are like me thinking why on earth would you do this instead of using
Google Analytics there are 2 reasons:

1\. Javascript disabled

2\. Tracking emails opened

[http://skillcrush.com/2012/07/19/tracking-
pixel/](http://skillcrush.com/2012/07/19/tracking-pixel/)

------
Sirupsen
If you're just popping off a queue, using something like POSIX Message Queues
of SysV IPC might be nice to avoid depending on Redis on every single app
server.

~~~
benwilber0
Increases the complexity immensely. Anyone can do this. Not everyone can use
POSIX msg queues or IPC effectively.

------
thezilch
Now, can syslog-ng be made to wrap each flush in a Redis pipeline?

~~~
benwilber0
This is ideal :-)

------
pyotrgalois
Good hack.

------
misiti3780
this is GREAT!

