Hacker News new | past | comments | ask | show | jobs | submit login
Realtime pixel tracking with nginx, syslog-ng, and Redis (benwilber.net)
125 points by benwilber0 on Sept 14, 2013 | hide | past | web | favorite | 27 comments

Sir, I can't tell if you're brilliant or insane.

For those who didn't read the post: he's formatting syslog messages as redis wire commands, telling syslog to forward those messages to redis over the network (normally you're forwarding to another syslog client, but because of the custom message format here, it acts against a redis server), so syslog ends up as a write-only redis client.

Insane. and brilliant.

He's already patching nginx to support logging via syslog, so why didn't he just patch nginx to log directly to redis? One less place for things to go wrong.

Some ad contracts require you keep server logs of hits and not post-processed versions in your DB (in case any discrepancies come up). Running through syslog allows you to centralize where your logs are being processed/sent since each log hit can branch out to multiple endpoints.

Okay, that I can buy.

Brilliantly lazy, I would say, since he used out-of-box software to do the piping, he didn't have to write any code. My first inclination would be to format nginx's logs to a JSON array like such:

log_format json '["$time_local","$msec","$remote_addr","$remote_user","$http_host","$request","$status","$upstream_addr","$upstream_cache_status","$bytes_sent","$request_time","$upstream_response_time","$http_referer","$http_user_agent","$http_x_forwarded_for","$http_cookie","$upstream_http_x_session_key","$upstream_http_x_session_user"]';

(just copied and pasted from my own config, many fields are not relevant to this use case in particular).

Then log to a named pipe into a program which would then RPUSH the entries into a LIST or maybe PUBLISH them to a channel, from where several different analytics programs would be running for several different metrics that depend on the same data, so that they could run in parallel.

Can you please break up the 3rd line of your post, it is badly breaking the comment page layout.

The HN stylesheet really needs a "word-wrap: break-word;" in there somewhere.

This is what I have in my Stylish extension for HN:

@namespace url(http://www.w3.org/1999/xhtml); @-moz-document domain("news.ycombinator.com") { p { word-break: break-all; } }

Awesome, thanks!

Hacker. In all senses of the word. Brilliant post!

Nice. A similar (in spirit) hack I've used is to host the pixel on a CDN (like Akamai or Cloudfront) and run map reduce over the logs there. You don't get real-time, like this, but look ma, no server! CloudFront is especially useful since it can log directly to S3 and you can run elastic map reduce directly on those files.

I've started doing this as well, and it works well. Cloudfront lists their log delivery as "best effort" which gave me pause me at first. However, after a month of testing, I've seen the stats to be on part with Google Analytics.

I'd love to see an open source package emerge to implement the collection and the data processing. Cloudfront + Redshift would rock.

It exists :-) Check out https://github.com/snowplow/snowplow

Awesome! Thanks for the link

As others have mentioned, nginx + one of the redis (or lua-redis) modules does this very well without the complexity of syslog in the middle. We load many millions of values a day via httpredis2. It's been rock solid.


We log the same requests to a file in a custom log format that gets batched to s3 and then Cassandra and EMR/Hive. Makes a great platform for realtime + historical analytics.

Seconding this. I've been doing something similar using OpenResty[0] and Redis. Handles millions of page views a day on a pretty low end server without breaking a sweat. Documentation on OpenResty is kinda tricky to wade through, but man is it lightweight and fast.


A more sane approach would be to script nginx using lua and lua-resty-redis.

I've been playing around with Docker and containers a little recently and I decided to test my skills in creating a container, installing software, and packaging it up for re-use and possibly even try to do it all in a Dockerfile after I do it through just "docker run -i -t ubuntu /bin/bash/".

I followed your guide but I am not able to see the Redis records. I was able to find pixel.log in /var/log/nginx and inside was what I would expect to see in Redis:


At this point I went to check the syslog-ng program as I realized I never verified it was running/working. I am getting a "Error parsing source, source plugin system not found in /usr/local/etc/syslog-ng.conf at line 2, column 3:" error and after some googling I found some people suggesting[0] adding '@include "scl.conf"' to the syslog-ng config file. I tried added this and it caused another syntax error. I googled for a while longer to no avail.

I know HN isn't really the forum for tech support but I couldn't find a better place to post it other than over email (My email is in my profile if you prefer that). If you have any pointers please let me know. Thank you for any help you may be able to provide.

[0] http://comments.gmane.org/gmane.comp.syslog-ng/15325

Yes, I patched it. I think the issue is with syslog-ng as it is complaining about the configuration file. Nginx is working fine (and patched).

If you are like me thinking why on earth would you do this instead of using Google Analytics there are 2 reasons:

1. Javascript disabled

2. Tracking emails opened


If you're just popping off a queue, using something like POSIX Message Queues of SysV IPC might be nice to avoid depending on Redis on every single app server.

Increases the complexity immensely. Anyone can do this. Not everyone can use POSIX msg queues or IPC effectively.

Now, can syslog-ng be made to wrap each flush in a Redis pipeline?

This is ideal :-)

Good hack.

this is GREAT!

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact