

Analytics on the Cheap - ksri
http://0x74696d.com/posts/analytics-on-the-cheap/

======
tszming
Be sure to check out
[https://github.com/snowplow/snowplow](https://github.com/snowplow/snowplow)
before rolling out your own custom analytics solution..

~~~
0x74696d
Yeah, using the Cloudfront CDN logs is a very similar approach. If we'd known
about Snowplow and it was production-ready when this service were written, it
would certainly be worth looking at.

That being said, this service does double-duty for us: when an analytics ping
comes in from the video player we're also using that same data to write
"bookmarks" for resuming play. This data has different liveness requirements
than the analytics and can't be done as a nightly batch. So we'd end up having
to write our own Collector (in the Snowplow architecture) anyway to perform
that fan-out of incoming events.

~~~
alexatkeplar
Hey 0x74696d - Snowplow co-founder here. Very cool post! The Snowplow Kinesis
architecture gives you "fan-out" for free - you could write your bookmarking
service as a Kinesis KCL app which reads from the Kinesis enriched event
stream written by our Kinesis Enrich
([https://github.com/snowplow/snowplow/tree/master/3-enrich/sc...](https://github.com/snowplow/snowplow/tree/master/3-enrich/scala-
kinesis-enrich)). In this case you'd be using our Scala Stream Collector
(Spray with a Kinesis back-end), not the CloudFront CDN Collector.

~~~
0x74696d
Nice. If I was starting this project today (it's pretty mature at this point)
that'd definitely be something I'd look into.

That being said, when Kinesis previewed it had only Java bindings for KCL. Not
sure if that's still the case, but that'd be a limiting factor for our shop
unfortunately.

~~~
alexatkeplar
You're right 0x74696d - originally the KCL was Java only. The Java KCL now
includes something called the MultiLangDaemon, which means you can write apps
in other languages. There is an official Python KCL
([https://github.com/awslabs/amazon-kinesis-client-
python](https://github.com/awslabs/amazon-kinesis-client-python)) but no
others I know of yet. There's also AWS Lambda for processing Kinesis streams
with JavaScript, and of course you can use Storm or Spark Streaming, although
those are JVMish too.

------
nissimk
Very interesting. You say "Within an hour or so each GET shows up in the logs
for the S3 bucket wherever we’re sending the logs for our S3 example.bucket."
Is this frequency configurable? I also see at the link below that the logs are
best effort and they may drop messages. How is this for you in practice?

[http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.ht...](http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html)

~~~
0x74696d
The frequency isn't configurable. We execute the ingest daily for a 24 hour
window that's IIRC 4 hours old. This works fine for us in practice.

------
socialist_coder
That's really cool. You're saving money in 2 places:

1) not paying for SQS, Kinesis, or some other kind of queue to store the
events.

2) not paying for a backend queue processor job to read the events from the
queue and write them to S3/Redshift.

You still need a backend processor to do the Redshift copy commands, but this
is much lighter weight and you'd never need to scale this process.

You'd also be paying some additional money for the S3 logging but probably
pretty minimal.

Very cool!

------
cedricthecrow
Kinesis is not hosted Kafka in the way that ElasticCache is hosted Redis. It
was developed internally within Amazon.

~~~
pixelmonkey
Source?

~~~
cedricthecrow
Should have added a disclaimer. I work for Amazon.

------
0x74696d
Author here. Happy to answer any questions.

~~~
jamesblonde
The font color is unreadable on ubuntu/chrome.

~~~
0x74696d
Weird. It's #000 on #FFF, unless my meager CSS skills have failed me. Maybe
it's just too thin or something? I'll see what I can do but probably not
today. /points to the markdown link in sibling comment.

~~~
lmm
The 'Lato' font is unreadably thin for me on Linux. If I turn it off (via
inspect element) and allow it to use the next one (Helvetica) then it looks
fine.

~~~
hobarrera
Chrome renders thin fonts stupidly thin on Linux (remember that Chrome uses
it's reinvents the whole wheel when it comes to rendering!).

Are you using chrome by any chance?

~~~
lmm
Yeah, should've said, Chrome.

