
Install Snowplow Analytics on the Google Cloud Platform - cosmie
https://www.simoahava.com/analytics/install-snowplow-on-the-google-cloud-platform/
======
cosmie
Snowplow Analytics[1] is a really great tool for web analytics, especially for
companies that want to _own_ their data or have privacy concerns. It's pretty
straightforward to replicate a lot of the reporting that you can do with
Google Analytics, while giving you full access to the clickstream data that
hampers more custom analysis within GA (at least in the free version, the
premium GA360 product allows raw access to the clickstream data Google
Analytics collects[2]).

[1] [https://snowplowanalytics.com/](https://snowplowanalytics.com/)

[2]
[https://support.google.com/analytics/answer/3437618?hl=en](https://support.google.com/analytics/answer/3437618?hl=en)

~~~
shermozle
You can in fact use it to get the raw GA clickstream data into a database, in
real-time. You send in a copy of the GA data alongside sending to Google.

~~~
cosmie
Thanks for stating that directly! Re-reading my comment, it didn't come off as
clearly as intended. And in fact, Simo also has a guide that walks through
doing precisely what you mention[1].

What I meant was that with a standard and free GA installation, barring a paid
GA360 subscription, you can't gain raw access directly to the GA clickstream
_from_ Google. And GA has a lot of unintuitive assumptions in their data
processing that people only tend to become aware of once that assumption
spectacularly breaks down for their use case, and you can't do any post-
processing to account for it historically, because you can't access the
clickstream data.

An alternative to paying for GA360 would be to duplicate your clickstream to
Snowplow, using a technique like [1]. Which allows you to both leverage the GA
interface and integration for run-of-the-mill needs where it works, and fall
back to the duplicated data within Snowplow once you hit a wall inside of GA.

It's a really successful, cost effective, low-friction method I've used in
organizations with a really immature analytics environment that's expected to
mature over time. The familiarity and accumulated knowledgebase around GA
makes it easy to get started, then cut over to Snowplow as needs and use cases
evolve, before eventually hitting a critical point where the investment in
GA360 finally makes sense. It creates an incredibly low-friction maturation
process for end users, since the data models and historical data are
consistent throughout. So there's minimal change management and consistent
reliability related to any modeling, reporting, use cases,
knowledge/training/skills transferability, etc that you've built up
internally. It's analogous to abstracting the backend architecture behind a
consistent and stable API, so you don't break anything along the way.

[1] [https://www.simoahava.com/analytics/automatically-fork-
googl...](https://www.simoahava.com/analytics/automatically-fork-google-
analytics-hits-snowplow/)

