
Ask HN: How to transform event/clickstream data into personalized user metadata - tkhunta
I would like to capture every time a user views a brand, seller, or listing on our app. Then I would like to have a method that returns a list of their brands, sellers, or listings based on the ones they visit the most.<p>I see a lot of software like mixpanel allows you to capture click stream data, but that seems more like &quot;business analyst wants to track x stat against y stat in the last 30 days.&quot;<p>Requirements:<p>- relatively low cost (we&#x27;re small-medium scale)<p>- can&#x27;t use postgres extensions like pipelinedb (on gcloud managed sql)<p>Current idea:<p>Create an event record in postgres (e.g. user_id, &quot;viewed&quot;, entity_id, entity_type)<p>then write an sql query to fetch their favorite brands, sellers, etc.. based on the aggregation&#x2F;sum of their events in real time.<p>Concerns:<p>a) Should raw event data be stored in postgresql? Won&#x27;t it just cause an unnecessary amount of writes, and requires involvement to maintain performance. We would probably have 10M events every few months.<p>b) Moving to something like big query, red shift, or whatever &quot;analytics&quot; db seems like the wrong tool &#x2F; overkill. We would like to rapidly query the data for user facing apps, gather personalized stats for users on the fly (e.g. &quot;my favorite brands&quot;), these tools seem like they&#x27;re more for generating reports, big data analysis, etc...<p>c) this leads me to believe that raw event data should be stored somewhere like s3, and aggregated offline, and the results be stored in postgresql for rapid querying<p>Am I over complicating this?
======
rywalker
I'd suggest using a service like Segment or MetaRouter.io to collect the event
data, then use Airflow to prep it.

