Hacker News new | past | comments | ask | show | jobs | submit login

I take it everything is based on schedules, there’s no runtime magic to handle incremental updates in a timely or efficient way? I’ve been watching @frankmcsherry’s Materialize.io with interest, for example.



Possibly controversial, but I don't believe there are that many situations in data analytics where you really need realtime/streaming analytics. It usually comes at the cost of significant additional complexity and maintenance, that ultimately hurts reproducibility and agility.

Having said that, we are working on triggering table builds from watching changes to consumed data sets, which paired with an incremental table build would allow you to reliably achieve latency of under a few minutes.


I disagree.

Near realtime (say sub 1-minute) feeds can be leveraged to extract a lot of value out of the data you're collecting. Perhaps maybe not for your typical SaaS startup, but for free-to-play video games you end up leaving so much value on the table if you can't quickly (automatically) respond to spend or churn indicators.

I think a good rule of thumb for this is looking at how fast your decisions are made. If you're making decisions daily then refreshing your data every 24 hours is probably good enough. If you're making decisions every hour, then every 60 minutes is probably good enough.

Another factor is scale. When you're dealing with 6-figure CCUs and are trying to optimize conversion or retention through split-testing, figuring out which variants are anomalously poorly performing quickly can save you a whole lot of money.

I reckon there's at least a 50 titles that can benefit from streaming analytics, which immediately affects anywhere from 100-500 employees (analysts, engineers) and likely influences over 1000 (broader company). That's a significant portion of the field.


That’s why I don’t like the term real time. Decision time should be the criterion.


Also, a true Data-Warehouse has it's data frozen on a set recurring schedule, say every hour or every 24 hours. This is so that users running related jobs at different times throughout the day see the same results and are working with the same data.




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: