Hacker News new | past | comments | ask | show | jobs | submit login
Whom the gods would destroy, they first give real-time analytics (2013) (mcfunley.com)
165 points by sbdchd on July 25, 2023 | hide | past | favorite | 64 comments



Author here. The main thing that inspired this happened a few years before I wrote it down. Etsy had gotten a new CEO, and they spent one of their first few weeks in long hours at my desk, iterating on the homepage design in what could only be described as a radically fast iteration loop. We'd ship a tweak, look at statsd for ten minutes, then change something else. This would have been a bad idea for all of the reasons of statistical validity listed, even if we hadn't built statsd to use UDP.

Emphasizing working on the homepage was also analytically dumb in a more meta way, since item/shop/search were really nearly all of traffic and sales back then. Anyway, I felt motivated to get that person to think first and fire the code missiles second.

At the end of the day, I think back on it fondly even though it was ludicrous. Shipping that much stuff to production that quickly and that safely was a real high water mark in my engineering career and I've been chasing the high ever since.


Isn’t that last sentence sort of a reason to prefer real-time analytics? If you can make development a fast paced game, no doubt you’ll keep your team more productive and engaged. Granted, it needs to be engineered in a way to ensure that productivity is aimed correctly (“how we decide which things we do”) as you point out in your great article.


There is a good chance the OP shipped changes that would have positively impacted the bottom line, but after 10 minutes of real time analytics it was replaced with something else because it performed poorly in a single 10 minute period.

You can ship A/B tests quickly and many websites do, but decisions are made after a statistically relevant time period.


Good question, though what you have in mind might be real time metrics, not analytics. Even then you might not need real time metrics to know whether your rapid changes are breaking things. An already established dev culture built on CI/CD, actionable health checks, feature flags/toggles, easy release rollbacks in emergencies are what you’d want. This way, your deploys are boring and you can focus on introducing new regressions, uh I mean features, fearlessly. :)


No, it’s orthogonal to the analytics


I really liked this article, and I thought this statement hit the nail on the head: "Confusing how we do things with how we decide which things to do is a fatal mistake." I've worked at companies that practice what I call "thrash management" (constantly jumping from one priority to the next based on whichever fire happens to be burning brightest that day) and it's no fun, to put it mildly.

That said, once you build a system for operational metrics (i.e. what you need to detect anomalies that indicate outages, security concerns, etc.) you're already a huge way there towards having real-time analytics. I still wholeheartedly agree with the author that these real time metrics should only be in the service of operations, not product planning.


I suspect a lot of bad design comes from a myopic focus on The Analytics. Yes, there were a lot of failures in the bad old days due to Big Design Up Front and only iterating once a year after you ship 1.0, but we've gone to the opposite extreme. A lot of organizations seem to just jerk and twitch as the raw numeric results come in, with no central cognition.

Build for the future you want, don't just head for the local minima. Take risks and damn the statistics.


Related:

Whom the Gods Would Destroy, They First Give Real-Time Analytics (2013) - https://news.ycombinator.com/item?id=15379660 - Oct 2017 (70 comments)

Whom the Gods Would Destroy, They First Give Real-time Analytics - https://news.ycombinator.com/item?id=6515805 - Oct 2013 (1 comment)

Whom the Gods Would Destroy, They First Give Real-time Analytics - https://news.ycombinator.com/item?id=5032588 - Jan 2013 (55 comments)


I think the point of real-time analytics is not to make product decisions but to get a sense of presence from your product and celebrate with your team.

As an engineer on many teams shipping features I've found that it's somehow underwhelming to finally launch something after months of work. You launch and the only thing you get to celebrate is some donuts in the office and if something goes wrong a notification from Sentry or Datadog :P

I've spent the past 3 years building a product analytics tool (https://june.so) and I think product analytics can deliver some real-time value to teams.

Some of the ways we've built our product to do this is:

- Live notifications in Slack for important events - to get pinged in a Slack channel when users use the feature you just launched

- Achievements on reports for your features - to celebrate the first 5, 10, 25 and 50 users using your product, see the progress live

I think for team morale, especially in the earlier days of a company it's great to celebrate small wins and as engineers we should be more connected to what happens inside of the products we build - not only when things go wrong.


Not sure about this for larger teams, but for very early stage teams I agree.

Seeing people using a hard-to-build feature a couple times a day, then more, until eventually you have to mute notifications to focus on work is a great way A/ to feel the progress, and B/ notice trends you can't pick out in averages.

Example for A: Just yesterday our CTO wrote in a feature-specific channel: > This page is now unreadable due to volume of usage pings! Go team!!

Example for B: Intuitively noticing whether your tool, that has say 6 DAUs on a team, is being used once by all 6 people, or in 3 pairing sessions, or something in between. Yes could run an analysis for this, but at an early stage co it's easier to just notice.

We became June users at our pre-launch co a few months ago, and the feature 0xferruccio mentioned is part of what sold me initially.

Not sure how long it'll remain useful but loving it for now.


If that's all you're going for, you don't need timeliness, comprehensiveness, accuracy, accessibility, performance, and durability.


Agreed 100% you don’t need the same guarantees.

Just saying that as in the conclusion of the article OP says “what do you need this information for?” and my understanding is that the people asking for “real time metrics” aren’t trying to do anything complex but get a pulse of the product


As I have grown older, I have realized more and more that the most important things can't really be measured directly.

Yes you can measure some related things that give you some hints about the thing you care about, but they are fragile. To borrow from Goodhart, if you make the related things a target, they will stop giving you even these hints.

This applies not just in software development, but life in general.


I agree with you.

This is why I think the idea of technocracy or "evidence based politics" is ultimately a mirage. Sure you can maybe assess some policy but the metrics you're choosing to measure or optimise for are political by their very nature. One's evidence based policy isn't the same as mine.

Health-outcomes-wise it would be better to force everyone to eat salad or whatever but that's only one dimension to optimise on at the expense of freedom and life enjoyment.

Tying it back to tech maybe going down market improves your conversion and lowers your CAC but maybe you've just acquired a bunch of customers with low value, high churn and high costs.

Maybe the sales of Amazon Prime are showing gangbusters returns with the dark patterns but now people loathe your brand and are hoping to see you hit with an FTC banhammer.

There's no silver bullet to this stuff, sure you should probably measure it but ultimately you have to make a decision and be guided by gut instinct and beliefs.


Conversion rate and or sales. That's the most important thing. And it can be measured directly.

The problem is analyzing why or why not it's hitting your desired value.


> Let’s say you’re testing a text change for a link on your website. Being an impatient person, you decide to do this over the course of an hour. You observe that 20 people in bucket A clicked, but 30 in bucket B clicked. Satisfied, and eager to move on, you choose bucket B.

Horrifying but all too common. A wise man once taught me that humans feel considerable discomfort in the presence of uncertainty, and will tend to jump on the first solution that presents itself. His answer to this was to strive to stay in 'exploration mode' for as long as possible - explore the solution space until you've hit the sides and the back and only then make a decision.


This is what you consider horrifying? This seems like an incredibly tame example.


Building a commercial real-time analytics system and running ops for it myself was the single hardest dev challenge I’ve ever tackled successfully. I came out of it one hell of a developer. GA real-time killed us a year later. We pivoted into cybersecurity and have a kickass business and team today that is 100% founder owned. Forged in the fires of Mount Doom comes to mind. It’s very hard and performance issues and bugs are impossible to hide from end users.


I believe that the comment about CAP theorem violation / treating the problem as a technically unsolved thing isn't true. Eg. See the dataflow paper that sets up more clear tradeoffs for latency and correctness in large scale data processing [1]. I think it makes sense to always hold a high bar for your technology -- if it's technically feasible, and fits within budgets (time and complexity for the team), accepting artificial limitations because they soften social problems feels like a mistake / believing in a false "ignorance is bliss" belief. I think the problem that is presented is more of a problem of popular understanding of statistics and game theory, and not the technical problem.

[1] "The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing" https://research.google/pubs/pub43864/


Massive fan of this piece. I linked it in another comment yesterday.

The place its truth has been most obvious to me is in analyzing subscription businesses. Your customers pay once a month, or once a year. Nothing, absolutely nothing you do on a minute-by-minute basis is relevant to strategic business decision making. Yet these businesses will invariably want real-time analytics. It serves no purpose! You simply cannot look at your fancy real-time dashboard, and then take action based on it on that same time scale. Meanwhile it costs you 10x what daily data would.


This is something I have been grappling with for a while - I work with data in an industrial plant there is a lot of legacy data systems here and "data latency" has been a real issue - real time dashboards are seen by some subset of management as the holy grail, being able to look at any part of our plant and see what is happening in real time is extremely attractive to certain people.

For a longtime there has been a lot of resistance "real time is too hard" used to get thrown around a lot- what is really driving it in last two years or so are Machine Learning and computer vision applications. There has been a huge push to integrate ML models (for example live defect detection) into our operating process which has necessitated low latency access to real time plant data.

The data pipelines we've had to build to enable these ML applications have bough latency way down all over the place and have kind of bought other applications like real time dashboards with them for "free".


I started in manufacturing with a similar view. What turned me around was reading “The Goal” by Goldratt and “Out of the Crisis” by Deming, and seeing just small applications of their principles yield significant results.


Out of the Crisis (and Deming more generally) deserves to be more widely known and quoted.

It's incredible how much of what he says (especially the stuff about how production workers feel) maps directly to the technology industry.


If the main objection to constructing a real-time product monitoring system for A/B(C/D/E...) decisions is that optional stopping is bad why not throw away the null-hypothesis sig testing and instead treat the problem as a multi-armed bandit?


I've built a multi-armed bandit system which lived alongside our A/B system.

1. Product didn't have any idea how to interpret its behavior and therefore never made any decisions based on it

2. Experimentation != product design. It's one thing to look at the results of a test, it's another thing to consider patterns of user behavior observed over months or years, which is what Product Analytics is actually for.


How does an typical NHST A/B system resolve this?


You tend to stop while the experiment is running, and then spend time looking at the results once it's done.

The real benefits here are getting a better understanding of what levers drive your product metrics, as you'll inevitably mess up the first n or so experiments (if I could give you only one piece of advice, it would be to use stratified randomisation, but everyone seems to have to make this mistake for themselves).


Advice appreciated but I'm exceedingly familiar with experimental design haha, what I understand far less well though is the integration of the toolset into a business/product development context. I can see how having a staggered cadence of stopping, reflecting on the experimental design, and making a decision is wise. But it still seems that you could perform the experiment using MAB to keep the profit motive happy (you don't want to waste potential click-throughs just because you like p-values, maybe tune it to be more conservative about shifting heavily to one arm) and then have some period where you stop the experiment to pause and reflect.

Heck you don't even have to do MAB if you don't want to, just don't use NHST. The Bayesian "flavor" of NHST (credible intervals around posterior expected values) has absolutely no problem with optional stopping. Run the experiment until you've got a precise enough estimate, then sit back and make your product decisions.

I guess where I'm going with all this is that it seems like the post's strongest point is "good product decisions require time, and realtime analytics bamboozle us into thinking fast decisions are better". All the stuff about NHST seems kind of tangential. Looking at it again I see that it's like a decade old, so I think this is the best explanation for why they were targeting NHST more aggressively. I would hope in our post-replication crisis world (hopefully "post", anyways) data scientists and A/B testers are more prudent about some of these better-known pitfalls.


MAB and its friends like contextual MAB has always been the dream. Closing the loop so analytics data is pushed back to the decision point in code and isn't a one-way pipe to some dashboard is the hardest part though. For non-technical reasons.


Sort of a generalized PEBCAK


Because it is difficult to map that onto real business decisions and requires oftentimes supporting a large space of possible UI combinations because they haven't been fully ruled out yet.


Doesn’t that problem also exist with NHST based A/B testing?


I think business decisions map very well onto the binary decisions implied by NHST A/B testing, which is partly why we put so many resources into studying those problems in the early 20th century.


How well does that dodge the problem? I'd imagine a multi armed bandit should stay such that it is always sampling from many fair coins, as it were. I would be delighted to read a study on that.


I can’t say that I did the proof out, but intuitively I would expect the posterior distribution over arm-probabilities would converge to something equal? The other option is spurious convergence to a bad posterior, which could maybe happen with poor sampling techniques, but I can’t imagine it’s more than an edge case


Right, that is what I meant about it should continue to sample from fair coins. I don't know that I've seen experiments to see how long that takes, though.

There is also the question of how long you'd leave multiple treatments out there. Presumably, even if there is no difference in outcomes, there can be benefits to having fewer deployed behaviors.

I'm now also curious if there are non-transitive situations. For example, three treatments together that all act fair if all deployed, but for reasons any two of them deployed alone will show a preference. Ideally, of course, treatments should be done such that this can't happen, but mistakes are often made.

Edit: Fully cede that this is likely chasing edges. The motivation for fewer deployed arms is far more compelling than the edge cases.


> I can understand why engineers are predisposed to see instantaneous A/B statistics as self-evidently positive

This is the crucial misunderstanding: in actuality, you are running a panel.

(There is no such thing as an A/B test outside of marketing. Running a meaningful panel requires some information on the population, your samples, the homogeneity of those, etc, just to pick the right test, to begin with. Also, you need a controlled setup, which notably includes a predetermined, fixed timeframe for your panel to run. Before this is over, you have no data, at all. You are merely tossing coins…)


Basically what op did was when you got software developers with little understanding of stats or analytics cosplaying as data scientists


Data scientists also do A/B testing on algorithms to see which one has better fit for a use case against real-world, real-time data.


I mean, if done right – there are dedicated sciences for this –, it's a panel with two samples. To me, the notion of an A/B test is all about tossing the scientific basics over board, in order to get a rough estimate, which we will call good enough. However, there are all those statistical methods, meant to even out the bumps in the road that you will encounter, e.g, if you're running your load-balancer performance A/B test at 2 am versus running it 5 pm, or running it on a Sunday versus running it on a Tuesday.

(In this specific case, as a data analyst, you will probably have an intricate understanding of your population, i.e. your data, the structure of the samples you're running against the algorithms, which you have tailored according to this understanding in the first place. However, while we may assume the best, this may still be what's called pre-scientific knowledge in statistical terms.)


I’ve always built analytics daily by default, and certain metrics real-time - via a different system - only when a compelling and permanent use case exists (except operational analytics). I mean something like YouTube view counter/likes or delivering order count/total as a b2b2c so your customers can watch their promotions in real-time. Major bonus, you have the super complicated, hard to maintain, real-time system to point to when someone wants everything real-time but invested the minimum possible.


I also think real time is mostly useless (aside from for alerting, which probably is a different tool), but I don't think the one day delay is much of (if any) protection against the experimental pitfalls described.


I absolutely agree with this.

In my entire career I’ve never come across a situation where a company either needed or could even use real time data analytics.

Real time data: occasionally but definitely. Real time data analytics: never.


Them: We need metrics to know if the users like this new feature we're pushing on them.

Me: or you know, we could maybe see what the users biggest issues are first and try to build stuff to solve those problems.


>"Homer: There's three ways to do things. The right way, the wrong way, and the Max Power way!

Bart: Isn't that the wrong way?

Homer: Yeah. But faster!

- "Homer to the Max" "

Now that's funny! <g> :-)


This has aged well.


(2013)


Exactly. And it betrays the biasea of the era. This author really got it wrong.


No, he got it exactly right. It is just as relevant today as when it was written.


And… near real-time with high uptime is relatively more costly to build / maintain / deploy / operate than batch — so save your org the cost!


While there are probably all sorts of problems with marxism when it comes to economics, in large companies there should be a 'vanguard party' of statisticians who prevent the masses from making false claims of causality from p-hacked tests.


Andrew Gelman is truly our Vladimir Lenin


Just use Amplitude


Last I used Amplitude it was insanely expensive. Is that not still the case?


We’re working on it! Free up to 10M events, more to come later this year.

What would reasonably costed look like to you?

More on my view on real-time analytics here: https://news.ycombinator.com/item?id=15380607


I need to remember that on HN, sometimes the founders of companies I'm talking about will respond :D. I should've used more words.

I don't really know what reasonably costed would've looked like for the time I used Amplitude (2018-2019), but I know that the value my team extracted from it was not commensurate with its cost. Whether that was because of overzealous assumptions on our part, or something else, I don't really know, but I know we signed up, used for a few months and then canceled/downgraded to a cheaper service whose name is escaping me.

I was mostly challenging the "Just use ____" notion of the commenter above, not really that Amplitude is worth the money for correctly-intentioned businesses. Regardless I appreciate the ask.


real-time, good, cheap. pick two.


At best it's pick two. Often you get one or none.


This is very 2013. Meanwhile in 2023, a decade later, you literally have systems detecting credit card fraud in milliseconds. [Disclosure: I work for StarTree, which is powered by Apache Pinot. We eat petabytes of data for breakfast.]


I'm not sure if you actually read the article with this answer. The article is explicitly talking about using real time data to make design decisions, detecting something like credit card fraud is completely different problem space.


I wonder if these are the same systems that "detect fraud" and freeze my bank account requiring manual intervention to fix the 2 times a year I send a random family member less than $2,000


What has that to do with product decisions?


Credit cards are products.


Did you read the article?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: