Hacker News new | past | comments | ask | show | jobs | submit login
Load Testing with Koi Pond (slack.engineering)
47 points by uaas 11 days ago | hide | past | favorite | 7 comments





As someone with a Koi pond I got excited because this is a real issue with bio-load so I went into this thinking there might be info on how to determine food, fish quantity, etc. I shouldn't have been surprised what this actually is.

Capacity planning is not easy, it usually involves of understanding user journey and estimating the capacity needed from each subsystem from defined journey. What interesting in this article is the way they simulate the journey on the load test, they created "mini" slack client and use that to simulate the traffic. Even tho they have shared the situation, I think they still able to simulate it by using usual api hit based tools and that would be much easier to maintain.

Amazing to think they were actually spawning 150k headless browsers to simulate the traffic. That sounds like throwing money at the problem and it probably worked (for a while anyway).

Having built a load-test tool as well, I can say making it realistic enough and keeping it that way is possibly the hardest challenge. Maintenance cost is high, especially in a features focused environment.


> Having built a load-test tool as well.

Which tool. Curious.

To your other points,

> That sounds like throwing money at the problem and it probably worked (for a while anyway).

> Maintenance cost is high, especially in a features focused environment.

Isn't really just choosing which way to throw money at the problem? Hardware costs, vs. person-hours to maintain a thin client version?


> Which tool. Curious.

We built something similar at Mattermost, which (funnily enough) is a comparable application.

https://github.com/mattermost/mattermost-load-test-ng

https://mattermost.com/blog/improving-performance-through-lo...

> Isn't really just choosing which way to throw money at the problem? Hardware costs, vs. person-hours to maintain a thin client version?

That's fair, although the second option has (in my opinion) a better return on investment given by the knowledge and experience gain.


The new tool seems like an early version as well, with pretty basic functionality.

In the example where it is supposed to "viewing a message, marking the message as read, and finally calling reactions.add"...it doesn't really do those things in a real chain. They just have a 5 second delay after "view a message", then run the "mark message as read", then a 60 second delay, then calling reactions.add. I'm not sure that mimics real end user behavior terribly well.

It seems like they could have used jMeter rather than making a home-grown web sockets test client. Perhaps there's some requirement where existing tools don't work well.


For whom yet to read the article, this story is about stopping the money-throwing and switching to more scalable (cheaper) solution.

It's kind of interesting to see them choosing rather "declarative" (which is, json-centric) approach instead of adopting small languages like Lua for scenario-based scripting.

Maybe the declarative approach is suitable for auto-generation from the user stats data as they described? After all, there are often fewer number of people who like to write stress tests than writing a feature that should be stress-tested.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: