
Launch HN: Speedscale (YC S20) – Automatically create tests from actual traffic - Inchull
We’re Ken, Nate and Matt, co-founders of Speedscale (<a href="https:&#x2F;&#x2F;speedscale.com" rel="nofollow">https:&#x2F;&#x2F;speedscale.com</a>), a tool that automatically generates continuous integration (CI) tests from past traffic. Carefully scaling rollouts to ever larger groups of customers is the safest deployment strategy, but can take weeks. Even for elite DevOps organizations up to 15% of changes to production can result in degraded service [1] [2].<p>We met as undergrads at Georgia Tech and come from a DevOps and operations background so we’ve seen this first hand. Each of us has over 15 years of experience building high-reliability systems, starting in the early days with satellite earth station monitoring. As interns we once wrote a bug that caused a 32 meter antenna to try to point down through the earth, almost flattening the building we were in. It was a great environment to learn about engineering reliability. We leveraged this experience to tackle monitoring Java app servers, SOA, SaaS observability and cloud data warehouses. What if we could use a form of observability data to automatically test the reliability of new deployments before they hit production? That’s the idea that got us started on Speedscale.<p>Most test automation tools record browser interactions or use AI to generate a set of UI tests. Speedscale works differently in that it captures API calls at the source using a Kubernetes sidecar [3] or a reverse proxy. We can see all the traffic going in and out of each service, not just the UI. We feed the traffic through an analyzer process that detects calls to external services and emulates a realistic request and response -- even authentication systems like OAUTH =).  Unlike guessing how users call your service, Speedscale automation reflects reality because we collected data from your live system. We call each interaction model a Scenario and Speedscale generates them without human effort leading to an easily maintained full-coverage CI test suite.<p>Scenarios can run on demand or in your build pipeline because Speedscale inserts your container into an ephemeral environment where we stress it with different performance, regression, and chaos scenarios. If it breaks, you can decide the alerting threshold. Speedscale is especially effective in ensuring compliance with subtle Service Level Objective (SLO) conditions like performance regression [4].<p>We&#x27;re not public yet but would be happy to give you a demo if you contact us at hello@speedscale.com. Also, we are doing alpha customer deployments to refine our feature set and protocol support - if you have this problem or have tried to solve it in the past we would love to get your feedback. Eventually we’ll end up selling the service via a subscription model but the details are still TBD. For the moment we’re mainly focused on making the product more useful and collecting feedback. Thanks!<p>[1] <a href="https:&#x2F;&#x2F;services.google.com&#x2F;fh&#x2F;files&#x2F;misc&#x2F;state-of-devops-2019.pdf" rel="nofollow">https:&#x2F;&#x2F;services.google.com&#x2F;fh&#x2F;files&#x2F;misc&#x2F;state-of-devops-20...</a><p>[2] <a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;builders-library&#x2F;automating-safe-hands-off-deployments&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;builders-library&#x2F;automating-safe-hand...</a><p>[3] <a href="https:&#x2F;&#x2F;kubernetes.io&#x2F;blog&#x2F;2015&#x2F;06&#x2F;the-distributed-system-toolkit-patterns&#x2F;" rel="nofollow">https:&#x2F;&#x2F;kubernetes.io&#x2F;blog&#x2F;2015&#x2F;06&#x2F;the-distributed-system-to...</a><p>[4] <a href="https:&#x2F;&#x2F;landing.google.com&#x2F;sre&#x2F;sre-book&#x2F;chapters&#x2F;service-level-objectives&#x2F;" rel="nofollow">https:&#x2F;&#x2F;landing.google.com&#x2F;sre&#x2F;sre-book&#x2F;chapters&#x2F;service-lev...</a>
======
kahrensdd
For a little background story on the satellite antenna. I was building a
monitoring device driver for an Antenna Control Unit (ACU). It's like a 4
rack-unit computer with special hardware for talking to the antenna motors
(azimuth, elevation and polarization). After sending it a command, the device
froze up, so we rebooted it. The CMOS battery was dead so when it came back up
the date was wrong, but I did not notice. I sent it a command to reposition
and it began moving to point below the horizon... The bad date meant that it
had the wrong geo location position for itself. Well it turns out "below
horizon" is really important because the building was just a structure to hold
up the antenna and it was going to crash into the ground. Fortunately someone
ran in and hit the STOP button while I was staring at the monitor. That day I
learned that monitoring and alerting is important stuff.

~~~
nunez
Dead CMOS batteries suck.

------
d_watt
Using products like this in the pastI've run into a pretty simple issue:

\- Request 1 is a post that generates a "todo" with a random id "5y22"

\- Request 2 is a "get" for /todo/5y22.

That works in production, but on replay of the traffic:

\- Request 1 generates a different random id, "86jj".

\- Request 2 is still a replayed "get" for /todo/5y22, which is now 404.

How does your tooling handle this nondeterminism in replays?

~~~
kahrensdd
For one thing we look at both inbound and outbound traffic and we treat it
separately. That use case looks different if we are trying to "test" TODO or
if the TODO service is a backend that our app relies upon.

So if you mean that we want to "test" TODO, our analyzer looks for data in
subsequent requests that was provided by the system-under-test (SUT) in
previous responses. A common example of this is an HTTP cookie. The SUT gives
us a session id through the Set-Cookie header response. So in a subsequent
request we use the cookie from the app, not the one that was recorded. This
has been done in general way to look for tokens.

Of course nobody is perfect so we'd love to see your real world app and test
our algorithms against it. :)

------
pbiggar
This sounds great! We actually discussed doing this at the very start of
CircleCI (we had a partnership with an exception handling service, but we
never executed on it). Coincidentally, my current company, Dark
([https://darklang.com](https://darklang.com)) is based around a similar
concept -- using live traffic as an assistant as you're writing code.

~~~
kahrensdd
For sure we see a lot of synergy with the CI systems. One of our alpha
customers is using CircleCI (no surprise there). They have an issue where devs
deploy services on top of each other in staging and accidentally take it down
for their internal users. So Speedscale lets them detect their new build is
not a good candidate to deploy to staging.

Thanks for sharing the info about your project, I am checking it out on GitHub
right now. :)

------
billyhoffman
Congratulations on the launch Ken!

Ken worked as a top SE at New Relic and has been involved in the Atlanta Web
Performance meetup for many years. It was a wonderful surprise to see you all
on HN

~~~
kahrensdd
Thank you Billy, we are very excited to be making progress with Speedscale. I
miss seeing you and the rest of the crew from the ATL Web Performance meetup.

------
decentCapitalYC
Hi Ken, I have been researching your project for a while and would love to get
early talk before Demo day starts. Will attend next Mon/Tue anyways, but
timing for decision is critical in YC. Wanna have more time dig into
SpeedScale. Really like you guys' solution can potentially bring huge value to
complicated production release (how many times SpaceX tests even just
combustion chamber pressure before the real launch?:). Your solution seems
really a systematic one by the 3 pillars you mentioned and also go under the
hood (vs Recording. 1st principle wins). Trust your 32M antenna lesson is
scary enough long lasting in memory. :) I was physics background, and yes,
definitely you should learn where is Panic Button before poking around in a
sophisticated lab.:)

Anyways, was trying to email you to founders at your domain but no response.
Can you reply to my email? Handle is lli at my company domain (also in my
profile). Would love to have a quick chat this week to get well warmed up
before thee Day.

------
vii
Real traffic exposes wrong assumptions in code, which cannot be caught with
unit or integration testing. Awesome to automate this kind of testing. I
encourage people to invest heavily in setting up artificial environments to
replay historical data.

One benefit of the artificial environment is efficiency (can cover e.g. a week
of historical data); this also requires mocking out time with
simulation/replay time. The integration with data-platforms makes a big
difference as it means data-scientists can help set up the right test
scenarios.

------
nserrino
This is cool, is the traffic curated in any way? Like if the database isn't
initialized, do you start with create requests before moving on to GETs for
those IDs? Also does this only support HTTP or does it support other protocols
as well?

~~~
mleray
Ultimately, the idea is to mock the database itself so we just return whatever
the real database returned during the recording. We don't have to run create
commands because we aren't actually managing a real database's internal state.
We "only" need to accurately return the responses the database gives the
system-under-test for a particular GET sequence. During the alpha we are
limiting support to HTTP/s but protocols like MongoDB, redis, MySQL, etc are
on the backlog. Until we have more database support we're asking alpha
customers to deploy test data in a test database, which seems to be a fairly
normal part of the CI process for big apps.

------
techdragon
I love the look of this, I cant even remember all the times I’ve run across
applications out in the wild that were built without any thought about testing
until long after the original devs are gone. All you have is production
traffic and reverse engineering. It’s a real pain, and I would love to have
more tools to attack this problem when it comes up.

So how open is that alpha? I’d really love to try this out.

~~~
kahrensdd
Reverse engineering protocols has morphed from a hobby to a full time job.
I've spent a lot of time going through logs, looking at data and knowing that
I am not looking at the TCP level request and response. So we are trying to
solve that problem. :)

Please send a note to hello@speedscale.com so we can get the details of your
environment and determine if it is a fit for our alpha.

------
RabbitmqGuy
Is this like goreplay[1]? How are you different?

1\. [https://github.com/buger/goreplay](https://github.com/buger/goreplay)

~~~
kahrensdd
Yes, I really like goreplay and it was an early inspiration for our sidecar.
It uses gopacket (like tcpdump) to collect data, we currently use the TCP
proxy route because we can use TLS libraries for HTTP/S support. There are
similarities in that both can collect and replay data.

One of the early observations from my co-founder Nate was there are 3 key
ingredients to testing in a SOA environment [1]

* Automation

* Dependencies

* Data

While goreplay has a form of automation, it doesn't help you with dependencies
(no mocking), and the data is either streamed or locked in a special file
format. Of course like any open source project there's pieces of the solution
but you have to assemble them. For instance there is no UI, no reports, no
pass/fail assertions, no integration with CI system, etc. I'm by no means an
expert, there is likely a path to combine with other open source projects to
fulfill those.

Our stuff isn't perfect, but we primarily see overlap with the automation
capability of Speedscale and the goreplay project.

[1]
[https://speedscale.com/2020/02/06/triplethreat/](https://speedscale.com/2020/02/06/triplethreat/)

(edited formatting)

------
sramam
Congratulations on surviving the 32 meter odyssey and living to launch!

Wondering how do you deal with stateful services? You mention an "analyzer
process" for external services, what about internal services?

It seems this would work well at a single service level, but would it also be
possible to apply the analysis at both a single service and a group of
services? Some form of unit-testing _and_ integration-testing at a service
level...

~~~
kahrensdd
Awesome thank you for the note. Fortunately there was a "stop antenna" button
which saved the day lol.

We've been down the path of stateful services before and actually reflect the
proper state in our responder. Because we control the test that is being
played against the system-under-test, we understand the sequence and order of
calls that will be made to the downstream system as well.

In addition, the analysis actually captures all outbound services at once. We
are able to identify each separate hostname that is invoked and mock them all
out as a group. One of our first alphas was stunned that it auto mocked 7
backend systems on the first try.

------
2rsf
Sounds great, for using this in a financial environment it will need an option
for data anonymization, I'm not sure how can you identify what needs to
anonymized without human interaction though.

~~~
mleray
You're spot on :). We got this feedback from one of our financial services
alphas so we built a DLP rules engine to cover it. That wasn't enough. So we
offered to integrate with Google DLP. Still no. So in the end we architected
for a split-plane architecture (similar to DataBricks) so big customers can
host their own data but we can manage the control stack. It's not something
we're doing during the alpha but it's part of the plan. Would that work?

~~~
2rsf
I'm only the end user so I can't judge the details

------
jameslk
I had a similar idea, but from the frontend. I think it would solve a few edge
cases where frontend logic is needed to complete full state-based
interactions. But I think there's now some solutions out there for this. Good
to see some traction here though, since automated testing and regression
testing is still a very difficult and manual thing to set up and keep updated.
Lots of opportunity to make it less painful.

------
ab_goat
There's another company that has a bit of an earlier start doing this:
ProdPerfect (www.prodperfect.com)

"Reach new heights with weightless test automation. ProdPerfect is the first
autonomous end-to-end (E2E) regression testing solution that continuously
identifies, creates, maintains, and evolves E2E test suites via data-driven,
machine-led analysis of live user traffic."

~~~
gkapur
These are very different; one is clickstream based and the other sits on the
networking layer.

~~~
ab_goat
Whoops! Failed to read past the first paragraph. Also should have added a
"similar" in there. Thanks for pointing this out.

------
scraig2020
Super Excited about this. Data Variance is hard to solve and the bucket of
data that your helping folks access is impressive...

------
mrkurt
Can you run tests from different geos?

~~~
kahrensdd
You can run it in your own environment. If you're running Kubernetes we
provide an operator that orchestrates the test runs. If you are using docker
we give you containers that you control with ENV VARs.

Do you have more background on the multi geo use case?

~~~
mrkurt
We run a multi-geo service (Fly.io). Replicating user load on distributed apps
is hard.

Containers with env vars are easy though!

~~~
kahrensdd
Yes I recently went through a similar use case with one of my alpha users.
They wanted to run the reverse proxy and playback as docker containers spread
through their environment. Will drop you an email with more info...

------
CodeNasty
Terrific stuff and glad this is becoming industrialized at scale.will keep you
guys in mind as we expand

------
samblr
Much needed! Congrats on the launch.

------
nunez
Congrats on the launch, y’all!!!

------
scraig2020
Sweeet.... cant wait to get my hands on it and leverage it

------
jmartens
Love it!

------
pixiemaster
does it work for gameserver-style multi-client websocket communication?

~~~
kahrensdd
One of our early inspirations was a video game company that would replay
recorded gameplay against new builds of a game server. And our proxy assembles
each request and response in the order they were received, even if they go
with different user sessions. For the alpha we are focused on API type calls
over HTTP/S, but websocket implementation is definitely something we are
tracking.

------
jtchang
Congrats on the launch!

