
Launch HN: Orbiter (YC W20) – Autonomous data monitoring for non-engineers - zhangwins
Hello HN! We are Victor, Mark, and Winston, founders of Orbiter (<a href="https:&#x2F;&#x2F;www.getorbiter.com" rel="nofollow">https:&#x2F;&#x2F;www.getorbiter.com</a>). 
We monitor data in real-time to detect abnormal drops in business and product metrics. When a problem is detected, we alert teams in Slack so they never miss an issue that could impact revenue or the user experience.<p>Before Orbiter, we were product managers and data scientists at Tesla, DoorDash, and Facebook. It often felt impossible trying to keep up with the different dashboards and metrics while also actually doing work and building things. Even with tools like Amplitude, Tableau, and Google Data Studio, we would still catch real issues late by days or weeks. This led to lost revenue and bad customer experiences (i.e. angry customers who tweet Elon Musk). We couldn&#x27;t stare at dashboards all day, and we needed to quickly understand which fluctuating metrics were concerning. We also saw that our engineering counterparts had plenty of tools for passive monitoring and alerting—PagerDuty, Sentry, DataDog, etc.—but the business and product side didn’t have many. We built Orbiter to solve these problems.<p>Here’s an example: at a previous company, a number of backend endpoints were migrated which unknowingly caused a connected product feature in the Android shopping flow to disappear. Typically, users in that part of the shopping flow progress to the next page at a 70% rate but because of the missing feature, this rate dropped by 5% absolute. This was a serious issue but was hard to catch by looking at dashboards alone because: 1) this was just one number changing out of hundreds of metrics that change every hour, 2) this number naturally fluctuates daily and weekly, especially as the business grows, 3) it would have taken hours of historical data analysis to ascertain that a 5% drop was highly abnormal for that day. It wasn’t until this metric stayed depressed for many days that someone found it suspicious enough to investigate. All in, including the time to implement and deploy the fix, conversion was depressed for seven days costing more than $50K in reduced sales.<p>It can be especially challenging for the human eye to judge the severity of a changing metric; seasonality, macro trends, and sensitivity all play a role in equivocating conclusions. To solve this, we build machine learning models for your metrics that capture the normal&#x2F;abnormal patterns in the data. We use a supervised learning approach for our alerting algorithm to identify real abnormalities. Then, we forecast the expected “normal” metric value and also classify whether an abnormality should be labeled as an alert. Specifically, forecasting models identify macro-trends and seasonality patterns (e.g. this particular metric is over-indexed on Mondays and Tuesdays relative to other days of the week). Classifier models determine the likelihood of true positives based on historical patterns. Each metric has an individual sensitivity threshold that we tune with our customers so the alerting conditions catch real issues without being overly noisy. Models are re-trained weekly and we take user feedback on alerts to update the model and improve accuracy over time.<p>Some of our customers are startups with sparse data. In these cases, it can be challenging to build a high-confidence model. What we do instead is work with our customers to define manual settings for “guardrails” that trigger alerts. For example, “Alert me if this metric falls below 70%!” or “Alert me if this metric drops more than 5% week over week”. As our customers grow and their datasets grow, we can apply greater intelligence to their monitoring by moving over to the automated modeling approach.<p>We made Orbiter so that it&#x27;s easy for non-technical teams to set-up and use. It’s a web app, requires no eng development, and connects to existing analytics databases the same way that existing dashboard tools like Looker or a SQL editor just plug in. Teams connect their Slack to Orbiter so they get immediate notifications when a metric changes abnormally.<p>We anticipate that the HN community has members, teammates, or friends who are product managers, businesspeople, or data scientists that might have the problems we experienced. We’d love for you and them to give Orbiter a spin. Most importantly, we’d love to hear your feedback! Please let us know in the thread, and&#x2F;or feel free to send us a note at hello@getorbiter.com. Thank you!
======
generatorguy
I work on power stations which normally have about 1000 monitored variables
per turbine-generator and another 500 for the plant in general. So typically
2500 for a two unit plant.

Alarms are generated if a variable exceeds a threshold, or a binary variable
is in the wrong state.

Is Orbiter something that would benefit power plants?

~~~
jacques_chester
I think some sort of anomaly detection would be useful in your case. There are
a bunch of libraries floating about, I remember at least Netflix[1], Yelp and
Datadog talking about them. There appears to be a really good links page
available too[1]. You can also learn a lot from _Forecasting Principles and
Practice_ , which is free online[2]

I have previously pitched using a kind of SPC-for-metrics approach, with
Nelson rules[3] to help surface metrics which are starting to move out of
control. I think it would have the advantage over ML techniques that it's easy
to understand.

My experience is that alerting thresholds are a very poor mechanism for
managing systems. They just ossify past disasters and typically become noise.
Alert fatigue renders them meaningless. If they're set by the manufacturer
then the incentives are broken, they will favour false alerts in order to push
legal responsibility onto the operator.

[0] [https://github.com/Netflix/Surus](https://github.com/Netflix/Surus)

[1] [https://github.com/yzhao062/anomaly-detection-
resources](https://github.com/yzhao062/anomaly-detection-resources)

[2] [https://otexts.com/fpp2/](https://otexts.com/fpp2/)

[3]
[https://en.wikipedia.org/wiki/Nelson_rules](https://en.wikipedia.org/wiki/Nelson_rules)

~~~
generatorguy
thanks for the links.

We only create an alert if there is a problem the operator can solve,
otherwise there is no point in waking them up at 3 AM, so if anything our
thresholds are set as loose as possible instead of as tight as possible.

However there are many instances where the operator could be alerted earlier
that the machine operation is abnormal. For example the stator windings are
rated for operation up to 155 degrees C but the machine is lightly loaded for
a long time, the ambient temperature is normal, and the windings are 140
degrees. No alert would be generated from the stator winding temperature but
something is amiss.

I think this is the case where some ML/AI/hypeword techniques might be
applicable, for the controller to know that based on half a dozen variables
the expected value for other variables based on past operation.

~~~
vladsanchez
You should take a look at [http://riemann.io](http://riemann.io)

------
dataminded
Really excited about this.

We're very early into doing a PoC where we use DataDog/Cloudwatch for our
business metrics for this specific use case. We're also looking at tracking
data quality metrics. The standard BI reporting tools are very immature when
it comes to alerting based on changes in data over time.

I hope at some point you consider ingesting metrics like the ops tools do.
Giving you direct access to my database is going to be really challenging but
I'm glad to send you what I want you to keep track of.

~~~
zhangwins
Ah very interesting, and agree on the immaturity of alerting/time-series
changes for current BI reporting tools. Would be great if you could send me
more info about what you're thinking about tracking & also hear more about how
the PoC you guys are thinking of. Would you mind sending me a note to
winston[at]getorbiter.com?

------
idrism
This is interesting. If you can deliver on this, I'm guessing you can deliver
on a lot more. Figuring out what warrants an alert is a non-trivial problem,
and it's in the same problem space as answering other business questions like
"what is our true organic traffic".

Also, on metric drops I'm interested not just in the alerts but also in the
narrowing down of what is causing the drop. For example, the first question we
always ask is "could marketing blend be causing this". I imagine your ML can
figure that out. You could also point out where to look, like "iOS 13 is fine,
but there is a severe drop in conversion for iOS 12" or "Conversion dropped
for app version 13.2 on Android".

Great stuff! I'd love to see if it works!

~~~
tixocloud
Funny, Actually I know a startup in Edinburgh that has figured out the “true
organic traffic” and they’ve used ML to fix the data for marketing attribution
model.

~~~
zhangwins
Was this ML attribution model output explainable / deterministic? I've seen
some really complicated marketing attribution models in the past and hear it
was something of a never-ending battle to understand and arrive at the "right"
model.

~~~
tixocloud
I believe it is explainable as I didn’t hear anything fancy about the model
being built. It’s been tested and proven to cut marketing spend quite a bit
while delivering the same results. A patent has also been filed.

You are spot on that sometimes we just overcomplicate models and sometimes
it’s best to go with something explainable and deterministic but less accurate
as opposed to more accuracy but complicated.

------
slap_shot
Congrats on the launch. This is a really interesting space that I think has a
ton of potential - I'm watching pretty closely to see what comes out of it.

Have you heard of Outlier ([https://outlier.ai](https://outlier.ai))? Do you
have any thoughts? How does Orbiter compare to Outlier?

(I haven't used Outlier but see it come up in anomaly detection discussion a
lot recently).

~~~
zhangwins
Thank you! There's definitely a lot of growth and potential in this space and
we're really excited too. We're focused on intelligent monitoring and alerting
for metrics that the user cares about & defines. We also automate the
diagnostic playbooks that teams use today after detecting an issue (eg check
data, check user segments, check geographies, etc.) Outlier seems to focus on
insights and less on monitoring/alerting. They comb through data to surface
4-5 "unexpected insights" about your customers or business every day in a FB
feed-type product.

~~~
ares2012
Hi! (Founder of Outlier.ai here) You are right, our platform is designed to
produce the most important insights from massive amounts of data, without
requiring human supervision/configuration. It is most useful in applications
when there is too much data to set up guardrails, or the teams don't know what
guardrails to create. Our typical customers are very large consumer businesses
who have data spread across dozens of systems and need to ensure they never
miss important emerging trends or problems.

We are not an alerting or monitoring system, so I don't think you'd use us for
the same applications as Orbital. The typical users of Outlier are the
business users ranging from executives to business operations who want to make
sure they are asking the right questions about the business.

Orbital looks like a great product, good luck in building your business!

------
knightelvis
Without looking into details of your soluton, what's the difference between
your solution and Cloudwatch anomaly detection?
[https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori...](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Anomaly_Detection.html)

~~~
zhangwins
Orbiter anomaly detection is for any DB (e.g. Postgres, Snowflake) and metrics
that business/product teams tend to track such as transaction conversion %,
user growth, add item to basket %, etc.

Amazon Cloudwatch anomaly detection is for AWS resources & apps, and covers
infra metrics like resource utilization, app performance, ops health.

In terms of the anomaly detection capabilities -- both are using similar
machine learning processes to detect metric issues automatically!

P.S. If you get curious about the details of our solution, we have a 2 minute
video demo ;) Cheers!
[https://www.youtube.com/watch?v=R7P_M6j0P2A](https://www.youtube.com/watch?v=R7P_M6j0P2A)

~~~
knightelvis
Thanks for replying. That's a good demo. However, I don't necessarily agree
Cloudwatch is only for infra metrics. Theoretically, you could send any
metrics to CW and leverage the anomaly detection feature. Given it aggregates
data over time and you could lost granularities of your data, that's probably
not a good idea for business centric data. Then I found AWS QuickSight
([https://aws.amazon.com/quicksight/features-
ml/?nc=sn&loc=2&d...](https://aws.amazon.com/quicksight/features-
ml/?nc=sn&loc=2&dn=2)) which seems to have a similar feature parity?

------
dodata
Congrats on launching! Looks very helpful!

As a data scientist, I found that a drop in metrics was just as often due to a
data pipeline issue as it was an actual business problem. This unfortunately
causes business users to lose trust in the metrics quickly. How do you plan to
differentiate between those two root causes of metric changes?

~~~
zhangwins
Ah I can empathize with you here (as a former DS) -- we had incidents in the
past that were data pipeline / instrumentation changes causing bad data which
then caused metric drops (versus a real product issue, but they nonetheless
caused a loss of confidence in data).

We think there are a number of diagnostic features that could be helpful here
(to be built!). Teams today run playbooks to root cause issues when metric
drops happen. We should be able to take that playbook and automate it. Say,
Orbiter identifies an abnormal change in Metric X. The team is then probably
analyzing sub-funnel metrics Y and Z, or looking at various dimension cuts to
isolate the issue. Maybe they're also checking data quality by comparing the
count of event volume vs. count of user IDs vs. count of device IDs, etc. If
we run all of these diagnostic checks when Metric X drops, we could give the
team insight into what we know is OK vs. not OK.

~~~
sanabriarenato
That's really cool! Besides identifying abrupt changes in metric X, for me the
most difficult part is trying to understand what caused this change in X.
Great to know that you have this issue in the roadmap, but do you think it's
possible to develop a model/automation that is generic enough to be used in
different business ? Maybe analysing the correlation between different time
series could be a way to go ?

~~~
tixocloud
It’s definitely possible if you have the underlying data definitions so you’re
not having to compare time-series across industries (it’ll be hard because
every single business’ metrics could be so different based on the way the
metrics themselves are setup).

Avora ([https://avora.com/product/](https://avora.com/product/)) and
Thoughtspot ([https://Thoughtspot.com](https://Thoughtspot.com)) all have the
root cause capability

------
arciini
This is really cool! Our search-engine-based impressions dropped substantially
in early Feb. and because we didn't have that in our main dashboards, it took
us almost 2 weeks to discover that. Orbiter would've been pretty useful for
that - got in touch!

~~~
zhangwins
Thanks! Looking forward to getting connected. We've heard SEO-specific use
cases come up with some of the other companies we've worked with too -- you
basically need to find out the exact time that your SEO ranking saw a material
change cause it's usually driven by something that shipped at that time.
Otherwise takes a long time to get back the traffic from GOOG

------
photonios
This sounds really cool! I've wished for something like this many times. I am
mostly attracted by the fact that it would be mostly automatic. I am hoping it
lives up to the hype.

Signed up for the beta. All the best!

------
DEADBEEFC0FFEE
I've been talking with AppDynamics, and much of what you have said in this
thread could have been said by AppD. Are you hoping to get some of their
market?

------
longtermd
I really like the manual configurability. In our startup, we work a lot with
influencers and it's very usual for us to have strong spikes in signups (and
also high/ low CVR for different quality of influencers and "strength of
promotion"). This nature would a purely ML model to constantly shout alert.

------
arzel
I saw y’all on PH and immediately submitted to get early access. Super excited
to try it out, and congrats on the launch!

~~~
zhangwins
Woohoo! Thanks so much. Looking forward to getting in touch soon :D

------
BlackJack
Congrats on the launch! This is definitely a real, big problem.

------
thedrake
This is what Maxly.coM used to do. They learned several valuable lessons and
would be worth talking to one of the founders.

~~~
zhangwins
Thanks for the tip - haven't heard of Maxly before but will do some research

------
tixocloud
Congrats on the launch. We’re in an adjacent/overlapping space with drift
detection/model monitoring so it’s always very exciting to see automatic data
monitoring tools come into place. We’re hoping that as more and more startups
come onboard the better it is for all of us. Cheers and best wishes!

~~~
zhangwins
Thank you! We're actually a team of Canadians too (but have been
living/working in SF Bay Area) :D Always great to see more applications for
data science - best of wishes to you too!

------
djiddish98
Go Winston! So much better than trying to do this in Tableau

