
Launch HN: API Tracker (YC W20) – Track and manage the APIs you use - cameroncooper
Hey HN!<p>We’re Cameron, Trung and Matt from API Tracker (<a href="https:&#x2F;&#x2F;www.apitracker.com" rel="nofollow">https:&#x2F;&#x2F;www.apitracker.com</a>). We make tools to help with using third-party APIs in production.<p>When software teams integrate with APIs they often run into outages, network issues, interface changes or even bugs that cause unexpected behavior in the rest of their system. These problems are hard to predict and prepare for so most teams don’t deal with them until there&#x27;s a outage and have to do an emergency build to add logging and get to a root cause.<p>This is what happened to us. Trung and I are both software engineers and we spent a lot of time and energy trying to make our API integrations robust and reliable in production. We found ourselves instrumenting all our API calls so we could know how many calls we were making, how long they were taking and if they were failing. We set up alerts for errors and latency increases and integrated with PagerDuty. We wrote retry logic with exponential backoff. We wrote failover from one API provider to another. At the end of it all we built a lot of tooling that required maintenance and wasn’t even applied uniformly across all of our integrations.<p>After building all this infrastructure we realized that many other teams are reinventing the same wheel.<p>To solve this problem we built an API proxy that takes requests and relays them to the API provider. By proxying this traffic we are able to instrument each call to measure latency, record status codes, headers and bodies, and add reliability features like automatic retry with exponential backoff. From there we can monitor and alert on issues and provide a searchable call log for debugging and auditability.<p>We knew that because we were asking teams to run their mission critical API calls through us that we had to build a highly available and scalable proxy architecture. We’ve done this by designing a proxy that can be distributed across multiple regions and clouds. We are currently running out of AWS. Global Accelerator allows us to use their private internet backbone to quickly get traffic to our proxies which run behind AWS Network Load Balancers. While this can help us ensure resilience against infrastructure outages, we also need to protect against self-inflicted wounds like bugs and bad deployments. Upon release we bring up a new set of proxy instances, deploy the code, and run our full test suite to make sure that each instance is able to proxy requests correctly. Once all instances are healthy they begin to go into the load balancer.<p>For companies with more stringent needs we support on-premise installations as well as a client-side SDK that can do instrumentation without the proxy.<p>Today we offer the service as a subscription. We hope to make it easy for teams to get visibility and control across all their integrations without having to build it themselves. This includes:<p>- Detailed logging on all of their third-party API calls<p>- Monitoring and alerting for increased latency and error rates<p>- Reliability features like automatic retry, circuit breaker and request queueing<p>- Rate limit and quota monitoring<p>We would love to hear from the community how you are managing your API integrations. Our story is a result of our experiences and how we dealt with them, but we know the HN community has seen it all. We would love to hear from you about problems you’ve had and how you dealt with them. Please leave a comment or send us an email to founders@apitracker.com. Looking forward to the discussion!
======
dolftax
We've been using API Tracker in production for few weeks now. The primary use
case for us is to reliably handle webhooks from GitHub which our product
relies heavily on (app installation, commit and pull request events).

Unfortunately, GitHub doesn't retry any failed webhooks and when our service
goes down for a few seconds, thousands of webhooks fail and pile up. GitHub
doesn't provide an API to query the failed webhooks and retry as well. We had
to go through the painstaking task of visiting GitHub's app dashboard and
click retry on each webhook, one by one.

With API tracker in place, we've updated our GitHub app's webhook delivery URL
to send the webhooks to API tracker and they forward it to our services. In
worst case when our service goes down for a while, API tracker gracefully
retries all the failed webhooks.

Ref: [https://github.community/t5/GitHub-API-Development-
and/Handl...](https://github.community/t5/GitHub-API-Development-and/Handling-
GitHub-webhook-retry/td-p/25465)

~~~
thorgaardian
Interesting use-case for it. Without prior knowledge of a solution like this I
would have suggested you send the webhooks to a queue backed notification
system (e.g. SNS backed by SQS) and subscribe to the event topic, but sounds
must easier to configure and manage the way you instrumented it. Might be a
good use-case for me to try out!

~~~
cameroncooper
This is something you can easily configure with our automatic retry function.
We have an option to return a pre-configured response to the caller, and put
the request in a queue to be retried until successful. This allows you to have
a sustained outage while making sure all calls are eventually delivered.

~~~
ignoramous
> This allows you to have a sustained outage while making sure...

Re-driving queue backlogs at services recovering from sustained outages ends
in tears almost always. Tread carefully. :)

~~~
jrockway
Typically people use two pools for circuit breaking, with the limit set lower
on retries:
[https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv...](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/circuit_breaking#arch-
overview-circuit-break-cluster-maximum-connection-pools)

------
orliesaurus
There have been a number of players in this area throughout the years (Galileo
[RIP], Runscope [semi-RIP], Newrelic just to mention a few) for the analytical
part ... and countless more for the proxying part (Kong, Envoy, Tyk, etc)

Can you elaborate a little bit more where you place yourself in the market?
Why should someone trust you over any of the bigger, older and more stable
competitors? Thanks

~~~
cameroncooper
You're right that there are a number of proxy solutions out there, but most
are focused on exposing an API for external consumption (i.e. API producers).
We think that by focusing on outbound API calls we can go deep on features
that make less sense in those products. The same is true for the analytics
solutions (i.e. Newrelic). For example it wouldn't make sense for them to add
automatic retry or request caching, but its still a common pain point with
integrations and makes a lot of sense for us to build. Finally, some of the
tools (i.e. Runscope) are meant for development debugging and don't solve the
production pain point.

~~~
thorgaardian
What you described in the first sentence is commonly referred to as an API
gateway - protecting ingress traffic into a publicly accessible service/app
(e.g. Kong, AWS API gateway, Ambassador, etc). Lately there's been a lot more
generalized solutions in this category for inter-process communication via
service meshes like Istio, Gloo, AWS AppMesh, and others - all of which seem
to offer a solution that works for both internal traffic routing as well as
external (when whitelisted).

Can you offer a description of your product that differentiates it from
service mesh solutions? Did you build your own proxy software, or are you
built on top of Envoy like many of the other available solutions?

~~~
cameroncooper
We are not built on top of Envoy and have built our own proxy.

Many of the service mesh solutions require you to deploy and manage them as an
on-premise installation. Our primary offering is a hosted solution, but also
offer a managed service for on-premise installations.

As you've correctly pointed out the service mesh solutions can allow routing
of external traffic, but by focusing on the external calls there are features
that make sense for us to build that wouldn't make sense in something like
Istio/Gloo/AppMesh. For example, we can build an enhanced experience around
third-party APIs to better understand the calls, errors, quotas, etc that are
specific to that provider.

~~~
candiddevmike
Why did you build your own proxy instead of using envoy? What short comings
did envoy have?

~~~
cameroncooper
We wanted to architect a system that made it easy to deploy proxy nodes to
multiple regions and clouds. We also wanted it to be easy to add functionality
specific to our feature set. While we might have been able to achieve our
goals by modifying an existing proxy, it made more sense to us to build our
own. I have built proxies in previous companies and this was something I was
very comfortable doing.

~~~
candiddevmike
Can you expand on what specific part of envoy prohibited that?

Additionally, as other commenters mentioned, almost every company has rallied
around Envoy and is spending considerable time/money making it better. If your
solution isn't as performant as envoy, it seems like a poor architecture
choice to roll your own, especially given the time/money constraints startups
have.

------
thdxr
This is great, I can see the potential of something like this and am jealous
I'm not the one working on it!

Don't take the pushback in the other comments too seriously. There is
definitely an audience (myself included) who'd want a focus, specific tool

~~~
james_s_tayler
Ditto. I face the problem this solves every day and from time to time think
about the fact someone must be trying to solve this problem.

------
openthc
We've had to build similar tools -- but one step further to make three
different upstream services behave in a common way. We also added pre&post
flight error checking for cases where the backend wouldn't behave nice.

Any plans to "commonize" some different-backends like Twilio / Plivo, or
SendGrid, Mandrill, etc, etc?

Very nice work!

~~~
cameroncooper
Thanks for sharing your experience, we have heard similar things from other
companies. We do have plans to create common interfaces for different services
like SMS/email as you have suggested. This will allow us to seamlessly fail-
over between providers to maintain uptime and performance without any action
on the client part.

------
time0ut
Congratulations on your launch! This is very interesting. I have a few
questions. I apologize if your website answers these, but I couldn't find
clear answers after a cursory glance:

Can you tell me more about how the on-premise installation works and/or is
licensed?

Can it manage my authentication mechanisms for me? For example, can I
configure it with my client side certificates or have it fetch and cache oauth
tokens? We do this in our current solution and it is very nice being able to
hide all these details from our applications.

Can it do request/response transformation at all? We have a lot of cases where
we want to massage things a little here and there. I realize this might be out
of the scope of what you are trying to do, but it would be a nice to have.

We currently do this sort of stuff with a cluster of IBM Datapower gateways.
They perform very well but are expensive, difficult to configure, and somewhat
opaque.

~~~
cameroncooper
The standard model for on-premise is an annually licensed managed service. We
deploy, manage and monitor the platform on the customer's resources (usually
AWS account).

Great questions on credential management and transformations. These are not in
the offering today, but they are on our near-term roadmap and we are very
excited about their potential. As you've alluded to, there's a lot we can do
there.

~~~
time0ut
Thank you for your response.

I'll be keeping an eye out for enhancements. We have to renew our Datapower
licenses annually and are always on the look out for a replacement.

------
incognos
It is a nice solution but I am weary of anything that proxies my traffic.
Especially considering the legislative environment. I've been using Bearer
[[https://www.bearer.sh](https://www.bearer.sh)] which does not use proxy but
a library that hooks into the low level calls - It gives us a great view of
what is going on with our third party API calls. You can filter out the calls
that do not interest you and separate Production from Staging etc... I did not
want to have to build the monitoring infra myself, not a core competency and
for the money, it is cheaper to use an external service over 5 years vs.
building in-house.

------
mc3
Hi I have two questions:

1\. How can I be sure sensitive date sent via the APIs is secure / private
etc?

2\. Is you reliability and availability 100%, because if I use you my app's
availability is now only as good as yours. We've been bitten by cascading
effects of outages of upstream cloud services, but something like this would
knockout everything I guess if it was down.

------
tonylucas
Have just signed up, was (yet again) looking for a solution like this for
monitoring outbound API calls. Look forward to trying it

------
sachinag
[https://cloud.ibm.com/catalog/services/api-
connect](https://cloud.ibm.com/catalog/services/api-connect) seems to do a lot
of this for free. Probably could also use the community version of Mulesoft:
[https://developer.mulesoft.com/mulesoft-products-and-
licensi...](https://developer.mulesoft.com/mulesoft-products-and-licensing)

~~~
erik_landerholm
Two of the last companies I'd ever want to work with or rely on, other than
that...

------
hn_throwaway_99
Congrats on the launch. I have a ton of 3rd party APIs I'm integrating with,
so like you have been thinking about all the stuff I'll need to do to make it
reliable in production.

What do you guys do for masking or encrypting sensitive data? I like the
opportunity to log everything but a lot of what I'd want to log is PII or
sensitive financial data.

~~~
cameroncooper
We have two approaches to securing this kind of data. Once you specify what
fields you want secured we can simply mask it out, or we can hash the data in
a way that allows you to search for it if you know the value you are looking
for.

------
FanaHOVA
Started using apitracker a week or two ago; it's been great for logging
requests and inspecting failed/slow ones. Haven't tried automated retrying
yet, but excited to do that soon as well.

~~~
cameroncooper
Glad it's been able to help! Please let us know if there's anything else we
can do.

------
jconley
Interesting service. Have built things like this a couple times. How is the
on-premise version priced? Didn't see that on the site anywhere.

~~~
cameroncooper
Thanks! Our standard model for on-premise is an annual license and depends on
some factors such as request volume and features.

------
datboitom
Is this any different than Bearer.sh?

~~~
cameroncooper
Yes it is. Bearer relies on client side instrumentation. Today that is limited
to just Node.js and Ruby applications. While we also support client side
instrumentation, the proxy is an important element in our offering because it
is language agnostic and enables a new class features that can only be
implemented in the proxy (e.g. caching).

~~~
gmontard
Hi, I'm the co-founder of Bearer.sh.

Indeed Bearer.sh works as a package (Gem, NPM) inside your application, and it
automatically instruments your HTTP stack, meaning there are zero-code changes
to do on your existing integrations to make it works instantly.

But more interestingly, since we're not a proxy at all, it means you don't
have to trust us to deliver that very important API traffic of yours (who
would?), offer a sub-millisecond impact on your performance and works with any
public, private or crazy certificate or IP restricted APIs! APIs are a
liability and dependence to your app, let's not add us to that list!

We're going to launch support for many other stacks soon, and also a whole new
set of "active features" as you mentioned, by still beeing 100% NOT a proxy -
stay tuned in the coming days :)

Feel free to try, we offer 1M API Call per month for free and you can quickly
jump to 20M for $49 only.

We're super happy to see all of the interest around that space these days,
let's change the API space altogether

~~~
incognos
You said it better than I did. That was one of the issues we ran into since we
have third party APIs that require IP whitelisting, certs and VPNs, a proxy
just won't work in that instance. Can't wait for the Python implementation...

------
ignoramous
Would it be right to say this is sentry.io meets envoy, grpc, and konghq?
Super interesting. Congratulations.

How do I manage my API integrations, you ask?

Global Accelerator (GLA) is a key infrastructure piece for a HA service I'm
building but for the data-plane. It is such a hassle-free but slightly
expensive way to vend anycast IPs (no need to purchase ASNs and/or announce
routes from colos across the globe) and have the traffic load-balanced to 25+
AWS regions, that I recommend it instantly to anyone architecting HA services.
[https://fly.io](https://fly.io) and [https://stackpath.com/edge-
computing](https://stackpath.com/edge-computing) are viable alternatives.
Cloudflare announced MagicTransit which isn't as smooth as AWS GLA in terms of
developer experience, whilst Azure and Google offer global-load-balancers,
too, and may be even before AWS announced it in 2018? So, really, I think
utilizing GLA is something folks should do if they run global HA services. The
only issue with using NLB behind AWS GLA is the client-IP is not preserved. In
our case, we needed it, so we had to get creative with sticky routing and port
assignment (listeners) to do load-balancing / traffic-shaping.

Another HA trick I plan to employ is to use Cloudflare-Workers (200+ PoPs) to
front https-traffic to our control-plane endpoints. It lacks observability,
monitoring, and alerting unless you're on Cloudflare's enterprise plans. The
rate-limiting option is expensive ($0.05 per 10k good requests). I'm sure
there's no way to queue requests out-of-the-box, so I can very much see a need
for what you've built, and where you guys fit in.

To be honest, I'd be surprised if firebase or API Gateway or KongHQ don't
already do what you do, as well. Is that case? If so, keep at it. It is a real
need. And as you point out, something that I've _had to_ build for every
service and integration point.

A few questions (I went through your website and docs, but here I am):

\- How do you handle secrets that the clients might need to share with your
service, like Apikeys or Access/SecretKeys?

\- Do you also push logs to the customers in addition to them pulling it from
your endpoints / UI?

\- A bit curious about your logging, monitoring, and alerting infrastructure--
Is it ran on top of CloudWatch or Prometheus or Loggly or Elasticsearch or
Lightstep or...?

\- Do you support proxying http/REST APIs only?

[https://autocode.stdlib.com/](https://autocode.stdlib.com/) which was
discussed a few weeks ago here looks, to me, like a good addition to what
you're building.

~~~
cameroncooper
Thanks for sharing your experience. We love GLA as well.

Great questions.

\- For sensitive fields that you do not want retained or searchable, we can
mask them out.

\- We don't currently have integrations to push our logs to another service,
but this is a good use case for us and it's on our near term roadmap.

\- We use Elasticsearch in the product, but we also use CloudWatch extensively
for our own operations.

\- Right now we only support proxying HTTP requests, but are open to
supporting other protocols.

~~~
ignoramous
Thanks a lot, Cameron. I'll watch this space [0] as you continue to add
features and improve upon efficiency to pass on the cost savings to your
customers :) All the best!

[0] I'd have opted for a newsletter, but I couldn't find any sign-ups forms
for it.

------
notlukesky
Is there an SLA roadmap?

------
derricgilling
We at Moesif ([https://www.moesif.com/solutions/track-third-party-
api](https://www.moesif.com/solutions/track-third-party-api)) released a
similar tool in 2017 and found that many of our customers including Deloitte,
UPS, Snap Kitchen, iFit, and Trung's previous company, Snap Kitchen were
looking for a way to track APIs without the complexity of a full service mesh
like Envoy. Especially if you're hosted in something that cannot run an on-
prem service mesh or gateway.

We're a little different in that we also support agent-based rather than just
proxy. Meaning we have an SDK that sits out-of-band.

