Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Paigo (YC S22) – Measure and bill SaaS customers based on usage
89 points by twosdai on Oct 25, 2022 | hide | past | favorite | 35 comments
Hey HN! Daniel here, I’m a software engineer and hobbyist hacker. I’m joined by my cofounder Matt. We’re building Paigo (https://paigo.tech). We make it easy for SaaS businesses to bill customers based on usage.

To get your hands dirty a bit we have a stateless and signupless demo you can try out: https://hn.paigo.tech/ and a video of me walking through the system in a bit more detail: https://youtu.be/T6J1Yh8GhdU.

The idea of our platform is fairly straightforward: You give us read-only access to your SaaS backend and based on tenant metadata for your infrastructure, we measure, persist, and aggregate SaaS tenant usage data to give a clear picture of per-client usage. We can measure metrics like API requests, Compute time, Data Storage, Transaction Volumes and many more. Some common scenarios would be: an ML platform could use Paigo to track processed input files for customers, a Data platform could use Paigo to determine the data size customers have consumed, and an API company can use Paigo to track customers’ API requests. Additionally, we also help you understand your cost to serve your clients’ usage, and this data allows us to provide your SaaS with usage based billing.

What’s the problem we are solving? Many SaaS products need to measure their customer's usage in some form, and many want to incorporate it into their billing plans. It’s fairly annoying to either build the entire system in house or to build a measurement system in house and then connect to a billing provider. It takes months to get a usage based billing system up and running and usually requires several engineers (if not more) to maintain and operate. Also, when Sales wants to offer specific discounts or deals to major enterprises, it’s typically handled outside of the in-house system in Excel spreadsheets with some good guesses. This is how a lot of money gets lost for major deals.

With Paigo we handle 100% of the measurement and collection of SaaS customers’ usage for the business. SaaS business can see their customers’ usage within 10 minutes, because all they need to do is give us read access to their cloud account. Since we pull the lower level infra-data we can additionally give information like per tenant cost, and profit margin.

Matt and I came to this project after we built similar internal billing systems at previous jobs and we realized how error-prone these systems can be—one incident might have even undercharged a client by a few million dollars! We also realized there was no solution which integrated directly to a backend system and handled the measurement and gathering of usage data as well as providing the end billing integration to platforms like stripe, AWS marketplace, or through ACH.

To get into the technical details Paigo has a few measurement systems to measure different forms of usage data: infrastructure-based, where we connect directly to cloud APIs then to slice-and-dice per tenant usage data; agent-based, where our agent is deployed into a runtime to gather usage like pod cpu time, memory, and file read write, along with any exported metrics that are prometheus compatible; and datastore-based, where we connect directly to datastores like S3, Kinesis, or log file. We require that the data in the datastore based approach adhere to a standard data format so we can process it. However this allows us to Pull, any custom metrics and dimensions directly from your Datastore. All of this data is then processed and sent to our backend usage journal, where we store it in an append-only ledger pattern.

For clients to search, and aggregate their data into an end bill or to slice and dice their client’s cost and usage we have an API clients can use. We’re an API first company, which is why our demo can work with Retool—the demo is just a very thin skin over our API. The API is a NestJS based application, currently running in AWS Lambda with API-Gateway.

We bill based on invoiced revenue (surprise surprise its usage based) and we have a platform fee, roughly it breaks down to 1% of invoiced revenue on Paigo. Note that pricing is not currently transparent on our website. Our typical customers are mid-sized enterprises where an initial sales call is typically expected. However, we will be updating our main webpage soon to have some self-service options.

For a bit of deeper dive on the measurement engine we have some docs here: https://docs.paigo.tech/

Thanks for taking time to read! Let us know what you hate and maybe what you love :P. We’d also love to hear your thoughts and experiences with measuring customer usage and usage-based billing!




Congrats on launching a product of this complexity! Best of luck.

I'm staring down the barrel of a potential usage pricing implementation, and I'm glad the majority of the foundational work is already done. It'd be no cakewalk to implement from scratch.

How do you generally address the risks of read access, GDPR, and other similar security and privacy concerns related to your technical model?


Thanks so much, yeah its a lot broader than we initially thought it would be, but we love backend programming a lot so its a joy to work on it, (most of the time).

For the security concerns, at a policy level we're currently going through getting SOC II compliance. In the future we will be progressing for fedramp complaince as well, but we haven't started on that.

  Regarding read access risks, the client has complete control in their account for giving us exactly what access they want/are able to via IAM. Additionally, right now our application is multi-tennant behind the scenes, however early on Matt and I saw that we will need to offer single tennant solution for increased data privacy, so we've designed the system with that in mind and its not a major lift for us to provide a single tenant and isolated environment within any region which data needs to reside in. It just hasn't come up as a concern for us yet. 
As an aside, we had some background working with government agencies where this was a major concern for them and single tenant region localized storage was frequently table stakes for deals.

For GDPR, we don't store or process PII. Which sounds kind of insane saying it out loud but its true. We integrate with end payment providers like Stripe and AWS marketplace, where all we're reporting with is a UUID which is associated in their platform to the end customer's billing info, which we never need to see or touch.

Now it is possible that someone could manually enter client PII into the platform in which case we would need to deal with that, but it has yet to come up. If it did, we have API endpoints which can delete all data pertaining to specific clients by request.

I suspect in the future that this will change (we may start persisting PII), and we will need to have a more cohesive strategy for dealing with right to be forgotten, but in the near term it hasn't come up.


Great coverage on those answers. Thanks!

One followup: have you considered handling the actual usage calculation and aggregation? One use case that comes to mind is accepting API request logs from (CloudFlare|CloudFront|Logstash), processing them, and directly deriving billing from those. That moves the entire process outside of a system your customer has to touch (in cases where complex application-layer knowledge isn't needed). One less thing for a potential customer to worry about (and removes one of the reasons to need read access to a database in many cases).

Again, all the best! Happy hacking. :)


> have you considered handling the actual usage calculation and aggregation?

So for aggregation we definitely do some of that work already, we enable clients to aggregate their raw data with a few different methods, like via running a total, average or count, over any arbitrary time frame.

For us doing the actual calculation of the dimensions at a specific time, we haven't thought about it, but we're always interested in building more so we might be able to prototype something.

Your specific example though is sounds like another way of metrics to be pushed into our platform, this really wouldn't be that big of a problem for us to implement, and we even floated the idea around of us exposing a stream that clients could push data to which we would process.

Given that the logs would be in an open standard format, we could definitely do that and it sounds like a good idea. :) Thanks for the suggestion.


A technical question:

I'm curious how you ship, aggregate and store usage data in a resilient (network partition tolerant), scalable and cost efficient way.

Your documentation doesn't seem to peak under the hood.

(Disclaimer: I'm currently building such a system, but using a third party wouldn't be viable in that context)


Great question.

For the resiliency part, our workers / agents we're using a write-ahead log (WAL) https://en.wikipedia.org/wiki/Write-ahead_logging to track what data has been collected and sent, when agents or workers fail and need to be restarted they read from the log to make sure that the data was sent appropriately. For a good starting point I recommend looking at the Prometheus agent for agent design and construction they have implemented a lot of the resiliency into their agents, and if you're familiar with GO forking their work might be a good starting point.

Additionally for all the off the shelf products we use to process and transmit of the data we have "at least once" delivery guaranteed, which realistically under the hood is using a WAL.

For scalability, our agents are deployed into serverless and managed components by some SaaS providers, Confluent and AWS right now. These components have some auto scaling written in, like with AWS lambda.

But for some our measurement components we're utilizing a kubernetes cluster with our custom workers, right now we just statically provision them, however we have the ability to auto scale infra based on node resource utilization, (using AWS ASG's) and for our pods, we currently scale horizontally based on resource availability, like CPU consumption and memory consumption using k8s horizontal pod autoscaler: https://kubernetes.io/docs/tasks/run-application/horizontal-...


Much needed! Congrats on the launch, Daniel and Matt.

> Matt and I came to this project after we built similar internal billing systems at previous jobs and we realized how error-prone these systems can be—one incident might have even undercharged a client by a few million dollars!

A BigCloud provider (no points for guessing) I worked for found about how they were undercharging customers due to a bug, and so, they fixed the bug for new customers, but continued to undercharge customers grandfathered in.

> However this allows us to Pull, any custom metrics and dimensions directly from your Datastore.

Most SaaS providers would rather push data than have it pulled, is what I'd imagine. Are you hearing otherwise from folks you've been speaking with? For instance, in serverless environments (which is the poison of choice for me, at least), pull is much harder to accomplish, even where possible.

> All of this data is then processed and sent to our backend usage journal, where we store it in an append-only ledger pattern.

Apparently, a BigCloud, in perhaps a case of NIH, ended up creating a highly-parallel event-queue as a direct result of the scale it was dealing with: https://archive.is/IUKvT Curious to hear how you deal with the barrage of multi-dimensional events?

> Additionally, we also help you understand your cost to serve your clients’ usage, and this data allows us to provide your SaaS with usage based billing.

2 cents: Fly.io Machines is a tremendous platform atop which I fully expect businesses to build multiple successful SaaS products; may be that's one niche for you folks to focus on and own.

> We bill based on invoiced revenue (surprise surprise its usage based) and we have a platform fee, roughly it breaks down to 1% of invoiced revenue on Paigo.

This sounds a bit steep. I know for a fact that togai.com are also in private beta (their choice of datastore is TimescaleDB, and event-store is NATS), but unsure what their pricing model is; I'd be surprised if it is the same as paigo's.


I am not cloud provider but in Subscriptions based carpooling. And similar situations we were in.

Point is if existing happy early customers - no point in hiking price due to mistake at our side (Be It our, or our tech vendor or tech integration partner!) provided you are making money and ramen profitable from those early users. :)


Thanks so much!

> Most SaaS providers would rather push data than have it pulled, is what I'd imagine. Are you hearing otherwise from folks you've been speaking with? For instance, in serverless environments (which is the poison of choice for me, at least), pull is much harder to accomplish, even where possible.

We totally offer Push based as well, all of our workers are just using the same API endpoint to push the data we collect to, its just not the strong highlight since other providers have push offered.

We went down the pull path since during our discovery process about 6 months ago, we were chatting with some DB and Infra companies who just built out and integrated with a billing provider. All of them mentioned being annoyed with the amount of engineering commitment it took for them to measure, persist, and then transmit the usage data to the provider for them to handle the rest. So we wanted to offer a pull based solution to help with this need.

You're totally right that its architecture dependent, and we don't want to cause a huge load (and cost) on a serverless platform. So for some dimensions, push is definitely an option.

> Apparently, a BigCloud, in perhaps a case of NIH, ended up creating a highly-parallel event-queue as a direct result of the scale it was dealing with: https://archive.is/IUKvT Curious to hear how you deal with the barrage of multi-dimensional events?

So to dive into more technical detail, we have an event queue where our workers drop data off to and then it gets persisted into our ledger, by workers reading from the queue. We have the queue hosted by a major cloud platform and its offered as a managed service, similar to kinesis.

For the different dimensions, we have a standard dataformat they need to be in before we can persist them, this transformation typically occurs on the client side, though in some cases we can transform the data from an open standard format (Prometheus https://prometheus.io/docs/concepts/data_model/) to our backend format.

At a criminally high level, this data format consists of a measurement, a value, a field, and a set of metadata tags. Our ledger is built on a schema-less Timeseries DB, so it doesn't matter if the same measurement has a different set of metadata from another. This gives us a boat ton of flexibility when it comes to how we want to query data.

For the different types of dimensions and their different data types, it becomes an issue when wanting to aggregate on them. For instance, a you may want the Total of one dimension, however Average and Count wouldn't make any sense.

To get by this, clients need to tell us what Aggregation method to use per dimension.

This really isn't present in the demo, since its a fairly simplistic version of the whole app, but its a requirement we have implemented into the API.

> 2 cents: Fly.io Machines is a tremendous platform

Thanks for the hot tip, looks awesome! I'll definitely check this out and how we can sneak it into our product.


No why thank you for being so considerate, genuine, and detailed in your responses through-out this thread (:


I'm curious how you'd compare yourselves to https://metronome.com which is another player in the space?


Hey, it's Matt here to answer this question. Our differentiations compared to Metronome are twofold.

First, we actually, and literally, measure and collect the usage (whereas they don't). So in Metronome's case, a SaaS company needs to develop an internal system to measure the usage amount, and be responsible for sending the measured amount to their REST API or via Segment. However, based on our own experience and many others', those work actually represent the majority of the challenges and engineering time for usage-based billing. Therefore, Paigo takes the full ownership to literally measure the usage whenever we can, and pull the data into our backend journey. So there is no development or integration needed.

Secondly, Metronome has a sole focus on billing infrastructure, whereas we are a broader footprint on usage-based business model. We not only provide billing infrastructure, but also pricing toolkit to optimize unit price based usage, and business analytics such as MRR/ARR/retention for usage-based model.

Hope it helps. Happy to dive deep in however way you are interested!


Sounds useful. Not sure I’d trade a 1% reduction in revenue across the board for this though.


For transparency, we're at an early stage and its much more useful for us to have more clients using and working with the software. Reporting bugs, telling us what they need etc... We're definitely open to negotiating on price, and we don't want it to be a blocker for people to use the platform.

That being said we still want to charge something to indicate a level of seriousness to the use.

For some context on how we arrived to the figure the ~1% figure is in the low end ballpark for how much we saw companies spend internally on their billing systems built in house, and its slightly higher but still in the same range as other billing providers.


Just my 2 cents - don't reduce the price. If someone isn't getting enough value out of it they likely aren't the right customer and might pull you all over the place with requests.

If someone is running $100m through your system they are going to call to negotiate anyway, you don't have to advertise it.


Spot on. I'm sure the YC group partners have told them as much:

https://www.ycombinator.com/blog/why-does-your-company-deser...

https://blog.ycombinator.com/users-you-dont-want/

Though, I understand the apprehension behind having to pay 1% of the revenue. It is a psychological thing mostly (in a way, the potential costs seem bottomless like with card payments, even though 1% is actually a super-good deal for small tech shops).


Yeah, it's probably not a good fit if a customer for usage based pricing doesn't want to pay usage based pricing themselves ;)


Super excited to see you launching here Dan, wishing you the best of luck!


Thanks Doug! :) You rock dude.


Thanks! (on behalf of Daniel)


Congrats Daniel and Matt - great to see you guys hit this milestone :)


Appreciate it!


Do you act as merchant of record (to handle VAT)? There are so many solutions out there that use Stripe for billing which involves so much extra work for European founders.


Hey it's Matt here. This is an area we are looking into. We have current customers being onboarded, and yet to discuss the best option for it. I'm wondering if being a merchant of record is the best solution for you, or SaaS business?


Heads up, you forgot to handle confused deputy in your IAM Role policy: (In 'Configuring IAM role' at https://docs.paigo.tech/ can't link to a page), which means anyone can pass a role (e.g. for another user) and you'll assume it.

Check out https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-de... for how to handle it. You need to require and use an 'ExternalId'.


Drat, We even have the option in the API to pass it in and use it. Just didnt get propigated everywhere else.

Thanks for that callout, I'll update everything ASAP.


FYI your "stateless and signupless demo" link isn't linking and copying/pasting the link didn't work.


Thanks for the callout, that sucks. I shouldn't use URL shorteners :/

https://paigo.retool.com/embedded/public/99bd5e9a-af3c-4e9c-...

Here is the expanded link.

And here is a fun HN specific one linking to the same thing: https://hn.paigo.tech/


there's nothing in there, all the tables are empty


Yeah so to utilize the platform, during the same session you'd need to create an offering and a service during which you'd provide an IAM role for your account where Paigo can go and read usage data for you and then give you your usage, and cost.

I walk through it all during the first 3 minutes of the youtube video if you wanted to follow along.

https://www.youtube.com/watch?v=T6J1Yh8GhdU

For creating the IAM role, we have some information under https://docs.paigo.tech/

If you get stuck or have issues feel free to ping me!


So basically this is Subscription/Recurring Invoice based on usage/metered billing. SaaS/Hosting provider/Electricity provider etc

Try/Check Zoho Subscriptions (I am happy customer of "Zoho Subscriptions") - They support Many features, metered billing, Payment Methods, Many Integrations etc. Without taking much data (except what is needed minimal). And very good tech as well as non-tech customer support! And they charge reasonably fixed, transparent and predictive price.

I understand need and pain point and multiple players can exist, Key is to have transparent pricing. (You can always have custom plan/discount/ change in pricing in future etc for future customer).

Also who is your target customer? Tech company may likely implement their own billing software or use existing proven players.

I think (Happy to be wrong!) - 1. Trying to find Non-Core-Tech/Offline subscriptions service and what are their pain points? 2. Why only usage based billing?

I know there are very less tools/software for SaaS/recurring/Subscriptions based billing (Zoho, Chargebee, Recurly, Shopify few of them) so wish you good luck.


Hey it's Matt here. So there is some research done by others showing that usage-based model has doubled its adoption in the past four years, and the acceleration has not slowed by even by a bit. Data points to the fact that it will become the mainstream five years down the road. Our personal experience speaks to the same observation. That's why we aim to solve the next gen challenge. :-)

Subscription billing is a solved problem, admittedly. Many existing software is doing a decent job, marginal improvement is needed. However, those old models are very different from usage-based model, because imagine what's the volume of change for a customer's subscription? It's probably at most once a month. But for usage-based model, the volume of events would be at a level of easily 10K per second. These are not the same kind of world. Other challenges include the underlying infra to support usage-based model, flexibility for pricing, integration with usage-based reporting of cloud marketplaces, etc.. all of those things are net new, and necessary for next gen SaaS model.


> Matt and I came to this project after we built similar internal billing systems at previous jobs and we realized how error-prone these systems can be—one incident might have even undercharged a client by a few million dollars!

This honestly doesn't lend me the confidence in your solution that I assume you think it does ;P. "We've done this before, and dude... we failed at it SO BAD, you just don't understand: this is HARD. So, instead of doing it yourself, you'd be much better off out-sourcing that effort... to us." :( It is a subtle difference, I admit, between versions of this pitch that work and the ones that don't, but this one just didn't work for me.


I edited that sentence when I was helping these guys with their text and it's possible I introduced a misleading connotation. The way I understood what they originally wrote is that they had observed such a billing lapse inside some organization, but not that they were responsible for it.

On the other hand, even if so, there's the classic proverb about 'expensive training' etc...


No worries :)

Yeah to add more color to this, we just observed the failure of a process and system we weren't a part of. Basically there were manual elements in the billing and aggregation of usage data on a SaaS platform and those were forgotten about over the course of many months which lead to the company under billing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: