Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Oodle – serverless, fully-managed, drop-in replacement for Prometheus (oodle.ai)
84 points by kirankgollu 14 days ago | hide | past | favorite | 82 comments
Hello HN!

My co-founder, Vijay and I are excited to open up Oodle for everyone.

We used to be observability geeks at Rubrik. Our metrics bills grew like 20x over 4 years. We tried to control spend by getting better visibility, blocking high cardinality labels like pod_id, cluster_id, and customer_id. But that made debugging issues complicated. App engineers hated blocking metrics, and blocking others' code reviews was not fun for platform engineers either! Migrations (and lock-ins) were very painful, the first migration from Influx to Signalfx took 6+ months and the second migration from Splunk took over a year.

Oodle is taking a new approach to building a cost-efficient serverless metrics observability platform. It delivers fast performance at high scale. We leverage custom storage format on S3, tuned for metrics data. Queries are serverless. The hard part is how to achieve fast performance while optimizing for costs (every cpu cycle, storage/memory byte counts!). We've written about the architecture in more detail on our blog: https://blog.oodle.ai/building-a-high-performance-low-cost-m...

Try out our playground with 13M+ active time series/hr & 13B+ samples/day: https://play.oodle.ai

Explore all features with live data via Quick Signup: https://us1.oodle.ai/signup - Instant exploration (<5min): Run one command to stream synthetic metrics to your account - Easy integration (<15min): Explore with your data from existing Prometheus or OTel setup.

We’d love your feedback!

cheers




The UI feels _very_ similar to Grafana. Even the dashboard folders look exactly the same to me. I would have thought that Grafana being AGPL woudl specifically forbid this?

Edit: Or maybe the AGPL just requires releasing any code you change? I could be confused.


It’s indeed Grafana. We’ve been maintaing a public fork of Grafana.


Where do you keep the code?

Found it: https://github.com/oodle-ai/grafana

Why do this instead of just build a data source?

Edit: Not to be that guy (but I'm about to be that guy). You have links to grafana.com (which, is your competitor), all over the source in your page. This also lists the version as 11.1.0, which was released 6-21.

All of the versions in your fork repo mention 11.0.0-pre. Did I find the wrong repo, or are you using code that you haven't published?

The reason I mostly care is that this is the sort of reason that good open source projects get closed down, and that makes me a bit sad.


Oodle can be utilized solely as a datasource, but we also wanted to provide a solution for customers who don’t have a visualization platform in place.

Here is the branch we use: https://github.com/oodle-ai/grafana/tree/v11.1.0-oodle-stabl..., which has all the changes we have made in Grafana.


So the vast majority of your fork is just rebranding? Customers get to lose thousands of commits worth of improvements for that?


We are reasonably close to the latest version of Grafana. We periodically pull in new changes.


I presume its more due to licensing than rebranding.

I have been meaning to ask the observability experts this question:

Why not dump all metrics , events and logs into Clickhouse ? and purge data as necessary? For small to medium sized businesses/solution ecosystem, will this be be enough ?


Very different tools, applications almost don't intersect actually.

ClickHouse is an analytical database (for events). Yes, you can do metrics in there (or PostgreSQL for that matter). Observability has their own needs, so specialized solutions work better, more integrations and out-of-the box tooling already provided.

With generic databases it's more like a constructor that you'd need to develop a lot to become workable. For example, let's say you have 50m active time series, with 1%/hour churn rate. What would be the database structure in ClickHouse? (my answer is that I don't know. I know mostly VictoriaMetrics, and there's no such question there, the structure is already implemented).


It'll work. Clickhouse has even experimental support for storing prometheus metrics natively. A big missing piece is alerting.


ClickHouse is great for logs and traces, however, for metrics, it is still in the early phase. ClickHouse is also a general purpose, real time analytics database. See clickhouse.com. Whereas Oodle is specifically built for end-to-end metrics observability.


Curious to know what you mean by "early phase" here.

ClickHouse recently got the support of TimeSeries table Engine [1]. It is marked as experimental, so yes - early stage. This engine is quite interesting, the data can be ingested via Prometheus remote write protocol. And read back via Prometheus remote read protocol. But reading back is the weakest part here, because Prometheus remote read requires sending blocks of data back to Prometheus, where Prometheus will unpack those blocks and do the filtering&transformations on its own. As you see, this doesn't allow leveraging the true power of ClickHouse - query performance.

Yes, you can use SQL to read metrics directly from ClickHouse tables. However, many people prefer the simplicity of PromQL compared to the flexibility of SQL. So until ClickHouse gets native PromQL support, it is in the early stages.

[1] https://clickhouse.com/docs/en/engines/table-engines/special...


Not a fan of bad mouthing other offerings. As @iampims is saying, alerts is a big missing piece. Clickhouse is also a general purpose database for many use cases including analytics, financial servers, ML& Gen AI, fraud, and observability.

Pretty cool. I wonder where it would develop - will Oodle release an Open Source version or will ideas be implemented in some new (or existing) Open Source solutions - ClickHouse, VictoriaMetrics etc.



It'd be nice to post a disclaimer if you are working on a competing offering before you post a comment. (for the benefit of everyone)

Sorry, I didn't understand your point... What I did wrong here? )

your comment said "no magic" and the links points to "#magic-behind-oodle". May be I misread your comment or you didn't mean that.

very interesting solution. and great idea to have a playground. would love to know some details on the implementation of the architecture you have shared - 1. how do you query across multiple files, do you have a query engine like data fusion doing that heavy lifting, or is this a custom implementation ? 2. how do you manage a WAL with real time query-ability across files ? have you seen any failures (recent entries missing sort of issues) Thanks, once again really interesting design and intuitively looks more economical.

Thanks for your feedback, and great questions. 1. We create serverless functions to process each file and then combine the results, optimized for columnar file formats. 2. This is one of our core innovations :) We created custom representations of WALs which help us with query performance and ingesting them quickly. 3. Once a WAL is ingested, it is available for query within a few seconds. So far it has been reliable and we have not had issues with missing data.

Cool! The website says “No Lock-In” does it mean that I can bring by own compute and storage?

Also, found a few typos and a broken link, see error report here: https://triplechecker.com/s/xEd4Hp/oodle.ai?v=uxGS1


Thanks for the report - we just deployed fix for the same.

No lock-in means it’s 100% open source (PromQL) compatible. You can swap out vendors or move to self-hosted open source solutions should you need to move away from Oodle. When you migrate out, you get to export all your data, dashboards and alerts. you don't need to make any code changes.

We support bringing your own bucket (BYOB) for large enterprise customers however, you cannot bring your own compute at this time. Our thoughts are along the lines of how Snowflake approached the problem - everything fully managed to keep the operational overhead minimal. https://jack-vanlightly.com/blog/2023/9/25/on-the-future-of-...


Why did you name your startup the same name as the most popular network compression library for video games? This seems short sighted. Even if you don't run afoul of trademark/copyright, you're sharing a lot of SEO and marketing terminology.


Point taken. Thanks for the feedback. Our reasoning is that we’d like the name to be short and memorable. And a bunch of observability companies have observe keyword overloaded all over the place, we wanted our name to stand out. Oodle = Optimized Observability Data Lake.


The logo on your main page for oodle.ai is blurry.

Why use a .ai domain? I love LLM but this is a turnoff to me.


We are still early in our journey, and are currently working on leveraging LLMs for incidents and query / dashboard generation.

We do use pre-LLM-era AI and statistical analysis to provide insights and auto create dashboards for alerts (currently in alpha).


thanks for the pointer on logo - we are fixing it later today.

Edit: fixed now.


its steam from the japanese onsen

"fully managed, serverless"

So its not really a drop in replacement for prometheus then, its more of a send all your data to some other bloke kind of replacement.

Software as a service is fine, but you dont need to hide it behind hip marketing terminology.


Technically you are correct, the scraper will still exist. However, the hard part is scaling the query and storage layers which we replace.


I'm wondering something: how is the storage/compactation solved? AFAIK S3 lacks append semantics, so data must be accumulated somewhere else before storing it. Kinesis?


We use a local disk to temporarily stage data before putting it on S3. We have smaller WAL (write ahead log) objects, and a periodic compaction process which creates read-optimized files on S3.


We, at Workorb, migrated from Grafana to Oodle and very happy so far. Observability space does need a ground up reimagination and we think Oodle is positioned to do that.


I'm curious, why did you move off of grafana?

For the same reasons op listed or for other reasons?


For us, cost, specially as we grow and number of metrics and tags increase was a factor. We are also starting use Oodle AI for helping with discovery of problem root cause faster.

I don't know how trademark works or anything like that not a lawyer etc etc but lots of stuff are called oodle. I wish you luck.


This is the one that came to mind for me when I saw Oodle: https://www.radgametools.com/oodle.htm


Same. Oodle is extremely well known in the game dev sphere. It’s literally baked into PS5 silicon for hardware decompression.


Thanks for the heads up. we did check on IP/trademarks just to be sure to avoid violations.


Oodle is a registered trademark:

https://uspto.report/TM/88478792

RAD is now owned by Epic Games (acquired in 2021) so they have very deep pockets.

A lot of people, including myself, were clearly initially confused that there must be some association given you are using this name in a not-entirely-unrelated field.

IANAL but I hope you're real sure that you are legally in the clear before you commit too deeply to the name.


Thanks for your inputs, we will followup.


There is this obscure website, google.com I think is it's name.


Love the observability feature here. Would love to see a detailed feature set comparision along on the competetitors landscape


Thanks for the kind words - we will be posting a feature comparison matrix in the upcoming weeks on our website.


Is it SaaS-only?


If it weren't, then you'd need servers and it couldn't be serverless! :)


"Serverless" is an overloaded term marketing that really means functions-as-a-service. Looking at the stack, I don't actually see any components that you couldn't easily port to an on-prem solution.


This architecture diagram (https://oodle.ai/product#magicbehindoodle) goes into more detail into where we leverage Serverless. For ingestion, we still use dedicated compute, but for queries, we leverage serverless.


yes, it's only fully managed at this time. However, oodle is very cost-efficient, it's cheaper than your self-hosted infra costs. https://oodle.ai/usecases/self-hosted


I would love to see an actual breakdown of oodle vs self hosted costs. I seriously doubt that it’s cheaper.


Unless you have heavy experience hosting and tuning prometheus, its not easy to be that cost efficient. It has a tendency to OOM crash on heavy queries if enough memory isnt provided, and provisioning huge memory for occasional queries becomes expensive quickly. Not to mention backups and replicas.

The actual mileage may vary here - we found this to be true for our early customers. Our pricing is simple, and transparent - https://oodle.ai/pricing. Based on our conversations, we consistently hear this is very cost-efficient. Pls let us know if you feel otherwise - but one can can easily input the #active time series / hour to get an estimate from Oodle.


any plans to open source? I feel very comfortable using neon.tech (separates compute from storage for postgres) b/c they open source their stack but it would be hard for me to adopt something like oodle without an open source version.


We don’t have plan to open source at this time. Many observability solutions are closed source. Could you please describe why you would require this to be open source vs open source compatible?

We think Oodle combines the benefits of open source (compatibility, no lock-in) with the operational simplicity, reliability of commercial vendors. Some products might give a false illusion of no-lock in just because it's open source, it wouldn’t mean you don't have lock-ins. We believe what really matters is "Open Source Compatible" - i.e. how easy it is to get in? How easy is it to switch out? (to a de-facto open source standard like Prometheus/Grafana should you need to disconnect ties with the vendor). Security and compliance is the other big part - we are working on adding compliance like SOC2, CCPA etc.


The primary benefit of FOSS is the ability to inspect and patch the software yourself. Another important benefit for any large FOSS project is knowing it’s been heavily reviewed.

Protocol compatibility is a baseline requirement for a drop-in replacement. Without that, your product would need to be like 10x cheaper than Prometheus for anyone to consider devoting engineering resources to switching to it.

Existing FOSS solutions are reliable and inexpensive enough, not even considering the effort and unknowns involved with swapping out a major component of the observatory stack. Between that, your distorted view of open source, and your focus on checkbox compliance, I’m unlikely to ever consider trying your software. I wish you luck though, and I hope you learn more about the value of FOSS.


Thank you for the feedback. we will keep an eye out for more inputs from our users on this topic.

We believe there are clear benefits of open source. In fact, we believe in it so much that we made our product OSS compatible from Day 1. (btw, not every observability product is OSS compatible).

We also understand not everyone's evaluation criteria is the same. Upon speaking with number of our early users, we repeatedly found that what really matters is "is the product reliable, cost-effective?", "is the product easy to use and open source compatible?", "is the product easy to migrate in/out?". So, we tried to address those head on. We also heard some open source solutions are indeed unreliable especially at scale e.g. Prometheus has scaling challenges beyond 2M+ active time series / hour and it is not horizontally scalable. People tend to over-provision CPU/Memory, despite that queries time out at any meaningful scale high cardinality queries (this is a well understood problem).

Ability to inspect code, do patches themselves may be an evaluation criteria for some users, we found that it was not the major evaluation criteria among our users. I've led multiple evaluations at Rubrik (open source + non-open source), ability to patch software was not the most important criteria - reliability, operational overhead, cost, ease of use, and ability to switch in/out were may more important.


I’m curious what you think it means to be "open source compatible"

Some comparison to Thanos would be great!


Great question! Vijay here, I'm one of the co-founders of Oodle. Compared to thanos 1. We use object store (S3) for all queries - even recent time ranges. Object store is not just an archival solution 2. Customized indexing to minimize memory usage. Index is also on object storage. 3. Custom columnar file format optimized for storing metrics on object storage 4. Serverless functions for achieving good query performance. This helps us break down and parallelise queries without impacting cost with pre-provisioned compute. 5. No downsampling. Downsampling is not required to improve query performance or reduce costs with serverless and object storage


Yup, fan of the LGTM stack + Alloy


The Oodle team is great! If you're looking for a cheap metrics (prom/otel) store, check it out!

Not to mention they offer the best metrics free tier in the entire space... Let me know if you know of a better free tier ;)

Unfortunately, Grafana Cloud only offers 10k active series, which is really easy to surpass even in a homelab; meanwhile, Oodle offers 100k.


Is this comment legit? New account, no activity…

Yeah it's legit. Never bothered to create a HN account but caught wind of this post and had to come drop some good words about the team. I use it (for free) and have no skin in the game. Just want to let folks know of a free tier that is literally 10x better than the next best thing!

VictoriaMetrics is still cheaper!


Oodle is a fully managed offering. Curious why you think Victoria Metrics Cloud is still cheaper. https://victoriametrics.com/products/cloud/

This translates to about $5.2 per 1k metrics, Oodle is $1 per 1k metrics!

Am I missing something?


Hmm, where do you see $5.2?

I see $0.182/h/1M ts = $0.182 per 1k metrics (S.STARTER.A instance)

Also Oodle website says to "Contact for pricing beyond 5M". While VictoriaMetrics Cloud only starts at 1M (S.STARTER.A), and "Contact Us" starts at 125M active time series.

PS: time series is only one dimension, there's also storage size, retention, query/alerts frequency etc.


Oodle is a fully managed, supports high availability, so was comparing against the cluster mode of victoria metrics cloud offering.

RE: Victoria Metrics Pricing: Pls see https://victoriametrics.com/products/cloud/. The pricing that you are referring doesn't seem public, looks like you've to sign up to see that pricing. $190/month is a single node pricing. For any real use cases, you need HA and victoria metrics enterprise pricing for a cluster starts at $1300/month for 250k metrics. This translates to $5.2 per 1k metrics (5x more expensive than Oodle). For a real scale about ~2.5M or ~5M time series / hour, Oodle is around half the cost of victoria metrics.

RE: Pricing dimensions, we've simplified our pricing by indexing on a single dimension. We don't require our customers to choose machine type, RAM, CPU etc. There are other limits but for the most part, they don't matter so much in our pricing.

RE: Pricing tiers, anything more than $30-50k/year (>5M time series / hour), companies usually to talk with someone for volume discounts rather than go with online pricing.


Ah I see, I use single-node.

Although I see for C.XLARGE.HA they go down to $0.56 per 1k time-series per month, just not for the smallest instance. So looks like VictoriaMetrics can be 2x less expensive than Oodle on large scale.


It's not a fair comparison. You can't compare the base tier pricing of Oodle offering with the high scale tier of victoria metrics. As you scale, volume benefits kick in just like the way they are for VM. It's a common wisdom that enterprise companies don't swipe credit cards beyond 50k/year. We request our customers to speak with us when they get to high scale tiers.

250k metrics - Oodle is 5x cheaper, < 5M metrics - Oodle 2-3x cheaper, beyond 5M - talk to us (our pricing will be competitive and will be better)


Yep, all true.

$0.253/h/1M = $0.182 per 1k metrics per month.

As suggested in earlier message, this pricing is only for single node. imo, this needs to be compared to cluster mode - any company with meaningful scale will need reliabilty and needs to run in cluster mode since you’d need your observability systems to be up and running when the rest of your systems are down.

Why is the primary sales call to action is that it's serverless if it's a hosted solution? Who cares.


Because at the early stages it’s really important to talk to customers.

This also helps find users for whom this is a huge pain point - metrics costs are so high that they’d love to talk to someone and complain about the problem.


“fully-managed, cheap metrics, ideal for serverless applications”


we leverage serverless and s3 based architecture for much lower costs. However, it's applies for any application, not just for serverless applications.


Your costs and deployment pattern are your problem, customers don’t care about them.

Saying it is serverless means nothing to customers unless the serverless aspect applies to them, which in this case it doesn’t. If you’re only selling access to your product for a fee, then whether it’s serverless or not, customers couldn’t care less.


That's fair.

We actually started with "reduce your costs by 3-10x with infinite scalability" without talking about storing in s3 and serverless (along the lines of your thinking - only talk about benefits + what you do). But our users, engineers by their very nature, were skeptical about how Oodle works, so we ended up settling on a combination of why, what and how. This resonated better with our early customers and prospects.


Not to be confused with Oodle[1]

[1] https://www.radgametools.com/oodle.htm


[flagged]


Our P99 query latency is under 3 seconds, we have tested up to 100M unique time series / hour and the architecture can scale up to billion time series / hour. To get a feel of the performance at high scale, give us a try at https://play.oodle.ai


[flagged]


With our custom columnar format and indexes, we are able to filter relevant data files where high cardinality column is present. This helps us to keep the queries faster for high cardinality labels as well, thus, allowing us to quickly drill down on specific pod_id/cluster_id/customer_id kind of labels.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: