Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Spice.ai – materialize, accelerate, and query SQL data from any source (github.com/spiceai)
177 points by lukekim 11 months ago | hide | past | favorite | 49 comments
Hi HN, We're Luke and Phillip, and we're building Spice.ai OSS - a lightweight, portable runtime, built in Rust and powered by Apache DataFusion to locally materialize, accelerate, and query data tables sourced from any database, data warehouse or data lake.

Phillip and I first introduced Spice on Show HN in September 2021. Since then, we’ve been schooled and humbled in every way building 100TB+ data and ML systems for the https://spice.ai cloud platform. Along with our customers, we struggled with getting fast, low-latency, high-concurrency SQL query within a budget, accessing and combining data from many sources, trade-offs between OLTP/OLAP compute engines, and managing datasets as code.

Today, we’re re-launching Spice, completely rebuilt from the ground up, to directly solve several of the problems we had in accessing data quickly and cost-effectively providing it to applications, dashboards, and machine learning. Spice provides federated SQL query across databases (MySQL, PostgreSQL, etc.), data warehouses (Snowflake, BigQuery, etc.) and data lakes (S3, MinIO, Databricks, etc.) with the ability to materialize remote datasets locally using in-memory Arrow, DuckDB, SQLite, or PostgreSQL. Accelerated engines run in your infrastructure giving you flexibility and control over price and performance.

You can read the full announcement blog post at https://blog.spiceai.org/posts/2024/03/28/adding-spice-the-n....

We’d appreciate it if you check Spice out, give us feedback, and if you'd like to contribute, we'd love to build with you.

Thanks!

GitHub: https://github.com/spiceai/spiceai




Any sense of comparison to Dremio, which helped steward the Arrow ecosystem for doing this kind of thing?

(The idea is great fwiw, I've been following them one-off for years, and we have to do elements of these things in how we build louie.ai and Graphistry for the GPU equivalent. Real pain point!)


Dremio is awesome. We've followed the Dremio journey from one of Jacques' original talks a couple of years back. Dremio's idea of caching tiers and reflections is powerful for performance.

Spice takes it further and provides flexibility for materialization, giving you full control over where that materialization exists (same machine, same pod, same network, same cluster, same region, etc.), what engine/processing (OLTP - SQLite/PostgreSQL, OLAP - DuckDB/Arrow) it uses and what tier (in-memory, attached NVMe, etc.) to store it down to the dataset level.


That's awesome! I'll definitely give it a try if there's a suitable scenario.


Thanks! Feedback and GitHub issues welcome!


Looks great! Is flightsql supported over the wire too, so one could hook it up to grafana? Any plans to support iceberg?


Yes! It can connect to FlightSQL compatible servers (see https://docs.spiceai.org/data-connectors/flightsql ) and its also a FlightSQL compatible server


We also have a Grafana plugin we'll continue to improve to make it super easy to connect to Grafana, and Spice has a metrics endpoint and example Grafana dashboard for monitoring itself https://github.com/spiceai/spiceai/blob/trunk/monitoring/gra...


And yes, Iceberg is very high up on our list


Hey guys - how does this compare to cube?


I'm not too familiar with https://cube.dev/ - but my initial impression is they are focused more on providing APIs backed by SQL. They have a SQL API that emulates the PostgreSQL wire protocol, whereas Spice implements Arrow and Flight SQL natively. Their pre-aggregations are a similar concept to Spice's data accelerators. It also looks like they have their own query language, whereas Spice is native SQL as well.


Interesting one. Any plans for clickhouse data connector?


Yes, it's on the backlog and we'll prioritize as we see demand as with https://github.com/spiceai/spiceai/issues/999.


Congratulations. Is this similar to Trino/Starburst, Drill?


Thank you!

Yes, in terms of federated queries, there are similarities, but Spice is designed to be much smaller, faster, and lightweight (single-binary, 140MB) so you can run it next to your application as a sidecar, or eventually even in the browser. Spice also gives you more options and flexibility for materialization, so you can choose where and how to store local materialized data.


Congrats on the launch! This is exciting. The video demo is awesome: https://youtu.be/AZyrecVWnEs?si=j7JVKhhcUor1_y-f


Thank you!


Congrats Luke & Phillip– exciting day!


Do you support subqueries and joins?


Spice supports what DataFusion supports, which is generally yes but there is still work to do to push down more queries to TableProviders. For example, joins within a single source are not yet pushed down to the underlying provider.

You can write a single query across many data sources which is what we show in the demo on the Git repo.


There is an effort within DataFusion to support pushing down joins across tables from the same remote provider that we will likely contribute to as well: https://github.com/datafusion-contrib/datafusion-federation


Congratulations on the launch!!


Congrats on the launch team!


This looks great - I've been meaning to dig into Rust - seems like a solid choice for you.


Wow, looks promising


This looks awesome!


looks great . Going to try this out


So great to see another project built on DataFusion @!


Thanks Andrew! I'm looking forward to contributing back to DataFusion as well.


Very cool!

One thing to keep in mind:

DuckDB can directly query parquet files (and many other file types[1]), mysql, postgres[0], and SQLite. So if you're in need of something like this, DuckDB on it's own might work for your use case.

0 - https://duckdb.org/docs/extensions/postgres

1 - https://twitter.com/thisritchie/status/1767922982046015840


Yes, we're huge fans of DuckDB, Mark, Hannes and the team.

What we've found is sometimes you want to materialize data in an OTLP DB, so what Spice gives you is the choice to store some datasets in DuckDB and some in something like SQLite/PostgreSQL and join them together in a single SQL query, so you can get the best of both worlds.


DuckDB can both read/write to PG. What exactly usecase you are unlocking?..


DuckDB is awesome. As an OLAP columnar-store database it excels at certain operations, like aggregations. If your use-case is row-based lookups where an OLTP database would perform better, you now get a choice of engine, while still having a single place to access your data from your app.

Originally, we only supported DuckDB in our cloud product Spice Firecache, but actually lost a customer because their use-case was optimized for an OLTP DB. Now, you can get a choice... down to the dataset level and still be able to join across them in a single query. With Spice, you can load both SQLite and DuckDB together in the same process for local materialization and acceleration.

Finally, Spice OSS does more than just data query. You can read about the vision to power AI-driven applications by co-locating data with models at https://docs.spiceai.org/intelligent-applications.


> If your use-case is row-based lookups where an OLTP database would perform better, you now get a choice of engine, while still having a single place to access your data from your app.

my understanding is if you run some SQL in DuckDB against PG using extension, say select * from t where id = 2; it will perform actual lookup on PG server but results will be accessible in DuckDB.

> With Spice, you can load both SQLite and DuckDB together in the same process for local materialization and acceleration.

you can do this in any Py or Java or C++ or whatever program..


You're right, and that might be a good choice if you wanted to deploy and operate an additional PostgreSQL server locally.

## Using DuckDB:

app -> duckdb -> network -> remote postgres (data) | local postgres (materialization)

## Using Spice:

app -> localhost gRPC/HTTP -> [Spice <duckdb|sqlite>] -> network -> [postgres|S3|snowflake|etc]

In addition, Spice manages the materialization for you. In the DuckDB-only case, you'd have to do a COPY FROM [remote postgres] to [local postgres] manually every time, and manage the data lifecycle yourself. That gets even more complicated if you want to do append or incremental updates of data to your local materialization.


DuckDB is an in-process DB similar to SQLite - so every application in your stack would need to embed it. Spice is a binary that has Flight SQL and HTTP query endpoints - so multiple applications can connect to it from any language.


> Today, we're re-launching Spice...

  Obtaining blockchain and smart-contract data is hard ... Spice makes it easy.
http://web.archive.org/web/20220414105622/https://docs.spice...

A slight detour from the company's original vision (https://archive.is/88IoQ)?


Actually, we posted the original vision in Sep 2021 at https://blog.spiceai.org/posts/2021/09/07/introducing-spice.... for AI-driven applications and discussed needing a good source of data at https://blog.spiceai.org/posts/2021/12/05/ai-needs-ai-ready-....

We believe blockchain data is one of the most interesting time-series datasets to work in developing an AI-driven application platform, because it's continuous, well-structured, has many applications, and is open to index. Regardless of views on crypto, from a purely technical/data feed perspective, it's quite useful for testing time-series systems.


> original vision ... https://blog.spiceai.org/posts/2021/09/07/introducing-spice....

Agree. I shared the same blog link (via archive.is) in the comment you replied to.

Detour into blockchain, even if warranted, might have been a distraction? Either way, congratulations on this relaunch.


Thank you! Much appreciated.


wow thanks for pointing this out

I have a short circuit for whenever I see the B word or still pushing this smart-contract non-sense that isn't being used in serious real world projects with legal repercussions....for the 10+ years this technology has existed



My first thought when I saw your post pointing out the congratulations comments was one of the ending scenes from Neon Genesis Evangelion where they say congratulations repeatedly.

https://youtu.be/oyFQVZ2h0V8?si=oOYSIjVmpJK6mwft

Regardless, this is very spammy marketing.


If you think someone is posting abusively, email the mods. You've seen the thing about not-posting shillage insinuations in the site guidelines.


I posted a straight fact with the links to prove it. You are making that connection from the information I gave you.


https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

It's an accusation of abuse. Those go to hn@ycombinator.com, not in the threads where they are meta noise.


I posted true, verifiable information about this thread (that a lot of people voted up). I didn't accuse anyone of anything, people are smart, they can make up their own minds.


This isn't some complicated thing, there's a site guideline specifically about it and you should try to stick to it like everyone else because it trashes the forum. You can just mail this stuff in.


You can mail to mods too if you think something is wrong with parent comments?


People have friends who want to support them




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: