this one might be more in of a “Show HN” because it’s a pre-release - something that you might want to try yourself or contribute to. The GitHub repo is here: https://github.com/supabase/wrappers
For context, Postgres has Foreign Data Wrappers (FDW) [0]. These allow you to connect one Postgres database to another, and then query the secondary database directly in the first.
CREATE FOREIGN TABLE other_database_table (
id integer,
title text
)
SERVER other_database;
The data does not need to be “moved” to the primary database - it stays in the secondary and is fetched when you run a select query:
select * from other_database_table;
Our release today is a framework which extends this functionality to other databases/systems. If you’re familiar with Multicorn[1] or Steampipe[2], then it’s very similar. The framework is written in Rust, using the excellent pgx[3].
We have developed FDWs for Stripe, Firebase, BigQuery, Clickhouse, and Airtable - all in various pre-release states. We'll focus on the systems we’re using internally while we stabalize the framework.
There’s a lot in the blog post into our goals for this release. It’s early, but one of the things I’m most excited about.
The blog post, repo and your comment still leave me a little confused about what Supabase's wrappers project is, the intent and its current state.
So this "wrappers" project is just a framework that simplifies the development and manages the curation and distribution of Postgres Foreign Data Wrappers? At the end of the day, are these just FDWs that run on top of any Postgres instance? Do I have to run Supabase to take advantage of the FDWs created through this framework?
Additionally one of your examples is Snowflake, but I don't see a matching SF FDW in your repo? Is there a generic JDBC/SQL/etc FDW for any SQL based database like Snowflake, Oracle, etc? Or is this just to create a spark in someone's mind (guilty!) about what is possible and yet to be implemented?
Despite my confusion, this sounds like a very exciting project to follow.
| At the end of the day, are these just FDWs that run on top of any Postgres instance?
Yes, the FDWs developed by Wrappers can be used on any Postgres db.
| Do I have to run Supabase to take advantage of the FDWs created through this framework?
No, you don't need to. Those are just normal FDWs can be used on any Postgres db.
| Is there a generic JDBC/SQL/etc FDW for any SQL based database like Snowflake, Oracle, etc?
The Snowflake FDW isn't developed yet, the example is just for concept demo. It has ClickHouse and BigQuery FDWs although they are not JDBC based. Generic JDBC FDW will need JVM embedded so it is not in plan at this moment.
I looked into using a postgres FDW years ago when I was trying to come up with an easy way to ELT data out of Mongo into a DB. Effectively using Postgres as the ETL tool. Understanding where to find FDWs, who maintains them, etc... was overwhelming. I can see the merits of this framework that you're building, and def going to keep an eye on it.
Also always excited to hear about Rust being used.
I love the pragmatism behind tools like osquery and steampipe that expose a lot of APIs as database tables. It makes these datasets available to non-programmers that are more comfortable with a database/tabular format.
Is it fair to say though, that FDWs have to run as compiled shared libs? I've always wondered if there can be (like with VS code and language servers) a protocol where we can run a generic FDW that speaks a particular API over the network, and then you can build out-of-process connectors to it and just have to know the usual tools (HTTP, JSON, etc).
Thoughts? Maybe this already exists.
Then anyone could potentially build a bespoke API/dataset and just point an FDW to it.
it's a interesting idea, and one of the things that we were toying with in our pg_net extension (https://github.com/supabase/pg_net). This is a "generic" async network extension, so you can fetch/put/post. It works well for APIs.
I think the generic approach works for some things where the data is less "fixed" - for example, an OpenAI API endpoint.
But for "fixed" data (data warehouses), the wrapper usually needs some custom work for security, protocols, and "push down". I'll be interested to get HN's take on this - they might have some suggestions for us for this framework
I don't think so... in your example those are functions we just call in SQL right? I'm talking about exposing arbitrary data as a table like an FDW, but without having to compile a whole module.
So more like, (this is a bit contrived but bear with me):
Let's say I wanted to make a table that lists all my keybase proofs (so sad Keybase is basically dead, but I digress).
I'd build a graphql endpoint where you can query those proofs.
Then I'd have to define a metadata endpoint that maybe returns a JSON schema?
Then you can tell the FDW to define a virtual table that (as much as possible) uses the JSON schema to generate columns in this virtual table.
You'd define the FDW to say, here's the virtual table (like you'd do now), and then put in an option to say "you'll get the mappings for this table with the JSON schema returned by this metadata endpoint" and then tell the FDW "send select queries to this REST/graphql endpoint" which would be something in our example that returns keybase proofs).
I wouldn't need to build a whole new keybase FDW. I'd just use the generic one to hit that existing endpoint.
Is that making sense? It's kind of a dynamic FDW that can be configured for the long tail of API endpoints.
> define a metadata endpoint that maybe returns a JSON schema
We could use an OpenAPI spec[0], which many REST services expose as a JSON endpoint. I think that could work: develop a generic "OpenAPI" FDW, which you can connect like this:
create server example_api
foreign data wrapper example_api_wrapper
options (
open_api_url 'https://api.example.com/'
);
Then we'd map:
GET -> SELECT (which would map selected columns into a table)
POST -> INSERT
PATCH -> UPDATE
DELETE -> DELETE
So much great postgres stuff comming out from supabase. I'm really impressed with how much focus and direction the team have. So many companies would try to broaden their product/feature suite, but fail at understanding what made the core sucsessful and stretch the company in many directions. It really seems like they are just making them self better and better experts at postgres. And how to use it well. Really looking forward to see where this is going.
On a sidenote, is not the wrappers for
Airtable, BigQuery and ClickHouse opensourced? Or why did they skip that column in the second table?
Is all of supabase opensourced now? I'm meaning I heard something about function not being opensourced but I can be remembering wrong. Most stuff is on GitHub I see
The `self-hosted` column is only missing from the "under development" wrappers in the blog post's table because those are not production ready and shouldn't be self hosted (yet).
> Is all of supabase opensourced now?
Yep! The whole stack is open source and can be self hosted
I'm excited about Wrappers because of an idea born in the GraphQL community: using GraphQL as the point of integration for many different services in your app. A use case GraphQL seemed made for. It's not without problems, though, such as when you need data from one of those services inside your database. So this takes that idea and moves it one level down into the database. You still get the benefits of a unified API (through REST or GraphQL) without the same limitations. I think there couldn't be a more ideal point of integration!
disclaimer: I am a Supabase employee, but these views are my own.
We (https://wundergraph.com) do exactly this, API integration with GraphQL as the integration layer. Integrating APIs is messy. It could require custom middleware, etc... Glueing something together is the easy part. How do you mock it? How do you test it? Add custom logic? Doing it at the API layer allows you to seamlessly scale your serverless workloads. Doing all this work in the database means that we're eating up resources for tasks that the database is not designed to do. Just because FDW exists doesn't mean we should use it for this use case.
Stripe and Firebase both offer test endpoints for their services. The foreign data wrappers for each allow subbing in a user defined endpoint in their `options` so thats how I'd recommend testing the two that have released. Some of the pre-release integrations can be spun up locally in docker e.g. clickhouse. The FDWs for those similarly take an arbitrary connection string making it pretty straightforward to write tests. Here's an example from the repo https://github.com/supabase/wrappers/blob/5fac8afb62e6e8362b...
> Add custom logic?
Views!
> tasks that the database is not designed to do
The C API for foreign data wrappers is baked right into Postgres proper. They've been around since 2013 and are pretty battle hardened at this point. Supabase Wrappers exposes that API in rust so users can write FDWs without worrying about accidentally tanking the Postgres process. Its more about making a great Postgres feature more accessible than tricking the database into doing anything new.
another possibility for application developers - mocking can be done at the "application" level. For example, using something like Jest, or Supabase can build mocking into client libs.
I'm not clear how mocking in the middleware is materially different from this approach (except perhaps you are testing network connectivity too).
I love Supabase and what they're doing! I evaluated them heavily when designing architecture for a healthcare product.
I'm not sure about this one though - rust is a great systems language, but it wouldn't be my first choice for bridging the db <-> api gap.
I wonder why this wasn't built on top of, or an enhancement to, the existing (excellent) multicorn[1] project. Python seems like a better choice of language for dealing with io bound problems like talking to remote APIs.
Multicorn is mature, stable, well tested and has a ton of FDW implementations already.
The dynamic nature of python simplifies the development/debug cycle, and serialization to/from JSON is easier than in any mainstream language except for javascript.
I'd love to understand more about the technical rationale that drove this.
> I wonder why this wasn't built on top of, or an enhancement to, the existing (excellent) multicorn[1] project
Have to agree with you there, multicorn is extremely cool. I'm a big sqlalchemy fan so their default SQLA wrapper was a killer feature to give up (although maybe we could do something similar with launchbadge/sqlx[1]). We investigated using multicorn early this year and had a few hiccups. Activity on the original repo[2] quieted way down ~3 years ago. For example, pg14 support hasn't landed and the newest supported python version is EOL in 2022. There is new fork[3] with pg14 support (15 in the pipe) that might pick up in adoption but thats still TBD.
Supabase aims to support new major Postgres versions within 2-3 months so we have to be very careful taking on dependencies that might slow that process.
> I'd love to understand more about the technical rationale that drove this.
Architecturally, multicorn has postgres communicate with a separate python process on the host that does all the hard work. That's convenient, but it can bloat over time and/or be memory hungry for larger result sets. The rust implementation runs in-process and is generally a lot lighter.
Currently I'd say supabase/wrappers is a safer/easier version of the C API vs a direct analog to multicorn. Over time I think we'll see that comparison become more appropriate. There's a lot of excitement around the concept internally and we've already been floating some ideas wrt `auto-mapping` tables for common SQL dialects, a generic JSON HTTP API wrapper, etc. Stay tuned!
Sorry I couldn't figure this out from the docs, but Stripe data is queried "live" from Stripe, right? The abstraction is great, but won't this lead to unexpected N API calls when joining across my domain + Stripe?
> unexpected N API calls when joining across my domain
I'm not sure why they would be unexpected (because it should displace some other API calls). I'll hazard a guess that you're worried about fetching the same data multiple times? If that's the case, then yes, you should materialize the data into your database.
The Wrapper itself handles pagination, but you'd also want to make sure you're using filters & limits in your SQL statements.
Thanks for the reply. I'll flesh out my thought process in case it's helpful. My immediate reaction was excitement about the abstraction. An example use case is joining my users to their corresponding Stripe Customers in SQL. The kinds of queries I can reasonably write depend on implementation details of the connector. For example, if Stripe has a bulk customer lookup (list of customer IDs -> Customers), and the connector uses it, I can estimate I'd be able to query on the order of 500 Users at a time in a performant way. But if the API only supports looking up one customer at a time, that 500 User query kicks off 500 API requests, which isn't going to work.
You're right -- it's not unexpected -- maybe more like a leaky abstraction.
I understand now, and this is a similar problem to how some GraphQL engines work
I imagine you want to do something like:
select
*
from
public.users join stripe.customers
on
public.users.stripe_id = stripe.customers.id
limit 100;
Then yes, it might make 100 consecutive calls to your stripe account. There are 3 options here:
1. Materialize your customers into your database (like I mention in the previous comment)
2. We build a "smart" FDW, so that it parses your query and fetches the Stripe data first, then performs the join.
3. Use a CTE:
with
customers as (
select * from stripe.customers
),
users as (
select * from public.users
)
select
*
from
users join customers
on
users.stripe_id = customers.id
The only option that Postgres's materialized views give you is to "refresh" the view, meaning: re-run the query used to define the view, fetching an entirely new dataset, and use that to replace all the existing data in the view.
There's no facility for deltas, or any other way of propagating changes.
Edit: And to answer your original question, yes you can absolutely base the materialized view on data obtained from a foreign table.
Yeah, this was the approach I was imagining before someone mentioned Materialized View (proper).
Once I heard those words mentioned, I began to imagine what sort of interaction with a foreign data source's operations log FDW might have.
Of course, and depending on the foreign data source, you could probably even expose an operations log to Postgres via a FDW by defining it as a foreign table... Effectively opening up the possibility of eventually consistent replication. Lots of data stores' operations logs are accessible data collections/tables anyways.
So I could implement one myself for arbitrary REST APIs?
I can't wait for somebody offering a generator, where you plug in your API, it pulls and parses the JSON, then you can select the fields you want and it generates the wrapper. (Alternatively, for put/post you could supply your own JSON).
Practically like some low/no code tools like Appsmith/Budibase and the likes already do today.
Sadly I lack the necessary skills and more important the time to dive into that.
Is there documentation on how to interact with a API for the wrapper? The API I'm probably going to try this with needs some crazy logic to parse the responses and does everything through query parameters.
I really want to love foreign data wrappers for Postgres and this seems like a big improvement over existing Python library, but the lack of support for them in managed databases services makes them a non-starter for so many use-cases.
Because RDS, for example, will only support the foreign data wrapper for reading from another "Postgres", what we really need is a server that supports the Postgres wire protocol (easier said than done) and you implement your drivers as a handler to that server.
Never tried this with RDS, but it's entirely possible to use postgres_fdw to interact with a foreign table on another postgres server, where you might have more choice about what extensions you run.
I'm doing this right now because I have a postgres installation that it's not yet convenient to upgrade beyond v12, but where I'd really like the benefit of the recently-improved JDBC FDW, which requires at least v13.
I'm eager to start using foreign data wrappers in PG, but there are close to zero benchmarks. Is my database going to break when transactions start piling up because an API I'm using is slow? Will my TPS drop significantly due to longer transactions? In other words, what am I paying for all this good stuff?
How well do foreign data wrappers work with Postgres's query planner? Does't the planner need to estimate the cardinality of results from each table? Are the Supabase wrappers providing this info from the wrappers even though it might be hard to get from the data source?
Also, is the Firebase connector for Realtime Database or Firestore?
This ends up being a way to hide an ETL pipeline behind a SQL query. Using this and some materialized views it makes it easy to just pull data in from an external source and just have it in your system.
I'm not sure on the value of this for customer facing production systems, but for internal reporting / product analytics this should make it really easy to pull in disparate datasets w/o having to spend eng time to keep each one running.
Pg newbie here, first time I'm reading about this. Super useful for internal tools indeed!
Could you point me to any tools or resources on achieving that (pulling data from several external databases into a single one for the purpose of analytics/aggregation)?
this one might be more in of a “Show HN” because it’s a pre-release - something that you might want to try yourself or contribute to. The GitHub repo is here: https://github.com/supabase/wrappers
For context, Postgres has Foreign Data Wrappers (FDW) [0]. These allow you to connect one Postgres database to another, and then query the secondary database directly in the first.
The data does not need to be “moved” to the primary database - it stays in the secondary and is fetched when you run a select query: Our release today is a framework which extends this functionality to other databases/systems. If you’re familiar with Multicorn[1] or Steampipe[2], then it’s very similar. The framework is written in Rust, using the excellent pgx[3].We have developed FDWs for Stripe, Firebase, BigQuery, Clickhouse, and Airtable - all in various pre-release states. We'll focus on the systems we’re using internally while we stabalize the framework.
There’s a lot in the blog post into our goals for this release. It’s early, but one of the things I’m most excited about.
[0] Postgres FDW: https://www.postgresql.org/docs/current/sql-createforeigndat...
[1] Multicorn: https://multicorn.org/
[2] Steampipe: https://steampipe.io/
[2] pgx: https://github.com/tcdi/pgx