Show HN: Ingest data from your customers (Prequel YC W21)

alexpetralia · on March 15, 2023

So is this like Fivetran, except between clients as opposed to vendor-client?

If so, any idea why most data integrations tools have not done this (or have they)? What is so tricky that they could not extend their tools to cover a customer's Postgres database?

ctc24 · on March 15, 2023

Not sure if I'm understanding the analogy. The way I usually describe it is that it's like Census / Hightouch, but it's offered by the vendor as a first-party feature.

Let's take Salesforce as an example. Let's say they want to pull in data from their customer's database -- maybe so that sales reps can keep track of how much volume the customer did in the last month -- instead of requiring the customer to instrument their code with Salesforce API calls. Salesforce could use this tool to connect directly to all of their customer's databases / data warehouses, regardless of whether they're Postgres, Snowflake, Clickhouse, etc.

As far as why it's non-trivial: you have to support a lot of different databases / data warehouses, which all have slightly different query languages, type systems, and optimizations. Then you've got to move the data reliably, dealing with things like eventual consistency etc. We feel like that's the reason this hasn't been built yet.

gregw2 · on March 15, 2023

Is it a full refresh from the source each time, or is it incremental, and if incremental, what assumptions do you make or not make about keys, dupes, etc?

ctc24 · on March 15, 2023

It depends -- mostly on whether the vendor (the company receiving the data) is comfortable requiring the source to map some fields.

For low volume cases, we can operate with zero mapping of fields. In those cases, we run every transfer as a full refresh.

If the volumes are higher, then we'll typically ask the source to expose a primary key and last_updated_at timestamp field. In those cases, we run incremental transfers. We use the last_updated_at to figure out what data to transfer, and the primary key to merge it into the destination table without creating dupes.

gregw2 · on March 16, 2023

Thanks! Can you detect deleted rows as part of that?

Do you have support for the target table maintaining history via record effective and termination dates with a current record indicator, or do you just support maintaining current state at the target?

Can the target be a cloud filestore or old school SFTP site?

ctc24 · on March 16, 2023

We can detect deleted rows for incremental transfers (and propagate those) if they're soft-deleted in the source, whether through a deleted_at column or a is_deleted column.

For now, we only support maintaining current state in the target.

Yup! We support all common cloud file storage as destinations (S3, R2, GCS, Azure Blob Storage) as well as vanilla SFTP servers.

mdaniel · on March 15, 2023

So is the value-add that the customers of Company-A (who is your customer) entrust you with credentials to their databases versus entrusting Company-A with them directly?

ctc24 · on March 15, 2023

That can be part of the value-add, though for on-prem deployments, we never touch the credentials ourselves.

Not to sound like a consultant, but there's three value-adds I'd call out:

1. Handling the dialect, types, and connection modalities of many different databases. This takes a lot of time to build and there's a lot of nuance that's non-trivial to work through.

2. Replicating data and guaranteeing data integrity + reliability. There's again a lot of nuance here, especially once you start considering that data is eventually consistent in most sources, that you want to transfer it as efficiently as possible, etc.

3. Providing a clean UX that end-customers can use out of the box, such that the end-customer experience is clean and intuitive. We spend a lot of time thinking about how it makes sense for people to connect their data, so that our customers don't have to.

edit: fmt

ttpphd · on March 15, 2023

If one side of your business is ingesting data, is the other side excreting it?

ctc24 · on March 15, 2023

Pretty much! We also offer data exports.

lisasays · on March 15, 2023

The setup only takes 5 minutes,

Nothing ever takes 5 minutes. Remember these are engineers you're talking to here.

ctc24 · on March 15, 2023

Ha, fair enough! We did our best to make the setup flow as yak-shaving proof as possible, but no such thing as a guarantee.

conormccarter · on March 15, 2023

Hey HN -- Conor here, aka the guy from the demo. Happy to answer any questions you have!