Hacker News new | past | comments | ask | show | jobs | submit login

The fundamental challenge of open-source ETL is that high-quality connectors require understanding and working around all kinds of corner cases in the API of each data source. It’s very hard to get open source contributors to do this kind of work; it’s a real slog. Hence at Fivetran we’ve always stuck with the commercial route.



Personally, the most infuriating thing about a tool is where I can fix the damn thing given the source code but I have to go through the support staff to the engineering team and then wait for "this is on our roadmap but not something we're currently prioritizing". Right, I know. I don't expect other people to do work for me. I just need them to let me do the work myself.

The massive advantage of the OSS route isn't that you can ask the community to build a tool for you; it's that when you inevitably have a corner case or some behaviour you want to encode, you can just make RenesPostgres connector and copy in the Postgres connector and fix it.

I don't understand why anyone keeps their source all closed. Even one of those "you can't release this but you can edit it" licenses is better.

Half of why I use Kong as an API Gateway is that I can just edit the source code of their plugins. Thank fuck for that.


I think that if the wider open source community can maintain API client libraries for every imaginable SaaS API and every popular programming language, there's no reason that it can't maintain open source ELT connectors for all of these sources as well.

I work at GitLab as project lead of Meltano (https://meltano.com/) — which embraces Singer instead of abandoning it — and we've seen a lot of interest from data consultancies looking for mature tooling around deploying and developing Singer taps, many of whom have expressed that they'd be happy to maintain open source ELT connectors for data sources that are commonly used by their clients, if they can significantly save on ELT costs that would otherwise get passed on to those clients.

Of course, only one data consultancy (or data team at a company) would need to maintain an open source tap, and others that need the same source for _their_ clients can contribute and help keep it up to date.


We couldn’t agree more that producing high-quality connectors requires a lot of work. The hardest part about this task is that connectors must evolve quickly (due to changes in the API, new corner cases, etc). The quality of the connector is not just how well the first version works but how well it works throughout its entire lifetime.

Our perspective is that by providing these connectors as open source we can arrive at higher quality connectors. For a closed source solution, a user has to go through customer service and persuade them that there is indeed a problem. A story we have heard countless times, is that SaaS ETL providers are slow to fix corner cases discovered by users leading to extended downtime. With an OSS solution, a user can fix a problem themselves and be back online immediately.

We proactively maintain all connectors, but we believe that by sharing that responsibility with the OSS community, we can achieve the highest quality connectors.

One of the main focuses of Airbyte is to provide a very strong open-source MIT standard for testing and developing (base packages, standard tests, best practices…) connectors in order to achieve the highest quality.


Similar thoughts (btw I came here looking for your comment ha!).

I guess you had mentioned in one of the videos that at Fivetran, it is your responsibility to ensure data integrity across all of the sources/integrations, and has been since the early days. This led the customers to trust the product in the early days and the team to draw learnings from abstract patterns across sources.

Have come to believe that it is THE MOST important thing to have an explicit ownership for issues whenever there is physical movement of data across an org's ecosystem.


How do you determine this explicit ownership for issues? I've come across many governance problems linked to a lack of transparency in "bug ownership", but I've often failed to find a common ground for clients and third parties: who's responsible? Who should pay for it?

Quite often it's the one with the loudest mouth or the biggest sponsor who wins.


A customer perspective from a mid-stage CFO: I like saving money, but prefer to pay for software solutions like this directly. I pay you, and you make sure this set of connectors {in and out} continues to durably work. Meanwhile, our engineers can focus on building our product.


This is something we will definitely offer as well, with an SLA. And because the maintenance is not only done by us, but the community as well, fixes will be propagated throughout all users much faster than if it has to go through customer support.

Open-source doesn't mean you can't have both. You can check how Databricks or Confluent are doing.


Couldn't agree more.

for me, work like ELT( https://fivetran.com or https://getcensus.com) are the type of work that no engineer in the world will get a promotion from.

data|software|back|platform|etc Engineer's time is better spend on something else than that.


Exactly!

Every engineers we talked to want it out of their plate. Which is why we believe it should be commoditized with an open-source standard.


Yeah, Fivetran seems often to have issues that come down to "we had a discussion with data source/sink provider and found they had a bug in their latest release." Even if an open source contributor gets to that point they won't have the strong arm ability to force the provider to fix the bug ASAP.


Fivetran has built all the custom OAuth flows for their 150 custom integrations and you can build it into your own (internal or external) applications, it is neat. @goergewfraser When do you plan to add the ability to configure connectors that need extra config after the initial connection, e.g. choosing reports from Google Analytics?


That's an excellent point and not easy to demonstrate until someone does experience an edge case with their connectors. The main value of open sourcing a framework for integrations (e.g. Singer), is to allow customers to easily support a large number of long tail integrations that exist out there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: