I understand Reverse ETL as a concept, but why does it require different software to solve? Aren't most ETL/ELT tools designed to move data from any source to any other source? We've got a pretty vanilla requirement of pulling data from CRM to data warehouse for reporting and then pushing rollups back to CRM. And we use the same tool for both flows.
there are many nuances to sending data from data store to biz tools, that don't apply to ETL tools. For example, data warehouses are designed to take as much data and in whatever format, you can throw at them. Biz tools on the other hand have very custom payload/API formats, rate limits, and more.
From Wikipedia : extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations.
Reverse is only meaningful when you're used to a specific view that ETL can only load into a single destination. (Usually data warehouse)
Just like the term "push notification" which are a specific kind of communication from the server to a mobile device but in fine, it's the device calling a service to know if there are any notification waiting for them.
A Message Queue in other words...
You have been downvoted for this but you are entirely correct.
The differentiation in terminology is entirely a fiction created by marketing teams for these kinds of tools.
I'm saying this is someone that sees the benefit in this specific tool and desires to implement it for my team.
There is no reason that tools like FiveTran couldn't handle the same things that tools like Census do. They just focused on a specific set of use cases, ignored other ones that were more of a paradigm shift. Then when teams created products that fill this void they referred to them as reverse ETL for entirely marketing reasons.
It's an example of where enterprise software gatekeepers such as Gartner drive negative value by deliberately confusing the language to sell their magic quadrant reports to risk adverse executives at massive corporations.
I have one very practical question, having hit frustrations pushing data from our data warehouse into Salesforce (I see you have a Salesforce adapter).
When you push data into Salesforce, and Salesforce (API) returns error messages indicating some row in the target can't be modified because it is busy, is your tool able to detect which rows couldn't get loaded and retry those rows N times (hopefully with some exponential backoff delay or perhaps a fixed delay between retries), and then if those retries fail, report back up permanent failures of which rows/IDs didn't make it to Salesforce? (I haven't run across a tool that does this but this was my pain point.)
We've engineered the platform to manage rate limiting, implement retry logic, and perform logging and monitoring of failures. Regarding Salesforce, I'm unclear about the specific issue you're referencing, but I'm eager to help. Could you initiate a discussion on our Slack channel? I will pick it up from there.
Cool! I'm working on something on the other side of the same pipeline. Real time data ingestion and identity resolution, delivering into a data lake.
Have you considered adding source connectors for S3 based data lakes? For example Parquet files or Delta Lake? Maybe via AWS Athena to make it similar to the Red Shift connector?
Yes! S3 as source connector is in our roadmap. Should be out soon. By connecting to S3 as a source & using Athena, data models can be created & synced to various business apps. Files in Parquet format is supported through Athena.