I was recently researching ways of anonymizing production data for staging, and I also found existing tools either cumbersome to setup or lacking in features.
I stumbled upon clickhouse-obfuscator[1], and really liked that it worked on standalone dump formats (CSV, Parquet, etc.) rather than any specific DBMS. I think that's a great approach for this, since it keeps things simple and generic, and it can be conveniently added as a middle step in the backup-restore pipeline. Unfortunately, the tool is quite barebones, and has issues maintaining referential integrity, so we had to abandon it.
This is still an unsolved problem in our team, so I'll keep an eye on your tool. We would need support for ClickHouse as well, so it's good you're planning support for other DBMSs. Good luck!
I was recently researching ways of anonymizing production data for staging, and I also found existing tools either cumbersome to setup or lacking in features.
I stumbled upon clickhouse-obfuscator[1], and really liked that it worked on standalone dump formats (CSV, Parquet, etc.) rather than any specific DBMS. I think that's a great approach for this, since it keeps things simple and generic, and it can be conveniently added as a middle step in the backup-restore pipeline. Unfortunately, the tool is quite barebones, and has issues maintaining referential integrity, so we had to abandon it.
This is still an unsolved problem in our team, so I'll keep an eye on your tool. We would need support for ClickHouse as well, so it's good you're planning support for other DBMSs. Good luck!
[1]: https://clickhouse.com/docs/en/operations/utilities/clickhou...