
Show HN: Format-preserving redaction for structured test data - lvh
https://github.com/latacora/wernicke
======
lvh
We have a lot of structured data at Latacora: most of which off of APIs and
appliances. You often want to build regression tests against it, but the real
data is sensitive. We built this tool to redact the data so that the exact
IPs, security groups, et cetera are replaced, but their replacements have the
same shape (e.g. IPs to IPs, MAC addresses to MAC addresses, random hex
strings to random hex strings of the same length...) and that the _same_ IP
occurring more than once gets mapped to the _same_ replacement IP. This is
useful because e.g. I might care that two instances are in the same security
group, and that doesn't work if I just blindly randomize each one separately.

It's extensible, but right now that involves editing code. If you can express
it as a regex, you should be good to go. (I'd like to make it configurable via
config file, but not enough people have asked so far.)

