Hacker News new | past | comments | ask | show | jobs | submit login

Pretty cool! I did a similar project for flatfiles, but used bloom filters to generate an “index” of row contents to test against later. I feel like a similar idea could work for identifying divergent rows within your segments more quickly/with less repeated work.

Making that work across databases could be a huge pain though, I had some success in Postgre but bitfields in the other DBs were painful.




> Making that work across databases could be a huge pain though

That was indeed the main challenge. Each DB has a different syntax, different set of features, different format for timestamps and floats, different max precision, and so on. I'd say most of our work on data-diff went to making sure the behavior of the different DBs aligned with each other.




Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: