Hacker News new | past | comments | ask | show | jobs | submit login

There's a whole subfield of information science dedicated to basically this exact problem: entity resolution.

Hilariously, it has dozens of names, because it just comes up in so many places for so many people. It appears that "record linkage" is the term that has won the top spot at Wikipedia: https://en.wikipedia.org/wiki/Record_linkage

Record linkage seems to be unrelated. While OP isn't sure how to segregate and join data, he has perfect joining capability through unique indices.

Record linkage seems to be concerned with joins that aren't guaranteed to be correct because there are no unique keys.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact