Hacker News new | past | comments | ask | show | jobs | submit login

Users having multiple addresses is something I've cursed a lot over. I work in a team that does data analytics for a news publishing company, and our print business is still very important. Unfortunately, in our database over print customers users are basically addresses because you don't really need to know how many people are receiving your paper as a distributor, only where and how many papers. Since it's also been a safe assumption for a century that people share newspapers with each other, market research was done street by street to inform ad buyers of which markets we reached. Many people have more than one home. Some people take out another subscription for a relative.

This mapped very awkwardly to digital subscribers who we had individual data on. We were able to join databases in a way that sort of works through more or less (mostly less) comfortable assumptions. The queries are not pretty.






There's a whole subfield of information science dedicated to basically this exact problem: entity resolution.

Hilariously, it has dozens of names, because it just comes up in so many places for so many people. It appears that "record linkage" is the term that has won the top spot at Wikipedia: https://en.wikipedia.org/wiki/Record_linkage


Record linkage seems to be unrelated. While OP isn't sure how to segregate and join data, he has perfect joining capability through unique indices.

Record linkage seems to be concerned with joins that aren't guaranteed to be correct because there are no unique keys.


Multiple email support seems indeed complex, event the most popular CRM on the market doesn’t support this feature even if it’s requested a lot. https://success.salesforce.com/ideaview?id=08730000000BrPIAA...



Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: